WO2022035942A1 - Systèmes et procédés de classification de documents basée sur l'apprentissage automatique - Google Patents

Systèmes et procédés de classification de documents basée sur l'apprentissage automatique Download PDF

Info

Publication number
WO2022035942A1
WO2022035942A1 PCT/US2021/045505 US2021045505W WO2022035942A1 WO 2022035942 A1 WO2022035942 A1 WO 2022035942A1 US 2021045505 W US2021045505 W US 2021045505W WO 2022035942 A1 WO2022035942 A1 WO 2022035942A1
Authority
WO
WIPO (PCT)
Prior art keywords
document
image
classifiers
computing device
stamp
Prior art date
Application number
PCT/US2021/045505
Other languages
English (en)
Inventor
Sudhir Sundararam
Zach RUSK
Jagadheeswaran KATHIRVEL
Won Lee
Goutam VENKATESH
Ankit Kumar SINHA
Original Assignee
Nationstar Mortgage LLC, d/b/a/ Mr. Cooper
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US16/990,900 external-priority patent/US11928877B2/en
Priority claimed from US16/990,892 external-priority patent/US11361528B2/en
Priority claimed from US16/998,682 external-priority patent/US20220058496A1/en
Application filed by Nationstar Mortgage LLC, d/b/a/ Mr. Cooper filed Critical Nationstar Mortgage LLC, d/b/a/ Mr. Cooper
Publication of WO2022035942A1 publication Critical patent/WO2022035942A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/809Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/87Arrangements for image or video recognition or understanding using pattern recognition or machine learning using selection of the recognition techniques, e.g. of a classifier in a multiple classifier system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • This disclosure generally relates to systems and methods for computer vision and image classification.
  • this disclosure relates to systems and methods for machine learningbased document classification, including identifying, extracting, and classifying stamps or other regions of interest from in document images; and automatic context-based annotation for structured and semi-structured documents.
  • Classifying scanned or captured images of physical paper documents may be difficult for computing systems, due to the large variation in documents, particularly very similar documents such as different pages within a multi-page document, and where metadata of the document is incomplete or absent.
  • Previous attempts at whole document classification utilizing optical character recognition and keyword extraction or natural language processing may be slow and inefficient, requiring extensive processor and memory resources. Additionally, such systems may be inaccurate, such as where similar keywords appear in unrelated documents. For example, such systems may be unable to distinguish between a first middle page of a first multipage document, and a second middle page of a second, similar multi-page document, and may inaccurately assign the first middle page to the second document or vice versa.
  • Printed documents can include text, images, and other markings that present challenges to accurate classification. Identifying and classifying stamps on printed documents is particularly challenging because of the irregularities between stamp types, positions in documents, and inconsistent impressions or markings of the stamp impressed on the document. It would be beneficial for many document classification systems to automatically identify, extract, and classify stamps included in images of printed documents.
  • Identifying and classifying important items in scanned images of documents is particularly difficult for computer-based systems, because the items of interest may be widely distributed across the page, and may appear in different locations on different pages or documents, depending on the source. For example, a date on a document may appear in the upper left on a first document, an upper right on a second document, and a lower right on a third document, and may be formatted differently, utilize different fonts and/or sizes, etc. Furthermore, some documents may have additional annotations or inclusions, such as stamps, embossing, watermarks, handwritten notes, etc., all of which may adversely affect image recognition and classification.
  • FIGs. 1A-1B are flow charts of a method for machine learning-based document classification using multiple classifiers, according to some implementations
  • FIG. 2 is a block diagram of an embodiment of a convolutional neural network, according to some implementations.
  • FIG. 3 is a block diagram of an example classification using a decision tree, according to some implementations.
  • FIG. 4 is a block diagram of an example of the gradient descent method operating on a parabolic function, according to some implementations.
  • FIG. 5 is a block diagram of an example system using supervised learning, according to some implementations.
  • FIG. 6 is a block diagram of a system classifying received documents, according to some implementations;
  • FIG. 7 shows a block diagram of an example system for detecting and classifying stamps or regions of interest in document images
  • FIG. 8 shows an illustrative diagram of a document undergoing a pre-processing and stamp detection process
  • FIG. 9 shows an illustrative diagram of a document undergoing a stamp extraction and classification process
  • FIG. 10 shows an illustrative flow chart of an example method of detecting and classifying stamps in document images
  • FIG. 11 is an illustration of an example document having annotations, according to some implementations.
  • FIG. 12 is a block diagram of a system for automatic context-based annotation, according to some implementations.
  • FIG. 13 is a flow chart of a method for automatic context-based annotation, according to some implementations.
  • FIGs. 14A and 14B are block diagrams depicting embodiments of computing devices useful in connection with the methods and systems described herein.
  • Section A describes embodiments of systems and methods for machine learning-based document classification
  • Section B describes embodiments of systems and methods for stamp detection and classification
  • Section C describes embodiments of systems and methods for automatic context-based annotation
  • Section D describes a computing environment which may be useful for practicing embodiments described herein.
  • Scanning documents may involve converting physical paper documents into digital image documents.
  • a digital document may not have the same properties that paper documents have. For example, the pages in a physical document are discrete. Further, if multiple physical documents are to be read, one document may be put down, such as a textbook, and a next document may be picked up, such as the next textbook. In contrast, a scanned digital document may have continuous pages. Further, multiple documents may be scanned into one file such that there may be no clear identifier, such as the physical nature of putting one document down and picking the next one up, between one digital document from the next. Thus, the content of the scanned images may be critical in differentiating pages from another and determining when one digital document ends and the next digital document begins.
  • Classifying scanned or captured images of physical paper documents may be difficult for computing systems, due to the large variation in documents, particularly very similar documents such as different pages within a multi-page document, and where metadata of the document is incomplete or absent.
  • Previous attempts at whole document classification utilizing optical character recognition and keyword extraction or natural language processing may be slow and inefficient, requiring extensive processor and memory resources. Additionally, such systems may be inaccurate, such as where similar keywords appear in unrelated documents. For example, such systems may be unable to distinguish between a first middle page of a first multipage document, and a second middle page of a second, similar multi-page document, and may inaccurately assign the first middle page to the second document or vice versa.
  • a computing system or user may be scanned. Requiring a computing system or user to distinguish one document from the next by reading the title page of the document and identifying the individual pages of the document may require the user or computing system to read each page in its entirety. For example, in the event a textbook is scanned, the user or computing system may need to distinguish title pages, table of content pages, publishing information pages, content pages, and appendix pages. In another example, in the event a book is scanned, the user or computing system may need to distinguish title pages, publishing information pages, chapter pages, and content pages. Further, in the event contracts are scanned, a user or computing system may need to identify content pages, blank pages, recorded (stamped) pages, and signature pages.
  • documents may be structured, semi -structured, or unstructured. Structured documents may include specified fields to be filled with particular values or codes with fixed or limited lengths, such as tax forms or similar records.
  • Unstructured documents may fall into particular categories, but have few or no specified fields, and may comprise text or other data of any length, such as legal or mortgage documents, deeds, complaints, etc.
  • Semi-structured documents may include a mix of structured and unstructured fields, with some fields having associated definitions or length limitations and other fields having no limits, such as invoices, policy documents, etc. Techniques that may be used to identify content in some documents, such as structured documents in which optical character recognition may be applied in predefined regions with associated definitions, may not work on semi- structured or unstructured documents.
  • implementations of the systems and methods discussed herein provide for digital document identification via a multi-stage or iterative machine-learning classification process utilizing a plurality of classifiers.
  • Documents may be identified and classified at various iterations according to and identifying the digital document based upon agreement between a predetermined number of classifiers.
  • these classifiers may not need to scan entire documents, reducing processor and memory utilization compared to classification systems not implementing the systems and methods discussed herein.
  • the classifications provided by implementations of the systems and methods discussed herein may be more accurate than simple keyword-based analysis.
  • documents may be multi-page documents. Pages of a multi-page document may be related by virtue of being part of the same document, but may have very different characteristics: for example, a first page may be a title or cover page with particular features such as document identifiers, addresses, codes, or other such features, while subsequent pages may be freeform text, images, or other data.
  • the systems and methods discussed herein may be applied on a page by page basis, and/or on a document by document basis, to classify pages as being part of the same multi-page document and/or to classify documents as being of the same type, source, or grouping (sometimes referred to as a “domain”).
  • a document label may be predicted by various classifiers at step 102.
  • a computing device may determine whether a predetermined number of classifiers agrees on a same label, and whether the agreed-upon label is a meaningful label, or merely a label indicating that the classifiers can not classify the document at step 106.
  • the document may be classified with that label at step 150.
  • additional classifiers may be employed in an attempt to classify the document at step 112.
  • the computing device may label the document with that label at step 150.
  • classifiers may attempt to label the document given information about a parent document at step 124.
  • the document may be classified with that label at step 150.
  • image analysis may be performed at step 134.
  • the document In the event the image analysis returns a meaningful label, the document may be labeled with that label at step 150.
  • a new classifier may be employed at step 140.
  • the new classifier In the event the new classifier is able return a meaningful label, the document may be labeled with that label at step 150.
  • the new classifier In the event the new classifier is unable to return a meaningful label, the document may be labeled with the label that may not be meaningful at step 150.
  • step 102 several classifiers may be employed out of a plurality of classifiers in an attempt to label a document received by a computing device.
  • the plurality of classifiers may include a term frequency - inverse document frequency classifier, a gradient boosting classifier, a neural network, a time series analysis, a regular expression parser, and an image comparator.
  • the document received by the computing device may be a scanned document.
  • the scanned document received by the computing device may be an image, or a digital visually perceptible version of the physical document.
  • the digital image may be comprised of pixels, the pixels being the smallest addressable elements in the digital image.
  • the classifiers employed to label the document may each extract and utilize various features of the document.
  • the image may be preprocessed before features are learned.
  • the image may have noise removed, be binarized (i.e., pixels may be represented as a ‘ 1’ for having a black color and a ‘0’ for having a white color), be normalized, etc.
  • Features may be learned from the document based on various analyses of the document.
  • features may be extracted from the document by extracting text from the document.
  • features may be extracted from the document based on identifying coordinates of text within the document.
  • features may be extracted from the document by identifying vertical or horizontal edges in a document. For example, features such as shape context may be extracted from the document.
  • features may be learned from the document based on various analyses of an array based on the document.
  • an image may be mapped to an array.
  • the coordinates of the image may be stored in an array.
  • features may be extracted from an array using filters.
  • a Gabor filter may be used to assess the frequency content of an image.
  • sub-image detection and classification may be utilized as a page or document classifier.
  • image detection may be applied to portions of a document to detect embossing or stamps upon the document, which may indicate specific document types.
  • Image detection in such implementations may comprise applying edge detection algorithms to identify structural features or shapes and compared to structural features or shapes from templates of embossing or stamps.
  • transformations may be applied to the template image and/or extracted or detected image or structural features as part of matching, including scaling, translation, or rotation. Matching may be performed via a neural network trained on the template images, in some implementations, or using other correlation algorithms such as a sum of absolute differences (SAD) measurement or a scale-invariant feature transformation.
  • SAD sum of absolute differences
  • Each template image may be associated with a corresponding document or page type or classification, and upon identifying a match between an extracted or detected image or sub-image within a page or document and a template image, the image classifier may classify the page as having the page type or classification corresponding to the template image.
  • the classifiers employed during a first iteration may be a first subset of classifiers, the first subset of classifiers including one or more of a neural network, an elastic search model, and an XGBoost model.
  • Employing a first subset of classifiers may be called performing a first mashup at step 110.
  • other classifiers may be included in the first subset of classifiers.
  • classifiers Before the classifiers are employed on the image data, classifiers need to be trained such that the classifiers are able to effectively classify data.
  • Supervised learning is one way in which classifiers may be trained to better classify data.
  • FIG. 5 depicted is a block diagram of an example system using supervised learning 600.
  • Training system 504 may be trained on known input/output pairs such that training system 504 can learn how to classify an output given a certain input. Once training system 504 has learned how to classify known input/output pairs, the training system 504 can operate on unknown inputs to predict what an output should be and the class of that output.
  • Inputs 502 may be provided to training system 504. As shown, training system 504 changes over time. The training system 504 may adaptively update every iteration. In other words, each time a new input/output pair is provided to training system 504, training system 504 may perform an internal correction.
  • the predicted output value 506 of the training system 504 may be compared via comparator 508 to the actual output 510, the actual output 510 being the output that was part of the input/output pair fed into the system.
  • the comparator 508 may determine a difference between the actual output value 510 and the predicted output 506.
  • the comparator 508 may return an error signal 512 that indicates the error between the predicted output 506 and the actual output 510. Based on the error signal 512, the training system 504 may correct itself.
  • the comparator 508 will return an error signal 512 that indicates a numerical amount that weights in the neural network may change by to closer approximate the actual output 510.
  • the weights in the neural network indicate the importance of various connections of neurons in the neural network.
  • the concept of propagating the error through the training system 504 and modifying the training system may be called the back propagation method.
  • a neural network may be considered a series of algorithms that seek to identify relationships for a given set of inputs.
  • Various types of neural networks exist.
  • modular neural networks include a network of neural networks, each network may function independently to accomplish a sub-task that is part of tasks in a larger set. Breaking down tasks in the manner decreases the complexity of analyzing a large set of data.
  • gated neural networks are neural networks that incorporate memory such that the network is able to remember, and classify more accurately, long datasets. These networks, for example, may be employed in speech or language classifications.
  • this disclosure employs convolutional neural networks because convolutional networks are inherently strong in performing image-based classifications. Convolutional neural networks are suited for imagebased classification because the networks take advantage of the local spatial coherence of adjacent pixels in images.
  • FIG. 2 depicted is a block diagram of a convolutional neural network 200, according to some embodiments.
  • Convolutional layers may detect features in images via filters.
  • the filters may be designed to detect the presence of certain features in an image.
  • high- pass filters detect the presence of high frequency signals.
  • the output of the high-pass filter are the parts of the signal that have high frequency.
  • image filters may be designed to track certain features in an image.
  • the output of the specifically designed feature- filters may be the parts of the image that have specific features. In some embodiments, the more filters that may applied to the image, the more features that may be tracked.
  • Two-dimensional filters in a two-dimensional convolutional layer may search for recurrent spatial patterns that best capture relationships between adjacent pixels in a two- dimensional image.
  • An image 201 or an array mapping of an image 201, may be input into the convolutional layer 202.
  • the convolutional layer 202 may detect filter-specific features in an image.
  • convolutional neural networks use convolution to highlight features in a dataset.
  • a filter may be applied to an image array 201 to generate a feature map.
  • the filter slides over the array 201 and the element by element dot product of the filter and the array 201 is stored as a feature map.
  • the feature map created from the convolution of the array and the filter summarizes the presence of filterspecific features in the image. Increasing the number of filters applied to the image may increases the number of features that can be tracked. The resulting feature maps may subsequently be passed through an activation function to account for nonlinear patterns in the features.
  • activation functions may be employed to detect nonlinear patterns.
  • the nonlinear sigmoid function or hyperbolic tangent function may be applied as activation functions.
  • the sigmoid function ranges from 0 to 1, while the hyperbolic tangent function ranges from -1 to 1.
  • the rectifier linear function behaves linearly for positive values, making this function easy to optimize and subsequently allowing the neural network to achieve high prediction accuracy.
  • the rectifier linear activation function also outputs zero for any negative input, meaning it is not a true linear function.
  • the output of a convolution layer 203 in a convolutional neural network is a feature map, where the values in the feature map may have been passed through a rectifier linear activation function.
  • the number of convolutional layers may be increased. Increasing the number of convolutional layers increases the complexity of the features that may be tracked.
  • the filters used in the subsequent convolutional layers may be the same as the filters employed in the first convolutional layer. Alternatively, the filters used in the subsequent convolutional layers may be different from the filters employed in the first convolutional layer.
  • the extracted feature map 203 that has been acted on by the activation function may subsequently be input into a pooling layer, as indicated by 204.
  • the pooling layer down-samples the data. Down-sampling data may allow the neural network to retain relevant information. While having an abundance of data may be advantageous because it allows the network to fine tune the accuracy of its weights, large amounts of data may cause the neural network to spend significant time processing. Down-sampling data may be important in neural networks to reduce the computations necessary in the network.
  • a pooling window may be applied to the feature map 203.
  • the pooling layer outputs the maximum value of the data in the window, down-sampling the data in the window. Max pooling highlights the most prominent feature in the pooling window.
  • the pooling layer may output the average value of the data in the window.
  • a convolutional layer may succeed the pooling layer to re-process the down-sampled data and highlight features in a new feature map.
  • the down-sampled pooling data may be further flattened before being input into the fully connected layers 206 of the convolutional neural network.
  • Flattening the data means arranging the data into a one-dimensional vector.
  • the data is flattened for purposes of matrix multiplication that occurs in the fully connected layers.
  • the fully connected layer 206 may only have one set of neurons.
  • the fully connected layer 206 may have a set of neurons 208 in a first layer, and a set of neurons 210 in subsequent hidden layers.
  • the neurons 208 in the first layer may each receive flattened one-dimensional input vectors 205.
  • the number of hidden layers in the fully connected layer may be pruned. In other words, the number of hidden layers in the neural network may adaptively change as the neural network learns how to classify the outputs 210.
  • the neurons in each of the layers 208 and 210 are connected to each other.
  • the neurons are connected by weights. As discussed herein, during training, the weights are adjusted to strengthen the effect of some neurons and weaken the effect of other neurons. The adjustment of each neuron’s strength allows the neural network to better classify outputs.
  • the number of neurons in the neural network may be pruned. In other words, the number of neurons that are active in the neural network adaptively changes as the neural network leans how to classify the output.
  • the error between the predicted values and known values may be so small that the error may be deemed acceptable and the neural network does not need to continue training.
  • the value of the weights that yielded such small error rates may be stored and subsequently used in testing.
  • the neural network must satisfy the small error rate for several iterations to ensure that the neural network did not learn how to predict one output very well or accidentally predict one output very well. Requiring the network to maintain a small error over several iterations increases the likelihood that the network is properly classifying a diverse range of inputs.
  • 212 represents the output of the neural network.
  • the output of the fully connected layer is input into a second fully connected later.
  • Additional fully connected layers may be implemented to improve the accuracy of the neural network.
  • the number of additional fully connected layers may be limited by the processing power of the computer running the neural network.
  • the addition of fully connected layers may be limited by insignificant increases in the accuracy compared to increases in the computation time to process the additional fully connected layers.
  • the output of the fully connected layer 210 may be a vector of real numbers.
  • the real numbers may be output and classified via any classifier.
  • the real numbers may be input into a softmax classifier layer 214.
  • a softmax classifier may be employed because of the classifier’s ability to classify various classes. Other classifiers, for example the sigmoid function, make binary determinations about the classification of one class (i.e., the output may be classified using label A or the output may not be classified using label A).
  • a softmax classifier uses a softmax function, or a normalized exponential function, to transform an input of real numbers into a normalized probability distribution over predicted output classes. For example, the softmax classifier may indicate the probability of the output being in class A, B, C, etc.
  • a random forest may be used to classify the document given the vector of real numbers output by the fully connected layer 210.
  • a random forest may be considered the result of several decision trees making decisions about a classification. If a majority of the trees in the forest make the same decision about a class, then that class will be the output of the random forest.
  • a decision tree makes a determination about a class by taking an input, and making a series of small decisions about whether the input is in that class.
  • FIG. 3 depicted is a block diagram of an example of a classification 300 by a decision tree 306.
  • Decision tree 306 shows the paths that were used to eventually come to the decision that point 304 is in class B.
  • the root node 308 represents an entire sample set and is further divided into subsets.
  • the root node 308 may represent an independent variable.
  • Root node 308 may represent the independent variable XI .
  • Splits 310 are made based on the response to the binary question in the root node 308. For example, the root node 308 may evaluate whether data point 304 includes an XI value that is less than 10. According to classification 300, data point 304 includes an XI value less than 10, thus, in response to the decision based on the root node 308, a split is formed and a new decision node 312 may be used to further make determinations on data point 304.
  • Decision nodes are created when a node is split into a further sub-node.
  • the root node 308 is split into the decision node 312.
  • Various algorithms may be used to determine how a decision node can further tune the classification using splits such that the ultimate classification of data point 304 may be determined.
  • the splitting criterion may be tuned.
  • the chi-squared test may be one means of determining whether the decision node is effectively classifying the data point. Chi-squared determines how likely an observed distribution is due to chance.
  • chi-squared may be used to determine the effectiveness of the decision node’s split of the data.
  • a Gini index test may be used to determine how well the decision node split data. The Gini index may be used to determine the unevenness in the split (i.e., whether or not one outcome of the decision tree is inherently more likely than the other).
  • the decision node 312 may be used to make a further classification regarding data point 304. For example, decision node 312 evaluates whether data point 304 has an X2 value that is less than 15. In the current example, data point 304 has an X2 value that is less than 15. Thus, the decision tree will come to conclusion 314 that data point 304 should be in class B.
  • an elastic search model may be used to compute the predicted label in block 102.
  • An elastic search model is a regression model that considers both ridge regression penalties and lasso regression penalties.
  • the equation for an elastic model may be generally shown in Equation 1 below. Equation 1
  • y may be a variable that depends on x.
  • P is a coefficient of the weight of each feature of independent variable x.
  • P may be summed for p features in x.
  • Regression may be considered an analysis tool that models the strength between dependent variables and independent variables.
  • non-linear regression non-linear approximations may be used to model the relationship between the dependent variable and independent variables.
  • Linear regression involves an analysis of independent variables to predict the outcome of a dependent variable.
  • the dependent variable may be linearly related to the independent variable. Modifying Equation 1 above to employ linear regression may be shown by Equation 2 below.
  • Linear regression predicts the equation of a line that most closely approximates data points.
  • the equation of the line that most closely approximates the data points may be minimized by the least squares method.
  • the least squares method may be described as the value x used in determining the equation of the line that minimizes the error between the line and the data points.
  • argmin describes the argument that minimizes the relationship between x and the data points y.
  • 2 may be described as the Ridge regression penalty.
  • the penalty in ridge regression is a means of injecting bias.
  • a bias may be defined as the inability of a model to capture the true relationship of data.
  • Bias may be injected into the regression such that the regression model may be less likely to over fit the data.
  • the bias generalizes the regression model more, improving the model’s long term accuracy. Injecting a small bias may mean that the dependent variable may not be very sensitive to changes in the independent variable. Injecting a large bias may mean that the dependent variable may be sensitive to changes in the independent variable.
  • the ridge regression penalty has the effect of grouping collinear features.
  • the lambda in the penalty term may be determined via cross validation.
  • Cross validation is a means of evaluating a model’s performance after the model has been trained to accomplish a certain task.
  • Cross validation may be evaluated by subjecting a trained model to a dataset that the model was not trained on.
  • a dataset may be partitioned in several ways.
  • splitting the data into training data and testing data randomly is one method of partitioning a dataset.
  • this method of partitioning might not be advantageous because the model may benefit by training on more data. In other words, data is sacrificed for testing the model.
  • this method of partitioning works well.
  • k-fold cross validation may be employed to partition data. This method of partitioning data allows every data point to be used for training and testing. In a first step, the data may be randomly split into k folds.
  • bias i.e., the inability of a model to capture a relationship
  • a larger likelihood of variance i.e., overfitting the model
  • bias i.e., indicating that not enough data may have been used for training
  • less variance i.e., indicating that not enough data may have been used for training
  • data may be trained via k-1 folds, where the kth fold may be used for validation.
  • m a y be described as the Lasso regression penalty.
  • the elastic model may be employed to determine a linear or nonlinear relationship between input and output data. For example, given a two-dimensional image, a corresponding array may be generated. After training the elastic model, the array may be used as an input for the model, wherein the corresponding outputs may be predicted based on the linear or nonlinear relationship of the inputs and outputs. Given a set of numeric outputs, a classifier may be used to classify the numeric data into a useful label. In some embodiments, as discussed above, a softmax classifier may be used for output classification. In other embodiments, as discussed above, a random forest may be used for output classification.
  • a term frequency-inverse document frequency (TF-IDF) vector may be utilized to classify documents, using identified or extracted textual data from a candidate image (e.g. from optical character recognition or other identifiers).
  • the TF-IDF vector can include a series of weight values that each correspond to the frequency or relevance of a word or 7term in the textual data of the document under analysis.
  • each individual word may be identified in the textual data, and the number of times that word appears in the textual data of the document under analysis may be determined. The number of times a word or term appears in the textual data of the document under analysis can be referred to as its term frequency.
  • a counter associated with each word may be incremented for each appearance of the word in the document (if a word is not yet associated with a counter, a new counter may be initialized for that word, and assigned an initialization value such as one).
  • a weight value may be assigned to each of the words or terms in the textual data. Each of the weight values can be assigned to a coordinate in the TF-IDF vector data structure.
  • the resulting vector may be compared to vectors generated from template or sample documents, e.g. via a trained neural network or other classifier.
  • an Extreme Gradient Boosting model (“XGBoost”) may be used to compute the predicted label in block 102.
  • XGBoost Extreme Gradient Boosting model
  • gradient boosting operates by improving the accuracy of a predicted value of y given an input x applied to the function f(x). For example, if y should be 0.9, but f(x) returns 0.7, a supplemental value can be added to f(x) such that the accuracy of f(x) may be improved.
  • the supplemental value may be determined through an analysis of gradient descent.
  • a gradient descent analysis refers to minimizing the gradient such that during a next iteration, a function will produce an output closer to the optimal value.
  • FIG. 4 depicted is an example 400 of the gradient descent method operating on a parabolic function 401.
  • the optimal value may be indicated by 402.
  • the equation f(Xl) may return a data point on parabola 401 indicated by 404.
  • data point 404 may move in the direction of 405.
  • the slope between data point 402 and 404 is negative, thus to move data point 404 in the direction of data point 402, a supplemental value may be added to data point 404 during a next iteration. Adding a supplemental value when the slope is negative may have the effect of moving data point 404 in the direction of 405.
  • equation f(Xl) with the addition of the supplemental value may return a data point on parabola 401 indicated by 406.
  • data point 406 may move in the direction of 407.
  • the slope between data point 402 and 406 is positive, thus, to move data point 406 in the direction of data point 402, a supplemental value may be subtracted to data point 406 during the next iteration.
  • Subtracting a supplemental value when the slope is positive may have the effect of moving data point 406 in the direction of 407. Therefore, data points 404 and 406 must move in the direction opposite of the slope of the parabola 401 to arrive at the desired data point 402.
  • gradient descent may be performed such that data points may arrive closed to their optimal minimal value. Equation 3 below shows determining the gradient descent of an objective function such that f(x n ) better approximates y n .
  • y n may be the desired value
  • f x n ⁇ ) may be the function acting on input x n
  • O(y n , f(x n )) is the objective function that is used to optimize h(x n ).
  • the objective function may be the square loss function. In other embodiments, the objective function may be the absolute loss function.
  • gradient boosting may mean boosting the accuracy of a function by adding a gradient.
  • XGBoost is similar to gradient boosting but the second order gradient is employed instead of the first order gradient, each iteration is dependent on the last iteration, and regularization is employed.
  • Regularization is a means of preventing a model from being over fit.
  • the elastic net model described above employs regularization to prevent a regression model from being over fit.
  • the same parameters, the Ridge regression penalty and Lasso regression penalty may be incorporated into gradient boosting to improve model generalization. Equation 4 below shows the XGBoost equation, including a Taylor series approximation of the second order derivative and the addition of the Ridge and Lasso regression penalties.
  • XGBoost may also be used for classification purposes. For example, given a two- dimensional image, a corresponding array may be generated. In some embodiments, the coordinates of the two-dimensional image may be stored in the two-dimensional array. Features may be extracted from the array. For example, convolution may be used to extract filter-specific features, as described above in the convolutional neural network. During training, a model F ; (x ) may be determined for each class i that may be used in determining whether a given input may be classified by that class. In some embodiments, the probability of an input being labeled a certain class may be determined according to Equation 6 below. Equation 6
  • each class may be represented by i and the total number of class may be represented by n.
  • the probability of the input being classified by i may be determined by Equation 6 above.
  • a true probability distribution may be determined for a given input based on the known inputs and outputs (i.e., each class would return ‘0’ while the class that corresponds to the input would return ‘ 1’).
  • the function F ; (x ) may be adapted by the XGBoost model such that function F f (x ) better approximates the relationship between x and y for a given class i.
  • the objective function implemented in XGBoost may be minimized may be the Kullback-Leibler divergence function.
  • the decision in step 104 depends on a computing device determining whether a predetermined number of classifiers predict the same label.
  • the first subset of classifiers out of a plurality of classifiers in a first mashup 110, may compute a predicted label according to the classifiers’ various methodologies.
  • the predetermined number of classifiers may be a majority of classifiers. For example, given the neural network classifier, the elastic model, and the XGBoost model, the labels of two classifiers may be used in determining whether or not the classifiers agree to a label.
  • a first number of the selected subset of classifiers may classify the document with a first classification.
  • a second number of the selected subset of classifiers may classify the document with a second classification.
  • the classifiers may be determined to agree on a label if the classifiers independently select that label from a plurality of labels. In response to the predetermined number of classifiers predicting the same label, the process proceeds to the decision in step 106. In response to the predetermined number of classifiers not predicting the same label, the process proceeds to step 112.
  • the decision in step 106 depends on a computing device determining whether the label is meaningful. Meaningful labels may be used to identify the pages in a single document and may include, but are not limited to: title page, signature page, first pages, middle pages, end pages, recorded pages, etc. Further, meaningful labels may be used to identify documents from one another and may include, but are not limited to: document 1, document 2, document 3, page 1, page 2, page 3, etc., where a user could use the document labels to map the digitally scanned document to a physical document. In other embodiments, the classifier may return specific labels such as: title of document 1, title of document 2, etc., where the “title” portion of the label would correspond with the title of a physical document.
  • a classifier may be unable to classify a document, returning a label that is not meaningful.
  • a label that may be returned that is not meaningful is the label “Unknown.”
  • the process proceeds to step 112.
  • a classifier may return a confidence score.
  • the confidence score may be used to indicate the classifier’s confidence in the classifier’s label classification.
  • the classifier’s confidence score may be determined based on the classification.
  • classifiers may employ a softmax classifier to transform a numerical output produced by a model into a classification and subsequent label.
  • the softmax classifier may produce a classification label based on a probability distribution utilizing the predicted numerical values, over several output classes.
  • a label may be chosen based on the probability distributions such that the label selected may be the label associated with the highest probability in the probability distribution.
  • the confidence score may be the probability, from the probability distribution, associated with the selected label.
  • the confidence score associated with the selected label may be compared to a threshold. In response to the confidence score exceeding the threshold, the label may be considered a meaningful label and the process may proceed to step 150. In response to the confidence score not exceeding the threshold, the label selected by the classifier may not be considered meaningful. Instead, the label selected by the classifier may be replaced by, for example, the label “Unknown.” In response to the label not being a meaningful label, the process may proceed to step 112.
  • Each classifier in a plurality of classifiers may have their own threshold value.
  • a user may define a unique threshold value for each classifier.
  • the threshold value for various classifiers may be the same.
  • a user may tune the threshold values for each classifier to maximize the accuracy of the classifications.
  • the threshold value may be tuned as a hyper-parameter.
  • the neural network threshold may be determined to be x (e.g. 50, 55, 60, 65, 75, 90, or any other such value), the elastic net threshold may be determined to b y (e.g.
  • the XGBoost threshold may be determined to be z (e.g. 50, 55, 60, 65, 75, 90, or any other such value).
  • the thresholds x, y, and z may be identical or may be different (including implementations in which two thresholds are identical and one threshold is different).
  • step 150 responsive to a meaningful and agreed-upon label by a predetermined number of classifiers, the computing device may modify the document to include the meaningful and agreed-upon label.
  • step 112 several classifiers may be employed out of a plurality of classifiers in an attempt to label the document that was unable to be labeled during the first mashup 110.
  • a second mashup may be performed.
  • the second mashup may be considered a second iteration.
  • Several classifiers may be employed out of the plurality of classifiers.
  • the number of classifiers employed to label the document in the second mashup 120 may be different from the number of classifiers employed to label the document in the first mashup 110. Further, the classifiers employed in the second mashup 120 may be different from the classifiers employed in the first mashup 110.
  • the classifiers employed in the second mashup 120 may be a second subset of classifiers, the second subset of classifiers including a neural network, as discussed above, an elastic model, as discussed above, an XGBoost model, as discussed above, an Automated machine learning model, and a Regular Expression (“RegEx”) classifier, or any combination of these or other third party models.
  • a neural network as discussed above
  • an elastic model as discussed above
  • an XGBoost model as discussed above
  • Automated machine learning model as discussed above
  • Regular Expression (“RegEx”) classifier or any combination of these or other third party models.
  • the second subset of classifiers may be employed to classify the document.
  • a classifier implemented in the second subset was used in the first subset, the same features may be used for classification.
  • a new set of features may be derived for each classifier in the second subset.
  • features may be learned from various analyses of the document.
  • features may be learned from various analyses of an array based on the document. Different classifiers may utilize features that may be the different or the same as other classifiers.
  • Automated machine learning is a technique that selectively determines how to create and optimize a neural network.
  • automated machine learning may be provided by a third party system, and may be accessed by an Application Program Interface (“API”) or similar interface.
  • An API may be considered an interface that allows a user, on the user’s local machine, or a service account on a system to communicate with a server or other computing device on a remote machine. The communication with the remote machine may allow the API to receive information and direct the information to the user or other system from the remote machine.
  • Automated machine learning may employ a first neural network to design a second neural network, the second neural network based on certain input parameters. For example, a user may provide the API with images and labels corresponding to the images.
  • a user may provide the API with images of a first set of documents and label the images associated with those documents “document 1.”
  • a model is created by the automated machine learning system that optimizes text/image classification based on the input data.
  • a model may be trained to classify text/images based on the classes that the model was trained to be recognized.
  • classes such as title page, signature page, first pages, middle pages, end pages, document 1, document 2, document 3, etc. may be learned from the text/images.
  • a neural network’s design of a neural network may be called a neural architecture search.
  • the first neural network designing the second neural network may search for an architecture that achieves the particular image classification goal that was input as one or more sets of parameters into the first neural network.
  • the first neural network which may be referred to as a controller, may design a network that optimally labels documents using the provided labels given the types of sample documents and labels input into the first neural network (e.g. by adjusting hyperparameters of the network, selecting different features, etc.).
  • the first neural network designing the second neural network (which may be referred to as a child network, in some implementations) may consider architectures that are more complicated than what a human designing a neural network might consider, but the complex architecture of the second neural network may be more optimized to perform the requested image classification.
  • a RegEx classifier may be used to classify the image.
  • a RegEx classifier is a classifier that searches for, and matches, strings.
  • RegEx classifiers apply a search pattern to alphanumeric characters, and may include specific characters or delimiters (e.g. quotes, commas, periods, hyphens, etc.) to denote various fields or breaks, wildcards or other dynamic patterns, similarity matching, etc.
  • RegEx classification may also be applied to image processing and classification.
  • an image may be converted to an array based on a mapping of the pixels to their corresponding coordinates.
  • the array may be flattened into a one-dimensional vector.
  • the image may be binarized.
  • a black and white image may be represented by binary values where a ‘ 1’ represents a black pixel and a ‘0’ represents a white pixel.
  • a regular expression (“RegEx”) may be searched for in the image array or flattened vector.
  • a RegEx may be searched for in a subset of the image array.
  • RegExes may be searched for in specific rows and analyzed based on the assumption that an analysis of pixels in the same row may be more accurate as pixels in the same row are likely more correlated than pixels across the entire image.
  • a k-length array of k RegExes may be determined, where the i-th element in the array is the number of times the i-th RegEx matched the image.
  • a RegEx was defined as a pattern that represented a specific feature
  • features may be extracted from an image based on the features matching RegExes.
  • a classification of the image may be based on the features extracted and/or the frequency of the extracted feature of the image.
  • a RegEx searching for the string “document 1” may be searched for in an image.
  • the image may be classified as “document 1”.
  • the image may be classified as document 1 based on rule classification.
  • a rule may exist such that the rule dictates, that, upon a one or more specific matching RegExes, the document must be identified with a certain label.
  • a rule may dictate: “in response to the strings ‘signature’ and ‘here’, the page must be labeled as a signature page.”
  • RegExes may be created that search for “signature” and “here” and if those strings are found in the document, the document may be labeled as a signature page.
  • RegExes may be determined by the user. In alternate embodiments, regexes may be generated. For example, evolutionary algorithms may be used to determine relevant RegExes in a document.
  • Evolutionary algorithms operate by finding successful solutions based on other solutions that may be less successful.
  • a population or a solution set, may be generated.
  • a RegEx may be considered an individual in the popular. Different attributes of the solution set may be tuned. For example, the length of the RegEx and the characters that the RegEx searches for may be tuned.
  • the population may be randomly generated. In other embodiments, the population may be randomly generated with some constraints. For example, in response to binarizing the data, the characters that may be used in solution set may be limited to ‘ 1’ and ‘O’.
  • a fitness function may be created to evaluate how individuals in the population are performing.
  • the fitness function may evaluate the generated RegExes to determine if other RegExes may be better at classifying the image.
  • fitness functions may be designed to suit particular problems.
  • the fitness function may be used in conjunction with stopping criteria. For example, in response to a predetermined number of regexes performing well, the training of the evolutionary algorithm, in other words, the creation and tuning of new RegExes, may terminate.
  • the number of times that a particular RegEx has been matched to text in the document may be counted and summed. RegExes that have been identified in a document may be kept, and RegExes without any matches, or where the number of matches associated with that RegEx does not meet a certain threshold, may be discarded. Subsequently, attributes from the RegExes that have been matched may be mixed with other matched RegExes.
  • the attributes that are matched may be tuned. For example, given a RegEx that has been successfully matched, an attribute from that RegEx, for example the first two characters in the RegEx string, may be mixed with attributes of a second successfully matched RegEx. In some embodiments, mixing the attributes of successfully matched RegExes may mean concatenating the attributes from other successfully matched RegExes to form a new RegEx. In other embodiments, mixing the attributes of successfully matched RegExes may mean randomly selecting one or more portions of the attribute and creating a new RegEx of randomly selected portions of successfully matched RegExes. For example, one character from the first two characters of ten successfully matched RegExes may be randomly selected and randomly inserted into a new RegEx of length ten.
  • the decision in step 114 depends on a computing device determining whether a predetermined number of classifiers predict the same label.
  • the second subset of classifiers out of a plurality of classifiers in a second mashup 120, may compute a predicted label according to the classifiers’ various methodologies.
  • the predetermined number of classifiers may be a minority of classifiers, the number of minority classifiers being at least greater than one classifier.
  • the labels of two classifiers may be used in determining whether or not the classifiers agree on a label.
  • a first number of the selected subset of classifiers may classify the document with a first classification.
  • a second number of the selected subset of classifiers may classify the document with a second classification.
  • the classifiers may be determined to agree on a label if the classifiers independently select that label from a plurality of labels. In response to the predetermined number of classifiers predicting the same label, the process proceeds to the decision in step 116. In response to the predetermined number of classifiers not predicting the same label, the process proceeds to step 124.
  • step 116 depends on a computing device determining whether the label is meaningful.
  • meaningful labels may include: title page, signature page, first pages, middle pages, end pages, recorded pages, document 1, document 2, document 3, page 1, page 2, page 3, etc.
  • the process proceeds to step 124.
  • a classifier may return a confidence score.
  • the confidence scores may be compared to a threshold.
  • the label may be considered a meaningful label and the process may proceed to step 150.
  • the label selected by the classifier may not be considered meaningful. Instead, the label selected by the classifier may be replaced by, for example, the label “Unknown.”
  • the process may proceed to step 124.
  • each classifier in a plurality of classifiers may have their own threshold value.
  • classifiers employed in both the first mashup 110 and the second mashup 120 may have the same threshold value.
  • classifiers employed in both the first mashup 110 and the second mashup 120 may have different threshold values.
  • the threshold value may be tuned as a hyper-parameter.
  • the threshold values for the second mashup may include a neural network threshold value set to x, an elastic net threshold value set to , an XGboost threshold value set to z, an automatic machine learning threshold value set to a, and a RegEx threshold value set to b (each of which may include any of the values discussed above for thresholds x, , and z, and may be different from or identical to any other thresholds).
  • a threshold value b for a RegEx classifier may be set to a higher value than other classifier thresholds, such as 95 or 100.
  • more RegExes may be added such that the document is more thoroughly searched for matching expressions.
  • fuzzy logic may be implemented separately or as part of the RegEx to identify partial matches of the string to filters or expressions.
  • the fuzzy logic output may comprise estimates or partial matches and corresponding values, such as “60% true” or “40% false” for a given match of a string to a filter or expression.
  • Such implementations may be particularly helpful in instances where the RegEx fails to find an exact match, e.g. either 100% matching the filter (e.g. true) or 0% matching (e.g. false).
  • the fuzzy logic may be implemented serially or in parallel with the RegEx at step 140 in some implementations, and in some implementations, both exact and fuzzy matching may be referred to as RegEx matching.
  • step 150 responsive to a meaningful and agreed-upon label by a predetermined number of classifiers, the computing device may modify the document to include the meaningful and agreed-upon label.
  • step 124 several classifiers may be employed out of a plurality of classifiers in an attempt to label the document that was unable to be labeled during the second mashup 120 or the first mashup 110.
  • a third mashup 130 may be performed.
  • the third mashup 130 may be considered a third iteration.
  • Several classifiers may be employed out of the plurality of classifiers.
  • the number of classifiers employed to label the document in the third mashup 130 may be different from the number of classifiers employed to label the document in second mashup 120 and the first mashup 110.
  • the classifiers employed in the third mashup 130 may be different from the classifiers employed in the second mashup 120 and the first mashup 110.
  • the classifiers employed in the third mashup 130 may be a third subset of classifiers, the third subset of classifiers including a neural network, as discussed above, an elastic search model, as discussed above, an XGBoost model, as discussed above, an automated machine learning model, as discussed above, and a Regular Expression (RegEx) classifier, as discussed above.
  • a neural network as discussed above
  • an elastic search model as discussed above
  • an XGBoost model as discussed above
  • an automated machine learning model as discussed above
  • Regular Expression (RegEx) classifier as discussed above.
  • the third subset of classifiers may be employed to classify the document.
  • features used in any of the preceding subsets may be used again during the third mashup 130.
  • a new set of features may be derived for each classifier in the third subset.
  • features may be learned from various analysis of the document.
  • features may be learned from various analysis of an array based on the document.
  • Different classifiers may utilize features that may be the different or the same as other classifiers.
  • features from a parent document may be learned and input into the various classifiers in the third mashup 130.
  • the parent document may be a document that has been classified and labeled.
  • document 1 may successfully be classified and labeled as document 1.
  • Document 2, immediately following document 1 may have not been successfully classified during either the first or second mashups 110 and 120 respectively.
  • features from document 1 may be learned from document 1 to help improve the classification of document 2 in the third mashup 130.
  • page t of a book may provide a classifier context as to what is on page t+1 of a book.
  • features from a parent document may be considered historic inputs.
  • Historic inputs may improve the classification likelihood of the document being classified in the third mashup 130.
  • a time series analysis may be performed by incorporating the features of the parent document. Incorporating historic data may provide improve the ability of the third mashup 130 to classify and label the document because it is assumed that, for example, pages within the same document are serially autocorrelated. In other words, there may be correlations between the same features over time.
  • the selected classifiers for the third mashup 130 may be the same as the selected classifiers in the preceding mashups. In alternate embodiments, the same selected classifiers for the third mashup 130 may be retrained because of the incorporation of historic data.
  • RegEx classifiers operate based on pattern matching.
  • historic data such as successful RegExes, about a previously classified image and that image’s associated classification, may help the RegEx classifier in mashup 130 to classify the document.
  • the RegEx classifier may be trained to search for specific RegExes based on the parent document.
  • knowing the RegExes used to successfully classify a parent document may help the RegEx classifier classify the current document because the strings that the RegEx searches for in the current document may be similar to the parent document based on the assumption that time series data may be serially correlated.
  • the decision in step 126 depends on a computing device determining whether a predetermined number of classifiers predict the same label.
  • the third subset of classifiers out of a plurality of classifiers in a third mashup 130, may compute a predicted label according to the classifiers’ various methodologies.
  • the predetermined number of classifiers may be a majority of classifiers. For example, given a mashup including five classifiers, such as the neural network classifier, the elastic search model, the XGBoost model, the automated machine learning model, and the RegEx classifier, the labels of a majority (e.g.
  • three, four, or five classifiers may be used in determining whether or not the classifiers agree on a label.
  • a first number of the selected subset of classifiers may classify the document with a first classification.
  • a second number of the selected subset of classifiers may classify the document with a second classification.
  • the total number of majority votings may vary from one set of classifiers to another. In some cases, the majority can be decided by just one vote, and by two votes in another case.
  • the classifiers may be determined to agree on a label if the classifiers independently select that label from a plurality of labels. In response to the predetermined number of classifiers predicting the same label, the process proceeds to the decision in step 128. In response to the predetermined number of classifiers not predicting the same label, the process proceeds to step 134.
  • step 128 depends on a computing device determining whether the label is meaningful.
  • meaningful labels may include: title page, signature page, first pages, middle pages, end pages, recorded page, document 1, document 2, document 3, etc.
  • the process proceeds to step 134.
  • a classifier may return a confidence score.
  • the confidence scores may be compared to a threshold.
  • the label may be considered a meaningful label and the process may proceed to step 150.
  • the label selected by the classifier may not be considered meaningful. Instead, the label selected by the classifier may be replaced by, for example, the label “Unknown”.
  • the process may proceed to step 134.
  • each classifier in a plurality of classifiers may have their own threshold value. In some embodiments, classifiers employed in the preceding mashups may have the same threshold value.
  • the thresholds employed in the third mashup 130 may be the same as the thresholds employed in the second mashup 120.
  • classifiers employed in the preceding mashups may have different threshold values.
  • the threshold value may be tuned as a hyper-parameter.
  • step 150 responsive to a meaningful and agreed-upon label by a predetermined number of classifiers, the computing device may modify the document to include the meaningful and agreed-upon label.
  • step 134 several classifiers may be employed out of a plurality of classifiers in an attempt to label the document that was unable to be labeled during the previous mashups.
  • one or more image analyses or classifications may be performed.
  • An image analysis or classification may be considered the recognition of characteristics in an image and classification of the characteristics and/or the image according to one or more of a plurality of predetermined categories.
  • Image segmentation may be used to locate objects and boundaries, for example, lines and curves, in images. Pixels may be labeled by one or more characteristics, such as color, intensity or texture. Pixels with similar labels in a group of pixels may be considered to share the same visual features. For example, if several pixels in close proximity share the same intensity, then it may be assumed that those pixels may be closely related. Thus, the pixels may be a part of the same character, or, for example, the same curve comprising a portion of a single character. Clusters of similar pixels may be considered image objects. In another embodiment, edge detection (e.g.
  • Sobel filters or similar types of edge detection kernels may be employed to extract characters and/or structural features of the document or a portion of the document, such as straight lines or boxes indicating fields within the document, check boxes, stamps or embossing, watermarks, or other features.
  • Structural features of the document or a portion of the document may be compared with one or more templates of structural features to identify similar or matching templates, and the document may be scored as corresponding to a document from which the template was generated.
  • a template may be generated from a blank or filled out form, identifying structural features of the form such as boxes around each question or field on the form, check boxes, signature lines, etc.
  • Structural features may be similarly extracted from a candidate document and compared to the structural features of the template (including, in some implementations, applying uniform scaling, translation, or rotation to the structural features to account for inaccurate captures). Upon identifying a match or correspondence between the candidate document and the template, the candidate document may be classified as the corresponding form.
  • a convolutional neural network may be used to extract salient features from an image or portion of an image of a document, and process a feature vector according to a trained network.
  • Such networks may be trained on document templates, as discussed above.
  • a support vector machine (SVM) or k-Nearest Neighbor (kNN) algorithm may be used to compare and classify extracted features from images.
  • multiple image classifiers may be employed in serial or in parallel (e.g. simultaneously by different computing devices, with results aggregated), and a voting system may be used to aggregate the classifications, as discussed above. For example, in some implementations, if a majority of the image classifiers agree on a classification, an aggregated image classifier classification may be recorded; this aggregated image classifier classification may be further compared to classifications from other classifiers (e.g. XGBoost, automated machine learning, etc.) in a voting system as discussed above. In other implementations, each image classifier vote may be provided for voting along with other classifiers in a single voting or aggregation step (e.g. identifying a majority vote from votes from an edge detection image classifier,
  • optical character recognition may be used to recognize characters in an image.
  • the image objects, or clusters of related pixels may be compared to characters in a character and/or font database.
  • the image objects in the document may be matched to characteristics of characters.
  • an image object may comprise part of a curve that resembles the curve in the lower portion of the letter ‘c’.
  • a computing device may compare the curve with curves in other characters via a database. Subsequently, a prediction of the character may be determined based on related image objects. For example, after comparing image objects in a character database, the computing device may determine that the character is a ‘c’.
  • a computing device may predict each character and/or object in an image.
  • a computing device may be able to classify words based on an image.
  • a dictionary may be employed to check the words extracted from the document.
  • the computing device may determine a string of characters extracted from the document to be “daf ’.
  • a dictionary may be employed to check the string “daf ’ and it may be subsequently determined that the string “cat” instead of “daf ’ should have been found based on the curves of characters.
  • a dictionary may be a means of checking that the predicted character, based on the image object, was accurately determined.
  • the document may be classified based on the characters and strings determined by the computing device.
  • topic modeling may be performed to classify the documents based on the determined strings in the document.
  • LSA latent semantic analysis
  • LSA may determine the similarity of strings by associating strings with content and/or topics that the strings are frequently used to describe.
  • the word “client” may be associated with the word “customer” and receive a high string similarity score.
  • the words “Notebook Computer” would receive a low string similarity score in the context of “The Notebook”, the 2004 movie produced by Gran Via.
  • a score between -1 and 1 may be produced, where 1 indicates that the strings are identical in their context, while -1 means there is nothing that relates the strings to that content.
  • LSA performs string-concept similarity analysis by identifying relationships between strings and concepts in a document.
  • LSA evaluates the context of strings in a document by considering strings around each string.
  • LSA includes constructing a weighted term-document matrix, performing singular value decomposition on the matrix to reduce the matrix dimension while preserving string similarities, and subsequently identifying strings related to topics using the matrix.
  • LSA assumes that the words around a topic may describe the topic.
  • the topic “Notebook Computer” may be surrounded by words that describe technical features.
  • the topic “The Notebook” movie may be surrounded by dramatic or romantic words. Therefore, if a computing device determined that one string described the “Notebook Computer” and a different string described “The Notebook” movie, LSA may enable the computing device to determine that the strings are describing different topics.
  • the singular value decomposition matrix used in LSA may be used for document classification.
  • the vectors from the singular value decomposition matrix may be compared to a vector corresponding to different classes.
  • Cosine similarity may be applied to the vectors such that an angle may be calculated between the vectors in the matrix and the vector comprising classes.
  • a ninety-degree angle may express no similarity, while total similarity may be expressed by a zero degree angle because the strings would completely overlap.
  • a document may be classified by determining the similarity of the topics and/or strings in the document and the classes. For example, if a document topic is determined to be “document 1” based on LSA, the document may be classified as “document 1” because the cosine similarity analysis would show that the topic “document 1” completely overlaps with the label “document 1.”
  • step 136 depends on a computing device determining whether the label, determined by the image classification, is meaningful.
  • meaningful labels may include: title page, signature page, first pages, middle pages, end pages, recorded pages, document 1, document 2, document 3, page 1, page 2, page 3, etc.
  • the process proceeds to step 140.
  • a confidence score may be associated with the returned label.
  • the confidence score may be compared to a threshold.
  • the label In response to the confidence score exceeding the threshold, the label may be considered a meaningful label and the process may proceed to step 150.
  • the label selected by the classified In response to the confidence score not exceeding the threshold, the label selected by the classified may not be considered meaningful. Instead, the label selected by the classifier may be replaced by, for example, the label “Unknown”. In response to the label not being a meaningful label, the process may proceed to step 140.
  • each classifier in a plurality of classifiers may have their own threshold value.
  • the threshold value may be tuned as a hyper-parameter.
  • the threshold value for image classification may be a value d (e.g. 50, 60, 70, 80, or any other such value, in various embodiments).
  • step 150 responsive to a meaningful and agreed-upon label by a predetermined number of classifiers, the computing device may modify the document to include the meaningful and agreed-upon label.
  • step 140 several classifiers may be employed out of a plurality of classifiers in an attempt to label the document that was unable to be labeled during the previous iterations.
  • an untrained Regex may be employed.
  • regex classifiers search for and match strings.
  • the regex classifier may be trained. For example, evolutionary algorithms may be employed to generate regexes, the generated regexes being more likely to be found in the document. In alternate embodiments, a large number of regexes may be employed in an attempt to classify the document. The regexes may not necessarily be tuned such that the expressions are more likely to be found in the document.
  • a large number of untrained regexes may be used to discourage any likelihood of the trained regexes being overly trained.
  • Employing untrained regexes may be analogous to the concept of injecting bias into a model in the form of regularization, as discussed above.
  • a document may be searched for RegExes.
  • the document in response to a document matching RegExes to document labels, the document may be classified with that label.
  • a document may be classified with a label in response to RegExes being matched to the label a predetermined number of times. For example, a predetermined number may be set to two.
  • a document matches one RegEx to a label, the document may not be classified with that label because the number of RegExes did not meet or exceed the predetermined number two.
  • a confidence score may be associated with the labeled document. The confidence score may indicate the likelihood that the document was classified correctly.
  • the document may be modified with the label determined from the Untrained RegEx classifier.
  • the document may be labeled with a meaningful label such as title page, signature page, first pages, middle pages, end pages, recorded pages, document 1, document 2, document 3, page 1, page 2, page 3, etc.
  • a document may be labeled with this label in the event that the RegEx classifier matched text within the document to a meaningful label.
  • a document may be labeled a label that may not be meaningful.
  • a document may be labeled “Unknown”.
  • a document may be labeled with a label that may not be meaningful in the event that the RegEx classifier did not match text within the document to a meaningful label.
  • the document may be labeled with a label that may not be meaningful in the event that the confidence score associated with the RegEx classification did not exceed or meet the RegEx threshold value.
  • the RegEx threshold value is set to 100.
  • a document may be labeled “Unknown” in the event the RegEx’s confidence score is not
  • the systems and methods discussed herein provide a significant increase in accuracy compared to whole document natural language processing. For example, in one implementation of the systems discussed herein, nine stacks of documents were classified with an average accuracy of 84.4%.
  • Client 602 may request a stack of documents 604 to be classified.
  • the documents 604 may be documents that may have been scanned.
  • the scanned documents 604 may be images or digital visually perceptible versions of the physical documents.
  • System 606 may receive the stack of documents 604 from client 602 and classify the documents 604.
  • a processor 608 may be the logic in a device that receives software instructions.
  • a central processing unit (“CPU”) may be considered any logic circuit that responds to and processes instructions. Thus, CPUs provide flexibility in performing different applications because various instructions may be performed by the CPU.
  • One or more algorithmic logic units (“ALU”) may be incorporated in processors to perform necessary calculations in the event an instruction requires a calculation be performed. When a CPU performs a calculation, it performs the calculation, stores the calculation in memory, and reads the next instruction to determine what to do with the calculation.
  • a different type of processor 608 utilized in system 606 may be the graphics processing unit (“GPU”).
  • a system 606 may include both GPU and CPU processors 608.
  • a GPU is a specialized electronic circuit designed to quickly perform calculations and access memory. As GPUs are specifically designed to perform calculations quickly, GPUs may have many ALUs allowing for parallel calculations. Parallel calculations mean that calculations are performed more quickly.
  • GPUs, while specialized, are still flexible in that they are able to support various applications and software instructions. As GPUs are still relatively flexible in the applications they service, GPUs are similar to CPUs in that GPUs perform calculations and subsequently store the calculations in memory as the next instruction is read.
  • processor 608 may include a neural network engine 610 and parser 612.
  • a neural network engine 610 is an engine that utilizes the inherent parallelisms in a neural network to improve and speed up the time required for calculations.
  • processors 608 performing neural network instructions perform the neural network calculations sequentially because of the dependencies in a neural network.
  • the inputs to one neuron in a network may be the outputs from the previous neuron.
  • a neuron in a first layer may receive inputs, perform calculations, and pass the output to the next neuron.
  • many of the same computations are performed numerous time during the execution of the neural network. For example, multiplication, addition and executing transfer function are performed at every neuron.
  • neural network engines 610 may capitalize on the parallelisms of a neural network in various ways. For example, every addition, multiplication and execution of the transfer function may be performed simultaneously for different neurons in different layers.
  • a parser 612 may be a data interpreter that breaks data into smaller elements of data such that the smaller elements of data may be processed faster or more accurately.
  • a parser 612 may take a sequence of text and break the text into a parse tree.
  • a parse tree is a tree that may represent the text based on the structure of the text.
  • a parse tree may be similar in structure to the decision tree illustrated in FIG. 3, but decisions may not be performed. Instead, a parse tree may merely be used to show the structure of data to simplify the data.
  • system 606 may additionally have a tensor processing unit (“TPU”) 614.
  • TPU 614 while still a processor like a CPU and GPU, is an Artificial Intelligence application-specific integrated circuit, such as those circuits manufactured by Google of Mountain View, California. TPUs do not require any memory as their purpose is to perform computations quickly. Thus, TPU 614 performs calculations and subsequently passes the calculations to an ALU or outputs the calculations such that more calculations may be performed. Thus, TPUs may be faster than their counterparts CPUs and GPUs.
  • a network storage device 616 may be a device that is connected to a network, allowing multiple users connected to the same network to store data from the device.
  • the network storage device may be communicably and operatively coupled to the network such that direct or indirect exchange of data, values, instructions, messages and the like may be permitted for multiple users.
  • implementations of the systems and methods discussed herein provide for digital document identification via a multi-stage or iterative machine-learning classification process utilizing a plurality of classifiers.
  • Documents may be identified and classified at various iterations according to and identifying the digital document based upon agreement between a predetermined number of classifiers.
  • these classifiers may not need to scan entire documents, reducing processor and memory utilization compared to classification systems not implementing the systems and methods discussed herein.
  • the classifications provided by implementations of the systems and methods discussed herein may be more accurate than simple keyword-based analysis.
  • documents may be multi-page documents.
  • Pages of a multi-page document may be related by virtue of being part of the same document, but may have very different characteristics: for example, a first page may be a title or cover page with particular features such as document identifiers, addresses, codes, or other such features, while subsequent pages may be freeform text, images, or other data. Accordingly, the systems and methods discussed herein may be applied on a page by page basis, and/or on a document by document basis, to classify pages as being part of the same multi-page document, sometimes referred to as a “dictionary” of pages, and/or to classify documents as being of the same type, source, or grouping, sometimes referred to as a “domain”. In some implementations, pages of different documents or domains that are similarly classified (e.g.
  • cover or title pages of documents of the same type may be collated together into a single target document; and conversely, in some implementations, pages coming from a single multi-page document may be collated into multiple target documents.
  • a multi-page document that comprises cover pages from a plurality of separate documents may have each page classified as a cover page from a different document or domain, and the source multi-page document may be divided into a corresponding plurality of target documents. This may allow for automatic reorganization of pages from stacks of documents even if scanned or captured out of order, or if the pages have been otherwise shuffled.
  • the disclosure is directed to a method for machine learning-based document classification executed by one or more computing devices: receiving, a candidate document for classification, iteratively (a) selecting a subset of classifiers from a plurality of classifiers, (b) extracting a corresponding set of feature characteristics from the candidate document, responsive to the selected subset of classifiers, (c) classifying the candidate document according to each of the selected subsets of classifiers, and (d) repeating steps (a)-(c) until a predetermined number of the selected subset of classifiers at each iteration agrees on a classification, comparing, a confidence score to a threshold, the confidence score based on the classification of the candidate document, the threshold according to each of the selected subsets of classifiers, classifying, the candidate document according to the agreed-upon classification, responsive to the confidence score exceeding the threshold; and modifying, by the computing device, the candidate document to include an identification of the agreed-upon classification.
  • a number of classifiers in the selected subset of classifiers in a first iteration is different from a number of classifiers in the selected subset of classifiers in a second iteration.
  • each classifier in a selected subset utilizes different feature characteristics of the candidate document.
  • first number of the selected subset of classifiers classify the candidate document with a first classification
  • a second number of the selected subset of classifiers classify the candidate document with a second classification.
  • the subset of classifiers of a first iteration are different from the subset of classifiers of a second iteration.
  • step (b) further comprises extracting feature characteristics of a parent document of the candidate document; and step (c) further comprises classifying the candidate document according to the extracted feature characteristics of the parent document of the candidate document.
  • step (d) further comprises repeating steps (a)-(c) responsive to a classifier of the selected subset of classifiers returning an unknown classification. In some implementations, during at least one iteration, step (d) further comprises repeating steps (a)-(c) responsive to all of the selected subset of classifiers not agreeing on a classification.
  • extracting the corresponding set of feature characteristics from the candidate document further comprises at least one of extracting text of the candidate document, identifying coordinates of text within the candidate document, or identifying vertical or horizontal edges of an image the candidate document.
  • the plurality of classifiers comprises an elastic search model, a gradient boosting classifier, a neural network, a time series analysis, a regular expression parser, and one or more image comparators.
  • the predetermined number of selected subset classifiers includes a majority of classifiers in at least one iteration. Further, the predetermined number of selected subset classifiers includes a minority of classifiers, the number of minority classifiers being at least greater than one classifier in at least one iteration.
  • this disclosure is directed to a system for machine learning-based classification executed by a computing device.
  • the system includes a receiver configured to receive a candidate document for classification and processing circuitry configured to: select a subset of classifiers from a plurality of classifier, extract a set of feature characteristics from the candidate document, the extracted set of feature characteristics based on the selected subset of classifiers, classify the candidate document according to each of the selected subsets of classifiers, determine that a predetermined number of the selected subset of classifiers agrees on a classification, compare a confidence score to a threshold, the confidence score based on the classification of the candidate document, the threshold according to each of selected subsets of classifiers, classify the candidate document according to the agreed-upon classification, responsive to the confidence score exceeding the threshold; and modify the candidate document to include an identification of the agreed-upon classification.
  • each classifier in a selected subset utilizes different feature characteristics of the candidate document.
  • the processing circuitry is further configured to: extract feature characteristics of a parent document of the candidate document; and classify the candidate document according to the extracted feature characteristics of the parent document of the candidate document.
  • extracting the corresponding set of feature characteristics from the candidate document further comprises at least one of extracting text of the candidate document, identifying coordinates of text within the candidate document, or identifying vertical or horizontal edges of an image the candidate document.
  • the plurality of classifiers comprises an elastic search model, a gradient boosting classifier, a neural network, a time series analysis, a regular expression parser, and an image comparator.
  • the predetermined number of selected subset classifiers includes a majority of classifiers. In some implementations, the predetermined number of selected subset classifiers includes a minority of classifiers, the number of minority classifiers being at least greater than one classifier. In some implementations, the processing circuitry is further configured to return an unknown classification.
  • Printed documents can include text, images, and other information that can be challenging to classify using document analysis techniques. Certain markings, such as stamps, signatures, or other modifications to documents that occur after a document is printed can present additional challenges to document classification. Often, stamps can be inconsistent with the rest of a document, and may be partially incomplete, irregular, and have a varying location across similar documents. Because stamps and other markings are typically applied after a document is printed, those markings can occasionally overlap and obscure document text and the stamp itself. Further, the placement of markings such as stamps and signatures may vary in shade, color, and marking strength.
  • stamps or other markings can include text information that the other implementations fail to extract or analyze effectively. Certain stamps or markings make lack any significant shape, and may include only text that other models fail to identify, extract, or classify.
  • the systems and methods of this technical solution can automatically detect, extract, and classify stamps and other regions of interest in document images by combining natural language processing techniques and image processing techniques.
  • the document images can include images of printed documents that may be with stamped, for example, with one or more ink or toner stamps.
  • the systems and methods described herein can implement cloud computing, machine learning, and domain-specific data preprocessing techniques to detect a variety of different stamp or marking types in document images by analyzing its textual content and its visual structure. This allows the systems and methods of this technical solution to proficiently identify, classify, and extract stamps or markings of interest, even if those stamps or markings lack identifying shapes and are instead predominantly text.
  • the systems and methods described herein can implement cloud computing, machine learning, and domain-specific data processing techniques to effectively extract and classify stamps in document images that other document classifiers cannot.
  • the techniques described herein can pre-process and filter un-necessary data from a document image, and utilize combined outputs of image data and text data analysis techniques to detect and extract stamps or other markings from document images.
  • the pipeline architecture techniques described herein can accurately classify stamps that vary in shape, position, and structure by combining natural language processing techniques and image data processing techniques.
  • the systems and methods described herein can re-train and adapt to unfamiliar stamp or marking types and patterns on demand. Feature extractors for those stamp types can be implemented automatically and without manual intervention.
  • stamps or other markings can be classified, for example, to differentiate between different stamp types.
  • Stamp types may include, for example, a signature, an identifying marking applied to a document after printing, a court stamp, a notary stamp, a mortgage recording stamp, an endorsement stamp, an identifying seal applied to a document after printing, a signature, or any other type of marking applied to a document after printing.
  • the systems and methods described herein can analyze further analyze the portions of the document image that may not include the stamp to extract as much relevant information from a given document image as possible.
  • the systems and methods described herein can further apply additional document classification techniques to identify or classify the type of the document under analysis.
  • Other document data can then be extracted or inferred based on the information extracted from the document, the stamp, and the classification of the document. After extraction, this other document data can be stored in association with the document and the document classification in a data store.
  • the machine learning techniques used to extract image data can include, for example, convolutional neural networks, residual neural networks, or other neural networks or image classifier models.
  • the systems and methods described herein can implement natural language processing models, word-embedding, and bags-of-words techniques to analyze information about text present in document images.
  • the techniques of this technical solution can implement one or more pipeline architectures, allowing for the sequential or parallel processing of document images.
  • the architectures of this technical solution can retrieve or receive document images or files, and can separate the files or images by type.
  • a file or image type can be analyzed, for example, by parsing information from a header memory region of the file, or by analyzing a file extension present in the file name.
  • the architecture can pre-process the images by converting them to grayscale, and can reduce them to a predetermined dimension for further analysis (e.g., 512x512, 1024x1024, 2048x2048, or any other dimensions to match the input of a classification model as needed, etc.).
  • Text can be pre-processed and extracted or filtered from a document image using a regular expression algorithm.
  • the extracted text information can be packaged into a data structure suitable for machine learning models, for example the text and other document information can be vectorized into a term frequency-inverse document frequency (TF-IDF) vector.
  • TF-IDF term frequency-inverse document frequency
  • the document Upon detection of one or more stamps or markings of interest in a document image, the document can then be processed using an object classification model, such as a deep convolutional neural network (CNN). Additional processing techniques, such as optical character recognition (OCR), can be applied to the portion of the document including the stamp or mark of interest to extract additional text information of each stamp or mark detected in the document image.
  • OCR optical character recognition
  • NLP Natural language processing
  • Rule-based or statistical NLP models can be used to analyze the text information present in each stamp to aid in accurate classification of the stamp or marking.
  • the ensemble of models utilized by the systems and methods of this technical solution can process both text and images to detect and classify one or more stamps or markings that may be present in document images.
  • Text and image analysis is used because the stamps or markings present in document images often contain relevant identifying text in addition to identifying geometric and layout information.
  • the image classification models of the ensemble can include CNNs, deep neural networks (DNN), fully connected neural networks, autoencoder networks, dense neural networks, linear regression models, rule-based techniques, and other machine learning techniques usable to identify objects in images.
  • At least one image based model can include a CNN coupled with an autoencoder model with a dense neural network appended to the end.
  • one other model includes a convolutional autoencoder model that can function as form of unsupervised transfer learning.
  • the autoencoder can be fed into a dense neural network (e.g., a fully connected neural network, etc.) that acts as a classifier based on the features extracted in the autoencoder.
  • Language and text processing models of the ensemble can include dense neural networks, and domain-specific processing to pre-process and extract text and language information from document image files.
  • domain-specific processing techniques can include regularexpression algorithms that can clean the text extracted or filtered from the image.
  • Domainspecific word standardization techniques can be applied to the text information to increase the accuracy and performance while decreasing the dimensionality of the data input to the classification models.
  • Certain stamps may include regular language or text that may be processed or analyzed using rule-based detection and parsing, such as processing using regular expressions.
  • Such complex regular expression rules can help organize the text into distinguishable bigrams that are associated with recording stamps used in mortgage documents, particularly the smaller recording stamps that solely display the book and page of the associated county records.
  • This process allows the text-based models to better isolate the language associated with recording stamps through the creation of unique, stamp-specific bigrams or keyword pairings.
  • Text of particular interest such as keywords, numbers, or patterns
  • they can be paired together into bigrams that distinguishes the block of text that are derived from stamps efficiently.
  • Such text processing can reduce the overall size of data input into a classification model, reduce overall computational complexity, and create distinguishing word features that makes it more efficient to detect a recorded stamp on a page, which is an improvement over other document classification systems.
  • the foregoing models can be computed into a single ensemble classifier, or detector model.
  • Weights can be applied to one or more of the models described herein, for example through Tikhonov regularization or ridge regression, to perform weighted ensembling when creating the single ensemble classifier.
  • the ensemble classifier can be further optimized or fit onto one or more test sets (e.g., sets of document data or set of stamps to classify) by the systems and methods described herein using the statistical distribution of one or more test sets. This can allow the weighted ensemble classifier to become further optimized for a population of stamps, markings, or document types.
  • Multicollinearity issues are therefore mitigated between each model by using ridge regression techniques, thus improving the efficiency in parameter estimation in used for creating a final output.
  • Cross-validation techniques can be used to find an optimal X value, allowing the ensemble model to filter out any unnecessary components.
  • the systems and methods described herein can utilize the ensemble model to detect, extract, and classify stamps or markings of interest in one or more document images or files.
  • the systems and methods of this technical solution can implement a CNN model as an automated feature extractor for stacked bidirectional long shortterm memory (LSTM) based models or stacked gated recurrent unit (GRU) based models.
  • the systems and methods of this technical solution can segment or break down a document image into various regions, allowing for faster and more compact image and document analysis.
  • the systems and methods described herein can implement parallel processing techniques to improve the computational efficiency of stamp or making detection, extraction, and classification.
  • Regional segmentation of document images or files can reduce the noise and dimensions that a particular model will process, improving the overall classification accuracy of the system.
  • a hierarchy ensemble model may be implemented to determine whether a document image or file should be processed by an NLP model, an image processing model, or both.
  • the hierarchy ensemble model can determine the models with which a document image or file should be processed based on document quality (e.g., clarity of document image, light of document image, clarity or discernibility of one or more features in document image, resolution of document image, etc.), data returned from OCR processing, or other characteristics of the document.
  • document quality e.g., clarity of document image, light of document image, clarity or discernibility of one or more features in document image, resolution of document image, etc.
  • the system 700 can include at least one computing device 705, at least one network 710, and at least one client devices 720A-N (sometimes referred to generally as “client device 720” or “client devices 720”).
  • the computing device 705 can include at least one database 715, at least one document data receiver 730, at least one image data pre-processor 735, at least one text data filter 740, at least one vector converter 745, at least one stamp detector 750, at least one image data extractor 755, at least one text extractor 760, at least one stamp classifier 765, and at least one stamp identification manager 770.
  • the database 715 can store, maintain, index, or otherwise include at least one image data 775, at least one extracted text 780, and at least one or more identified stamps 785 (sometimes referred to in the singular as identified stamp 785).
  • Each of the components or modules e.g., the computing device 705, the network 710, the client devices 720, the database 715, the document data receiver 730, the image data preprocessor 735, the text data filter 740, the vector converter 745, the stamp detector 750, the image data extractor 755, the text extractor 760, the stamp classifier 765, and the stamp identification manager 770, etc.
  • the system 700 can be implemented using the hardware components or a combination of software with the hardware components of any computing system (e.g., computing device 144, the computing device 705, any other computing system described herein, etc.) detailed herein.
  • the computing device 705 can include at least one processor and a memory, e.g., a processing circuit.
  • the memory can store processor-executable instructions that, when executed by processor, cause the processor to perform one or more of the operations described herein.
  • the processor may include a microprocessor, an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), etc., or combinations thereof.
  • the processor may include specialized processing units for parallel processing, such as one or more tensor processing units (TPU) or one or more graphical processing units (GPU).
  • the processor may include one or more co-processors, such as a floating point unit (FPU), vector processing unit (VPU), or other specialized processor configured to work in conjunction with one or more central processing units.
  • FPU floating point unit
  • VPU vector processing unit
  • the memory may include, but is not limited to, electronic, optical, magnetic, or any other storage or transmission device capable of providing the processor with program instructions.
  • the memory may be configured to allow for high communication bandwidth between parallel processing units of the processor.
  • the memory may further include a floppy disk, CD-ROM, DVD, magnetic disk, memory chip, ASIC, FPGA, read-only memory (ROM), random-access memory (RAM), electrically erasable programmable ROM (EEPROM), erasable programmable ROM (EPROM), flash memory, optical media, or any other suitable memory from which the processor can read instructions.
  • the instructions may include code from any suitable computer programming language.
  • the computing device 705 can include one or more computing devices or servers that can perform various functions as described herein.
  • the computing device 705 can include any or all of the components and perform any or all of the functions of the computer device 1400 described herein in conjunction with FIGs. 14A-14B.
  • the network 710 can include computer networks such as the Internet, local, wide, metro or other area networks, intranets, satellite networks, other computer networks such as voice or data mobile phone communication networks, and combinations thereof.
  • the computing device 705 of the system 700 can communicate (e.g., bidirectional communication) via the network 710, for instance with at least one client device 720.
  • the network 710 may be any form of computer network that can relay information between the computing device 705, the client device 720, and one or more content sources, such as web servers, amongst others.
  • the network 710 may include the Internet and/or other types of data networks, such as a local area network (LAN), a wide area network (WAN), a cellular network, a satellite network, or other types of data networks.
  • the network 710 may also include any number of computing devices (e.g., computers, servers, routers, network switches, etc.) that are configured to receive and/or transmit data within the network 710.
  • the network 710 may further include any number of hardwired and/or wireless connections. Any or all of the computing devices described herein (e.g., the computing device 705, the computer system 1400, etc.) may communicate wirelessly (e.g., via WiFi, cellular, radio, etc.) with a transceiver that is hardwired (e.g., via a fiber optic cable, a CAT5 cable, etc.) to other computing devices in the network 710.
  • Any or all of the computing devices described herein may also communicate wirelessly with the computing devices of the network 710 via a proxy device (e.g., a router, network switch, or gateway).
  • a proxy device e.g., a router, network switch, or gateway.
  • the database 715 can be a database or data storage medium configured to store and/or maintain any of the information described herein.
  • the database 715 can maintain one or more data structures, which may contain, index, or otherwise store each of the values, pluralities, sets, variables, vectors, or thresholds described herein.
  • the database 715 can be accessed using one or more memory addresses, index values, or identifiers of any item, structure, or region maintained in the database 715.
  • the database 715 can be accessed by the components of the computing device 705, or any other computing device described herein, via the network 710.
  • the database 715 can be internal to the computing device 705.
  • the database 715 can exist external to the computing device 705, and may be accessed via the network 710.
  • the database 715 can be distributed across many different computer systems or storage elements, and may be accessed via the network 710 or a suitable computer bus interface.
  • the computing device 705 can store, in one or more regions of the memory of the computing device 705, or in the database 715, the results of any or all computations, determinations, selections, identifications, generations, constructions, or calculations in one or more data structures indexed or identified with appropriate values. Any or all values stored in the database 715 may be accessed by any computing device described herein, such as the computing device 705 or the computing device 144, to perform any of the functionalities or functions described herein.
  • the client device 720 can be a computing device configured to communicate via the network 710 to display data such as applications, web browsers, webpage content or other information resources.
  • the client device 720 can transmit requests for stamp detection and classification to the computing device 705 via the network 710.
  • the request can include, for example, an image of a document, such as an image captured by the client device 720.
  • the client devices 720 can be one or more desktop computers, laptop computers, tablet computers, smartphones, personal digital assistants, mobile devices, consumer computing devices, servers, clients, digital video recorders, set-top boxes for televisions, video game consoles, or any other computing device configured to communicate via the network 710, among others.
  • the client device 720 can be a communication device through which an end user can submit requests to classify stamps in document images, or receive stamp classification information.
  • the client device 720 can include a processor and a memory, e.g., a processing circuit.
  • the memory stores machine instructions that, when executed by processor, cause processor to perform one or more of the operations described herein.
  • the processor may include a microprocessor, an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), etc., or combinations thereof.
  • the memory may include, but is not limited to, electronic, optical, magnetic, or any other storage or transmission device capable of providing the processor with program instructions.
  • the memory may further include a floppy disk, CD- ROM, DVD, magnetic disk, memory chip, ASIC, FPGA, read-only memory (ROM), randomaccess memory (RAM), electrically erasable programmable ROM (EEPROM), erasable programmable ROM (EPROM), flash memory, optical media, or any other suitable memory from which the processor can read instructions.
  • the instructions may include code from any suitable computer programming language.
  • the client device 720 may also include one or more user interface devices.
  • a user interface device refers to any electronic device that conveys data to a user by generating sensory information (e.g., a visualization on a display, one or more sounds, etc.) and/or converts received sensory information from a user into electronic signals (e.g., a keyboard, a mouse, a pointing device, a touch screen display, a microphone, etc.).
  • the one or more user interface devices may be internal to a housing of the client device 720 (e.g., a built-in display, microphone, etc.) or external to the housing of the client device 720 (e.g., a monitor connected to the client device 720, a speaker connected to the client device 720, etc.).
  • the client device 720 may include an electronic display, which may visually displays webpages using webpage data received from one or more content sources.
  • the client device 720 may include one or more camera devices configured to capture and record images in the memory of the client device 720. The images captured by the one or more camera devices of the client device 720 can be transmitted to the computing device 705 for stamp or marking detection and classification.
  • the document data receiver 730 can receive textual data of a document and image data including a capture of the document.
  • the textual data of a document and the image data of the document can be received, for example, in a request for document classification by one or more of the client devices 720.
  • the request for classification of a document can include an image of the document
  • the document data receiver 730 can retrieve the textual information of the document depicted in the image data by performing one or more image analysis techniques, such as OCR.
  • the image analysis techniques can be used to extract text that is present on the document depicted in the image.
  • the text information may include text that is included on the document but not included in any stamps present on the document.
  • the textual information can include the text present in the document and text included one or more stamps or marks present on the document.
  • the image data received in by document data receiver can include one or more image files depicting a document (e.g., a printed piece of paper, which may include text, images, or stamps or marks to be identified, etc.).
  • the image files may be JPEG image files, PNG image files, GIF image files, TIFF image files, PDF files, EPS files, RAW image files, or any other type of image file or image data.
  • the document data receiver 730 can include one or more camera devices capable of capturing images of printed documents. In such implementations, the document data receiver can receive the textual data of the document or the image data of the document from the one or more camera devices subsequent to capturing an image of a document. Any captured images can be stored, for example, in the memory of the computing device 705.
  • the image data pre-processor 735 can pre-process the image data.
  • the computing device 705 can more efficiently and accurately classify one or more stamps or markings of interest in the document to be classified.
  • Pre-processing the image data can include converting the image data to grayscale, or otherwise modifying the colors of the image data to better suit the classification models used by the computing device 705 to detect and classify one or more stamps present in the document under analysis.
  • the image data may be pre-processed by downscaling the image data (in grayscale, modified color, or as-is, etc.) to a predetermined size.
  • the predetermined size may have equal height and width dimensions (e.g., 512x512, 1024x1024, 2048x2048, etc.).
  • the downscaled image data may be cropped to remove irrelevant image data that would otherwise not contribute to the classification of any stamp or marking.
  • the image data pre-processor 735 may crop out areas of the image that do not include the document under analysis, such as the areas of the image that are outside the edges of the document.
  • the text data filter 740 can filter the textual data to remove predetermined characters.
  • the text data filter 740 may clean the text to remove any characters, sentences, or other information from the document text information that would otherwise interfere with the detection or classification of stamps or marks of interest in the document under analysis.
  • Such cleaning can include, for example, identifying one or more characters (e.g., symbols, etc.), words, phrases, sentences, paragraphs, passages, or other text information that is not relevant to, or may otherwise obstruct, the detection or classification of the stamps or markings of interest, and remove (e.g., filter, etc.) it from the textual data.
  • Filtering the textual data can include applying a regular expression filter to the textual data.
  • Regular expressions may identify predetermined characters or sequences of characters, and can perform one or more operations (e.g., removal, etc.) on those characters.
  • Filtering the textual data may further include standardizing at least one word of the textual data according to a predefined dictionary. Standardizing one or more words of the textual data can include replacing word present in the text data with one deemed more appropriate by the text data filter 740. For example, if a word includes a typographical error, the text data filter 740 may utilize a predefined dictionary to identify the correct spelling of the word, and replace the word with the typographical error to correct the error.
  • the vector converter 745 can convert the textual data to a term frequency -inverse document frequency (TF-IDF) vector.
  • the TF-IDF vector can include a series of weight values that each correspond to the frequency or relevance of a word or term in the textual data of the document under analysis.
  • the vector converter 745 can identify each individual word in the textual data, and determine the number of times that word appears in the textual data of the document under analysis. The number of times a word or term appears in the textual data of the document under analysis can be referred to as its term frequency.
  • the vector converter 745 can scan through every word in the textual data, and increment a counter associated with that word.
  • the vector converter 745 can initialize a new counter for that word, and assign it an initialization value (e.g., one, etc.). Using the term frequency, the vector converter 745 can assign a weight value to each of the words or terms in the textual data a weight value. Each of the weight values can be assigned to a coordinate in the TF-IDF vector data structure.
  • the stamp detector 750 can detect, via a trained neural network from the pre-processed image data and the TF-IDF vector, a presence of a stamp on the document. Based on the text information, the stamp detector can utilize one or more natural language processing (NLP) models to determine that a document likely has a stamp or marking of interest.
  • NLP natural language processing
  • Certain documents may include text that indicates that the document should have one or more stamps or markings (e.g., a signature line indicating a signature may be present, a notary field indicating a notary stamp may be present, or any other type of field that might indicate a stamp or marking may have been added after printing, etc.).
  • Other documents may include language that would otherwise not indicate that a stamp is present in the document.
  • the stamp detector 750 may determine that there is no such field, signature line, or other indication present.
  • the stamp detector 750 can further analyze the document by applying the image data of the document to one or more object detection models.
  • the object detection models can include, for example, CNNs, DNNs, autoencoder models, fully connected neural networks, or some combination thereof.
  • the one or more object detection models can take the image data that was pre-processed by the image data pre-processor 735 as an input.
  • the object detection models may have an input layer (e.g., a first set of neural network nodes each associated with a weight value or a bias value, etc.) that has similar dimensions to the number of pixels in the pre-processed image data.
  • the input layer for the object detection model may be a two-dimensional weight tensor of with 512 rows and 512 columns.
  • the stamp detector 750 can multiply the pixel values (e.g., the gray scale intensity of the pre-processed image data, etc.) of the pixels by the weight value tensor of the input layer.
  • the output of the first layer of the neural network can be multiplied by a tensor representing a second layer, and so on until a number is output that represents a classification value.
  • the weight values in each tensor representing the neural network layers can be trained such that the final layer will output an indication of whether the document under analysis includes one or more stamps or markings of interest.
  • the stamp detector 750 can output the approximate location (e.g., pixel location) or boundaries (e.g., a box defining pixel coordinates of the boundaries of the detected stamp, etc.) of one or more stamps or markings of interest in the document under analysis.
  • the document image data 805 includes example text and an example stamp.
  • the computing device 705 or the components thereof can pre-process the image data and the textual data of the document 805 to create data that represents a pre-processed document 810.
  • the image data of the pre-processed document 810 includes image the stamp.
  • the stamp detector 750 has detected the presence of a stamp in the document.
  • the bounding box 815 is shown to illustrate the detection of the presence of the stamp of the document 805.
  • the bounding box 815 can represent the pixel locations of the image data of the document 805 that identify a bounding box around the detected stamp. This bounding box information can be used in later processing steps to aid in the extraction and classification of the stamp or marking.
  • the image data extractor 755 can, responsive to detection of the presence of the stamp, extract a subset of the image data including the stamp. Once the presence of the stamp has been detected, the image data extractor 755 can identify and extract a subset of the image data that includes the stamp. This image data may also include other information, such as text that the stamp or marking of interest happens to overlap on the document under analysis.
  • the image data extractor 755 can limit and improve the performance of the classification models utilized by the computing device 705. In particular, the image data extractor 755 can reduce the amount of input data to the classification models in later pipeline stages implemented by the components of the computing device 705.
  • the image data extractor 755 can utilize the bounding region identified by the stamp detector 750 to extract the subset of image data. For example, the image data extractor 755 can extract the pixels of the image data that are within the bounding region identified by the stamp detector 750 as the subset of image data that includes the detected stamp or marking of interest.
  • the text extractor 760 can extract, via optical character recognition (OCR), text from the subset of the image data.
  • OCR optical character recognition
  • stamps or marks of interest may include identifying text, such as identifier numbers, or other information that is relevant to the classification of a particular stamp or mark.
  • the text extractor 760 can utilize OCR to identify strings of characters present in the subset of image data.
  • the text extractor 760 may implement one or more neural network models to identify one or more characters or text present in the image data of the stamp. This text may not have been extracted or identified by the vector converter 745.
  • the text extracted by the text extractor 760 may be formatted or otherwise structured based on rules for expected formats of particular text information. For example, if the text extractor 760 detects a series of digitals that have similar position information along one axis, such as the digits ‘ 1’, ‘O’, ‘ 1’, ‘ 1’, ‘9’ ‘4’ in a row, the text extractor 760 may aggregate those characters into a single text string or number, such as “101194”. The text extractor 760 may perform the same process with alphabetic characters that are close enough together to be considered words, phrases, or sentences.
  • the stamp classifier 765 can classify a stamp using a weighted ensemble model from the extracted text and the TF-IDF vector.
  • the ensemble of models utilized by the stamp classifier 765 process both text and images to detect and classify one or more stamps or markings that may be present in document images.
  • the combination of text and image analysis is used for classification because the stamps or markings present in document images often contain relevant identifying text in addition to identifying visual information.
  • the stamp classifier 760 can accurately classify and analyze the stamps or markings of interest identified by the stamp detector 750.
  • the image classification models of the ensemble can include CNNs, DNNs, fully connected neural networks, autoencoder networks, dense neural networks, linear regression models, rule-based techniques, and other machine learning techniques usable to identify objects in images.
  • at least one image based model can include a CNN coupled with an autoencoder model with a dense neural network appended to the end.
  • one other model includes a convolutional autoencoder model that can function as form of unsupervised transfer learning.
  • the autoencoder can be fed into a dense neural network (e.g., a fully connected neural network, etc.) that acts as a classifier based on the features extracted in the autoencoder.
  • Language and text processing models of the ensemble can include dense neural networks, and other natural language processing models, such as LSTM or GRU based models. Both the text and the language processing and classification models can be computed by the stamp classifier 765 into a single ensemble classifier that is configured to classify stamps based on a combination of text information and image information. Weights can be applied to one or more of those models, for example through Tikhonov regularization or ridge regression, to perform weighted ensembling when creating the single ensemble classifier.
  • the ensemble classifier can be further optimized or fit onto one or more test sets (e.g., sets of document data or set of stamps to classify) by the stamp classifier 765 using the statistical distribution of one or more test sets.
  • the stamp classifier 765 can use cross-validation techniques to find an optimal X value, allowing the ensemble model to filter out any unnecessary components.
  • Classifying the stamp may further include classifying the stamp as corresponding to one of a predetermined plurality of classifications.
  • the stamp classifier 765 may maintain a list of predetermined classifications of stamps or markings of interest.
  • the stamp classifier 765 can apply the text data, the TD-IDF, or the subset of the image data to one or more input layers or nodes of the ensemble model, and propagate by applying the operations (e.g., multiplication, addition, etc.) and tensor data structures specified in the model to the input values, and the output values of each sequential layer, node, or neural network structure to output a classification value for the stamp.
  • the output value may be a vector of probability values, where each coordinate in the vector corresponds to one of the plurality of predetermined classifications of stamps.
  • the stamp classifier 765 can identify the coordinate in the output vector that has the largest probability value, and use the identified coordinate to look- up (e.g., in a lookup table, etc.) the corresponding classification value for the detected stamp or marking of interest.
  • FIG. 9 depicted is an example diagram 900 illustrating example results of the extraction and classification of one or more stamps from a processed document image 810.
  • the processed document 810 includes a detected stamp bounding region 815.
  • the image data extractor 755 can extract the subset of the image data 910 of the processed document 810 that includes the stamp. Then, based on the image data, and text data extracted from the subset of the image data 910, and the TF-IDF vector generated from the processed document 810, the stamp classifier 765 can classify the stamp to generate a classification value (shown in diagram 900 as a stamp 915 without any markings from the document 810).
  • the stamp identification manager 770 can store the subset of the image data including the stamp 775, the extracted text from the subset of the image data 780, and an identification of the classification of the stamp 785 in the database 715.
  • the stamp identification manager 770 can access the database 715 to create one or more entries that are associated with the document under analysis. Then, the stamp identification manager 770 can generate one or more data structures that each correspond to the one or more detected stamps or markings of interest in the document under analysis. The stamp identification manager 770 can then populate the data structures using the subset of the image data 775 extracted by the image data extractor 755, the extracted text 780 from the text extractor 760, and the classification of the stamp 785 received from the stamp classifier 765.
  • the stamp identification manager 770 can transmit the image data 775, the extracted text 780, and the identification of the stamps 785 that correspond to the one or more stamps or markings of interest in the analyzed document to the client device 720 that requested analysis of the document.
  • the stamp identification manager 770 can transmit the image data 775, the extracted text 780, and the identification of the stamps 785, for example, in a response message via the network 710.
  • FIG. 10 depicted is an illustrative flow chart of an example method 1000 of detecting and classifying stamps or markings in a document, in accordance with one or more implementations.
  • the method can be performed, for example, by the computing device 705 described herein above in conjunction with FIG. 7, or by the computing device 144 described herein below in conjunction with FIGs. 14A and 14B, or by any other computing device described herein.
  • the computing system e.g., the computing device 705, etc.
  • the computing system can pre-process image data of the document data (Step 1004).
  • the computing system can filter the textual data of the document data (Step 1006).
  • the computing system can convert the textual data to a TF-IDF vector (Step 1008).
  • the computing system can determine if a stamp has been detected (Step 1010).
  • the computing system can extract subsets of image data (Step 1012).
  • the computing system can select the k-th subset of image data (Step 1014).
  • the computing system can extract text from the selected subset of image data (Step 1016).
  • the computing system can classify the stamp or marking of interest (Step 1018).
  • the computing system can store the classification or identification of the stamp or marking of interest (Step 1020).
  • the computing system can determine whether the counter register k is equal to the number of detected stamps or markings of interest n (Step 1022).
  • the computing system can increment the counter register k (Step 1024).
  • the computing system can finish classifying stamps or markings of interest (Step 1026).
  • the computing system can receive document data (Step 1002).
  • the computing system can receive textual data of a document and image data including a capture of the document.
  • the textual data of a document and the image data of the document can be received, for example, in a request for document classification by one or more of the client devices 720.
  • the request for classification of a document can include an image of the document, and the computing system can retrieve the textual information of the document depicted in the image data by performing one or more image analysis techniques, such as OCR.
  • the image analysis techniques can be used to extract text that is present on the document depicted in the image.
  • the text information may include text that is included on the document but not included in any stamps present on the document.
  • the textual information can include the text present in the document and text included one or more stamps or marks present on the document.
  • the image data received in by document data receiver can include one or more image files depicting a document (e.g., a printed piece of paper, which may include text, images, or stamps or marks to be identified, etc.).
  • the image files may be JPEG image files, PNG image files, GIF image files, TIFF image files, PDF files, EPS files, RAW image files, or any other type of image file or image data.
  • the computing system can include one or more camera devices capable of capturing images of printed documents.
  • the document data receiver can receive the textual data of the document or the image data of the document from the one or more camera devices subsequent to capturing an image of a document. Any captured images can be stored, for example, in the memory of the computing system.
  • the computing system can pre-process image data of the document data (Step 1004).
  • pre-processing the image data By pre-processing the image data, the computing system can more efficiently and accurately classify one or more stamps or markings of interest in the document to be classified.
  • Pre-processing the image data can include converting the image data to grayscale, or otherwise modifying the colors of the image data to better suit the classification models used by the computing system to detect and classify one or more stamps present in the document under analysis.
  • the image data may be pre-processed by downscaling the image data (in grayscale, modified color, or as-is, etc.) to a predetermined size.
  • the predetermined size may have equal height and width dimensions (e.g., 512x512, 1024x1024, 2048x2048, etc.).
  • the downscaled image data may be cropped to remove irrelevant image data that would otherwise not contribute to the classification of any stamp or marking.
  • the computing system may crop out areas of the image that do not include the document under analysis, such as the areas of the image that are outside the edges of the document.
  • the computing system can filter the textual data of the document data (Step 1006).
  • the computing system may clean the text to remove any characters, sentences, or other information from the document text information that would otherwise interfere with the detection or classification of stamps or marks of interest in the document under analysis.
  • Such cleaning can include, for example, identifying one or more characters (e.g., symbols, etc.), words, phrases, sentences, paragraphs, passages, or other text information that is not relevant to, or may otherwise obstruct, the detection or classification of the stamps or markings of interest, and remove (e.g., filter, etc.) it from the textual data.
  • Filtering the textual data can include applying a regular expression filter to the textual data.
  • Regular expressions may identify predetermined characters or sequences of characters, and can perform one or more operations (e.g., removal, etc.) on those characters.
  • Filtering the textual data may further include standardizing at least one word of the textual data according to a predefined dictionary. Standardizing one or more words of the textual data can include replacing word present in the text data with one deemed more appropriate by the computing system. For example, if a word includes a typographical error, the computing system may utilize a predefined dictionary to identify the correct spelling of the word, and replace the word with the typographical error to correct the error.
  • the computing system can convert the textual data to a TF-IDF vector (Step 1008).
  • the computing system can convert the textual data to a term frequency-inverse document frequency (TF-IDF) vector.
  • TF-IDF vector can include a series of weight values that each correspond to the frequency or relevance of a word or term in the textual data of the document under analysis.
  • the computing system can identify each individual word in the textual data, and determine the number of times that word appears in the textual data of the document under analysis. The number of times a word or term appears in the textual data of the document under analysis can be referred to as its term frequency.
  • the computing system can scan through every word in the textual data, and increment a counter associated with that word. If the word is not yet associated with a counter, the computing system can initialize a new counter for that word, and assign it an initialization value (e.g., one, etc.).
  • the vector converter 745 can assign a weight value to each of the words or terms in the textual data a weight value. Each of the weight values can be assigned to a coordinate in the TF-IDF vector data structure.
  • the computing system can determine if a stamp has been detected (Step 1010).
  • the computing system can detect, via a trained neural network from the pre-processed image data and the TF-IDF vector, a presence of a stamp on the document.
  • the stamp detector can utilize one or more natural language processing (NLP) models to determine that a document likely has a stamp or marking of interest.
  • NLP natural language processing
  • Certain documents may include text that indicates that the document should have one or more stamps or markings (e.g., a signature line indicating a signature may be present, a notary field indicating a notary stamp may be present, or any other type of field that might indicate a stamp or marking may have been added after printing, etc.).
  • Other documents may include language that would otherwise not indicate that a stamp is present in the document.
  • the computing system may determine that there is no such field, signature line, or other indication present.
  • the computing system can further analyze the document by applying the image data of the document to one or more object detection models.
  • the object detection models can include, for example, CNNs, DNNs, autoencoder models, fully connected neural networks, or some combination thereof.
  • the one or more object detection models can take the image data that was pre-processed by the computing system as an input.
  • the object detection models may have an input layer (e.g., a first set of neural network nodes each associated with a weight value or a bias value, etc.) that has similar dimensions to the number of pixels in the pre-processed image data. For example, if the pre-processed image data is 512 pixels wide by 512 pixels high, the input layer for the object detection model may be a two-dimensional weight tensor of with 512 rows and 512 columns.
  • an input layer e.g., a first set of neural network nodes each associated with a weight value or a bias value, etc.
  • the computing system can multiply the pixel values (e.g., the gray scale intensity of the pre-processed image data, etc.) of the pixels by the weight value tensor of the input layer.
  • the output of the first layer of the neural network can be multiplied by a tensor representing a second layer, and so on until a number is output that represents a classification value.
  • the weight values in each tensor representing the neural network layers can be trained such that the final layer will output an indication of whether the document under analysis includes one or more stamps or markings of interest.
  • the computing system can output the approximate location (e.g., pixel location) or boundaries (e.g., a box defining pixel coordinates of the boundaries of the detected stamp, etc.) of one or more stamps or markings of interest in the document under analysis. If the computing system detects the presence of one or more stamps or markings of interest, the computing system can perform step 1012 of the method 1000. Otherwise, the computing system can perform step 1026 of the method 1000.
  • the approximate location e.g., pixel location
  • boundaries e.g., a box defining pixel coordinates of the boundaries of the detected stamp, etc.
  • the computing system can extract subsets of image data (Step 1012).
  • the computing system can, responsive to detection of the presence of the stamp, extract a subset of the image data including the stamp or markings of interest. Once the presence of the stamp has been detected, the computing system can identify and extract a subset of the image data that includes the stamp. This image data may also include other information, such as text that the stamp or marking of interest happens to overlap on the document under analysis.
  • the computing system can limit and improve the performance of the classification models utilized by the computing system. In particular, the computing system can reduce the amount of input data to the classification models in later classification pipeline stages.
  • the computing system can utilize the previously identified bounding region to extract the subset of image data. For example, the computing system can extract the pixels of the image data that are within the bounding region as the subset of image data that includes the detected stamp or marking of interest
  • the computing system can extract text from the selected subset of image data (Step
  • the computing system can extract, via optical character recognition (OCR), text from the selected subset of the image data.
  • OCR optical character recognition
  • stamps or marks of interest may include identifying text, such as identifier numbers, or other information that is relevant to the classification of a particular stamp or mark.
  • the computing system can utilize OCR to identify strings of characters present in the subset of image data.
  • the computing system may implement one or more neural network models to identify one or more characters or text present in the image data of the stamp.
  • the text extracted by the computing system may be formatted or otherwise structured based on rules for expected formats of particular text information.
  • the computing system may aggregate those characters into a single text string or number, such as “101793”.
  • the computing system may perform the same process with alphabetic characters that are close enough together to be considered words, phrases, or sentences.
  • the computing system can classify the stamp or marking of interest (Step 1018).
  • the computing system can classify the selected stamp using a weighted ensemble model from the extracted text and the TF-IDF vector.
  • the ensemble of models utilized by the computing system process both text and images to detect and classify the selected stamp or marking present in the document image.
  • the combination of text and image analysis is used for classification because the stamps or markings present in document images often contain relevant identifying text in addition to identifying visual information. By synthesizing the output of text and image analysis models, the computing system can accurately classify and analyze the selected stamp or marking of interest.
  • the image classification models of the ensemble can include CNNs, DNNs, fully connected neural networks, autoencoder networks, dense neural networks, linear regression models, rule-based techniques, and other machine learning techniques usable to identify objects in images.
  • at least one image based model can include a CNN coupled with an autoencoder model with a dense neural network appended to the end.
  • one other model includes a convolutional autoencoder model that can function as form of unsupervised transfer learning.
  • the autoencoder can be fed into a dense neural network (e.g., a fully connected neural network, etc.) that acts as a classifier based on the features extracted in the autoencoder.
  • Language and text processing models of the ensemble can include dense neural networks, and other natural language processing models, such as LSTM or GRU based models. Both the text and the language processing and classification models can be computed by the computing system into a single ensemble classifier that is configured to classify stamps based on a combination of text information and image information. Weights can be applied to one or more of those models, for example through Tikhonov regularization or ridge regression, to perform weighted ensembling when creating the single ensemble classifier.
  • the ensemble classifier can be further optimized or fit onto one or more test sets (e.g., sets of document data or set of stamps to classify) by the computing system using the statistical distribution of one or more test sets.
  • Multicollinearity issues are therefore mitigated between each model by using ridge regression techniques, thus improving the efficiency in parameter estimation in used for creating a final output.
  • the computing system can use cross-validation techniques to find an optimal X value, allowing the ensemble model to filter out any unnecessary components.
  • Classifying the stamp may further include classifying the stamp as corresponding to one of a predetermined plurality of classifications.
  • the computing system may maintain a list of predetermined classifications of stamps or markings of interest.
  • the computing system can apply the text data, the TD-IDF, or the subset of the image data to one or more input layers or nodes of the ensemble model, and propagate by applying the operations (e.g., multiplication, addition, etc.) and tensor data structures specified in the model to the input values, and the output values of each sequential layer, node, or neural network structure to output a classification value for the stamp.
  • the operations e.g., multiplication, addition, etc.
  • the output value may be a vector of probability values, where each coordinate in the vector corresponds to one of the plurality of predetermined classifications of stamps.
  • the computing system can identify the coordinate in the output vector that has the largest probability value, and use the identified coordinate to look-up (e.g., in a lookup table, etc.) the corresponding classification value for the detected stamp or marking of interest.
  • the computing system can store the classification or identification of the stamp or marking of interest (Step 1020).
  • the computing system can store the subset of the image data including the stamp, the extracted text from the subset of the image data, and an identification of the classification of the stamp in a database (e.g., the database 715).
  • the computing system can access the database to create one or more entries that are associated with the document under analysis.
  • the computing system can generate a data structure that corresponds to the selected stamp or marking of interest in the document under analysis.
  • the computing system can then populate the data structure using the subset of the image data, the extracted text, and the classification of the stamp.
  • the computing system can transmit the image data, the extracted text, and the identification of the stamps that correspond to the one or more stamps or markings of interest in the analyzed document to the client device that requested analysis of the document.
  • the computing system can transmit the image data, the extracted text, and the identification of the stamp, for example, in a response message via a network (e.g., network 710).
  • the computing system can determine whether the counter register k is equal to the number of detected stamps or markings of interest n (Step 1022). To determine whether the computing system has extracted and classified each of the detected stamps or markings of interest, the computing system can compare the counter register used to select each stamp to the total number of stamps n. If the counter register k is not equal to (e.g., less than) the total number of stamps x, the computing system can execute (Step 1024). If the counter register k is equal to (e.g., equal to or greater than) the total number of stamps n , the computing system can execute (Step 1026)
  • the computing system can increment the counter register k (Step 1024). To extract and classify each of the detected stamps or markings of interest, the computing system can add one to the counter register k to indicate the number of stamps that have been classified. In some implementations, the computing system can set the counter register k to a memory address value (e.g., location in computer memory) of the next location in memory of the next unclassified stamp, for example in a data structure. If this is the first iteration of this loop, the computing system can initialize the counter register k to an initial value, for example zero, before incrementing the counter register. After incrementing the value of the counter register k. the computing system can execute (Step 1014) of the method 1000.
  • a memory address value e.g., location in computer memory
  • the computing system can finish classifying stamps or markings of interest (Step 1026). Finish classifying the stamps or markings of interest can include de-initializing or de-allocating any temporary memory values utilized in the execution of the method 1000. For example, the counter register k may be reinitialized to zero, and any temporary data structures generated by the computing system to perform any of the functionalities described herein can be de-allocated or deleted.
  • the systems and methods of this technical solution provide a combined approach to classifying both document text and image data.
  • the systems and methods described herein present a technical improvement to document, marking, and stamp classification systems by improving the accuracy of document and stamp classification.
  • the systems and methods described herein can implement cloud computing, machine learning, and domain-specific data processing techniques to effectively extract and classify stamps in document images that other systems cannot.
  • the techniques described herein can pre-process and filter un-necessary data from a document image, and utilize combined outputs of image data and text data analysis techniques to detect and extract stamps or other markings from document images.
  • the method may include receiving, by a computing device, textual data of a document and image data including a capture of the document.
  • the method may include preprocessing, by the computing device, the image data.
  • the method may include filtering, by the computing device, the textual data to remove predetermined characters.
  • the method may include converting, by the computing device, the textual data to a term frequencyinverse document frequency (TF-IDF) vector.
  • the method may include detecting, by the computing device via a trained neural network from the pre-processed image data and the TF- IDF vector, a presence of a stamp on the document.
  • TF-IDF term frequencyinverse document frequency
  • the method may include, responsive to detection of the presence of the stamp, extracting, by the computing device, a subset of the image data including the stamp.
  • the method may include extracting, by the computing device via optical character recognition, text from the subset of the image data.
  • the method may include classifying, by the computing device via a weighted ensemble model from the extracted text and the TF-IDF vector, the stamp.
  • the method may include storing, by the computing device in a database, the subset of the image data including the stamp, the extracted text from the subset of the image data, and an identification of the classification of the stamp.
  • preprocessing the image data may include converting, by the computing device, the image data to grayscale. In some implementations of the method, preprocessing the image data may include downscaling, by the computing device, the grayscale image data to a predetermined size. In some implementations of the method, filtering the textual data may further include applying a regular expression filter to the textual data.
  • filtering the textual data may further include standardizing, by the computing device, at least one word of the textual data according to a predefined dictionary.
  • classifying the stamp may further include applying a ridge regression model to the extracted text and the TF-IDF vector.
  • classifying the stamp may further include classifying the stamp as corresponding to one of a predetermined plurality of classifications.
  • the system can include a computing device comprising one or more processors and a memory, and can be configured by machine-readable instructions.
  • the system can receive, by a computing device, textual data of a document and image data including a capture of the document.
  • the system can preprocess, by the computing device, the image data.
  • the system can filter, by the computing device, the textual data to remove predetermined characters.
  • the system can convert, by the computing device, the textual data to a term frequency-inverse document frequency vector.
  • the system can detect, by the computing device via a trained neural network from the pre-processed image data and the TF-IDF vector, a presence of a stamp on the document.
  • the processor(s) may be configured to, responsive to detection of the presence of the stamp, extract, by the computing device, a subset of the image data including the stamp.
  • the system can extract, by the computing device via optical character recognition, text from the subset of the image data.
  • the system can classify, by the computing device via a weighted ensemble model from the extracted text and the TF-IDF vector, the stamp.
  • the system can store, by the computing device in a database, the subset of the image data including the stamp, the extracted text from the subset of the image data, and an identification of the classification of the stamp.
  • the system can pre-process the image data by converting the image data to grayscale. In some implementations, the system can pre-process the image data by downscaling the grayscale image data to a predetermined size. In some implementations, the system can filter the textual data by applying a regular expression filter to the textual data.
  • the system can filter the textual data by standardizing at least one word of the textual data according to a predefined dictionary.
  • the system can classify the stamp by applying a ridge regression model to the extracted text and the TF-IDF vector.
  • the system can classify the stamp by classifying the stamp as corresponding to one of a predetermined plurality of classifications.
  • At least one other aspect of this technical solution is generally directed to a non-transient computer-readable storage medium having instructions embodied thereon, the instructions being executable by one or more processors to perform a method for stamp detection and classification.
  • the method may include receiving, textual data of a document and image data including a capture of the document.
  • the method may include preprocessing, the image data.
  • the method may include filtering, the textual data to remove predetermined characters.
  • the method may include converting, the textual data to a term frequency-inverse document frequency vector.
  • the method may include detecting, via a trained neural network from the pre-processed image data and the TF-IDF vector, a presence of a stamp on the document.
  • the method may include, responsive to detection of the presence of the stamp, extracting a subset of the image data including the stamp.
  • the method may include extracting via optical character recognition, text from the subset of the image data.
  • the method may include classifying, via a weighted ensemble model from the extracted text and the TF-IDF vector, the stamp.
  • the method may include storing, in a database, the subset of the image data including the stamp, the extracted text from the subset of the image data, and an identification of the classification of the stamp.
  • preprocessing the image data may include converting the image data to grayscale. In some implementations of the method, preprocessing the image data may include downscaling the grayscale image data to a predetermined size. In some implementations of the method, filtering the textual data may further include applying a regular expression filter to the textual data.
  • filtering the textual data may further include standardizing, at least one word of the textual data according to a predefined dictionary.
  • classifying the stamp may further include applying a ridge regression model to the extracted text and the TF-IDF vector.
  • classifying the stamp may further include classifying the stamp as corresponding to one of a predetermined plurality of classifications.
  • Documents in paper form are frequently scanned by computer systems for archival purposes, classification, data retrieval, analysis, or other functions.
  • a scan of an example document 1100 may include various features 1102A-1102F which may be useful for analysis, classification, or other purposes, such as addresses 1102B-1102C, dates, values or codes 1102 A, 1102E, or other such information. Identifying and classifying important items in scanned images of documents is particularly difficult for computer-based systems, because the items of interest may be widely distributed across the page, and may appear in different locations on different pages or documents, depending on the source.
  • a date on a document may appear in the upper left on a first document, an upper right on a second document, and a lower right on a third document, and may be formatted differently, utilize different fonts and/or sizes, etc.
  • some documents may have additional annotations or inclusions 1102F, such as stamps, embossing, watermarks, handwritten notes, etc., all of which may adversely affect image recognition and classification.
  • the inclusions may be of different scales or rotations, and may be partially incomplete in many implementations (e.g. where a stamp or embossing is only partially present within a scanned image).
  • Some classification attempts utilize optical character recognition and natural language processing to parse an entire document (e.g. converting text in the document to a string, and applying regular expression filters to identify keywords).
  • implementations may be inaccurate, may be unable to deal with codes or numeric values, and may confuse similar but different documents, particularly with documents from a large collection of sources.
  • documents may include non-standard labels for annotations that may confuse such processing attempts (e.g. “period” rather than “date range”, or similar labels).
  • the systems and methods discussed herein provide an improvement to computer vision and image analysis by leveraging a priori knowledge from annotations in template documents.
  • a large library of template documents may be generated and pre-processed in many implementations to identify annotations or other inclusions commonly present on documents related to or conforming to the template.
  • Newly scanned documents may be compared to these templates, and when a similar template is identified, annotation locations and types from the template may be applied to the newly scanned document to recognize and classify annotations and inclusions.
  • Many different templates may be created for documents, including templates for different sources of documents, different types of documents, and even different pages within multi-page documents (e.g.
  • comparisons of scanned documents and template documents may be distributed amongst a plurality of computing devices for processing in parallel, with similarity results aggregated.
  • implementations of the systems and methods discussed herein receive a scanned or captured image of a page or document, referred to as a candidate image.
  • the candidate image may be pre-processed, such as by scaling, normalizing, downsampling, filtering noise, or otherwise preparing the image for comparison.
  • Structural features within the image may be detected, such as lines, boxes, text locations, etc. In some implementations, this may be done by performing edge detections (e.g. by detecting horizontal and edges, e.g. via a filtering with a Sobel operator or Laplacian operator, applying a canny edge detector, etc.).
  • unwanted pixels may be removed by conducting local maxima analysis on each pixel to filter the edge detection results.
  • a hysteresis threshold may be applied using minimum and maximum values across a window or range of pixels, to filter out non-edges.
  • Images captured from template documents may be similarly pre-processed and undergo feature extraction.
  • the extracted features of the candidate image may be compared to extracted features of each of the template document images, and a similarity score calculated.
  • a template document having a highest similarity score may be identified, and the locations of annotations (including sizes or dimensions and coordinates) within the template document may be applied to the candidate document.
  • the image of the candidate document may then be associated with these annotations (including, in some implementations, capturing images of or extracting alphanumeric characters within each annotation region, and storing the image or extracted characters in association with the candidate document).
  • one or more servers 1210, 1210’ may communicate via a network 1230 with each other and/or with one or more client devices 1202.
  • Client devices 1202 and/or servers 1210, 1210’ may comprise desktop computing devices, laptop computing devices, rackmount computing devices, tablet computing devices, wearable computing devices, embedded computing devices, appliances or clusters of appliances, or any other type and form of computing device.
  • client devices 1202 and/or servers 1210, 1210’ may comprise virtual computing devices executed by one or more computing devices (e.g. a cloud of virtual machines).
  • Client devices 1202 and servers 1210, 1210’ may comprise processors, memory, network interfaces, and/or other devices, as discussed in more detail below.
  • Client devices 1202 and servers 1210, 1210’ may communicate via one or more networks 1230, which may comprise a local area network (LAN), wide area network (WAN) such as the Internet, a cellular network, a broadband network, a satellite network, or any other type and form of network or combination of networks.
  • Network 1230 may include other devices not illustrated, including firewalls, switches, routers, access points, gateways, network accelerators, caches, or other such devices.
  • Client devices 1202 and/or other devices may store documents 100 in a data store 1206, which may comprise an internal or external storage device, including network storage devices.
  • Documents 100 or images of documents 100 may be in any suitable format, including uncompressed images or compressed images (e.g. JPEG or PNG compressed images), or other formats.
  • Documents 100 may comprise multi -page documents (e.g. a multi -page PDF file) or single page documents.
  • Client devices 1202 may execute an annotation editor 1204, which may comprise an application, server, service, daemon, routine, or other executable logic for communicating with an annotation classifier 1212 of a server 1210, and for providing candidate documents or images of candidate documents to an annotation classifier 1212.
  • annotation editor 1204 may comprise a web browser application, and an annotation classifier 1212 (discussed in more detail below) may comprise a web server or application server for providing a web application or Software-as-a-Service (SaaS) application. In some implementations, annotation editor 1204 may allow a user to edit or modify annotations generated by an annotation classifier 1212, as discussed below.
  • an annotation classifier 1212 discussed in more detail below
  • SaaS Software-as-a-Service
  • annotation editor 1204 may allow a user to edit or modify annotations generated by an annotation classifier 1212, as discussed below.
  • a server 1210 or processor of a server 1210 may execute an annotation classifier 1212.
  • Annotation classifier 1212 may comprise an application, server, service, daemon, routine, or other executable logic for receiving candidate images of documents 100 (or for receiving documents 100 and generating candidate images), pre-processing the candidate images, extracting features from the candidate images, and classifying the images according to similarity to one or more template documents.
  • an annotation classifier 1212 may receive a document or image of a document 100 (e.g. a scan, screen capture, photo, or other such image) from a client device 1202 or another device.
  • documents are received in non-image formats (e.g.
  • annotation classifier 1212 may convert the document to an image (e.g. via rendering of a screen capture, exporting images, etc.). In some implementations in which documents have multiple pages, annotation classifier 1212 may separate the document into multiple single-page documents or images.
  • Annotation classifier 1212 may comprise an image pre-processor 1214.
  • Image preprocessor 1214 may comprise an application, server, service, daemon, routine, or other executable logic for preparing a candidate image for comparison and classification.
  • image pre-processor 1214 may scale candidate images to a predetermined size (e.g. 1500 x 1500 pixels, or any other such size) corresponding to a size used for template images.
  • scaling candidate images may include stretching or shrinking a candidate image to the predetermined size, while in other implementations, scaling images may include padding a candidate image (e.g. adding black or white pixels to a border of the image to increase its size to the predetermined size) or cropping a candidate image.
  • image pre-processor 1214 may downsample a candidate image or convert the candidate image to grayscale from color, reduce a resolution of the candidate image, apply a noise filter (e.g. a 5 x 5 Gaussian noise filter) or similar filter to reduce noise or smooth variations in the candidate image, or otherwise process the candidate image for classification.
  • a noise filter e.g. a 5 x 5 Gaussian noise filter
  • Annotation classifier 1212 may also comprise a feature extractor 1216.
  • Feature extractor 1216 may comprise an application, server, service, daemon, routine, or other executable logic for detecting structural features of a candidate image.
  • structural features may refer to edges, borders, separated groups of alphanumeric characters, shapes, or other such features.
  • feature extractor 1216 may identify edges within candidate images to detect structural features.
  • feature extractor 1216 may use an edge detection filter, such as a Sobel filter or Laplacian filter, in both horizontal and vertical directions to get an edge gradient in both directions for each pixel in the image.
  • a local maxima analysis may be applied to remove unwanted pixels, and in some implementations, hysteresis thresholding may be applied with a minimum and maximum value to remove non-edges or false positive edges.
  • the detected edges may then be used as extracted features for comparison to template images. This may significantly reduce the complexity of classification versus comparing the entire image to template images, and make the overall classification and annotation process much faster.
  • the same feature extraction process may be performed on a collection of documents representing typical documents of various types (e.g. template documents), which may be artificially created or may be gathered from actual documents that have been previously classified and annotated.
  • the processed and edge detected images of the template documents may be stored in a template database 1220 and accessible by the annotation classifier 1212. These may be stored in association with identifications of each annotation, along with identifications of locations of each annotation (e.g. coordinates, dimensions, etc.).
  • a processed and edge- detected template image e.g. templatel.jpg
  • template documents may be processed well in advance of classifying candidate images, and processing may be distributed amongst a plurality of servers 1212’ for scalability and efficiency.
  • Annotation classifier 1212 may include a similarity scorer 1218.
  • Similarity scorer 1218 may comprise an application, server, service, daemon, routine, or other executable logic for comparing a feature-extracted or edge-detected candidate image with similarly feature-extracted or edge-detected template images. As discussed above, by reducing the candidate image and template images to extracted features or edges, the comparison process may be significantly simplified, reducing the time taken to perform each comparison.
  • Similarity scorer 1218 may calculate a structural similarity (SSIM) between the candidate image and a template image. For example, in some implementations, structural similarity may be calculated as a weighted combination of luminance, contrast, and structure comparisons between the candidate image and template image (e.g.
  • the SSIM may be calculated as a correlation coefficient with a range of 0 to 1 or 0 to 100%, with 0 indicating no structural similarity and 1 or 100% indicating identical structures.
  • a complex wavelet SSIM may be utilized to account for rotation or translation in the candidate image.
  • the collection of template images may be divided amongst a plurality of servers 1210’ with each providing similarity scores for a subset of template images of the collection.
  • Controller 1224 may comprise an application, server, service, daemon, routine, or other executable logic for providing the feature extracted or edge detected candidate image to a plurality of servers 1210’ and for receiving and aggregating similarity scores from the plurality of servers 1210’.
  • controller 1224 may be part of the annotation classifier 1212 or executed as a subroutine by the annotation classifier 1212.
  • controller 1224 may provide an identification of a subset of templates with which a candidate image should be compared (e.g.
  • each server 1210’ may store in a local template database 1220 a different subset of the template images and may compare the candidate image to each template image in its local template database 1220.
  • Each server 1210’ may provide the controller 1224 with an indexed set of similarity scores (e.g. “template 1; 75%; template 2: 20%; template 3: 95%; etc.), and the controller 1224 may aggregate the set of scores to identify the highest scoring or most similar template image.
  • a threshold e.g. 90%
  • the annotation classifier 1212 of the controller server may select the template image with the highest similarity score of the aggregated set of similarity scores as being a likely match for the candidate image and document.
  • Annotation classifier 1212 may include a document editor 1222.
  • Document editor 1222 may comprise an application, server, service, daemon, routine, or other executable logic for modifying a candidate document according to annotations associated with a selected template image (e.g. locations and dimensions of annotations and text labels for each annotation). For example, in some implementations, document editor 1222 may add annotations and/or labels from the selected template document to metadata of the candidate document. In other implementations, the document editor 1222 may add annotations to the candidate image (e.g. the original candidate image, before pre-processing or feature extraction) directly (e.g. highlighting annotation regions with a semi-transparent overlay or color, adding boxes around annotation regions, adding text labels adjacent to each annotation region, etc.).
  • the modified candidate image may be provided to the client device 1202 for viewing (and in some implementations editing) in an annotation editor 1204 or other such application.
  • document editor 1222 may perform optical character recognition on the candidate image for each annotation region identified in the selected template (e.g. within a region defined by coordinates or dimensions for the annotation), and may in some implementations, apply natural language processing or apply a regular expression filter to extracted characters from the annotation region. This may avoid performing such OCR and processing to the entire document, or may limit the amount of data to be processed to a small amount corresponding to the annotation region. In some implementations, such processing or filtering may be specific to the annotation type. For example, a first regular expression filter may be used to process alphanumeric characters extracted from a first annotation region, while a second regular expression filter may be used to process alphanumeric characters extracted from a second annotation region.
  • Such filters may be specified in the template document or associated annotation document (e.g. JSON file, XML file, etc.). This may allow for more specific processing for different types of annotations that conform to different syntaxes and rules (e.g. dates, addresses, identification codes, etc.).
  • the extracted and processed text may be added to metadata of the candidate document.
  • FIG. 13 is a flow chart of a method 1300 for automatic context-based annotation, according to some implementations.
  • a computing system may receive a candidate image.
  • the computing system may receive a scan or capture of a document, while in other implementations, the computing system may receive a document (e.g. PDF document, word processing document, etc.) and may generate a candidate image of the document (e.g. rendering as or exporting the document to an image).
  • the computing system may receive the document or candidate image from local storage, from remote storage or a network attached storage device, from a remote server (e.g. application server) or another annotation server, or from a client device, or any other such means.
  • a remote server e.g. application server
  • the computing system may pre-process the candidate image. Pre-processing the candidate image may comprise scaling the candidate image to a predetermined size corresponding to a template size (e.g. 1500 x 1500 pixels, 1000 x 800 pixels, or any other such size); downscaling or converting the candidate image to grayscale; increasing or reducing a resolution of the candidate image to match a predetermined resolution or resolution of template images; cropping or padding the candidate image; applying a noise filter (e.g. a 5 x 5 Gaussian noise filter, or any other such value) or smoothing the image; or otherwise preparing the candidate image for feature selection.
  • the computing system may identify structures in the candidate image. In some implementations, structures may be identified via an edge detection algorithm.
  • the image may be filtered via a Sobel process or Laplacian process or other edge detector in both vertical and horizontal directions to generate an edge gradient in both directions for each pixel.
  • a low pass filter or local maxima analysis may be applied to remove unwanted pixels, and hysteresis thresholding may be applied to remove false positives and identify true edges.
  • a template image from a collection or a plurality of template images may be selected for comparison.
  • the template images may have the same size, bit depth, and resolution as the pre-processed candidate image, and the same structural feature extraction discussed at step 1306 may be applied to identify structures in the template image. Accordingly, at step 1308, a pre-processed and feature-extracted or edge- detected template image from a plurality of template images may be selected for comparison to the candidate image.
  • the computing system may calculate a similarity score between the candidate image and the template image.
  • the similarity score may be calculated as a perceptual difference between the two images, as discussed above, e.g. as a weighted combination of averages, variances, and covariances of luminance, contrast, and structure within sliding windows or blocks across the images.
  • other comparisons may be used, such as a mean-square error (MSE) between the candidate image and template image, or other such methods.
  • MSE mean-square error
  • the similarity score may be calculated with a range from 0 to 100%, or normalized to fall within this range, in many implementations.
  • the computing system may determine whether the calculated score exceeds a threshold, such as 75%, 85%, or 90%. If so, then at step 1314, an identification of the template image (or template) may be added to a similar template list. At step 1316, the computing system may determine whether additional template images exist for comparison. If so, then steps 1308-1316 may be repeated iteratively for each additional template image.
  • a threshold such as 75%, 85%, or 90%.
  • the comparison steps 1308-1316 may be partitioned or distributed across a plurality of computing systems or servers in parallel. Accordingly, in such implementations, the candidate image (or processed and feature-extracted) candidate image may be provided to each system, and each system may perform steps 1308-1316 for different subsets of template images (either identified when the candidate image is provided, or according to a previously set distribution, such as a subset of template images in a local database of the corresponding server or system).
  • the servers may provide their similar template lists and similarity scores to a host or controller system for aggregation, as discussed above.
  • the computing system may identify a highest scoring template in the similar template list.
  • a threshold utilized at step 1312, if no similarity score exceeds the threshold, method 1300 may exit and return an error or indication that no similar template was identified.
  • the computing system may retrieve annotation data associated with the selected template.
  • the annotation data may comprise identifications of locations of one or more annotation regions (e.g. by coordinates, dimensions, etc.).
  • the annotation data may include annotation labels, such as “date”, “address”, “policy”, etc. or any other such label.
  • the annotation data may include an identification of a regular expression or filter for parsing alphanumeric characters within the annotation region according to an annotation-specific syntax (e.g.
  • a policy identifier should be n characters in length, and start with a letter; that an address should include a state identifier, zip code, common dividers such as line breaks or commas, etc.; or any other type and form of filter or parsing rules).
  • the computing system may modify the candidate document and/or pre- processed or original candidate image according to the retrieved annotation data.
  • the computing system may add visible highlighting, boxes, or other indicators to identify annotation regions within the candidate image; may add text labels according to the retrieved annotation data; and/or may perform optical character recognition within the annotation regions and, in some implementations, apply filters or regular expressions to parse the extracted characters.
  • extracted characters may be added in association with corresponding annotation labels to metadata of the candidate document.
  • the labeled and annotated document may be provided to a client device for review, editing (e.g. modification of extracted alphanumeric annotations, modification of labels, adjustment of annotation regions, etc.), or analysis. Edits to the annotations may be provided to the computing system from the client device, in some implementations, and stored in association with the labeled and annotated document (e.g. in metadata or associated data).
  • the systems and methods discussed herein may be utilized with any type and form of document, including any type and form of annotation.
  • these systems and methods may be used for identification and classification of annotations on medical records, mortgage documents, legal documents, financial documents, instructional documents, examination forms, journals, or any other type and form of document that conforms to a template or typical style (e.g. with specified regions for annotations or entries).
  • such documents may be structured, semi-structured, or unstructured.
  • Structured documents may include specified fields to be filled with particular values or codes with fixed or limited lengths, such as tax forms or similar records.
  • Unstructured documents may fall into particular categories, but have few or no specified fields, and may comprise text or other data of any length, such as legal or mortgage documents, deeds, complaints, etc.
  • Semi-structured documents may include a mix of structured and unstructured fields, with some fields having associated definitions or length limitations and other fields having no limits, such as invoices, policy documents, etc.
  • Techniques that may be used to identify content in some documents, such as structured documents in which optical character recognition may be applied in predefined regions with associated definitions, may not work on semi-structured or unstructured documents.
  • annotated documents may be utilized as a training set for machine learning-based document identification or classification systems. For example, automatically annotated documents may be verified for accuracy in some implementations, and then provided as training data for a supervised learning system to be trained to recognize and classify other similar documents, including those without such annotations.
  • implementations of the automatic annotation systems discussed herein may analyze structural features of documents; however, once annotations are identified and classified, other non- structural data from the document such as text, images, specific values within fields, etc. may be used to classify other documents that lack such structural features.
  • annotations may be present on the first page of a multi-page document, but absent on other pages of the document (such as a form with multiple fields on a first page, and then subsequent pages left blank for additional text entry).
  • Non- structural features of the annotated page such as grammar or syntax, word choices, embossing or stamps, watermarks, references to particular entities, etc. may be used to identify subsequent pages as related to the same document via a trained machine learning system, such as an artificial neural network or Bayesian classifier. These subsequent pages may then be associated with the same annotations of the annotated page or classified as related to the same document type.
  • the systems and methods discussed herein allow for disambiguation and identification of annotations regardless of location, even for similar documents (e.g. such as records of one type from one source compared to records of the same type from a different source, utilizing a different template to provide similar information).
  • implementations of the systems and methods discussed herein may be significantly faster and more scalable, both by utilizing a priori templates for comparison with a limited set of features extracted from a document, and by partitioning such templates across a plurality of systems for comparison in parallel. Accordingly, implementations of these systems and methods provide an improvement in computer vision and document recognition and classification.
  • the present disclosure is directed to a method for automatic context-based document annotation.
  • the method includes receiving, by a computing system, a candidate image of a document for annotation identification.
  • the method also includes detecting, by the computing system, a set of structural features of the candidate image.
  • the method also includes, for each of a plurality of template images, calculating, by the computing system, a similarity score between structural features of the template image and the detected set of structural features.
  • the method also includes selecting, by the computing system, a template image having a highest similarity score.
  • the method also includes populating, by the computing system, the candidate image with one or more annotation labels according to a corresponding one or more annotation labels of the selected template image.
  • the method includes scaling the candidate image to a size corresponding to a size of the template images.
  • the method includes filtering noise from the candidate image according to a predetermined window; and detecting horizontal and vertical edges within the noise-filtered candidate image.
  • the method includes, for each pixel, determining a horizontal and vertical edge gradient; filtering pixels according to a local maxima; and applying hysteresis thresholding to the filtered pixels.
  • the method includes detecting horizontal and vertical edges within each template image of the plurality of template images; and calculating the similarity score between the structural features of each template image and the detected set of structural features further comprises determining a correlation coefficient between the structural features of each template image and the detected set of structural features.
  • the method includes providing, to each of an additional one or more computing devices, the candidate image and an identification of a subset of template images; and receiving, from each of the additional one or more computing devices, a similarity score between the candidate image and each template image of the corresponding subset of template images.
  • the method includes, for each of the one or more annotation labels, retrieving coordinates and dimensions of the annotation label within the selected template image.
  • the method includes, for each of the one or more annotation labels: extracting alphanumeric text from the candidate image within the retrieved coordinates and dimensions of the annotation label; and adding the extracted alphanumeric text to metadata of the candidate image in association with an identification of the annotation label.
  • the method includes applying optical character recognition to a portion of the candidate image within the retrieved coordinates and dimensions.
  • the method includes receiving a modification to the extracted alphanumeric text; and storing the modified alphanumeric text in metadata of the candidate image.
  • the present application is directed to a system for automatic contextbased document annotation.
  • the system includes a first computing system comprising a processor executing an annotation classifier.
  • the annotation classifier is configured to: receive a candidate image of a document for annotation identification; detect a set of structural features of the candidate image; for each of a plurality of template images, calculate a similarity score between structural features of the template image and the detected set of structural features; select a template image having a highest similarity score; and populate the candidate image with one or more annotation labels according to a corresponding one or more annotation labels of the selected template image.
  • the annotation classifier is further configured to scale the candidate image to a size corresponding to a size of the template images. In some implementations, the annotation classifier is further configured to filter noise from the candidate image according to a predetermined window; and detect horizontal and vertical edges within the noise-filtered candidate image. In a further implementation, the annotation classifier is further configured to detect horizontal and vertical edges by: for each pixel, determining a horizontal and vertical edge gradient; filtering pixels according to a local maxima; and applying hysteresis thresholding to the filtered pixels.
  • annotation classifier is further configured to detect horizontal and vertical edges within each template image of the plurality of template images; and calculating the similarity score between the structural features of each template image and the detected set of structural features further comprises determining a correlation coefficient between the structural features of each template image and the detected set of structural features.
  • the annotation classifier is further configured to: provide, to each of an additional one or more computing devices, the candidate image and an identification of a subset of template images; and receive, from each of the additional one or more computing devices, a similarity score between the candidate image and each template image of the corresponding subset of template images.
  • the annotation classifier is further configured to, for each of the one or more annotation labels, retrieving coordinates and dimensions of the annotation label within the selected template image.
  • the annotation classifier is further configured to, for each of the one or more annotation labels: extract alphanumeric text from the candidate image within the retrieved coordinates and dimensions of the annotation label; and add the extracted alphanumeric text to metadata of the candidate image in association with an identification of the annotation label.
  • the annotation classifier is further configured to apply optical character recognition to a portion of the candidate image within the retrieved coordinates and dimensions.
  • the annotation classifier is further configured to receive a modification to the extracted alphanumeric text; and store the modified alphanumeric text in metadata of the candidate image.
  • FIGs. 14A and 14B depict block diagrams of a computing device 1400 useful for practicing an embodiment of the wireless communication devices 1402 or the access point 1406.
  • each computing device 1400 includes a central processing unit 1421, and a main memory unit 1422. As shown in FIG.
  • a computing device 1400 may include a storage device 1428, an installation device 1416, a network interface 1418, an I/O controller 1423, display devices 1424a-1424n, a keyboard 1426 and a pointing device 1427, such as a mouse.
  • the storage device 1428 may include, without limitation, an operating system and/or software.
  • each computing device 1400 may also include additional optional elements, such as a memory port 1403, a bridge 1470, one or more input/output devices 1430a-1430n (generally referred to using reference numeral 1430), and a cache memory 1440 in communication with the central processing unit 1421.
  • the central processing unit 1421 is any logic circuitry that responds to and processes instructions fetched from the main memory unit 1422.
  • the central processing unit 1421 is provided by a microprocessor unit, such as: those manufactured by Intel Corporation of Mountain View, California; those manufactured by International Business Machines of White Plains, New York; or those manufactured by Advanced Micro Devices of Sunnyvale, California.
  • the computing device 1400 may be based on any of these processors, or any other processor capable of operating as described herein.
  • a central processing unit or CPU may comprise a graphics processing unit or GPU (which may be useful not just for graphics processing, but for the types of parallel calculations frequently required for neural networks or other machine learning systems), a tensor processing unit or TPU (which may comprise a machine learning accelerating application-specific integrated circuit (ASIC), or other such processing units.
  • a system may comprise a plurality of processing devices of different types (e.g. one or more CPUs, one or more GPUs, and/or one or more TPUs).
  • Processing devices may also be virtual processors (e.g. vCPUs) provided by a virtual machine managed by a hypervisor of a physical computing device and deployed as a service or cloud or in similar architectures.
  • Main memory unit 1422 may be one or more memory chips capable of storing data and allowing any storage location to be directly accessed by the microprocessor 1421, such as any type or variant of Static random access memory (SRAM), Dynamic random access memory (DRAM), Ferroelectric RAM (FRAM), NAND Flash, NOR Flash and Solid State Drives (SSD).
  • the main memory 1422 may be based on any of the above described memory chips, or any other available memory chips capable of operating as described herein.
  • the processor 1421 communicates with main memory 1422 via a system bus 1450 (described in more detail below).
  • FIG. 14B depicts an embodiment of a computing device 1400 in which the processor communicates directly with main memory 1422 via a memory port 1403.
  • the main memory 1422 may be DRDRAM.
  • FIG. 14B depicts an embodiment in which the main processor 1421 communicates directly with cache memory 1440 via a secondary bus, sometimes referred to as a backside bus.
  • the main processor 1421 communicates with cache memory 1440 using the system bus 1450.
  • Cache memory 1440 typically has a faster response time than main memory 1422 and is provided by, for example, SRAM, BSRAM, or EDRAM.
  • the processor 1421 communicates with various I/O devices 1430 via a local system bus 1450.
  • FIG. 14B depicts an embodiment of a computer 1400 in which the main processor 1421 may communicate directly with VO device 1430b, for example via HYPERTRANSPORT, RAPIDIO, or INFINIBAND communications technology.
  • FIG. 14B also depicts an embodiment in which local busses and direct communication are mixed: the processor 1421 communicates with VO device 1430a using a local interconnect bus while communicating with I/O device 1430b directly.
  • a wide variety of VO devices 1430a- 143 On may be present in the computing device 1400.
  • Input devices include keyboards, mice, trackpads, trackballs, microphones, dials, touch pads, touch screen, and drawing tablets.
  • Output devices include video displays, speakers, inkjet printers, laser printers, projectors and dye-sublimation printers.
  • the VO devices may be controlled by an I/O controller 1423 as shown in FIG. 14A.
  • the I/O controller may control one or more I/O devices such as a keyboard 1426 and a pointing device 1427, e.g., a mouse or optical pen.
  • an I/O device may also provide storage and/or an installation medium 1416 for the computing device 1400.
  • the computing device 1400 may provide USB connections (not shown) to receive handheld USB storage devices such as the USB Flash Drive line of devices manufactured by Twintech Industry, Inc. of Los Alamitos, California.
  • the computing device 1400 may support any suitable installation device 1416, such as a disk drive, a CD-ROM drive, a CD-R/RW drive, a DVD- ROM drive, a flash memory drive, tape drives of various formats, USB device, hard-drive, a network interface, or any other device suitable for installing software and programs.
  • the computing device 1400 may further include a storage device, such as one or more hard disk drives or redundant arrays of independent disks, for storing an operating system and other related software, and for storing application software programs such as any program or software 1420 for implementing (e.g., configured and/or designed for) the systems and methods described herein.
  • any of the installation devices 1416 could also be used as the storage device.
  • the operating system and the software can be run from a bootable medium.
  • the computing device 1400 may include a network interface 1418 to interface to the network 1404 through a variety of connections including, but not limited to, standard telephone lines, LAN or WAN links (e.g., 802.11, Tl, T3, 56kb, X.25, SNA, DECNET), broadband connections (e.g., ISDN, Frame Relay, ATM, Gigabit Ethernet, Ethernet- over-SONET), wireless connections, or some combination of any or all of the above.
  • standard telephone lines LAN or WAN links (e.g., 802.11, Tl, T3, 56kb, X.25, SNA, DECNET), broadband connections (e.g., ISDN, Frame Relay, ATM, Gigabit Ethernet, Ethernet- over-SONET), wireless connections, or some combination of any or all of the above.
  • LAN or WAN links e.g., 802.11, Tl, T3, 56kb, X.25, SNA, DECNET
  • broadband connections e.g., ISDN, Frame Re
  • Connections can be established using a variety of communication protocols (e.g., TCP/IP, IPX, SPX, NetBIOS, Ethernet, ARCNET, SONET, SDH, Fiber Distributed Data Interface (FDDI), RS232, IEEE 802.11, IEEE 802.11a, IEEE 802.11b, IEEE 802.11g, IEEE 802.1 In, IEEE 802.1 lac, IEEE 802.1 lad, CDMA, GSM, WiMax and direct asynchronous connections).
  • the computing device 1400 communicates with other computing devices 1400’ via any type and/or form of gateway or tunneling protocol such as Secure Socket Layer (SSL) or Transport Layer Security (TLS).
  • SSL Secure Socket Layer
  • TLS Transport Layer Security
  • the network interface 1418 may include a built-in network adapter, network interface card, PCMCIA network card, card bus network adapter, wireless network adapter, USB network adapter, modem or any other device suitable for interfacing the computing device 1400 to any type of network capable of communication and performing the operations described herein.
  • the computing device 1400 may include or be connected to one or more display devices 1424a-1424n.
  • any of the I/O devices 1430a-1430n and/or the I/O controller 1423 may include any type and/or form of suitable hardware, software, or combination of hardware and software to support, enable or provide for the connection and use of the display device(s) 1424a-1424n by the computing device 1400.
  • the computing device 1400 may include any type and/or form of video adapter, video card, driver, and/or library to interface, communicate, connect or otherwise use the display device(s) 1424a-1424n.
  • a video adapter may include multiple connectors to interface to the display device(s) 1424a- 1424n.
  • the computing device 1400 may include multiple video adapters, with each video adapter connected to the display device(s) 1424a-1424n.
  • any portion of the operating system of the computing device 1400 may be configured for using multiple displays 1424a-1424n.
  • a computing device 1400 may be configured to have one or more display devices 1424a-1424n.
  • an I/O device 1430 may be a bridge between the system bus 1450 and an external communication bus, such as a USB bus, an Apple Desktop Bus, an RS-232 serial connection, a SCSI bus, a FireWire bus, a FireWire 800 bus, an Ethernet bus, an AppleTalk bus, a Gigabit Ethernet bus, an Asynchronous Transfer Mode bus, a FibreChannel bus, a Serial Attached small computer system interface bus, a USB connection, or a HDMI bus.
  • an external communication bus such as a USB bus, an Apple Desktop Bus, an RS-232 serial connection, a SCSI bus, a FireWire bus, a FireWire 800 bus, an Ethernet bus, an AppleTalk bus, a Gigabit Ethernet bus, an Asynchronous Transfer Mode bus, a FibreChannel bus, a Serial Attached small computer system interface bus, a USB connection, or a HDMI bus.
  • a computing device 1400 of the sort depicted in FIGs. 14A and 14B may operate under the control of an operating system, which control scheduling of tasks and access to system resources.
  • the computing device 1400 can be running any operating system such as any of the versions of the MICROSOFT WINDOWS operating systems, the different releases of the Unix and Linux operating systems, any version of the MAC OS for Macintosh computers, any embedded operating system, any real-time operating system, any open source operating system, any proprietary operating system, any operating systems for mobile computing devices, or any other operating system capable of running on the computing device and performing the operations described herein.
  • Typical operating systems include, but are not limited to: Android, produced by Google Inc.; WINDOWS 7 and 8, produced by Microsoft Corporation of Redmond, Washington; MAC OS, produced by Apple Computer of Cupertino, California; WebOS, produced by Research In Motion (RIM); OS/2, produced by International Business Machines of Armonk, New York; and Linux, a freely-available operating system distributed by Caldera Corp, of Salt Lake City, Utah, or any type and/or form of a Unix operating system, among others.
  • Android produced by Google Inc.
  • WINDOWS 7 and 8 produced by Microsoft Corporation of Redmond, Washington
  • MAC OS produced by Apple Computer of Cupertino, California
  • WebOS produced by Research In Motion (RIM)
  • OS/2 produced by International Business Machines of Armonk, New York
  • Linux a freely-available operating system distributed by Caldera Corp, of Salt Lake City, Utah, or any type and/or form of a Unix operating system, among others.
  • the computer system 1400 can be any workstation, telephone, desktop computer, laptop or notebook computer, server, handheld computer, mobile telephone or other portable telecommunications device, media playing device, a gaming system, mobile computing device, or any other type and/or form of computing, telecommunications or media device that is capable of communication.
  • the computer system 1400 has sufficient processor power and memory capacity to perform the operations described herein.
  • the computing device 1400 may have different processors, operating systems, and input devices consistent with the device.
  • the computing device 1400 is a smart phone, mobile device, tablet or personal digital assistant.
  • the computing device 1400 is an Android-based mobile device, an iPhone smart phone manufactured by Apple Computer of Cupertino, California, or a Blackberry or WebOS-based handheld device or smart phone, such as the devices manufactured by Research In Motion Limited.
  • the computing device 1400 can be any workstation, desktop computer, laptop or notebook computer, server, handheld computer, mobile telephone, any other computer, or other form of computing or telecommunications device that is capable of communication and that has sufficient processor power and memory capacity to perform the operations described herein.
  • software functionality or executable logic for execution by one or more processors of the system may be provided in any suitable format.
  • logic instructions may be provided as native executable code, as instructions for a compiler of the system, or in a package or container for deployment on a virtual computing system (e.g. a Docker container, a Kubernetes Engine (GKE) container, or any other type of deployable code).
  • Containers may comprise standalone packages comprising all of the executable code necessary to run an application, including code for the application itself, code for system tools or libraries, preferences, settings, assets or resources, or other features.
  • containers may be platform or operating system agnostic.
  • a docker engine executed by a single host operating system and underlying hardware may execute a plurality of containerized applications, reducing resources necessary to provide the applications relative to virtual machines for each application (each of which may require a guest operating system).
  • the disclosure may reference one or more “users”, such “users” may refer to user-associated devices or stations (STAs), for example, consistent with the terms “user” and “multi-user” typically used in the context of a multi-user multipleinput and multiple-output (MU-MIMO) environment.
  • STAs user-associated devices or stations
  • communications systems described above may include devices and APs operating according to an 802.11 standard
  • embodiments of the systems and methods described can operate according to other standards and use wireless communications devices other than devices configured as devices and APs.
  • multiple-unit communication interfaces associated with cellular networks, satellite communications, vehicle communication networks, and other non-802.11 wireless networks can utilize the systems and methods described herein to achieve improved overall capacity and/or link quality without departing from the scope of the systems and methods described herein.
  • first and second in connection with devices, mode of operation, transmit chains, antennas, etc., for purposes of identifying or differentiating one from another or from others. These terms are not intended to merely relate entities (e.g., a first device and a second device) temporally or according to a sequence, although in some cases, these entities may include such a relationship. Nor do these terms limit the number of possible entities (e.g., devices) that may operate within a system or environment.
  • the systems and methods described above may be provided as one or more computer-readable programs or executable instructions embodied on or in one or more articles of manufacture.
  • the article of manufacture may be a floppy disk, a hard disk, a CD-ROM, a flash memory card, a PROM, a RAM, a ROM, or a magnetic tape.
  • the computer-readable programs may be implemented in any programming language, such as LISP, PERL, C, C++, C#, PROLOG, or in any byte code language such as JAVA.
  • the software programs or executable instructions may be stored on or in one or more articles of manufacture as object code.

Abstract

Selon certains aspects, la présente divulgation concerne des procédés et des systèmes de classification de documents basée sur l'apprentissage automatique mettant en œuvre de multiples classifieurs. Divers classifieurs peuvent être utilisés pendant différentes itérations du processus pour promouvoir la classification d'un document. Le document peut être classé et étiqueté en réponse de la concordance d'un nombre prédéfini de classifieurs eu égard à une étiquette significative. En outre, l'étiquette significative peut uniquement être appliquée au document dans le cas où les classifieurs ont prédit l'étiquette de document avec un score de confiance dépassant une valeur seuil.
PCT/US2021/045505 2020-08-11 2021-08-11 Systèmes et procédés de classification de documents basée sur l'apprentissage automatique WO2022035942A1 (fr)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US16/990,900 US11928877B2 (en) 2020-08-11 2020-08-11 Systems and methods for automatic context-based annotation
US16/990,892 US11361528B2 (en) 2020-08-11 2020-08-11 Systems and methods for stamp detection and classification
US16/990,892 2020-08-11
US16/990,900 2020-08-11
US16/998,682 US20220058496A1 (en) 2020-08-20 2020-08-20 Systems and methods for machine learning-based document classification
US16/998,682 2020-08-20

Publications (1)

Publication Number Publication Date
WO2022035942A1 true WO2022035942A1 (fr) 2022-02-17

Family

ID=80247351

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2021/045505 WO2022035942A1 (fr) 2020-08-11 2021-08-11 Systèmes et procédés de classification de documents basée sur l'apprentissage automatique

Country Status (1)

Country Link
WO (1) WO2022035942A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114663903A (zh) * 2022-05-25 2022-06-24 深圳大道云科技有限公司 文本资料的分类方法、装置、设备及存储介质
CN114844778A (zh) * 2022-04-25 2022-08-02 中国联合网络通信集团有限公司 核心网的异常检测方法、装置、电子设备及可读存储介质
CN115878807A (zh) * 2023-02-27 2023-03-31 中关村科学城城市大脑股份有限公司 一种基于城市大脑的一网通办案件分类方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060218134A1 (en) * 2005-03-25 2006-09-28 Simske Steven J Document classifiers and methods for document classification
US20100082642A1 (en) * 2008-09-30 2010-04-01 George Forman Classifier Indexing
JP2014238626A (ja) * 2013-06-06 2014-12-18 株式会社日立ソリューションズ 文書分類装置
US9058382B2 (en) * 2005-11-14 2015-06-16 Microsoft Technology Licensing, Llc Augmenting a training set for document categorization
US20170116325A1 (en) * 2009-07-28 2017-04-27 Fti Consulting, Inc. Computer-Implemented System And Method For Inclusion-Based Electronically Stored Information Item Cluster Visual Representation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060218134A1 (en) * 2005-03-25 2006-09-28 Simske Steven J Document classifiers and methods for document classification
US9058382B2 (en) * 2005-11-14 2015-06-16 Microsoft Technology Licensing, Llc Augmenting a training set for document categorization
US20100082642A1 (en) * 2008-09-30 2010-04-01 George Forman Classifier Indexing
US20170116325A1 (en) * 2009-07-28 2017-04-27 Fti Consulting, Inc. Computer-Implemented System And Method For Inclusion-Based Electronically Stored Information Item Cluster Visual Representation
JP2014238626A (ja) * 2013-06-06 2014-12-18 株式会社日立ソリューションズ 文書分類装置

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114844778A (zh) * 2022-04-25 2022-08-02 中国联合网络通信集团有限公司 核心网的异常检测方法、装置、电子设备及可读存储介质
CN114844778B (zh) * 2022-04-25 2023-05-30 中国联合网络通信集团有限公司 核心网的异常检测方法、装置、电子设备及可读存储介质
CN114663903A (zh) * 2022-05-25 2022-06-24 深圳大道云科技有限公司 文本资料的分类方法、装置、设备及存储介质
CN115878807A (zh) * 2023-02-27 2023-03-31 中关村科学城城市大脑股份有限公司 一种基于城市大脑的一网通办案件分类方法及系统

Similar Documents

Publication Publication Date Title
Preethi et al. An effective digit recognition model using enhanced convolutional neural network based chaotic grey wolf optimization
US20160253597A1 (en) Content-aware domain adaptation for cross-domain classification
Faraki et al. Fisher tensors for classifying human epithelial cells
US20220058496A1 (en) Systems and methods for machine learning-based document classification
US11521372B2 (en) Utilizing machine learning models, position based extraction, and automated data labeling to process image-based documents
US10867169B2 (en) Character recognition using hierarchical classification
US11954139B2 (en) Deep document processing with self-supervised learning
Mohamed et al. Content-based image retrieval using convolutional neural networks
WO2022035942A1 (fr) Systèmes et procédés de classification de documents basée sur l'apprentissage automatique
US20230101817A1 (en) Systems and methods for machine learning-based data extraction
US11830233B2 (en) Systems and methods for stamp detection and classification
US20170076152A1 (en) Determining a text string based on visual features of a shred
Shetty et al. Segmentation and labeling of documents using conditional random fields
CN110008365B (zh) 一种图像处理方法、装置、设备及可读存储介质
US20220375090A1 (en) Generating improved panoptic segmented digital images based on panoptic segmentation neural networks that utilize exemplar unknown object classes
Pengcheng et al. Chinese calligraphic style representation for recognition
Kumari et al. A review of deep learning techniques in document image word spotting
US11699044B1 (en) Apparatus and methods for generating and transmitting simulated communication
Nock et al. Boosting k-NN for categorization of natural scenes
Naseer et al. Meta‐feature based few‐shot Siamese learning for Urdu optical character recognition
Kumar et al. Bayesian background models for keyword spotting in handwritten documents
Bose et al. Light Weight Structure Texture Feature Analysis for Character Recognition Using Progressive Stochastic Learning Algorithm
Evangelou et al. PU learning-based recognition of structural elements in architectural floor plans
Sharma et al. Optical Character Recognition Using Hybrid CRNN Based Lexicon-Free Approach with Grey Wolf Hyperparameter Optimization
Jangpangi et al. Handwriting recognition using wasserstein metric in adversarial learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21856626

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21856626

Country of ref document: EP

Kind code of ref document: A1