AU2021101072A4

AU2021101072A4 - DIP-Intelligent Neural Networks: Digital Image Processing Using Intelligent Neural Networks System

Info

Publication number: AU2021101072A4
Application number: AU2021101072A
Authority: AU
Inventors: K. Anuradha; S. B. Chordiya; G. Karuna; Ishan Y. Pandya; Vrushsen Purushottam Pawar; Beg Raj; B. K. Sarkar; Pari Nidhi Singh; Pawan Kumar Singh; Kotte Sowjanya
Original assignee: Individual
Current assignee: Individual
Priority date: 2021-02-27
Filing date: 2021-02-27
Publication date: 2021-04-29
Anticipated expiration: 2029-02-27

Abstract

Our Invention" DIP-Intelligent Neural Networks" is a digital image may be processed by an ensemble of convolutional neural networks (CNNs) to classify objects in the digital image and each CNN, a candidate architecture and candidate parameters may be selected to build a plurality of CNNs. The invented technology also includes the predetermined number of CNNs, each having different values for the selected candidate parameters meet a validation threshold an ensemble of CNNs may be generated from the predetermined number of CNNs and the predictions from the ensemble of CNNs may then be aggregated to accurately classify the objects in the digital image. The DIP-Intelligent Neural Networks is a digital image processing typically involves processing a digital image, for example, from a digital still image or digital video, to ascertain, detect, and classify particular features or objects in the image and the pattern recognition may be applied during the image processing to detect a particular object in the image. The invented technology also a digital image processing with pattern recognition has been used in a wide variety of applications, such as facial recognition, detection of land features from aerial photographs, vehicle license plate determination, etc. 18 Create a traiin set foimages of Select at cilt achitetr from a Select candidate parameters for the I---Pedetermine selected candidate architecture number o flagge intermediate CN S?. architecture and selected candidate eeie une par"" er inedtennied nube oN f Buil an intermediate conoluioal Classif the extnt of damgfo the enaml of Intermnediate CNs Nvalidation ihreshold?FlgteierdaeCN A rtificial Intelligence Dep Natura naguage McieNural Compuer Supervie Unsupervised Reinfrcment Sern-supervised according to an example of the present disclosure.

Description

of set foimages Create a traiin

Select at cilt achitetr from a

Select candidate parameters for the I---Pedetermine selected candidate architecture number o flagge intermediate CN S?.

architecture and selected candidate eeie une par"" er inedtennied nube oN f

Buil an intermediate conoluioal Classif the extnt of damgfo the

enaml of Intermnediate CNs

Nvalidation ihreshold?FlgteierdaeCN

A rtificial Intelligence

Dep Natura naguage McieNural Compuer

Supervie Unsupervised Reinfrcment Sern-supervised

according to an example of the present disclosure.

DIP-Intelligent Neural Networks: Digital Image Processing Using Intelligent Neural Networks System

FIELD OF THE INVENTION

Our Invention "DIP-Intelligent Neural Networks" is related to a digital image processing using intelligent neural networks system.

BACKGROUND OF THE INVENTION

Digital image processing typically involves processing a digital image, for example, from a digital still image or digital video, to ascertain, detect, and/or classify particular features or objects in the image. Pattern recognition may be applied during the image processing to detect a particular object in the image. Digital image processing with pattern recognition has been used in a wide variety of applications, such as facial recognition, detection of land features from aerial photographs, vehicle license plate determination, etc. Different types of conventional machine learning functions may be used for pattern recognition, however, many conventional machine learning functions are not adapted or may be difficult to adapt for pattern recognition in digital image processing.

This disclosure relates generally to the field of image processing and, more particularly, to various techniques for object detection and recognition within digital images using a split processing pipeline operating in both high-resolution and low-resolution modes concurrently.

The advent of portable integrated computing devices has caused a wide-spread proliferation of digital cameras. These integrated computing devices commonly take the form of smartphones or tablets and typically include general purpose computers, cameras, sophisticated user interfaces including touch-sensitive screens, and wireless communications abilities through Wi-Fi, LTE, HSDPA and other cell-based or wireless technologies.

The wide proliferation of these integrated devices provides opportunities to use the devices' capabilities to perform tasks that would otherwise require dedicated hardware and software. For example, as noted above, integrated devices such as smartphones and tablets typically have one or two embedded cameras. These cameras comprise lens/camera hardware modules that may be controlled through the general purpose computer using system software and/or downloadable software (e.g., "Apps") and a user interface including, e.g., programmable buttons placed on the touch-sensitive screen and/or "hands-free" controls such as voice controls.

One opportunity for using the features of an integrated device is to capture and evaluate images. The devices' camera(s) allows the capture of one or more images, and the general purpose computer provides processing power to perform analysis. In addition, any analysis that is performed for a network service computer can be facilitated by transmitting the image data or other data to a service computer (e.g., a server, a website, or other network-accessible computer) using the communications capabilities of the device.

These abilities of integrated devices allow for recreational, commercial and transactional uses of images and image analysis. For example, images may be captured and analyzed to decipher information from the images such as characters, symbols, and/or other objects of interest located in the captured images. The characters, symbols, and/or other objects of interest may be transmitted over a network for any useful purpose such as for use in a game, or a database, or as part of a transaction such as a credit card transaction. For these reasons and others, it is useful to enhance the abilities of these integrated devices and other devices for deciphering information from images.

In particular, when trying to read a credit card with a camera, there are multiple challenges that a user may face. Because of the widely-varying distances that the credit card may be from the camera when the user is attempting to read the credit card, one particular challenge is the difficulty in focusing the camera properly on the credit card. Another challenge faced is associated with the difficulties in reading characters with perspective correction, thus forcing the user to hold the card in a parallel plane to the camera to limit any potential perspective distortions. One of the solutions to these problems available today is that the user has to be guided (e.g., via the user interface on the device possessing the camera) to frame the credit card (or other object-of-interest) in a precise location and orientation-usually very close to the camera-so that sufficient image detail may be obtained.

This is challenging and often frustrating to the user-and may even result in a more difficult and time-consuming user experience than simply manually typing in the information of interest from the credit card. It would therefore be desirable to have a system that detects the credit card (or other object-of-interest) in three-dimensional space, utilizing scaling and/or perspective correction on the image, thus allowing the user more freedom in how the credit card (or other object-of-interest) may be held in relation to the camera during the detection process.

Another challenge often faced comes from the computational costs of credit card recognition (or other object-of-interest recognition) algorithms, which scale in complexity as the resolution of the camera increases. Therefore, in prior art implementations, the camera is typically running in a low resolution mode, which necessitates the close framing of the card by the user in order for the camera to read sufficient details on the card for the recognition algorithm to work successfully with sufficient regularity. However, placing the card in such a close focus range also makes it more challenging for the cameras autofocus functionality to handle the situation correctly.

A final shortcoming of prior art optical character recognition (OCR) techniques, such as those used in credit card recognition algorithms, is that they rely on single-character classifiers, which require that the incoming character sequence data be segmented before each individual character may be recognized-a requirement that is difficult-if not impossible-in the credit card recognition context.

The inventors have realized new and non-obvious ways to make it easier for the user's device to detect and/or recognize the credit card (or other object-of-interest) by overcoming one or more of the aforementioned challenges.

As used herein, the term "detect" in reference to an object-of-interest refers to an algorithm's ability to determine whether the object-of-interest is present in the scene; whereas the term "recognize" in reference to an object-of-interest refers to an algorithm's ability to extract additional information from a detected object-of-interest in order to identify the detected object-of-interest from among the universe of potential objects-of interest.

PRIOR ART SEARCH

TW1657346B *2018-02-142019-04-21 Data reduction and method for establishing data identification model, computer system and computer-readable recording medium W02019097456A1 *2017-11-172019-05-23C 3 Limited Object measurement system US20190294928A1*2018-03-212019-09-26Megvii (Beijing) Technology Co., Ltd. Image processing method and apparatus, and computer-readable storage medium US10474925B22017-07-312019-11-12Industrial Technology Research Institute Deep neural network with side branches for recognizing and classifying media data and method for using the same US10616364B22017-06-012020-04-07Samsung Electronics Co., Ltd. Electronic apparatus and method of operating the same RU2724797C1*2020-01-222020-06-Cash register system and method for identification of courses on tray US10769532B2 *2017-04-052020-09-08Accenture Global Solutions Limited Network rating prediction engine US10831702B22018-09-202020-11-lOCeva D.S.P. Ltd. Efficient utilization of systolic arrays in computational processing US10846556B2 *2017-07-312020-11-24Advanced New Technologies Co., Ltd. Vehicle insurance image processing method, apparatus, server, and system Family To Family Citations JP6706788B2 *2015-03-062020-06-10 Image recognition method, image recognition device and program CN106156807B *2015-04-022020-06-02 Training method and device of convolutional neural network model US10095950B2 *2015-06-032018-10-09Hyperverge Inc.Systems and methods for image processing CN105138963A *2015-07-312015-12-09 Picture scene judging method, picture scene judging device and server US10282623B1*2015-09-252019-05-07Apple Inc. Depth perception sensor data processing US10825095B1*2015-10-152020-11-03State Farm Mutual Automobile Insurance Company Using images and voice recordings to facilitate underwriting life insurance.

OBJECTIVES OF THE INVENTION 1. The objective of the invention is to a digital image may be processed by an ensemble of convolutional neural networks (CNNs) to classify objects in the digital image and each CNN, a candidate architecture and candidate parameters may be selected to build a plurality of CNNs. 2. The other objective of the invention is to the predetermined number of CNNs, each having different values for the selected candidate parameters meet a validation threshold an ensemble of CNNs may be generated from the predetermined number of CNNs and the predictions from the ensemble of CNNs may then be aggregated to accurately classify the objects in the digital image. 3. The other objective of the invention is to the digital image processing typically involves processing a digital image, for example, from a digital still image or digital video, to ascertain, detect, and classify particular features or objects in the image and the pattern recognition may be applied during the image processing to detect a particular object in the image. 4. The other objective of the invention is to the digital image processing with pattern recognition has been used in a wide variety of applications, such as facial recognition, detection of land features from aerial photographs, vehicle license plate determination, etc.

SUMMARY OF THE INVENTION

For simplicity and illustrative purposes, the present disclosure is described by referring mainly to an example thereof. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be readily apparent however, that the present disclosure may be practiced without limitation to these specific details. In other instances, some methods and structures have not been described in detail so as not to unnecessarily obscure the present disclosure. As used herein, the terms "a" and "an" are intended to denote at least one of a particular element, the term "includes" means includes but not limited to, the term "including" means including but not limited to, and the term "based on" means based at least in part on.

An image processing system, according to an example, builds and trains an ensemble of deep learning models, such as convolutional neural networks (CNNs), to accurately and automatically perform image processing to detect particular attributes of objects in a digital image, and to classify the objects according to the detected attributes. CNNs, however, include many functional components, which make it difficult to determine the necessary network architecture that performs accurately to detect and classify particular features of images relevant for a problem in hand.

Furthermore, each component of the CNN typically has a multitude of parameters associated with it. The specific values of those parameters necessary for a successful and accurate image classification are not known a priori without any application of a robust image processing system. The image processing system, therefore, provides a method for building and fine tuning CNNs proven to output an accurate classification of an image. Through an iterative process, a candidate architecture and candidate parameters for CNN may be selected to build, train, and optimize a CNN. For example, the iterative process may include selecting the candidate architecture from a plurality of candidate architectures and validating a set of candidate parameters for the selected candidate architecture.

The candidate architecture may include a number of convolution layers and subsampling layers and a classifier type. The candidate parameters may include a learning rate, a batch size, a maximum number of training epochs, an input image size, a number of feature maps at every layer of the CNN, a convolutional filter size, a sub-sampling pool size, a number of hidden layers, a number of units in each hidden layer, a selected classifier algorithm, and a number of output classes. In addition, a pre-processing protocol may also be selected to enhance particular content in the images for the selected candidate architecture and selected candidate parameters.

The iterative process may include building an intermediate CNN using the training set and evaluating the performance of the intermediate CNN on a validation set. The evaluation, for instance, determines whether the intermediate CNN meets a validation threshold such as less than a 20% error rate. The iterative process is repeated until a predetermined number of intermediate CNNs (e.g., 25) meet the validation threshold. According to an example, each intermediate CNN has different values for the selected candidate parameters.

An ensemble of the most accurate intermediate CNNs is then generated from the predetermined number of intermediate CNNs. The ensemble for example may be the top most accurate intermediate CNNs. The next step may include selecting an ensemble algorithm to aggregate and/or combine the predictions of each intermediate CNN in the ensemble to form an ensemble prediction. The prediction of each intermediate CNN in the ensemble may then be used to classify an image or an object in the image.

The technical benefits and advantages of the disclosed examples include providing an advanced deep learning architecture that exhibits superior classification accuracy to assess property damage and an iterative image processing system that determines that advanced deep learning architecture. A CNN generated by the image processing system through an iterative process is easier to train than other regular, feed-forward neural networks and has fewer parameters to estimate, making it a more efficient architecture to use to assess property damage.

According to an example, a CNN generated by the image processing system may be used for classifying the extent of damage to a property that is captured in a digital image. Damage may refer to any kind of injury or harm that impairs the appearance of the property. An image or digital image may include both a still image and a moving image (e.g., video). The property may be any tangible object including, but not limited to, a house, furniture, clothing, vehicle equipment, land, computing device, toy, etc. In an example where an insured customer has accidental damage to tangible property, the insured customer may document the damage to the damaged property by taking digital photographs with a smart phone and/or camera.

The digital images of the damaged property may then be fed into the image processing system. The image processing system may automatically classify the damaged property based on amount of damage determined from the image processing of the received digital images. In this example, the image processing system provides a machine vision method and apparatus to automatically detect the extent of damage to the property as captured in digital images.

According to an example, the image processing system generates an ensemble model (e.g., including multiple optimized CNNs) to classify an image or an object in the image with improved accuracy. In an example, the image processing system which used the ensemble model yielded an accuracy of nearly 90% on the images in the validation set.

As discussed above, according to an example, the image processing system may be used for classifying the extent of damage to property captured in an image. However, the image processing system may be used for substantially any application to classify features in a digital image into predefined categories.

Some images contain decipherable characters, symbols, or other objects-of-interest that users may desire to detect and/or recognize. For example, some systems may desire to recognize such characters and/or symbols so that they can be directly accessed by a computer in a convenient manner, such as in ASCII format. Some embodiments of this disclosure seek to enhance a computer's ability to detect and/or recognize such objects of-interest in order to gain direct access to characters or symbols visibly embodied in images. Further, by using an integrated device, such as a smartphone, tablet or other computing device having an embedded camera(s), a user may capture an image, have the image processed to decipher characters, and use the deciphered information in a transaction.

One example of using an integrated device as described above to detect and/or recognize an object-of-interest is to capture an image of an object having a sequence of characters, such as a typical credit card, business card, receipt, menu, or sign. Some embodiments of this disclosure provide for a user initiating a process on an integrated device by activating an application or by choosing a feature within an application to begin a transaction.

Upon this user prompt, the device may display a user interface that allows the user to initiate an image capture or that automatically initiates an image capture, with the subject of the image being of an object having one or more sub-regions comprising sequences of characters that the user wishes to detect, such as the holder name, expiration date, and account number fields on a typical credit card. The sequences of characters may also be comprised of raised or embossed characters, especially in the case of a typical credit card.

Differing embodiments of this disclosure may employ one or all of the several techniques described herein to perform credit card recognition using electronic devices with integrated cameras. According to some embodiments, the credit card recognition process may comprise: obtaining a first representation of a first image, wherein the first representation comprises a first plurality of pixels; identifying a first credit card region within the first representation; extracting a first plurality of sub-regions from within the identified first credit card region, wherein a first sub-region comprises a credit card number, wherein a second sub-region comprises an expiration date, and wherein a third sub-region comprises a card holder name; generating a predicted character sequence for the first, second, and third sub-regions; and validating the predicted character sequences for at least the first, second, and third sub-regions using various credit card-related heuristics, e.g., expected character sequence length, expected character sequence format, and checksums.

Still other embodiments of this disclosure may employ one or all of several techniques to use a "split" image processing pipeline that runs the camera at its full resolution (also referred to herein as "high-resolution"), while feeding scaled-down and cropped versions of the capture image frames to a credit card recognition algorithm. (It is to be understood that, although the techniques described herein will be discussed predominantly in the context of a credit card detector and recognition algorithm, the split image processing pipeline techniques described herein could be applied equally to any other object-of interest for which sufficient detection and/or recognition heuristics may be identified and exploited, e.g., faces, weapons, business cards, human bodies, etc.).

Thus, one part of the "split" image processing pipeline described herein may run the credit card recognition algorithm on scaled down (also referred to herein as "low resolution") frames from the camera, wherein the scale is determined by the optimum performance of that algorithm. Meanwhile, the second part of the "split" image processing pipeline may run a rectangle detector algorithm (or other object-of interest detector algorithm) with credit card-specific constrains (or other object-of interest-specific constraints) in the background. If the rectangle detector finds a rectangle matching the expected aspect ratio and minimum size of a credit card that can be read, then it may crop the card out of the "high-resolution" camera buffer, perform a perspective correction, and/or scale the rectangle to the desired size needed by the credit card recognition algorithm and then send the scaled, high-resolution representation of the card to the detection algorithm for further processing.

One reason for using the split image processing pipeline to operate on the "high resolution" and "low resolution" representations of the object-of-interest concurrently (rather than using solely the "full" or "high resolution" pipeline) is that there are known failure cases associated with object-of-interest detector algorithms (e.g., rectangle detector algorithms). Examples of failure cases include: 1.) The user holding the credit card too close to the camera, resulting in some edges being outside the frame.

This may fail in the rectangle detector (i.e., not enough edges located to be reliably identified as a valid rectangle shape) but work fine in the direct path of feeding the "low resolution" version of the image directly to the credit card recognition engine. 2.) Some particular kinds of credit cards or lighting and background scenarios will make it very difficult for the edge detector portion of the rectangle detector to reliably identify the boundaries of the credit card. In this second case, the user would likely be instructed to attempt to frame the card very closely to the camera, so that the credit card recognition engine alone can read the character sequences of the card. In some embodiments, if no valid credit card has been found by the rectangle detector after a predetermined amount of time, the user interface (UI) on the device may be employed to "guide" the user to frame the card closely.

Advantages of this split image processing pipeline approach to object-of-interest recognition include the ability of the user to hold the card more freely when the camera is attempting to detect the card and read the character sequences (as opposed to forcing the user to hold the card at a particular distance, angle, orientation, etc.). The techniques described herein also give the user better ability to move the credit card around in order to avoid specular reflections (e.g., reflections off of holograms or other shiny card surfaces). In most cases, the credit card will also be read earlier than in the prior art approachesin use today.

Still other embodiments of this disclosure may be employed to perform character sequence recognition with no explicit character segmentation. According to some such embodiments, the character sequence recognition process may comprise generating a predicted character sequence for a first representation of a first image comprising a first plurality of pixels by: sliding a well-trained single-character classifier, e.g., a Convolutional Neural Network (CNN), over the first representation of the first image one pixel position at a time until reaching an extent of the first representation of the first image in a first dimension (e.g., image width); recording a likelihood value for each of k potential output classes at each pixel position, wherein one of the k potential output classes comprises a "background class"; determining a sequence of most likely output classes at each pixel position; decoding the sequence by removing identical consecutive output class determinations and background class determinations from the determined sequence; and validating the decoded sequence using one or more predetermined heuristics, such as credit card-related heuristics.

BRIEF DESCRIPTION OF THE DIAGRAM

FIG. 1 shows a system diagram of an image processing system, according to an example of the present disclosure;

FIG. 2 shows classification categories that indicate the extent of damage to property, according to an example of the present disclosure;

FIG. 3 shows a data store of an image processing server, according to an example of the present disclosure;

FIG. 4 shows a block diagram of a computing device for classifying objects in a digital image using convolutional neural networks (CNNs), according to an example of the present disclosure;

FIG. 5 shows a flow chart diagram of a method to classify objects in a digital image using CNNs, according to an example of the present disclosure.

FIG. 6 shows a flow chart diagram of an optimized CNN, according to an example of the present disclosure.

DESCRIPTION OF THE INVENTION

FIG. 1: there is shown a system diagram of an image processing system 100, according to an example of the present disclosure. It should be understood that the system 100 may include additional components and that one or more of the components described herein may be removed and/or modified without departing from a scope of the system 100. The system100 may include at least one image capture device 110, a communications network 120, an image processing server 130, and a data store 140.

The image capture device 110 may communicate with the image processing server 130 via the communications network 120. The image capture device 110 may be any computing device that includes a camera such as, but not limited to, a smartphone, a computing tablet, a laptop computer, a desktop computer, or any wearable computing device. According to an example, the image capture device 110 may capture an image of a tangible property 150 and send the image of the tangible property 150 to the image processing server 130 to automatically classify the extent of damage to the tangible property 150.

The communications network 120 may include local area networks (LANs) and wide area networks (WANs), such as the Internet. The communications network 120 may include signal bearing mediums that may be controlled by software, applications and/or logic. The communications network 120 may include a combination of network elements to support data communication services. For example, the communications network 120 may connect the image capture device 110 to the image processing server 130 through the use of a physical connection such as copper cable, coaxial cable, and fiber cable, or through wireless technology such as radio, microwave, or satellite.

The image processing server 130, for example, may receive digital images from a training set at an image pre-processor 105. The image pre-processor may crop and enhance particular content in the images from the training set to input into the intermediate CNN builder 115. The intermediate CNN builder 115 may select various architectures and parameters to train an intermediate CNN 125. The intermediate CNN 125 may be then be evaluated on a validation set that is generated by the validation circuit 135. The validation circuit 135 may determine whether to flag the intermediate CNN 125 as meeting a designated validation threshold.

If the intermediate CNN 125 does not meet the validation threshold, the intermediate CNN is not flagged and continues to be trained on the digital images from the training set by the intermediate CNN builder 115. However, if the intermediate CNN 125 does meet the validation threshold, the intermediate CNN 125 is now a flagged intermediate CNN 145. As a result, the flagged intermediate CNN 145 is eligible to be selected as part of an ensemble of optimized CNNs that is generated by the ensemble generator 155. The ensemble generator 155, for example, may create an ensemble 165 of optimized CNNs. The predictions aggregated from the ensemble 165 may be used to accurately classify objects 175 from an inputted digital image. The processing functions of the image processing server 130 are further detailed below in FIGS. 4, 5, and 6.

According to an example, the image processing server 130 may receive an image of the tangible property 150 and automatically classify an extent of damage to the tangible property 150 using CNNs to recognize and classify the damage in the image of the tangible property 150. According to an example, the image processing server 130 may classify the extent of damage to the tangible property 150 into various predetermined classification categories 200 such as, but not limited to, undamaged, damaged, and severely damaged or totaled as illustrated in FIG. 2.

The image processing server 130 may be coupled to the data store 140, as further detailed below in FIG. 4. As illustrated in FIG. 3, the data store 140 may store data which is relied upon to classify the extent of damage to the tangible property 150 by the image processing server 130. For example, the data store 140 may store training sets and validation sets that comprise digital images of property 310, damaged property 320, and property that is a total loss 330. These digital images are relied upon by the image processing server 130 to build a model that accurately assesses and classifies the extent of damage to the tangible property 150.

With reference to FIG. 4, there is shown a block diagram of a computing device 400 for image processing using convolutional neural networks (CNNs) according to an example of the present disclosure. According to an example, the computing device 400 is the image processing server 130. It should be understood that the computing device 400 may include additional components and that one or more of the components described herein may be removed and/or modified without departing from a scope of the computing device 400.

The computing device 400 is depicted as including a processor 402, a data store 140, an input/output (I/0) interface 406, and an image processing platform 410. The components of the computing device 400 are shown on a single computer or server as an example and in other examples the components may exist on multiple computers or servers. The computing device 400 may store data in the data store 140 and/or may manage the storage of data stored in a separate computing device, for instance, through the I/O interface 406. The data store 140 may include physical memory such as a hard drive, an optical drive, a flash drive, an array of drives, or any combinations thereof, and may include volatile and/or non-volatile data storage.

The image processing platform 410 is depicted as including a training circuit 412, a model builder 414, a validation circuit 416, and a classifier 418. The processor 402, which may comprise a microprocessor, a micro-controller, an application specific integrated circuit (ASIC), Graphical Processing Unit (GPU) or the like, is to perform various processing functions in the computing device 400. The processing functions may include the functions of the training circuit 412, the model builder 414, the validation circuit 416, and the classifier 418 of the image processing platform 410.

The training circuit 412, for example, may create a training set from images of damaged property or objects. This training set may be used by the model builder 414 to build a CNN model. The model builder 414, for example, may build a CNN model on the training set according to a selected candidate architecture and candidate parameters for the CNN model. The validation circuit 416, for example, may evaluate performance of the CNN model built by the model builder 414 on a validation set and determine whether the CNN model meets a validation threshold. The classifier 418, for example, may classify an extent of damage for an object in each image in the validation set. The classifier may also aggregate predictions from an ensemble of optimized CNN models to more accurately assess the damaged objects in the digital images.

In an example, the image processing platform 410 includes machine readable instructions stored on a non-transitory computer readable medium 413 and executed by the processor 402. Examples of the non-transitory computer readable medium include dynamic random access memory (DRAM), electrically erasable programmable read-only memory (EEPROM), magneto resistive random access memory (MRAM), memristor, flash memory, hard drive, and the like. The computer readable medium 413 may be included in the data store 140 or may be a separate storage device. In another example, the image processing platform 410 includes a hardware device, such as a circuit or multiple circuits arranged on a board. In this example, the training circuit 412, the model builder 414, the validation circuit 416, and the classifier 418 comprise circuit components or individual circuits, such as an embedded system, an ASIC, or a field-programmable gate array (FPGA).

The processor 402 may be coupled to the data store 140 and the I/O interface 406 by a bus 405 where the bus 405 may be a communication system that transfers data between various components of the computing device 400. In examples, the bus 405 may be a Peripheral Component Interconnect (PCI), Industry Standard Architecture (ISA), PCI Express, Hyper Transport, Nub us, a proprietary bus, and the like.

The I/O interface 406 includes a hardware and/or a software interface. The I/O interface 406 may be a network interface connected to a network through a network device, over which the image processing platform 410 may receive and communicate information, for instance, information regarding an extent of damage to a property. For example, the input/output interface 406 may be a wireless local area network (WLAN) or a network interface controller (NIC).

The WLAN may link the computing device 400 to the network device through a radio signal. Similarly, the NIC may link the computing device 400 to a network device through a physical connection, such as a cable. The computing device 400 may also link to a network device through a wireless wide area network (WWAN), which uses a mobile data signal to communicate with mobile phone towers. The processor 402 may store information received through the input/output interface 406 in the data store 140 and may use the information in implementing the training circuit 412, the model builder 414, the validation circuit 416, and the classifier 418 of the image processing platform 410.

The methods disclosed below in FIGS. 5 and 6 describe examples of methods for digital image processing using CNNs, for example, to classify an extent of damage to property captured in an image. It should be apparent to those of ordinary skill in the art that the methods represent generalized illustrations and that other sequences may be added or existing sequences may be removed, modified or rearranged without departing from the scopes of the methods.

FIG. 5 shows a flow chart diagram of a method 500 of digital image processing using CNNs, according to an example of the present disclosure. A CNN may be utilized to advance the performance of a classification of objects in an image. Accordingly, the method 500 illustrated in FIG. 5 provides a method for training and building CNNs to output an accurate classification of objects in an image. For example, the processor 402 of the image processing server 130 may implement the image processing platform 410 to accurately assess property damage in images.

In block 505, the training circuit 412, for instance, may create a training set from images of damaged property or objects. According to an example, the training set data may comprise images of new (undamaged) objects, damaged objects, and totaled objects. This training set may be processed by the model builder 414 to discover predictive relationships and tune a model such as a CNN.

After the training set has been created, the method 500 may iteratively select candidate architectures and candidate parameters to optimize the CNN's ability to, for example, accurately classify an extent of damage for an object in an image. The iterative process may include blocks 510-545 of method 500.

In block 510, the model builder 414, for instance, may select a candidate architecture from a plurality of candidate architectures. According to an example, the plurality of candidate architectures may include different combinations of a number of convolution layers and subsampling layers and a classifier type. The classifier type may include a multilayer perceptron (MLP), a support vector machine (SVM), and the like.

In block 515, the model builder 414, for instance, may select candidate parameters for the selected candidate architecture. According to an example, the candidate parameters may include a learning rate, a batch size, a maximum number of training epochs, a convolutional filter size, a number of feature maps at every layer of the CNN, a sub sampling pool size, an input image size, a number of hidden layers, a number of units in each hidden layer, a selected classifier algorithm, and a number of output classes.

Examples of learning parameters include the learning rate, the batch size, and the maximum number of training epochs. The learning rate parameter is a rate at which the CNN learns optimal filter coefficients from the training set. Ideally, the learning rate is not too high (where the CNN overlearns and is less generalizable) or too low. According to an example, the range for the learning rate parameter includes, but is not limited to, 0.05 to 0.10. The batch size parameter is the number of images processed together (as opposed to using images one-at-a-time) when computing an estimate of a gradient descent in a minimization. Bunching a number of images in a batch during training speeds up the computing by using three-dimensional (3D) matrix representation (batch sizexheightxwidth) instead of a two-dimensional (2D) matrix representation of an image (height width).

According to an example, the range for the batch size parameter includes, but is not limited to, 2 to 128 images for each batch. The maximum number of training epochs parameter is the maximum number of times that the entire training set is re-used in updating minimization parameters. The number of training images divided by batch size is the total number of iterations in one epoch. According to an example, a range for the maximum number of training epochs parameter is between 100 and 200.

Examples of convolution and sub-sampling parameters include the convolutional filter size, the number of feature maps at each layer of the CNN, and the sub-sampling pool size. The convolutional filter size parameter is the size of the filters in a convolution layer. According to an example, the range for the convolutional filter size parameter is between 2x2 pixels and 114x114 pixels. The number of feature maps parameter is the number of feature maps output from the number of filters or kernels in each convolution layer. According to an example, the range for the number of feature maps parameter is between to 512 feature maps for a first convolutional layer. The sub-sampling pool size parameter is the size of a square patch of pixels in the image down-sampled into, and replaced by, one pixel after the operation via maximum pooling, which sets the value of the resulting pixel as the maximum value of the pixels in the initial square patch of pixels. According to an example, the range of values for the sub-sampling pool size parameter includes, but is not limited to, a range between 2x2 to 4x4. The parameters of the network of the convolutional layers are selected to reduce the input image size into 1x1 pixel value on the output of the final convolutional layer according to an example.

Examples of classifier parameters include the image input size, the number of hidden layers, the number of units in each layer, the selected classifier algorithm, and the number of output classes. The image input size is the dimension of the space where the data from the final convolution layer will be classified, and is therefore equal to the product of the number of feature maps and the image size of the last convolution layer. According to an example, the input image size is the number of feature maps on the last convolutional layer times 1x1. The hidden layers are fully connected MLP layers, and the number of hidden layers includes 2 according to an example.

The number of hidden layers should be limited to three hidden layers at most. The number of units in each hidden layer is the number of units in a hidden layer that uses the information learned in the convolution and subsampling layers to detect the extent of damage. According to an example, the range for the number units in each hidden layer parameter includes, but is not limited to between 6 and 1024 units. The selected classifier algorithm may include, but is not limited to, multilayer perceptron (MLP), a support vector machine (SVM), and the like. The number of output classes is the number of classes the input images are classified into. According to an example, the number of output classes may include, but is not limited to, 3.

The model builder 414, for instance, may then select a pre-processing protocol to enhance the information content in the images of the damaged objects for the selected candidate architecture and selected candidate parameters as shown in block 520. The pre processing protocol may include, but is not limited to, local contrast normalization or Zero-phase Component Analysis (ZCA) scaling, and independent component analysis (ICA) for whitening.

In block 525, the model builder 414, for instance, may train and build an intermediate CNN using the training set. After the intermediate CNN is trained and built, the validation circuit 416, for instance, may evaluate the performance of the intermediate CNN on a validation set as shown in block 530. According to an example, the validation set comprises a set of images of new (undamaged) objects, damaged objects, and totaled objects that are separate and distinct from the set of images from the training set. In this regard, the validation set is used to assess the accuracy of the intermediate CNN with respect to classifying the extent of damage in each of the images of the validation set.

In block 535, the validation circuit 416, for instance, may determine whether the intermediate CNN meets a validation threshold. The validation threshold may be a validation error rate. According to this example, the intermediate CNN may meet or satisfy the validation threshold if its validation error rate is less than 20% with respect to classification predictions. If the intermediate CNN does not meet the validation threshold, then the iterative process begins again at block 510.

On the other hand, if the intermediate CNN meets the validation threshold, then validation circuit 416 may flag the intermediate CNN to indicate that it has met the validation threshold as shown in block 540. In block 545, the validation circuit 416 may determine whether a predetermined number of intermediate CNNs have been flagged as meeting the validation threshold. The predetermined number of flagged intermediate CNNs for example may be 25 flagged intermediate CNNs. According to an example, each of the flagged intermediate CNNs are built with different values for the selected candidate parameters. If the number of flagged intermediate CNNs has not reached the predetermined number (e.g., 25), then the iterative process begins again at block 510.

Alternatively, if the number of flagged intermediate CNNs has reached the predetermined number (e.g., 25), then the validation circuit 416 may create an ensemble of intermediate CNNs from the predetermined number of intermediate CNNs as shown in block 550. For example, the 5 most accurate intermediate CNNs may be selected as an ensemble.

In block 555, the classifier 418, for instance, may classify the extent of damage for the object in each image in the validation set. According to an example, the classifying includes aggregating predictions from the ensemble of flagged intermediate CNNs to achieve greater accuracy in the classification of the extent of damage for the object in each image in the validation set. For example, aggregating predictions or taking a majority vote from the ensemble of flagged intermediate CNNs may result in an accuracy of approximately 90%, which is much above an individual CNN performance results of approximately 80-85%

FIG. 6 shows a flow chart diagram of an optimized convolutional neural network (CNN) 600, according to an example of the present disclosure. The CNN 600 is an optimized CNN that was built according to the method 500 described above. The architecture for this CNN 600 includes 4 convolution and sub-sampling layers, 2 hidden layers, and a logistic regression classifier, such as a MLP. In this regard, for instance, this CNN 600 may classify the extent of damage to property that is captured in an image with an accuracy of approximately 88%.

As discussed above, an insured customer may submit an image of property in a claim to an insurance company. The insurance company may utilize this CNN 600 to automatically classify the extent of damage to the property using the submitted image. For example, the submitted image may be input into the CNN 600.

The submitted image of the damaged property may be pre-processed 610 to enhance the information content in the image for processing by the CNN 600. In this example, the submitted image is 480x640 pixels. For example, the pre-processing 610 may crop the submitted image of the damaged property to 96x96 pixels and extract 3 RGB channel layers from the submitted image of the damaged property to present as an input image to the CNN 600.

In the first convolutional layer (Cl) 620, the CNN 600 may convolve the input image with 60 different first layer filters, each of size 5x5, to produce 60 feature maps of size 92x92. Each filter application of a convolution layer reduces the resolution of the input image. If input image is of resolution NxN, convolution filter is of size MxM, then resulting image will be of resolution N-M+lxN-M+l.

The CNN 600 may then perform a max-pooling on the feature maps, which is a form of non-linear sub-sampling. Max-pooling partitions the input image into a set of non overlapping square patches, replacing each patch with a single pixel of value equal to the maximum value of all the pixels in the initial square patch. In an example, the CNN may perform a max-pooling over a 2x2 region of the 60 feature maps on C1 620. The resulting feature maps of size 46x46 in C1620 are then further convolved and max-pooled in the second convolutional layer (C2) 630.

In C2 630, the resulting 60 feature maps of size 46x46 from C1 620 are convolved with second layer convolutional filters, each of size 3x3, to produce 128 feature maps of size 44x44. A max-pooling may then be performed over a 4x4 region of the 128 feature maps. The resulting 128 feature maps of size 11x11 in C2 630 are then further convolved and max-pooled in the third convolutional layer (C3) 640.

In C3 640, the resulting 128 feature maps of size 11x11 from C2 630 are convolved with third layer convolutional filters, each of size 4x4, to produce 128 feature maps of size 8x8. A max-pooling may then be performed over a 2x2 region of the 128 feature maps. The resulting 128 feature maps of size 4x4 in C3 640 are then further convolved and max pooled in the fourth convolutional layer (C4) 650.

In C4 650, the resulting 128 feature maps of size 4x4 from C3 640 are convolved with fourth layer filters, each of size 3x3, to produce 256 feature maps of size 2x2. A max pooling may then be performed over a 2x2 region of the 256 feature maps. The resulting 256 feature maps of size 1x1 in C4 650 are then input to the first hidden layer (H1) 660 to initiate the classification process.

To perform classification, CNN 600 applies fully-connected neural-network layers behind the convolutional layers. In the first classification layer of H1 660, for example, each of the 512 units takes in a value of every pixel from all 256 feature maps resulting from C4 650, multiplies each value by a predetermined weight, and de-linearizes the sum. In effect, the output of each of the 512 units, for example, represents a judgment about the originally submitted image of the damaged property e.

The second hidden layer (H2) 670 is added to derive more abstract conclusions about the submitted image of the damaged property from the output of each of the 100 units in the second classification layer of H2 670. As a result, the logistic regression classifier 680 of the CNN 600 may then accurately classify the extent of damage of the property in the submitted image as either new, damaged, or totaled according to the output of the 3 units in the third classification layer.

Claims

WE CLAIM

1. Our Invention" DIP-Intelligent Neural Networks" is a digital image may be processed by an ensemble of convolutional neural networks (CNNs) to classify objects in the digital image and each CNN, a candidate architecture and candidate parameters may be selected to build a plurality of CNNs. The invented technology also includes the predetermined number of CNNs, each having different values for the selected candidate parameters meet a validation threshold an ensemble of CNNs may be generated from the predetermined number of CNNs and the predictions from the ensemble of CNNs may then be aggregated to accurately classify the objects in the digital image. The DIP-Intelligent Neural Networks is a digital image processing typically involves processing a digital image, for example, from a digital still image or digital video, to ascertain, detect, and classify particular features or objects in the image and the pattern recognition may be applied during the image processing to detect a particular object in the image. The invented technology also a digital image processing with pattern recognition has been used in a wide variety of applications, such as facial recognition, detection of land features from aerial photographs, vehicle license plate determination, etc.

2. According to claim# the invention is a digital image may be processed by an ensemble of convolutional neural networks (CNNs) to classify objects in the digital image and each CNN, a candidate architecture and candidate parameters may be selected to build a plurality of CNNs.

3. According to claiml,2# the invention is the predetermined number of CNNs, each having different values for the selected candidate parameters meet a validation threshold an ensemble of CNNs may be generated from the predetermined number of CNNs and the predictions from the ensemble of CNNs may then be aggregated to accurately classify the objects in the digital image.

4. According to claim,2,3# the invention is a digital image processing typically involves processing a digital image, for example, from a digital still image or digital video, to ascertain, detect, and classify particular features or objects in the image and the pattern recognition may be applied during the image processing to detect a particular object in the image.

5. According to claim,2,4# the invention is a digital image processing with pattern recognition has been used in a wide variety of applications, such as facial recognition, detection of land features from aerial photographs, vehicle license plate determination, etc.