WO2022211781A1

WO2022211781A1 - Object detection inference pipeline

Info

Publication number: WO2022211781A1
Application number: PCT/US2021/024696
Authority: WO
Inventors: Amit Gupta; Arun Kumar SAGOTRA
Original assignee: Hitachi Vantara Llc
Priority date: 2021-03-29
Filing date: 2021-03-29
Publication date: 2022-10-06

Abstract

Example implementations described herein involve systems and methods which can involve for receipt of a plurality of video frames of a sewer pipe, processing the plurality of video frames through a model layer managing a multiple model pipeline, the multiple model pipeline having one or more models configured to detect one or more defects, the model layer configured to generate a first output involving identified defects of the sewer pipe from the plurality of video frames; processing the first output of the model layer through a post-processing layer configured to remove false positives from the identified defects, identify continuous defects in the identified defects and determine a severity of the identified defects to produce a second output; and processing the second output of the post-processing layer through an output layer configured to generate a defect report of the sewer pipe.

Description

OBJECT DETECTION INFERENCE PIPELINE

BACKGROUND

Field

[0001] The present disclosure relates generally to object detection, and more specifically, to classifying and detecting defects in pipelines such as sewer lines.

Related Art

[0002] In related art implementations, human inspectors watch video recordings of sewer lines and create the reports by annotating the defects as their camera travels though the bore. This is a time consuming, resource intensive process that suffers from human bias, errors, high cost and time-consuming manual process involving viewing the videos and creating a report of defects.

SUMMARY

[0003] The present disclosure is directed to example implementations to address the above problems in the related art. In particular, the example implementations described herein involve systems and methods classify and localize the defects in sewer lines by using deep learning and computer vision algorithms. The example implementations described herein allow applications to automatically create output similar to the Pipeline Assessment Certification Program (PACP) standard that is currently done manually by human inspectors.

[0004] The example implementations allow inspectors to feed the camera recording to the inference pipeline and generate the report, with high accuracy for majority of defects. The model allows for detection of numerous human visible defects such as cracks, fractures, roots, deposits, obstructions, grease, encrustations, infiltrations, broken and deformed pipes.

[0005] Aspects of the present disclosure can involve a method, which can include, for receipt of a plurality of video frames of a sewer pipe, processing the plurality of video frames through a model layer managing a multiple model pipeline, the multiple model pipeline including a boreview/sideview binary classifier model configured to determine each of the plurality of video frames as a boreview or a sideview and one or more models configured to detect one or more defects, the model layer configured to generate a first output involving identified defects of the sewer pipe from the plurality of video frames; processing the first output of the model layer through a post-processing layer configured to remove false positives from the identified defects, identify continuous defects in the identified defects and determine a severity of the identified defects to produce a second output; and processing the second output of the post-processing layer through an output layer configured to generate a defect report of the sewer pipe.

[0006] Aspects of the present disclosure can involve a system, which can include, for receipt of a plurality of video frames of a sewer pipe, means for processing the plurality of video frames through a model layer managing a multiple model pipeline, the multiple model pipeline including a boreview/sideview binary classifier model configured to determine each of the plurality of video frames as a boreview or a sideview and one or more models configured to detect one or more defects, the model layer configured to generate a first output involving identified defects of the sewer pipe from the plurality of video frames; means for processing the first output of the model layer through a post-processing layer configured to remove false positives from the identified defects, identify continuous defects in the identified defects and determine a severity of the identified defects to produce a second output; and means for processing the second output of the post-processing layer through an output layer configured to generate a defect report of the sewer pipe.

[0007] Aspects of the present disclosure can involve a computer program, storing instructions for execution by one or more special purpose processors, which can include, for receipt of a plurality of video frames of a sewer pipe, processing the plurality of video frames through a model layer managing a multiple model pipeline, the multiple model pipeline including a boreview/sideview binary classifier model configured to determine each of the plurality of video frames as a boreview or a sideview and one or more models configured to detect one or more defects, the model layer configured to generate a first output involving identified defects of the sewer pipe from the plurality of video frames; processing the first output of the model layer through a post-processing layer configured to remove false positives from the identified defects, identify continuous defects in the identified defects and determine a severity of the identified defects to produce a second output; and processing the second output of the post-processing layer through an output layer configured to generate a defect report of the sewer pipe. The computer program and its instructions can be stored in a non-transitory computer readable medium for execution by the one or more special purpose hardware processors.

[0008] Aspects of the present disclosure can involve an apparatus, which can include one or more special purpose hardware processors, configured to, for receipt of a plurality of video frames of a sewer pipe, process the plurality of video frames through a model layer managing a multiple model pipeline, the multiple model pipeline involving a boreview/sideview binary classifier model configured to determine each of the plurality of video frames as a boreview or a sideview and one or more models configured to detect one or more defects, the model layer configured to generate a first output involving identified defects of the sewer pipe from the plurality of video frames; process the first output of the model layer through a post-processing layer configured to remove false positives from the identified defects, identify continuous defects in the identified defects and determine a severity of the identified defects to produce a second output; and process the second output of the post-processing layer through an output layer configured to generate a defect report of the sewer pipe.

BRIEF DESCRIPTION OF DRAWINGS

[0009] FIG. 1(A) illustrates an overall architecture of the example implementations described herein.

[0010] FIG. 1(B) illustrates a flow diagram incorporating deep learning models and computer vision techniques to conduct sewer defect classification and localization with high accuracy, in accordance with an example implementation.

[0011] FIG. 2 illustrates an example of a video frame of a sewer pipe from a boreview.

[0012] FIG. 3 illustrates an example of a video frame of a sewer pipe from a sideview.

[0013] FIG. 4 illustrates an example video frame of a deformed elliptical sewer pipe.

[0014] FIG. 5 illustrates an example video frame of a sewer pipe with bulging projections.

[0015] FIG. 6 illustrates an example video frame of a broken sewer pipe.

[0016] FIG. 7(A) and 7(B) illustrate an example segmentation mask for a boreview and as applied to a video frame, respectively, in accordance with an example implementation. [0017] FIGS. 8(A) and 8(B) illustrate an example segmentation mask of a joint for a sideview and as applied to a video frame, respectively, in accordance with an example implementation.

[0018] FIG. 9(A) - 9(1) illustrates example video frames with various defects detected by the main model, in accordance with an example implementation.

[0019] FIGS. 10(A) to 10(D) illustrate an example application of a spatial attention module, in accordance with an example implementation.

[0020] FIGS. 11(A) and 11(B) illustrate an example implementation of detection of severity of defects in video frames, in accordance with an example implementation.

[0021] FIG. 12 illustrate an example merger of bounding boxes for defects, in accordance with an example implementation.

[0022] FIG. 13 illustrates an example of a cloud deployment of the example implementations described herein.

[0023] FIG. 14 illustrates an example integration to a user stack, in accordance with an example implementation.

[0024] FIGS. 15(A) and 15(B) illustrate example video frames with surface damage defects, in accordance with an example implementation.

[0025] FIGS. 16(A) and 16(B) illustrate example video frames with water streamers and gushers, in accordance with an example implementation.

DETAILED DESCRIPTION

[0026] The following detailed description provides details of the figures and example implementations of the present application. Reference numerals and descriptions of redundant elements between figures are omitted for clarity. Terms used throughout the description are provided as examples and are not intended to be limiting. For example, the use of the term “automatic” may involve fully automatic or semi-automatic implementations involving user or administrator control over certain aspects of the implementation, depending on the desired implementation of one of ordinary skill in the art practicing implementations of the present application. Selection can be conducted by a user through a user interface or other input means, or can be implemented through a desired algorithm. Example implementations as described herein can be utilized either singularly or in combination and the functionality of the example implementations can be implemented through any means according to the desired implementations.

[0027] FIG. 1(A) illustrates an overall architecture of the example implementations described herein. In example implementations described herein, video frames are processed through an architecture involving a model lay er 150, a post-processing lay er 151 , and an output layer 152. The model layer 150 is configured to conduct defect detection and classification through various convolutional neural network (CNN) models and computer vision algorithms, and to provide the detected and classified defects to the post-processing layer 151. The model layer 150 manages a multiple model pipeline as illustrated in FIG. 1(B), which can involve various convolutional neural networks, computer vision, and other model algorithms as arranged in a pipeline. As will be described herein, there can be a boreview/sideview binary classifier model, a main object detection model based on YOLOv4, a classification model to determine a level of surface damage (e.g., level of concrete erosion), joint detection and blockage algorithms, and so on. Once the video frames are processed through the model layer

150, the resulting output involves video frames that are classified according to the models, with identified defects in the sewer pipe or other pipelines, which is fed in the post-processing layer

151.

[0028] The post-processing layer 151 is configured to receive the output and process the output to conduct post-processing activities, such as algorithms to remove false positives from the identified defects, identify and merge continuous defects within the identified defects, determine a severity of the identified defects, determine confidence thresholds, orient bounding boxes to clock position, apply business rules, and so on, to produce another output of post- processed video frames, labels, and data to be digested by the output layer 152. Further details of the post-processing layer 151 functionalities are provided in FIG. 1(B). The output layer 152 can thereby format the output into the preferred standard (e.g., comma-separated values (CSV) format), indicating the frames in the video that have the defects along with their coordinates, and provide an output of images with the labeled of sewer lines with labeled defects (e.g., in the form of PACP defect codes) as well as the CSV report.

[0029] PACP Standard outlines the way human inspectors are supposed to detect and record defects in sewer lines. FIG. 1(B) illustrates a flow diagram incorporating machine leaming models and computer vision techniques to conduct sewer defect classification and localization with high accuracy, in accordance with an example implementation.

[0030] The input is videos captured from cameras sent through the sewer line or other pipeline. These cameras travel from one access point/manhole to another access point, recording the sewer line. As the operator/engineer is capturing videos along the boreview of the sewer, whenever they see any defect of interest they pan and tilt the camera to capture sideview of the sewer, sometimes zooming onto the defect.

[0031] A series of frames in a video are processed with an image hashing algorithm 101 to gather only dissimilar frames in a video on which the inference pipeline should detect the defects. Implementations for the image hashing algorithm 101 can be conducted in accordance with any algorithm in the related art. Such processing prevents excessive output while avoiding missing any defects in a video. However, the example implementations can be extended to conduct defect detection on every single video frame if needed in accordance with the desired implementation and requirements.

[0032] These frames are processed via several Deep learning (DL) models, to create PACP compliant inference pipeline for sewer videos. The various models used are a binary classifier to detect if the video is in boreview or sideview 103, a binary classifier to detect if video frame is deformed or normal 104, a binary classifier to detect if the pipe is deformed into an elliptical/oval shape (DFE) or has rounded bulging projections (DFBR) 108, segmentation models to detect the joints from boreview 119 or sideview 120, a classification model to detect codes such as surface damages 121, a segmentation model to detect water level, streams and gushers 109, and main object detection models 110, 105 that can be based on YOLOv4 or others depending on the desired implementation.

[0033] In addition to using DL models, the inference pipeline is also configured to convert the defect location (e.g., pixel coordinates from YOLO model) to clock positions 113, compute the severity of defects (e.g., severity of root blockage 115), mark continuous defects (e.g., that continue for more than three feet of distance) as per PACP requirements, and create a pipe rating index and summary for each pipe as per PACP standards.

[0034] In example implementations, hashed image frames are processed with a boreview/sideview binary classifier 103 to determine if the video frame is boreview or sideview (e.g., when camera pans and tilts away from the bore). This is the first model exercised on a video frame in the pipeline. This is the first model used in inference pipeline to determine if the camera is capturing a boreview or a sideview of the pipeline. FIG. 2 illustrates an example of a video frame of a sewer pipe from a boreview. FIG. 3 illustrates an example of a video frame of a sewer pipe from a sideview. Whenever an operator sees a defect, the operator may stop the rover/camera and then they pan and tilt or zoom to look at the defects, which may change the perspective from boreview to sideview and back. Hence, a classifier is needed to determine if the video frame is recording from the boreview or sideview. The classifier determines the pathway of the models through the pipeline as illustrated in FIG. 1(B). To train the boreview/sideview classifier model convolutional neural network (CNN) is used wherein labeled boreview images and sideview images are separated and fed into the deep CNN model until the classifier can classify video frames as boreview or sideview.

[0035] In example implementations, hashed image frames are processed with a binary classifier constructed from a bilinear CNN with spatial and channel attention to determine if the video frame involves a deformed pipeline or a normal pipeline at 104. This helps in determining if the boreview is deformed (e.g., due to fractures, and other engineering or disaster issues). To train the deformed/normal classifier model, a bilinear attention convolutional neural network (CNN) model is used wherein labeled deformed pipeline images and normal images are separated and fed into the bilinear attention CNN until the classifier can classify deformed/normal video. If this model determines that the pipe is deformed, it is combined with detection of fractures from a main object detection model 105 to further classify if it is a broken pipeline 107 or collapsed pipeline 106. If no fractures are present, it usually indicates that even though the pipe is deformed, it does not have displaced pieces that lead to broken or collapsed pipe. FIG. 6 illustrates an example video frame of a broken sewer pipe. As shown in FIG. 6, as the pipeline is determined to be deformed and fractures are present in the video frame, then the classification of the video frame of FIG. 6 can be determined to be as broken.

[0036] If a broken pipe occurs at the end of survey, it can be safely marked as collapsed 106, since inspections are abandoned in such cases.

[0037] In example implementations, hashed image frames are processed with a binary classifier constructed from a bilinear CNN with spatial and channel attention, to determine if the deformed video frame is DFE (deformed elliptical) or DFBR (bulging) 108. FIG. 4 illustrates an example video frame of a deformed elliptical sewer pipe. FIG. 5 illustrates an example video frame of a sewer pipe with bulging projections.

[0038] In example implementations, segmentation models, which can be constructed with a Unet++ model with custom loss, are utilized to detect j oints. A boreview j oint detection model 119 detects joints on the boreview of the pipeline, and can determine three types of classes: full joint, partial joint, and background. FIGS. 7(A) and 7(B) illustrate an example segmentation mask of a joint for a boreview and as applied to a video frame, respectively, in accordance with an example implementation. As illustrated in FIG. 7(A), a polygon segmentation mask is applied to video frames regarding the expected shape of a joint from the boreview. FIG. 7(B) illustrates the circular shape/joint being detected within a boreview classified video frame.

[0039] A sideview joint detection model 120 detects joints from the sideview of the pipeline and has two classes, sideview and background. FIGS. 8(A) and 8(B) illustrate an example segmentation mask of a joint for a sideview and as applied to a video frame, respectively, in accordance with an example implementation. As illustrated in FIG. 8(A), a polygon segmentation mask is applied to video frames regarding the expected shape of a joint from the sideview. FIG. 8(B) illustrates the linear shape being detected within a sideview classified video frame.

[0040] As described above, example implementations fit an ellipse on the predicted mask to detect joints. The detection of joints is an important part of the pipeline, as it allows for the computation of the “J” modifier as per PACP standard. A “J” modifier in a PACP output indicates proximity of a defect to a joint, which is important for maintenance purposes. The fitting of the ellipse also allows for calculation of the overlap between defects that look like a joint, and the joint itself to eliminate False Positives (FP). One such example is a Circumferential Crack (CC) as illustrated in FIG. 9 covered with boxes. Since the shape of a circumferential crack is easily confused with a joint, the main model tends to produce many FPs for CCs on a joint. If a CC overlaps a joint, it is eliminated from the output, thereby reducing the FPs.

[0041] FIGS. 10(A) to 10(D) illustrate an example application of a spatial attention module, in accordance with an example implementation. In example implementations, the main object detection model 105 can be created from YOLO or other variants such as YOLOv4. As an example, YOLO (You Only Look Once) and its variants such as YOLOv4 (YOLO fourth version) are at its core a real-time object detection system in the related art that is configured to recognize various objects in an image frame through dividing the image frame into grid cells. Each of the grid cells are then classified with probabilities based on known object classes, from which bounding boxes are generated on the image on objects based on the classification made in the grids. YOLO or its variants can be trained or implemented on a graphics processing unit (GPU). In example implementations described herein, YOLO or its variants is modified to detect defects based on a database of ground truth images of known defects.

[0042] YOLO or its variants may be composed of a backbone, a neck, and a head. The backbone is a neural network composed of convolution layers that are configured to extract the essential features from an image. The backbone architecture can be trained using pre-trained neural networks or otherwise in accordance with the desired implementation. The neck involves bottom-up paths and/or top-down paths to collect feature maps from various stages of the backbone. The head is the detector that conducts dense prediction in the form of a vector containing the coordinates of the predicted bounding box of a detected object along with the confidence score of the prediction and the corresponding label of the detected object. Example implementations described herein are based on the YOLOv4 variant, but other variants can be used in accordance with the desired implementation and the present disclosure is not limited there to.

[0043] In example implementations, various defects are detected from the main object detection model 105, including, but not limited to, roots, fractures, cracks, grease, encrustations, obstructions, deposits, taps, surface damage aggregate visible(SRV), and other PACP defects. To adjust the main object detection model 105, 110 for pipeline and sewer line purposes, modifications are made to the YOLOv4 with a spatial attention module (SAM) 1001 added to the convolutional layers 1000 of the backbone as illustrated in FIG. 10(A). SAM is a technique that facilitate spatial attention in CNN and generates a spatial attention map based on the inter-spatial relationship of features. Specifically, the SAM is configured to allow the network to ignore the background and provide more attention to the detected features based on the location of the features extracted by the convolutional layers 1000 of the backbone. Attention modules are used to make the CNN learn and focus more on the “important information”, rather than learning non-useful “background information” as shown in FIG. 10(C). In the case of object detection, useful information includes the objects or target class crop are classified and localized in an image. Depending on the desired implementation, the SAM can be in the form of a convolutional layer and a desired sigmoid function to generate a mask, heatmap, or other concatenated feature map from the input feature map, and can be formed from known techniques in accordance with the desired implementation. Sample defects detected by the main model are shown in FIGS. 9(A) - FIG. 9(1).

[0044] In the examples detection of defects of FIGS. 9(A) to (I), FIG. 9(A) illustrates an example detection of cracks and fractures. Examples of cracks and fractures that can be detected can involve Crack Longitudinal (CL), Crack Circumferential (CC), Fracture Longitudinal (FL), and so on. FIG. 9(B) illustrates an example detection of Fracture Multiple (FM). FIG. 9(C) illustrates detection of roots. Example of roots that can be detected can involve RBJ, RFJ, RMJ, and so on. FIG. 9(D) illustrates an example detection of Encrustations (DAE), FIG. 9(E) illustrates an example detection of Deposit Attached Grease (DAGS), FIG. 9(F) illustrates an example detection of Deposits (DSZ) , FIG. 9(G) illustrates an example detection of TAPS, FIG. 9(H) illustrates an example detection of Obstructions (OBZ), and FIG. 9(1) illustrates example detection of surface damage. Examples of surface damage that can be detected can involve Surface Damage Aggregate Visible (SRV), Joint Offsets (JOL, JOM), and so on. The above are examples of defects that can be detected through the techniques described herein and the present disclosure is not limited thereto. One of ordinary skill in the art can modify the object detection model to detect other desired defects according to the description herein to achieve the desired implementation.

[0045] In example implementations, the SAM 1001 can also be added to the convolutional layers 1000 of the YOLO head as well as illustrated in FIG. 10(B), so as to not lose the important information of the object being detected.

[0046] In example implementations, label smoothing is utilized (e.g., Iabel_smooth_eps=0.1 in the YOLO configuration) to differentiate between similar looking defects. Label Smoothing is a regularization technique that introduces noise for the labels to prevent overfitting or overconfidence in the predictions to facilitate the generation of a more robust model. In the context of the YOLO configuration for example implementations, noise is introduced to train the model to differentiate between similar looking defects. Cracks and fractures are a good example in the sewer industry, where the objects look similar, but have subtle differences. A crack that has opened up is classified as a fracture. Setting label_smooth_eps to 10%, allows the model to tackle over confidence and overfitting. However, the label smoothing can be adjusted depending on the desired implementation and pipeline, and the present disclosure is not limited thereto. In example implementations, various types of cracks and fractures such as Crack Longitudinal (CL), Crack Circumferential (CC), Fracture Longitudinal (FL) and so on can be used as labels per PACP standard codes.

[0047] The classification model 121 is trained with several images of various surface damage defects, and accurately detects these defects in unseen videos. Presently this model is invoked on the video frame only if the frame has not been classified as one of the intended defects of the multiclassifier model. In example implementations, the classification model 121 is constructed from a bilinear CNN with a SAM and a channel attention module as shown in FIG. 10(D). A bilinear CNN involves two CNNs utilized on the frame for image classification, wherein the outputs are aggregated and pooled to form a bilinear vector. The convolutional layers of the CNNs can be processed with a SAM as described herein and a channel attention module. The channel attention module focuses on the semantic attributes of the features as opposed to the location of the features in the SAM, and compress spatial dimensions of the input feature map to produce a channel attention map. Depending on the desired implementation, the channel attention module can be in the form of a convolutional layer and a desired sigmoid function. Such construction allows for the classification of video frames/images into various surface damage codes that includes worn down pipes, and more serious surface damages including the cases where sufficient concrete is missing. Sample detections are shown in FIG. 15(A) and 15(B).

[0048] A segmentation model 109 can be trained to detect water streamers (IR) and gushers (IG). Sample detections are shown in FIGS. 16(A) and 16(B). The segmentation masks for the water streamers (IR) and gushers (IG) can be formed in accordance with the desired implementation. In the example implementations described herein, the segmentation model 109 is the U-net++ segmentation model; however, other segmentation models may also be utilized in accordance with the desired implementation and the present disclosure is not limited thereto.

[0049] FIGS. 11(A) and 11(B) illustrate an example implementation of detection of severity of defects in video frames, in accordance with an example implementation. In example implementations, there are several values computed as additional data for various defects. One of these values is Roots Blockage Percentage computation at 115. A red green blue (RGB) image is cropped out of the detected root defects (e.g., RBJ, RFJ, RMJ) as illustrated in FIG. 11(A) and converted to grayscale as illustrated in FIG. 11(B). Then ratio of white and black pixels is computed to calculate roots block percentage. These values are typically required by PACP standard, but may not be required by other standards worldwide. Hence various such values are provided as an end to end solution, As the inference pipeline evolves post customer rollouts, various such value computations can be added and current ones modified in accordance with the desired implementation. Other examples of such values are percentage blockage of water flow in the sewer pipeline due to grease (DAGS), encrustations (DAE) and obstructions.

[0050] FIG. 12 illustrate an example merger of bounding boxes for defects, in accordance with an example implementation. As illustrated in FIG. 12, the various bounding boxes of labeled defects can be merged into a bigger box, if the boxes overlap or are in vicinity of each other.

[0051] FIG. 13 illustrates an example of a cloud deployment of the example implementations described herein. The cloud 1300 is composed of various computing and storage resources to execute a process to construct the models in the backend for deployment on instances of microservices 1301. The cloud 1300 is configured to prepare and load the data for training the model for deployment to the instances of microservices 1301 as shown at 1310- 1312. Once deployed, the cloud 1300 can conduct model tuning 1313 and model retraining 1314 for re-deployment based on the desired criteria.

[0052] Each of the microservices 1301 is composed of one or more special purpose hardware processors such as graphics processing unit (GPU) servers and storage systems as illustrated in FIG. 15. The microservice 1301 can utilize specialized hardware to facilitate the low latency execution of such models, and facilitate each of the model layer, the post processing layer, and the output layer. The output of the output layer from the microservices 1301 can involve output video frames with bounding boxes and labels indicating the defects, along with a CSV report, which can be sent to the user via a storage blob, a cloud service, or on-premise storage.

[0053] As illustrated in FIG. 1(A) and 1(B), the one or more special purpose hardware processors can be configured to, for receipt of a plurality of video frames of a sewer pipe, process the plurality of video frames through a model layer managing a multiple model pipeline, the multiple model pipeline including a boreview/sideview binary classifier model configured to determine each of the plurality of video frames as a boreview or a sideview and one or more models configured to detect one or more defects, the model layer configured to generate a first output involving identified defects of the sewer pipe from the plurality of video frames; process the first output of the model layer through a post-processing layer configured to remove false positives from the identified defects, identify continuous defects in the identified defects and determine a severity of the identified defects to produce a second output; and process the second output of the post-processing layer through an output layer configured to generate a defect report of the sewer pipe.

[0054] As illustrated in FIGS. 10(A) to 10(D), the one or more models can involve an object detection model configured to detect the one or more defects, the object detection model being a YOLO feature detection model modified with one or more spatial attention modules configured to adjust convolutional layers in the YOLO feature detection model to de- emphasize background information.

[0055] As illustrated in FIG. 1(B), the one or more models can involve anormal/deformed binary classifier model configured to determine ones of the plurality of video frames classified as the boreview as having a deformed sewer pipe or a normal sewer pipe.

[0056] As illustrated in FIG. 1(B), the one or more models can involve an elliptical (DFE)/bulge (DFBR) binary classifier model configured to determine ones of the plurality of video frames classified as having the deformed sewer pipe as having elliptical deformation or bulging projections.

[0057] As illustrated in FIG. 1(B) for ones of the plurality of video frames determined as having the deformed sewer pipe, the one or more special purpose hardware processors can be configured to classify the ones of the plurality of video frames determined as having the deformed sewer pipe as being a collapsed pipe for the ones of the plurality of video frames occurring at an end of a survey of the sewer pipe, or as being a broken sewer pipe otherwise.

[0058] As illustrated in FIG. 1(B) and FIGS. 9(A) to 9(1), the one or more models can involve an object detection model configured to one or more of cracks, fractures roots, grease, encrustations, obstructions, deposits, taps, or corroded metal pipe(SCP) in each of the plurality of video frames. [0059] As illustrated in FIGS. 1(B), 7(A), 7(B), 8(A), and 8(B), the one or more models involves a segmentation model configured to identify one or more of water level, streamers, and gushers in the sewer pipe from the plurality of video frames.

[0060] As illustrated in FIG. 1(B), the multiple model pipeline involves one or more segmentation models configured to identify joints of the sewer pipe in the plurality of video frames, wherein the post-processing layer remove the false positives from the identified defects by removing identified fractures or cracks in the first output on the identified joints.

[0061] As illustrated in FIG. 1(A) and 1(B), the one or more models can involve a classification model configured to determine surface damage to the sewer pipe based on a detected level of concrete erosion. Depending on the desired implementation, levels of concrete erosion can involve, but is not limited to, minor surface damage to the concrete, major surface damage to the concrete, metal pipe exposure due to the concrete being worn off, and so on.

[0062] As illustrated in FIGS. 11(A) and 11(B), the post-processing layer can be configured to determine a severity of the identified defects based on a blockage percentage detected in the sewer pipe from the plurality of video frames.

[0063] As illustrated in FIG. 1(B) and FIG. 12, the post-processing layer can be configured to identify continuous defects in the identified defects by merging continuous bounding boxes of identified defects into larger bounding boxes, and to convert bounding boxes of the identified defects into clock position for the second output.

[0064] As illustrated in FIG. 1 (B), the output layer is configured to generate a defect report including defect codes for the sewer pipe.

[0065] FIG. 14 illustrates an example integration to a user stack, in accordance with an example implementation. The microservices 1301 are configured to provide the output to user front end services or a user interface application via application programming interfaces (APIs).

[0066] Through the example implementations described herein, object detection in pipelines, in particular sewer lines, can thereby be automatically conducted to generate a PACP standard output. Defects can be detected in sewer lines worldwide, using country specific standards and coding schemes in accordance with the desired implementation. Although example implementations are described with respect to pipelines, other uses such as production lines, energy assets, mining lines, and so on, can also utilize the approaches described herein.

[0067] Some portions of the detailed description are presented in terms of algorithms and symbolic representations of operations within a computer. These algorithmic descriptions and symbolic representations are the means used by those skilled in the data processing arts to convey the essence of their innovations to others skilled in the art. An algorithm is a series of defined steps leading to a desired end state or result. In example implementations, the steps carried out require physical manipulations of tangible quantities for achieving a tangible result.

[0068] Unless specifically stated otherwise, as apparent from the discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing,” “computing,” “calculating,” “determining,” “displaying,” or the like, can include the actions and processes of a computer system or other information processing device that manipulates and transforms data represented as physical (electronic) quantities within the computer system’s registers and memories into other data similarly represented as physical quantities within the computer system’s memories or registers or other information storage, transmission or display devices.

[0069] Example implementations may also relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may include one or more general-purpose computers selectively activated or reconfigured by one or more computer programs. Such computer programs may be stored in a computer readable medium, such as a computer-readable storage medium or a computer-readable signal medium. A computer-readable storage medium may involve tangible mediums such as, but not limited to optical disks, magnetic disks, read-only memories, random access memories, solid state devices and drives, or any other types of tangible or non-transitory media suitable for storing electronic information. A computer readable signal medium may include mediums such as carrier waves. The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Computer programs can involve pure software implementations that involve instructions that perform the operations of the desired implementation.

[0070] Various general-purpose systems may be used with programs and modules in accordance with the examples herein, or it may prove convenient to construct a more specialized apparatus to perform desired method steps. In addition, the example implementations are not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the techniques of the example implementations as described herein. The instructions of the programming language(s) may be executed by one or more processing devices, e.g., central processing units (CPUs), processors, or controllers.

[0071] As is known in the art, the operations described above can be performed by hardware, software, or some combination of software and hardware. Various aspects of the example implementations may be implemented using circuits and logic devices (hardware), while other aspects may be implemented using instructions stored on a machine-readable medium (software), which if executed by a processor, would cause the processor to perform a method to carry out implementations of the present application. Further, some example implementations of the present application may be performed solely in hardware, whereas other example implementations may be performed solely in software. Moreover, the various functions described can be performed in a single unit, or can be spread across a number of components in any number of ways. When performed by software, the methods may be executed by a processor, such as a general purpose computer, based on instructions stored on a computer-readable medium. If desired, the instructions can be stored on the medium in a compressed and/or encrypted format.

[0072] Moreover, other implementations of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the techniques of the present application. Various aspects and/or components of the described example implementations may be used singly or in any combination. It is intended that the specification and example implementations be considered as examples only, with the true scope and spirit of the present application being indicated by the following claims.

Claims

CLAIMS What is claimed is:

1. A method, comprising: for receipt of a plurality of video frames of a sewer pipe: processing the plurality of video frames through a model layer managing a multiple model pipeline, the multiple model pipeline comprising a boreview/sideview binary classifier model configured to determine each of the plurality of video frames as a boreview or a sideview and one or more models configured to detect one or more defects, the model layer configured to generate a first output comprising identified defects of the sewer pipe from the plurality of video frames; processing the first output of the model layer through a post-processing layer configured to remove false positives from the identified defects, identify continuous defects in the identified defects and determine a severity of the identified defects to produce a second output; and processing the second output of the post-processing layer through an output layer configured to generate a defect report of the sewer pipe.

2. The method of claim 1, wherein the one or more models comprises an object detection model configured to detect the one or more defects, the object detection model being a YOLO feature detection model modified with one or more spatial attention modules configured to adjust convolutional layers in the YOLO feature detection model to de- emphasize background information.

3. The method of claim 1, wherein the one or more models comprises a normal/deformed binary classifier model configured to determine ones of the plurality of video frames classified as the boreview as having a deformed sewer pipe or a normal sewer pipe.

4. The method of claim 3, wherein the one or more models comprises an elliptical (DFE)/bulge (DFBR) binary classifier model configured to determine ones of the plurality of video frames classified as having the deformed sewer pipe as having elliptical deformation or bulging projections.

5. The method of claim 2, wherein for ones of the plurality of video frames determined as having the deformed sewer pipe, classifying the ones of the plurality of video frames determined as having the deformed sewer pipe as being a collapsed pipe for the ones of the plurality of video frames occurring at an end of a survey of the sewer pipe, or as being a broken sewer pipe otherwise.

6. The method of claim 1, wherein the one or more models comprises an object detection model configured to one or more of cracks, fractures roots, grease, encrustations, obstructions, deposits, taps, or corroded metal pipe(SCP) in each of the plurality of video frames.

7. The method of claim 1, wherein the one or more models comprises a segmentation model configured to identify one or more of water level, streamers, and gushers in the sewer pipe from the plurality of video frames.

8. The method of claim 1, wherein the multiple model pipeline comprises one or more segmentation models configured to identify joints of the sewer pipe in the plurality of video frames, wherein the post-processing layer remove the false positives from the identified defects by removing identified fractures or cracks in the first output on the identified joints.

9. The method of claim 1, wherein the one or more models comprises a classification model configured to determine surface damage to the sewer pipe based on a detected level of concrete erosion.

10. The method of claim 1 , wherein the post-processing layer is configured to determine a severity of the identified defects based on a blockage percentage detected in the sewer pipe from the plurality of video frames.

11. The method of claim 1, wherein the post-processing layer is configured to identify continuous defects in the identified defects by merging continuous bounding boxes of identified defects into larger bounding boxes, and wherein the post-processing layer is configured to convert bounding boxes of the identified defects into clock position for the second output.

12. The method of claim 1, wherein the output layer is configured to generate a defect report comprising defect codes for the sewer pipe.

13. A computer program, storing instructions that are executed by one or more special purpose hardware processors comprising: for receipt of a plurality of video frames of a sewer pipe: processing the plurality of video frames through a model layer managing a multiple model pipeline, the multiple model pipeline comprising a boreview/sideview binary classifier model configured to determine each of the plurality of video frames as a boreview or a sideview and one or more models configured to detect one or more defects, the model layer configured to generate a first output comprising identified defects of the sewer pipe from the plurality of video frames; processing the first output of the model layer through a post-processing layer configured to remove false positives from the identified defects, identify continuous defects in the identified defects and determine a severity of the identified defects to produce a second output; and processing the second output of the post-processing layer through an output layer configured to generate a defect report of the sewer pipe.

14. An apparatus, comprising: one or more special purpose hardware processors, configured to: for receipt of a plurality of video frames of a sewer pipe: processing the plurality of video frames through a model layer managing a multiple model pipeline, the multiple model pipeline comprising a boreview/sideview binary classifier model configured to determine each of the plurality of video frames as a boreview or a sideview and one or more models configured to detect one or more defects, the model layer configured to generate a first output comprising identified defects of the sewer pipe from the plurality of video frames; processing the first output of the model layer through a post-processing layer configured to remove false positives from the identified defects, identify continuous defects in the identified defects and determine a severity of the identified defects to produce a second output; and processing the second output of the post-processing layer through an output layer configured to generate a defect report of the sewer pipe.