WO2019106095A1 - Hierarchical image interpretation system - Google Patents

Hierarchical image interpretation system Download PDF

Info

Publication number
WO2019106095A1
WO2019106095A1 PCT/EP2018/083023 EP2018083023W WO2019106095A1 WO 2019106095 A1 WO2019106095 A1 WO 2019106095A1 EP 2018083023 W EP2018083023 W EP 2018083023W WO 2019106095 A1 WO2019106095 A1 WO 2019106095A1
Authority
WO
WIPO (PCT)
Prior art keywords
sub
image
region
node
hierarchy
Prior art date
Application number
PCT/EP2018/083023
Other languages
French (fr)
Inventor
Daniel Hubert
Ben BOUTCHER-WEST
Gabriel J. Brostow
Original Assignee
Yellow Line Parking Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yellow Line Parking Ltd. filed Critical Yellow Line Parking Ltd.
Publication of WO2019106095A1 publication Critical patent/WO2019106095A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • G06V20/582Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of traffic signs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/63Scene text, e.g. street names
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • G06V30/18143Extracting features based on salient regional features, e.g. scale invariant feature transform [SIFT] keypoints
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/196Recognition using electronic means using sequential comparisons of the image signals with a plurality of references
    • G06V30/1983Syntactic or structural pattern recognition, e.g. symbolic string recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/416Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the present disclosure relates to a system and method for parsing parking signs.
  • OCR Optical Character Recognition
  • An aspect of the invention provides a method for parsing parking signs, comprising receiving image data representing an image, processing the image data to determine a first information region in the image and associating the first information region with a parent node of a hierarchy, processing the image data to determine one or more information sub-regions wholly contained within the first information region and associating each determined sub-region with a sub-node of the hierarchy, wherein each sub-node is a child to the parent node, and outputting data indicative of the hierarchy.
  • the method may further comprise iteratively determining one or more further sub-regions wholly contained within one or more previously determined sub-regions and associating each further determined sub-region with a further sub-node of the hierarchy, wherein each farther sub-node is a child to the corresponding previously determined parent sub-node.
  • images comprising multiple regions of interest and therefore multiple levels of association may be parsed.
  • Determining a first information region or sub-region of the image may comprise using one or more feature detection algorithms.
  • the feature detection algorithm comprises one or more of Histogram of Oriented Gradients (HOG), Scale-Invariant Feature Transform (SIFT), Speeded up Robust Feature (SURF), Harr-like features, or a neural network.
  • HOG Histogram of Oriented Gradients
  • SIFT Scale-Invariant Feature Transform
  • SURF Speeded up Robust Feature
  • Harr-like features or a neural network.
  • information regions of interest may be efficiently determined.
  • the method may further comprise determining a semantic classification of each information region or sub- region and associating the semantic classification with the corresponding node or sub-node.
  • determining a semantic classification of each information region may comprise using a classification
  • the classification algorithm comprises using one or more of a neural network, decision forest, or logistic regression algorithm.
  • the hierarchical associations may be preserved through to semantic understanding.
  • the method may further comprise using a non-maximal suppression method to prevent overlap of determined sub-regions.
  • a non-maximal suppression method to prevent overlap of determined sub-regions.
  • the method may comprise co-training the classification algorithm with the feature detection algorithm.
  • the method may comprise training one or more of the feature detection and classification algorithms using data indicative of a predicted hierarchy.
  • the method may comprise training one or more of the feature detection and classification algorithms using data indicative of a position of one or more information regions.
  • this increases the accuracy of the feature detection and classification algorithms for determining and classifying spatial information regions.
  • the determined information regions are of different sizes.
  • this allows for increased diversification of input images to be parsed.
  • An aspect of the invention provides a system for parsing parking signs, comprising input means arranged to receive image data representing an image; processing means arranged to: determine a first information region in the image and associate the first information region with a parent node of a hierarchy, determine one or more information sub-regions wholly contained within the first information region and associating each determined sub-region with a sub-node of the hierarchy, wherein each sub-node is a child to the parent node; and output means arranged to output data indicative of the hierarchy.
  • Figure 1 shows an example of an image to be parsed
  • Figure 2 shows a system for parsing images
  • Figure 3 shows a method for parsing images
  • Figure 4 shows an example of a parsed image
  • Figure 5 shows an output hierarchy
  • Figure 6 shows an example of a semantic classification method.
  • FIG. 1 shows an image of a parking sign 100.
  • the parking sign consists of multiple information regions 1 10, 120, 130, wherein each region comprises one or more rules 140 (e.g. No loading, pay at machine, etc.) and times 150 (7-10am, Mon-Fri, etc.).
  • rules 140 e.g. No loading, pay at machine, etc.
  • times 150 7-10am, Mon-Fri, etc.
  • each rale is associated with one or more respective times, and govern the state of a parking space.
  • other rules are possible, such as the type/purpose of the parking space or other instructions on use.
  • a hierarchy therefore exists for the rules and times in the image of the parking sign, and a hierarchy may therefore be defined as a representation of the associations between regions of the image containing semantic information, A hierarchy may also be defined for other types of data, such as the associations between characters or words in a text string, or subjects in a video.
  • the information represented in the parking sign 100 may thus be represented hierarchically by representing the specific associations between individual rules and times.
  • FIG. 2 illustrates a system 200 for image parsing. Parsing in this context refers to extracting or interpreting information or semantic understanding from an input, such as an image. However, other forms of input may be envisaged, such as text or video.
  • the system 200 comprises an input means 210 arranged to receive image data 205 representing an image containing one or more information regions, a processing means 220 to process the image data to determine a first information region in the image and associate the first information region with a parent node of a hierarchy and determine one or more information sub-regions wholly contained within the first information region and associating each determined sub-region with a sub-node of the hierarchy, wherein each sub-node is a child to the parent node, and an output means 230 to output data indicative of the hierarchy.
  • the hierarchy may comprise any suitable data structure operable to the processing means 220, such as a tree, linked list, or relational database.
  • the input means 210 is arranged to receive image data 205 representing an image containing one or more information regions comprising a physical region of the image in which inter-associated information is contained, as will be explained later.
  • the image data 205 may be received from any suitable image capture means or data storage means and through any suitable wired or wireless connection to the input means 210.
  • the image capture means may comprise a sensor, such as a LIDAR sensor, or any other suitable image capture device. Typically such sensors will operate based on visual information, however as parking signs develop and communicate using other media, including electronic, it is envisaged that the image data 205 may be received from other suitable sensors and/or devices.
  • the input means 210 is digitally coupled to the processing means 220, which may comprise one or more processing devices.
  • the processing means 220 may be coupled to a database 225.
  • the processing means 220 may be coupled to a database 225.
  • each region 220 may be arranged to perform one or more computer vision met!iods on the received image data 205 to determine one or more information regions or sub-regions and designate each region as a node in a hierarchy. In this way, each region may be associated with a node in a hierarchy, as will be explained later.
  • the processing means 220 is digitally coupled to the output means 230.
  • the output means 230 may comprise any suitable means for outputting data indicative of the hierarchy 235, for example connected device e.g. a computing device, data storage means, distributed platform, or display means.
  • Figure 3 illustrates a method 300 for image parsing.
  • the method 300 may be performed using the system 200.
  • the method 300 comprises the step 310 of receiving image data representing an image, the image containing or more information regions.
  • the step 310 may be performed using the input means 210 as shown in Figure 2.
  • Information regions may comprise an area containing text, objects, or other image features of interest. Examples of information regions include the rales and times shown in the parking sign in Figure 1. Other examples may include images comprising information regions showing faces, clothing, brands, buildings, vehicles, or other image features or semantic objects to be determined.
  • the method 300 further comprises the step 320 of determining a first information region in the image and associating the first information region with a parent node (i.e. designating the first information region as the parent node) of a hierarchy.
  • the step 320 may be performed using the processing means 220 as shown in Figure 2.
  • Determining the first information region may be performed using any suitable object detection algorithm, such as a feature detection algorithm.
  • feature detection algorithms include, amongst others, Histogram of Oriented Gradients (HOG), Scale-Invariant Feature Transform (SIFT), Speeded up Robust Feature (SURF), Harr-like features, or a neural network, which provide indications of relevant points of the image.
  • the object detection algorithm may further comprise any suitable machine-learning algorithm arranged to determine a spatial area of the image, such as Fast R-CNN.
  • the determined information region may be any suitably shaped subset of pixels of the image comprising the object or objects of interest.
  • the infonnation region may be rectangular or square, or comprise any other non-regular shape.
  • the feature detection algorithm may have been trained on previous datasets indicative of the feature to be detected, such as images of parking signs.
  • the first information region Upon processing the image data to determine the first information region in the image, the first information region is associated with a parent node of a hierarchy.
  • the parent node may be the root node of the hierarchy, and may represent the main region of interest determined in the image.
  • the hierarchy may be represented by any suitable data format operable to the processing means 220, and may be stored in the database 225.
  • the parent node may be associated with infonnation relating to one or more of the size, position, or contents of the infonnation region determined.
  • image processing may then be applied to the image data corresponding to the determined first infonnation region to prepare for determining an information sub-region.
  • the image processing may comprise any suitable image processing operation such as image rectification, cropping, rotation, warping, hue/saturation/contrast alterations, de-noising, sharpening, blurring etc.
  • the image processing operations may be applied to the image data manually by a user, or may be applied automatically by the processing means 220 based on determined parameter values of the image data.
  • the determined parameter values may comprise one or more of an image skewness, distortion, hue, saturation, contrast, brightness, contrast, noise level, sharpness, and blur level, however other image parameter values will be envisaged.
  • an image processing operation may be applied to the image data to increase the brightness.
  • the parameter values may be determined algorithmically. For example, if an image processing algorithm determines the subject-matter of an image is rotated beyond a predetermined axis, a rotation operation may be applied to the image data to align the axis of its subject-matter with the predetermined axis.
  • the method 300 further comprises the step 330 of processing the image data to determine one or more information sub-regions wholly contained within the first information region and associating (i.e. designating) each determined sub-region with/as a sub-node, wherein each sub-node is a child to the parent node, i.e. the node associated with the first information region.
  • the step of determining the information sub-regions may again be performed using any suitable object detection algorithm, such as a feature detection algorithm.
  • the feature detection algorithm for determining the information sub-region may be trained to determine a different feature from the first feature detection algorithm arranged to determine a first information region, such as sections of a parking sign. Each determined information sub-region may any suitably shaped subset of pixels of the first information region.
  • Each sub-region may be of different sizes or shapes.
  • a non-maximal suppression method may be applied when determining information sub- regions to prevent overlap of the determined sub-regions.
  • Each sub-node may be associated with information relating to one or more of the size, position, or contents of the corresponding information sub- region detemiined. hi this way, a hierarchy of nodes may be formed, comprising a representation of associations between information sub-regions.
  • the method 300 may optionally also comprise the step 340 of determining a semantic classification for each information detemiined region or sub-region.
  • Any suitable classifier may be used to determine the semantic classification of each region or sub-region.
  • each sub-region of the first information region may have a classifier applied to determine the class of objects present within each region.
  • the classifier may comprise any suitable classification algorithm, such as neural network, decision forest, and logistic regression algorithms, however other algorithms will be envisaged.
  • the determined semantic classification may be compared to an existing dataset of semantic classifications to identify inconsistencies or anomalies in the detemiined semantic classification. In this way, the determined semantic classification may be validated based on the historical context of previous classifications.
  • the semantic classification of a parking sign rule may be compared against a dataset of existing semantic classifications for parking signs in order to determine a valid classification. This prevents errors arising in the semantic classification due to noise (such as poor quality images, objects blocking the view of a sign, graffiti on signs, etc.) or algorithmic error.
  • a confidence score may also be produced by the classifier which is indicative of the degree of certainty or error by the classifier. This confidence score may reflect the quality of the input. For example, when applied to parking signs, the confidence score may reflect the physical quality of the parking sign, which may be useful for physical sign maintenance purposes.
  • the classifier may have been co-trained with the feature detection algorithms.
  • the detemiined semantic classification may also be associated with the corresponding node, to provide semantic meaning to each node in the hierarchy. Further image processing may be applied to each determined information region or sub-region before classification, such as image rectification, cropping, rotation, warping, hue'saturation alterations, etc.
  • the semantic classification may be computer-parsable.
  • the method 300 may also comprise the step 335 of iteratively determining one or more further sub-regions wholly contained within one or more previously determined sub-regions, and associating each further detemiined sub-region with a further sub-node of the hierarchy, wherein each further sub-node is a child to the corresponding previously detemiined parent sub-node.
  • a hierarchy comprising multiple levels of nodes may be provided.
  • the number of iterations to determine sub-regions may be predetermined, or may be algorithmically detemiined.
  • semantic classifiers may be applied to each determined sub-region and associated with the corresponding node in the hierarchy.
  • Each iteration may of the step 335 may apply a different feature detection algorithm. In this way, each layer of nodes in the hierarchy may correspond to different a class of detected feature.
  • Each iteration of the step 335 may also apply a different semantic classifier.
  • Training of feature detection algorithms may be performed by specifying a structure of hierarchy to be output from the method 300, i.e. in a supervised manner. Training of the feature detection algorithms may further involve segmenting each image to explicitly indicate the position anchor sizes of each information region. Training the feature detection algorithms may comprise training from a collective dataset indicative of all the features to be determined, or each feature detection algorithm may be trained on a different dataset respectively.
  • the method 300 comprises the step 350 of outputting data indicative of the hierarchy.
  • the data indicative of the hierarchy may comprise data relating to one or more of the size, shape, contents, and semantic classification associated with each node, as well as the specific parent-child relationships between each node.
  • the data indicative of the hierarchy may be formatted in any suitable data structure operable to the processing means 220, such as a tree, linked list, relational table, or otherwise.
  • data indicative of a hierarchy may be stored in a data storage means such that the data may be retrieved for further processing in future.
  • the data indicative of the liierarchy may comprise any suitable data structure operable to the processing means 220.
  • the data structure may be arranged such that individual nodes and associated content may be edited, removed, or added to the hierarchy whilst preserving the remaining hierarchy structure.
  • an individual node of the hierarchy may be updated upon request from the processing means 220 to a different semantic classification.
  • an explicit instruction rnay be received by the processing means 220 to update, remove, overwrite, overrule, or add a node to an existing hierarchy.
  • the processing means 220 may receive an electronic instruction to modify the parking times associated with a parking sign that has been parsed as a hierarchy.
  • the electronic instruction may be received over a wired or wireless network from an external device, such as a computing server or computing device.
  • the semantic classification contained within the node corresponding to the parking time associated with that parking sign will be updated.
  • the method 300 may be applied to a second input, and may output data indicative of a hierarchy that is different to a previously determined hierarchy.
  • the method 300 may be re applied to an image of a parking sign that has already been parsed.
  • the differences between semantic classifications contained in each of the nodes of the previously determined hierarchy and the newly determined hierarchy may be identified, and the previous hierarchy may be updated to include the new semantic classifications.
  • the second input may comprise an image, however other forms of input may be envisaged.
  • multiple related images comprising identically located information regions having differing semantic information may be received by the method 300.
  • multiple images of varying states of a digital parking sign may be received, wherein each image contains information regions having different semantic information, reflecting the changing nature of the digital parking sign.
  • a digital parking sign may show“No parking” on a Monday between 9am and 5pm. However, outside of these times, the digital parking sign may show“1 hour parking - no return within 2 hours”.
  • multiple parent nodes may be created for the hierarchy with respect to each set of semantic information.
  • Figure 4 shows an exemplary application 400 of the method 300 applied to the image of the parking sign 100 shown in Figure 1.
  • Figure 5 similarly shows the hierarchy 500 determined from 400.
  • the method 300 is particularly useful for processing parking signs, as they are naturally arranged as a hierarchy of rules, and therefore by explicitly analysing the grouping of components of the sign and retaining the associated hierarchy, the semantic understanding of the sign is not lost.
  • other forms of input may be used, such as road markings, signalling equipment, etc.
  • Examples of traffic-related inputs for which the present invention may be applied to include images of parking signs, warning signs, regulatory signs, speed limit signs, low bridge signs, level crossing signs and signals, train signs, signals and road markings, bus and cycle signs and road markings, pedestrian zone signs, traffic calming signs, motorway signs, signals, and road markings, directional signs, information signs, traffic signals, road work signs, and others.
  • Applying the step 320 to the parking sign 100 provides the first information region 410.
  • the first information region 410 determined corresponds to the entire area of the sign itself, which therefore comprises the parent node 510 of the hierarchy 500.
  • a feature detection algorithm trained to detect signs was used to determine the first information region 410.
  • the information region 410 comprises multiple semantic objects of interest.
  • S determined sign is fronto-parallel.
  • the warping is performed by regressing a two-dimensional offset applied to each comer of the determined image region corresponding to the sign.
  • One or more information sub-regions 422, 424, 426 wholly contained within the processed first information region 420 are determined, in accordance with step 330.
  • feature detection algorithms trained to detect individual sections of a sign were used to determine the information sub-regions 422, 424, 426.
  • each of the determined sub-regions is associated with a sub-node of the hierarchy, as shown by sub-nodes 520, 530 540 in Figure 5.
  • Each of the sub-nodes 520, 530 and 540 is a child to the parent node 510, and the parent-child relationships between each node can clearly be seen.
  • One or more further information sub-regions wholly contained within each of the previously determined sub-regions is then determined, as per step 335.
  • Further sub-regions 432, 434, 436, 438 for example have been determined from sub-region 422.
  • feature detection algorithms trained to detect individual rules and times were used to determine the further sub-regions 432, 434, 436, 438.
  • the region 440 corresponding to the instructional message‘Display ticket’ has not been determined as a sub-region, due to the feature detection algorithm being selected to ignore this type of region.
  • each of the determined further sub-regions is associated with a sub-node of the hierarchy, as shown for example by sub-nodes 520, 530 540 in Figure 5.
  • Each of the sub-nodes 522, 524, 526, 528 is a child to the parent sub-node 520, and the parent-child relationships between each node can clearly be seen.
  • Figure 5 it can be seen that a semantic classification of each of the further sub-regions 432, 434, 436 has been performed and the resulting classification has been associated with the corresponding sub-nodes 522, 524, 528, such that each of the sub-nodes has been categorised into‘Rule’,‘Applies weekdays’ ,‘Applies times’, and‘Return period’ categories.
  • the semantic classification may therefore comprise a determined rule or time period relating to the state of the parking space, and the determined hierarchy is indicative of the times the rules are applicable.
  • Figure 5 therefore shows a resulting hierarchy, wherein each node corresponds to a region or object of interest.
  • the spatial/positional associations of each rule and time has advantageously been retained in the hierarchical structure, such that the semantic understanding of the parking sign 100 has not been lost.
  • Figure 6 shows an example of the semantic classifiers applied to some of the determined sub-regions, wherein each semantic classification is performed using a classification algorithm trained to return a relevant rale or time category.
  • 610, 620, and 630 show an example application of‘Day’,‘Rule’, and‘Penult code’ semantic classifiers using Convolutional Neural Networks and OCR parsers.
  • the resulting semantic classifications 640 are output in a computer-readable format.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

There is provided a system and method for parsing parking signs, comprising receiving image data representing an image, processing the image data to determine a first information region in the image and associating the first information region with a parent node of a hierarchy, processing the image data to determine one or more information sub-regions wholly contained within the first information region, and associating each determined sub-region with a sub-node of the hierarchy, wherein each sub-node is a child to the parent node, and outputting data indicative of the hierarchy.

Description

HIERARCHICAL IMAGE INTERPRETATION SYSTEM
TECHNICAL FIELD
The present disclosure relates to a system and method for parsing parking signs.
BACKGROUND
It is generally desired to provide image parsing systems capable of extracting useful information from digital images, for example using computer vision techniques. Traditionally, computer vision involves the acquisition, extraction and analysis of information present in one or more images through algorithmic or analytical methods to acliieve a visual understanding of the image. The applications of computer vision are numerous, and examples include Optical Character Recognition (OCR), object detection and recognition, biometrics, and others.
It is an aim of the invention to mitigate one or more problems of the prior art.
SUMMARY OF THE INVENTION
An aspect of the invention provides a method for parsing parking signs, comprising receiving image data representing an image, processing the image data to determine a first information region in the image and associating the first information region with a parent node of a hierarchy, processing the image data to determine one or more information sub-regions wholly contained within the first information region and associating each determined sub-region with a sub-node of the hierarchy, wherein each sub-node is a child to the parent node, and outputting data indicative of the hierarchy.
The method may further comprise iteratively determining one or more further sub-regions wholly contained within one or more previously determined sub-regions and associating each further determined sub-region with a further sub-node of the hierarchy, wherein each farther sub-node is a child to the corresponding previously determined parent sub-node. Advantageously, images comprising multiple regions of interest and therefore multiple levels of association may be parsed.
Determining a first information region or sub-region of the image may comprise using one or more feature detection algorithms. Optionally, the feature detection algorithm comprises one or more of Histogram of Oriented Gradients (HOG), Scale-Invariant Feature Transform (SIFT), Speeded up Robust Feature (SURF), Harr-like features, or a neural network. Advantageously, information regions of interest may be efficiently determined.
The method may further comprise determining a semantic classification of each information region or sub- region and associating the semantic classification with the corresponding node or sub-node. Optionally, determining a semantic classification of each information region may comprise using a classification
I algoritlun. Optionally, the classification algorithm comprises using one or more of a neural network, decision forest, or logistic regression algorithm. Advantageously, the hierarchical associations may be preserved through to semantic understanding.
The method may further comprise using a non-maximal suppression method to prevent overlap of determined sub-regions. Advantageously, this prevents information from being parsed repeatedly.
The method may comprise co-training the classification algorithm with the feature detection algorithm.
The method may comprise training one or more of the feature detection and classification algorithms using data indicative of a predicted hierarchy.
The method may comprise training one or more of the feature detection and classification algorithms using data indicative of a position of one or more information regions. Advantageously, this increases the accuracy of the feature detection and classification algorithms for determining and classifying spatial information regions.
Optionally, the determined information regions are of different sizes. Advantageously, this allows for increased diversification of input images to be parsed.
An aspect of the invention provides a system for parsing parking signs, comprising input means arranged to receive image data representing an image; processing means arranged to: determine a first information region in the image and associate the first information region with a parent node of a hierarchy, determine one or more information sub-regions wholly contained within the first information region and associating each determined sub-region with a sub-node of the hierarchy, wherein each sub-node is a child to the parent node; and output means arranged to output data indicative of the hierarchy.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 shows an example of an image to be parsed;
Figure 2 shows a system for parsing images;
Figure 3 shows a method for parsing images;
Figure 4 shows an example of a parsed image;
Figure 5 shows an output hierarchy;
Figure 6 shows an example of a semantic classification method.
DETAILED DESCRIPTION In the field of computer vision, the extraction and analysis of information present in images is particularly important for gaining an understanding of the image. However, semantic information can often be lost with traditional computer vision techniques.
Many images contain information whose natural representation is a hierarchy, i.e. an image may contain a multiple regions of information that are inherently associated with each other. Figure 1 for example shows an image of a parking sign 100. The parking sign consists of multiple information regions 1 10, 120, 130, wherein each region comprises one or more rules 140 (e.g. No loading, pay at machine, etc.) and times 150 (7-10am, Mon-Fri, etc.). It will be appreciated that each rale is associated with one or more respective times, and govern the state of a parking space. It will be appreciated that other rules are possible, such as the type/purpose of the parking space or other instructions on use. A hierarchy therefore exists for the rules and times in the image of the parking sign, and a hierarchy may therefore be defined as a representation of the associations between regions of the image containing semantic information, A hierarchy may also be defined for other types of data, such as the associations between characters or words in a text string, or subjects in a video. The information represented in the parking sign 100 may thus be represented hierarchically by representing the specific associations between individual rules and times.
When using traditional image parsing techniques such as OCR or neural networks, it becomes difficult to retain the hierarchical associations between information. For example, a standard OCR method may perfectly convert the image of the parking sign in Figure 1 into text, but will lose understanding of the required positional or spatial associations between elements, and therefore will lose context of the associations between individual rules and times. In this case, further semantic processing must be required in order to provide a full understanding of the purpose of the sign. Without a hierarchical representation of the information, it becomes difficult for an image parsing system to understand specifically what times are associated with‘No loading’ or‘Pay at machine’, or even to distinguish that‘No loading’ or‘Pay at machine’ are not associated together. It will be appreciated therefore that an image parsing system capable of extracting and retaining hierarchically represented information is required.
Figure 2 illustrates a system 200 for image parsing. Parsing in this context refers to extracting or interpreting information or semantic understanding from an input, such as an image. However, other forms of input may be envisaged, such as text or video. The system 200 comprises an input means 210 arranged to receive image data 205 representing an image containing one or more information regions, a processing means 220 to process the image data to determine a first information region in the image and associate the first information region with a parent node of a hierarchy and determine one or more information sub-regions wholly contained within the first information region and associating each determined sub-region with a sub-node of the hierarchy, wherein each sub-node is a child to the parent node, and an output means 230 to output data indicative of the hierarchy. The hierarchy may comprise any suitable data structure operable to the processing means 220, such as a tree, linked list, or relational database. The input means 210 is arranged to receive image data 205 representing an image containing one or more information regions comprising a physical region of the image in which inter-associated information is contained, as will be explained later. The image data 205 may be received from any suitable image capture means or data storage means and through any suitable wired or wireless connection to the input means 210. The image capture means may comprise a sensor, such as a LIDAR sensor, or any other suitable image capture device. Typically such sensors will operate based on visual information, however as parking signs develop and communicate using other media, including electronic, it is envisaged that the image data 205 may be received from other suitable sensors and/or devices.
The input means 210 is digitally coupled to the processing means 220, which may comprise one or more processing devices. The processing means 220 may be coupled to a database 225. The processing means
220 may be arranged to perform one or more computer vision met!iods on the received image data 205 to determine one or more information regions or sub-regions and designate each region as a node in a hierarchy. In this way, each region may be associated with a node in a hierarchy, as will be explained later.
The processing means 220 is digitally coupled to the output means 230. The output means 230 may comprise any suitable means for outputting data indicative of the hierarchy 235, for example connected device e.g. a computing device, data storage means, distributed platform, or display means.
Figure 3 illustrates a method 300 for image parsing. The method 300 may be performed using the system 200.
The method 300 comprises the step 310 of receiving image data representing an image, the image containing or more information regions. The step 310 may be performed using the input means 210 as shown in Figure 2. Information regions may comprise an area containing text, objects, or other image features of interest. Examples of information regions include the rales and times shown in the parking sign in Figure 1. Other examples may include images comprising information regions showing faces, clothing, brands, buildings, vehicles, or other image features or semantic objects to be determined.
The method 300 further comprises the step 320 of determining a first information region in the image and associating the first information region with a parent node (i.e. designating the first information region as the parent node) of a hierarchy. The step 320 may be performed using the processing means 220 as shown in Figure 2. Determining the first information region may be performed using any suitable object detection algorithm, such as a feature detection algorithm. Examples of feature detection algorithms include, amongst others, Histogram of Oriented Gradients (HOG), Scale-Invariant Feature Transform (SIFT), Speeded up Robust Feature (SURF), Harr-like features, or a neural network, which provide indications of relevant points of the image. The object detection algorithm may further comprise any suitable machine-learning algorithm arranged to determine a spatial area of the image, such as Fast R-CNN. The determined information region may be any suitably shaped subset of pixels of the image comprising the object or objects of interest. For example, the infonnation region may be rectangular or square, or comprise any other non-regular shape. The feature detection algorithm may have been trained on previous datasets indicative of the feature to be detected, such as images of parking signs.
Upon processing the image data to determine the first information region in the image, the first information region is associated with a parent node of a hierarchy. The parent node may be the root node of the hierarchy, and may represent the main region of interest determined in the image. The hierarchy may be represented by any suitable data format operable to the processing means 220, and may be stored in the database 225. The parent node may be associated with infonnation relating to one or more of the size, position, or contents of the infonnation region determined.
Optionally, image processing may then be applied to the image data corresponding to the determined first infonnation region to prepare for determining an information sub-region. The image processing may comprise any suitable image processing operation such as image rectification, cropping, rotation, warping, hue/saturation/contrast alterations, de-noising, sharpening, blurring etc. The image processing operations may be applied to the image data manually by a user, or may be applied automatically by the processing means 220 based on determined parameter values of the image data. The determined parameter values may comprise one or more of an image skewness, distortion, hue, saturation, contrast, brightness, contrast, noise level, sharpness, and blur level, however other image parameter values will be envisaged. For example, if a value of a brightness parameter of the image data is determined to be below a predetermined threshold, an image processing operation may be applied to the image data to increase the brightness. In some embodiments, the parameter values may be determined algorithmically. For example, if an image processing algorithm determines the subject-matter of an image is rotated beyond a predetermined axis, a rotation operation may be applied to the image data to align the axis of its subject-matter with the predetermined axis.
The method 300 further comprises the step 330 of processing the image data to determine one or more information sub-regions wholly contained within the first information region and associating (i.e. designating) each determined sub-region with/as a sub-node, wherein each sub-node is a child to the parent node, i.e. the node associated with the first information region. The step of determining the information sub-regions may again be performed using any suitable object detection algorithm, such as a feature detection algorithm. The feature detection algorithm for determining the information sub-region may be trained to determine a different feature from the first feature detection algorithm arranged to determine a first information region, such as sections of a parking sign. Each determined information sub-region may any suitably shaped subset of pixels of the first information region. Each sub-region may be of different sizes or shapes. A non-maximal suppression method may be applied when determining information sub- regions to prevent overlap of the determined sub-regions. Each sub-node may be associated with information relating to one or more of the size, position, or contents of the corresponding information sub- region detemiined. hi this way, a hierarchy of nodes may be formed, comprising a representation of associations between information sub-regions.
The method 300 may optionally also comprise the step 340 of determining a semantic classification for each information detemiined region or sub-region. Any suitable classifier may be used to determine the semantic classification of each region or sub-region. For example, in a first information region comprising multiple objects of interest, each sub-region of the first information region may have a classifier applied to determine the class of objects present within each region. The classifier may comprise any suitable classification algorithm, such as neural network, decision forest, and logistic regression algorithms, however other algorithms will be envisaged. The determined semantic classification may be compared to an existing dataset of semantic classifications to identify inconsistencies or anomalies in the detemiined semantic classification. In this way, the determined semantic classification may be validated based on the historical context of previous classifications. For example, the semantic classification of a parking sign rule may be compared against a dataset of existing semantic classifications for parking signs in order to determine a valid classification. This prevents errors arising in the semantic classification due to noise (such as poor quality images, objects blocking the view of a sign, graffiti on signs, etc.) or algorithmic error. In some embodiments, a confidence score may also be produced by the classifier which is indicative of the degree of certainty or error by the classifier. This confidence score may reflect the quality of the input. For example, when applied to parking signs, the confidence score may reflect the physical quality of the parking sign, which may be useful for physical sign maintenance purposes. The classifier may have been co-trained with the feature detection algorithms. The detemiined semantic classification may also be associated with the corresponding node, to provide semantic meaning to each node in the hierarchy. Further image processing may be applied to each determined information region or sub-region before classification, such as image rectification, cropping, rotation, warping, hue'saturation alterations, etc. The semantic classification may be computer-parsable.
The method 300 may also comprise the step 335 of iteratively determining one or more further sub-regions wholly contained within one or more previously determined sub-regions, and associating each further detemiined sub-region with a further sub-node of the hierarchy, wherein each further sub-node is a child to the corresponding previously detemiined parent sub-node. In this v/ay, a hierarchy comprising multiple levels of nodes may be provided. The number of iterations to determine sub-regions may be predetermined, or may be algorithmically detemiined. As before, semantic classifiers may be applied to each determined sub-region and associated with the corresponding node in the hierarchy. Each iteration may of the step 335 may apply a different feature detection algorithm. In this way, each layer of nodes in the hierarchy may correspond to different a class of detected feature. Each iteration of the step 335 may also apply a different semantic classifier.
(> Training of feature detection algorithms may be performed by specifying a structure of hierarchy to be output from the method 300, i.e. in a supervised manner. Training of the feature detection algorithms may further involve segmenting each image to explicitly indicate the position anchor sizes of each information region. Training the feature detection algorithms may comprise training from a collective dataset indicative of all the features to be determined, or each feature detection algorithm may be trained on a different dataset respectively.
One particular issue with current methods of parsing images and text is that when an image processing algorithm is applied incorrectly, for example due to poor quality training data, the resulting output is often uninterpretable. This makes it difficult to understand why an image processing algorithm made a particular output decision. In contrast, the invention as disclosed advantageously allows for clearer debugging of the image processing method, as each node of the predicted hierarchy (and associated semantic classification) may be compared directly with the respective node of the correct hierarchy, allowing for more granular validation of the image processing algorithm. Other advantages include the ability to deduce errors in the training stage by directly comparing predicted hierarchies with correct hierarchies.
Finally, the method 300 comprises the step 350 of outputting data indicative of the hierarchy. The data indicative of the hierarchy may comprise data relating to one or more of the size, shape, contents, and semantic classification associated with each node, as well as the specific parent-child relationships between each node. The data indicative of the hierarchy may be formatted in any suitable data structure operable to the processing means 220, such as a tree, linked list, relational table, or otherwise.
Once data indicative of a hierarchy is output, it may be stored in a data storage means such that the data may be retrieved for further processing in future. As noted, the data indicative of the liierarchy may comprise any suitable data structure operable to the processing means 220. The data structure may be arranged such that individual nodes and associated content may be edited, removed, or added to the hierarchy whilst preserving the remaining hierarchy structure.
For example, an individual node of the hierarchy may be updated upon request from the processing means 220 to a different semantic classification. In some instances, an explicit instruction rnay be received by the processing means 220 to update, remove, overwrite, overrule, or add a node to an existing hierarchy. For example, the processing means 220 may receive an electronic instruction to modify the parking times associated with a parking sign that has been parsed as a hierarchy. The electronic instruction may be received over a wired or wireless network from an external device, such as a computing server or computing device. In these instances, the semantic classification contained within the node corresponding to the parking time associated with that parking sign will be updated. In some instances, the method 300 may be applied to a second input, and may output data indicative of a hierarchy that is different to a previously determined hierarchy. For example, the method 300 may be re applied to an image of a parking sign that has already been parsed. The differences between semantic classifications contained in each of the nodes of the previously determined hierarchy and the newly determined hierarchy may be identified, and the previous hierarchy may be updated to include the new semantic classifications. In some embodiments, the second input may comprise an image, however other forms of input may be envisaged.
In some embodiments, multiple related images comprising identically located information regions having differing semantic information may be received by the method 300. For example, multiple images of varying states of a digital parking sign may be received, wherein each image contains information regions having different semantic information, reflecting the changing nature of the digital parking sign. For example, a digital parking sign may show“No parking” on a Monday between 9am and 5pm. However, outside of these times, the digital parking sign may show“1 hour parking - no return within 2 hours”. In example scenarios such as these, multiple parent nodes may be created for the hierarchy with respect to each set of semantic information.
Figure 4 shows an exemplary application 400 of the method 300 applied to the image of the parking sign 100 shown in Figure 1. Figure 5 similarly shows the hierarchy 500 determined from 400. The method 300 is particularly useful for processing parking signs, as they are naturally arranged as a hierarchy of rules, and therefore by explicitly analysing the grouping of components of the sign and retaining the associated hierarchy, the semantic understanding of the sign is not lost. However, it will be appreciated that other forms of input may be used, such as road markings, signalling equipment, etc. Examples of traffic-related inputs for which the present invention may be applied to include images of parking signs, warning signs, regulatory signs, speed limit signs, low bridge signs, level crossing signs and signals, train signs, signals and road markings, bus and cycle signs and road markings, pedestrian zone signs, traffic calming signs, motorway signs, signals, and road markings, directional signs, information signs, traffic signals, road work signs, and others.
Applying the step 320 to the parking sign 100 provides the first information region 410. The first information region 410 determined corresponds to the entire area of the sign itself, which therefore comprises the parent node 510 of the hierarchy 500. In this example, a feature detection algorithm trained to detect signs was used to determine the first information region 410. As can be seen, the information region 410 comprises multiple semantic objects of interest.
Further image processing is then applied to the first information region 410 to provide the image 420. In this example, warping is applied to the image corresponding to the determined region to ensure the
S determined sign is fronto-parallel. The warping is performed by regressing a two-dimensional offset applied to each comer of the determined image region corresponding to the sign.
One or more information sub-regions 422, 424, 426 wholly contained within the processed first information region 420 are determined, in accordance with step 330. In this example, feature detection algorithms trained to detect individual sections of a sign were used to determine the information sub-regions 422, 424, 426. As per step 330, each of the determined sub-regions is associated with a sub-node of the hierarchy, as shown by sub-nodes 520, 530 540 in Figure 5. Each of the sub-nodes 520, 530 and 540 is a child to the parent node 510, and the parent-child relationships between each node can clearly be seen.
One or more further information sub-regions wholly contained within each of the previously determined sub-regions is then determined, as per step 335. Further sub-regions 432, 434, 436, 438 for example have been determined from sub-region 422. In this example, feature detection algorithms trained to detect individual rules and times were used to determine the further sub-regions 432, 434, 436, 438. It should also be noted that the region 440 corresponding to the instructional message‘Display ticket’ has not been determined as a sub-region, due to the feature detection algorithm being selected to ignore this type of region. As per step 335, each of the determined further sub-regions is associated with a sub-node of the hierarchy, as shown for example by sub-nodes 520, 530 540 in Figure 5. Each of the sub-nodes 522, 524, 526, 528 is a child to the parent sub-node 520, and the parent-child relationships between each node can clearly be seen.
In Figure 5 it can be seen that a semantic classification of each of the further sub-regions 432, 434, 436 has been performed and the resulting classification has been associated with the corresponding sub-nodes 522, 524, 528, such that each of the sub-nodes has been categorised into‘Rule’,‘Applies weekdays’ ,‘Applies times’, and‘Return period’ categories. The semantic classification may therefore comprise a determined rule or time period relating to the state of the parking space, and the determined hierarchy is indicative of the times the rules are applicable. Figure 5 therefore shows a resulting hierarchy, wherein each node corresponds to a region or object of interest. As can be seen, the spatial/positional associations of each rule and time has advantageously been retained in the hierarchical structure, such that the semantic understanding of the parking sign 100 has not been lost.
Figure 6 shows an example of the semantic classifiers applied to some of the determined sub-regions, wherein each semantic classification is performed using a classification algorithm trained to return a relevant rale or time category. 610, 620, and 630 show an example application of‘Day’,‘Rule’, and‘Penult code’ semantic classifiers using Convolutional Neural Networks and OCR parsers. The resulting semantic classifications 640 are output in a computer-readable format. Although the present invention has been described in connection with some embodiments, it is not intended to be limited to the specific form set forth herein. Rather, the scope of the present invention is limited only by the accompanying claims. Additionally, although a feature may appear to be described in connection with particular embodiments, one skilled in the art would recognize that various features of the described embodiments may be combined in accordance with the invention hi the claims, the tenn‘comprising’ does not exclude the presence of other elements or steps.
Furthermore, the order of features in the claims does not imply any specific order in which the features must be performed and in particular the order of individual steps in a method claim does not imply that the steps must be performed in this order. Rather, the steps may be performed in any suitable order. In addition, singular references do not exclude a plurality. Thus, references to‘a’,‘an’,‘first’,‘second’, etc. do not preclude a plurality. In the claims, the tenn‘comprising’ or“including” does not exclude the presence of other elements.

Claims

CLAMS What is claimed is:
1. A method for parsing parking signs, comprising:
receiving image data representing an image;
processing the image data to determine a first information region in the image and associating the first information region with a parent node of a hierarchy;
processing the image data to determine one or more information sub-regions wholly contained within the first information region and associating each determined sub-region with a sub-node of the hierarchy, wherein each sub-node is a child to the parent node; and
outputting data indicative of the hierarchy.
2. The method of claim 2, comprising iteratively determining one or more further sub-regions wholly contained within one or more previously determined sub-regions and associating each further determined sub-region with a further sub-node of the hierarchy, wherein each further sub-node is a child to the corresponding previously determined parent sub-node.
3. The method of any preceding claim, wherein determining a first information region or sub-region of the image comprises using one or more feature detection algorithms.
4. The method of claim 3, wherein using the feature detection algorithm comprises using one or more of Histogram of Oriented Gradients (HOG), Scale-Invariant Feature Transform (SIFT), Speeded up Robust Feature (SURF), Hair-like features, or a neural network.
5. The method of any preceding claim, comprising determining a semantic classification of each information region or sub-region and associating the semantic classification with the corresponding node or sub-node.
6. The method of claim 5, wherein determining a semantic classification of each information region or sub-region comprises using a classification algorithm.
7. The method of claim 6, wherein using the classification algorithm comprises using one or more of a neural network, decision forest, or logistic regression algorithm.
8. The method of any preceding claim, comprising using a hsh-maximal suppression method to prevent overlap of determined sub-regions.
9. The method of any of claims 5 to 8, comprising co-training the classification algoritlim with the feature detection algoritlim.
10. The method of any of claims 5 to 9, comprising training one or more of the feature detection and classification algorithms using data indicative of a predicted hierarchy.
11. The method of any of claims 5 to 10, comprising training one or more of the feature detection and classification algorithms using data indicative of a position of one or more information regions.
12. The method of any preceding claim, wherein determined information regions are of different sizes.
13. The method of any preceding claim, comprising applying one or more image processing operations to the image data.
14. The method of claim 13, wherein the image processing operation comprises one or more of image rectification, cropping, rotation, warping, hue alterations, saturation alterations, contrast alterations, de-noising, sharpening, and blurring.
15. The method of claim 13 or 14, wherein the one or more image processing operations are based on one or more algorithmically determined parameter values of the image data.
16. The method of claim 15 or 16, wherein the determined parameter values comprise one or more of an image skewness, distortion, hue, saturation, contrast, brightness, contrast, noise level, sharpness, and blur level.
17. The method of any preceding claim, wherein the data indicative of the hierarchy comprises a tree, linked list, or relational table data structure.
18 The method of any of claims 5-17, wherein the semantic classification is compared against a dataset of existing semantic classifications to determine a valid classification.
19. The method of claim 5, further comprising determining a confidence score indicative of the degree of certainty or error of the semantic classification,
20. A system for parsing parking signs, comprising:
input means arranged to receive image data representing an image;
processing means arranged to: determine a first information region in the image and associate the first information region with a parent node of a hierarchy,
determine one or more information sub-regions wholly contained within the first information region and associating each determined sub-region with a sub-node of the hierarchy, wherein each sub-node is a child to the parent node; and
output means arranged to output data indicative of the hierarchy.
PCT/EP2018/083023 2017-11-29 2018-11-29 Hierarchical image interpretation system WO2019106095A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GBGB1719862.3A GB201719862D0 (en) 2017-11-29 2017-11-29 Hierarchical image interpretation system
GB1719862.3 2017-11-29

Publications (1)

Publication Number Publication Date
WO2019106095A1 true WO2019106095A1 (en) 2019-06-06

Family

ID=60950761

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2018/083023 WO2019106095A1 (en) 2017-11-29 2018-11-29 Hierarchical image interpretation system

Country Status (2)

Country Link
GB (2) GB201719862D0 (en)
WO (1) WO2019106095A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021046153A1 (en) * 2019-09-04 2021-03-11 Material Technologies Corporation Object feature visualization apparatus and methods
FR3113432A1 (en) 2020-08-12 2022-02-18 Thibault Autheman AUTOMATIC IMAGE CLASSIFICATION PROCESS
CN115269107A (en) * 2022-09-30 2022-11-01 北京弘玑信息技术有限公司 Method, medium and electronic device for processing interface image
US11503256B2 (en) 2019-09-04 2022-11-15 Material Technologies Corporation Object feature visualization apparatus and methods

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110052063A1 (en) * 2009-08-25 2011-03-03 Xerox Corporation Consistent hierarchical labeling of image and image regions

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5680479A (en) * 1992-04-24 1997-10-21 Canon Kabushiki Kaisha Method and apparatus for character recognition
US5841900A (en) * 1996-01-11 1998-11-24 Xerox Corporation Method for graph-based table recognition
US8374390B2 (en) * 2009-06-24 2013-02-12 Navteq B.V. Generating a graphic model of a geographic object and systems thereof
DE102010020330A1 (en) * 2010-05-14 2011-11-17 Conti Temic Microelectronic Gmbh Method for detecting traffic signs
US20140132767A1 (en) * 2010-07-31 2014-05-15 Eric Sonnabend Parking Information Collection System and Method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110052063A1 (en) * 2009-08-25 2011-03-03 Xerox Corporation Consistent hierarchical labeling of image and image regions

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
IASONAS KOKKINOS ET AL: "HOP: Hierarchical object parsing", IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION. PROCEEDINGS, 1 June 2009 (2009-06-01), US, pages 802 - 809, XP055557199, ISSN: 1063-6919, DOI: 10.1109/CVPR.2009.5206639 *
MÍRIAM BELLVER BUENO ET AL: "Hierarchical Object Detection with Deep Reinforcement Learning", 25 November 2016 (2016-11-25), XP055557203, Retrieved from the Internet <URL:https://arxiv.org/pdf/1611.03718.pdf> [retrieved on 20190214] *
XU YONGCHAO ET AL: "Hierarchical Segmentation Using Tree-Based Shape Spaces", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, IEEE COMPUTER SOCIETY, USA, vol. 39, no. 3, 1 March 2017 (2017-03-01), pages 457 - 469, XP011640257, ISSN: 0162-8828, [retrieved on 20170203], DOI: 10.1109/TPAMI.2016.2554550 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021046153A1 (en) * 2019-09-04 2021-03-11 Material Technologies Corporation Object feature visualization apparatus and methods
US11503256B2 (en) 2019-09-04 2022-11-15 Material Technologies Corporation Object feature visualization apparatus and methods
US11622096B2 (en) 2019-09-04 2023-04-04 Material Technologies Corporation Object feature visualization apparatus and methods
US11683459B2 (en) 2019-09-04 2023-06-20 Material Technologies Corporation Object feature visualization apparatus and methods
US11681751B2 (en) 2019-09-04 2023-06-20 Material Technologies Corporation Object feature visualization apparatus and methods
US12020353B2 (en) 2019-09-04 2024-06-25 Material Technologies Corporation Object feature visualization apparatus and methods
FR3113432A1 (en) 2020-08-12 2022-02-18 Thibault Autheman AUTOMATIC IMAGE CLASSIFICATION PROCESS
CN115269107A (en) * 2022-09-30 2022-11-01 北京弘玑信息技术有限公司 Method, medium and electronic device for processing interface image

Also Published As

Publication number Publication date
GB2570762A (en) 2019-08-07
GB201719862D0 (en) 2018-01-10
GB201819450D0 (en) 2019-01-16

Similar Documents

Publication Publication Date Title
WO2019106095A1 (en) Hierarchical image interpretation system
US10572725B1 (en) Form image field extraction
US10755149B2 (en) Zero shot machine vision system via joint sparse representations
US10423827B1 (en) Image text recognition
Siriborvornratanakul An automatic road distress visual inspection system using an onboard in‐car camera
US20190026550A1 (en) Semantic page segmentation of vector graphics documents
EP3806064A1 (en) Method and apparatus for detecting parking space usage condition, electronic device, and storage medium
CN112200081A (en) Abnormal behavior identification method and device, electronic equipment and storage medium
CN111797886B (en) Generating training data for OCR for neural networks by parsing PDL files
WO2019055114A1 (en) Attribute aware zero shot machine vision system via joint sparse representations
US10699751B1 (en) Method, system and device for fitting target object in video frame
CN115034200A (en) Drawing information extraction method and device, electronic equipment and storage medium
US20220358747A1 (en) Method and Generator for Generating Disturbed Input Data for a Neural Network
US20220266854A1 (en) Method for Operating a Driver Assistance System of a Vehicle and Driver Assistance System for a Vehicle
CN112699711B (en) Lane line detection method and device, storage medium and electronic equipment
US20230386221A1 (en) Method for detecting road conditions and electronic device
CN110909674A (en) Traffic sign identification method, device, equipment and storage medium
CN116704542A (en) Layer classification method, device, equipment and storage medium
CN111783812A (en) Method and device for identifying forbidden images and computer readable storage medium
KR102026280B1 (en) Method and system for scene text detection using deep learning
CN117392577A (en) Behavior recognition method for judicial video scene, storage medium and electronic device
CN112232335A (en) Determination of distribution and/or sorting information for the automated distribution and/or sorting of mailpieces
CN111340139A (en) Method and device for judging complexity of image content
EP3709666A1 (en) Method for fitting target object in video frame, system, and device
CN116246161A (en) Method and device for identifying target fine type of remote sensing image under guidance of domain knowledge

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18814824

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18814824

Country of ref document: EP

Kind code of ref document: A1