WO2021243205A1 - Procédé et système pour la reconnaissance géo-sémantique d'un objet structural - Google Patents

Procédé et système pour la reconnaissance géo-sémantique d'un objet structural Download PDF

Info

Publication number
WO2021243205A1
WO2021243205A1 PCT/US2021/034854 US2021034854W WO2021243205A1 WO 2021243205 A1 WO2021243205 A1 WO 2021243205A1 US 2021034854 W US2021034854 W US 2021034854W WO 2021243205 A1 WO2021243205 A1 WO 2021243205A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
semantic information
semantic
database
object features
Prior art date
Application number
PCT/US2021/034854
Other languages
English (en)
Inventor
Taegyu Lim
Byungsoo Kim
Original Assignee
Motion2Ai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Motion2Ai filed Critical Motion2Ai
Publication of WO2021243205A1 publication Critical patent/WO2021243205A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/457Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by analysing connectivity, e.g. edge linking, connected component analysis or slices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This disclosure relates generally to image recognition technology for acquiring geometric information and semantic information of structural objects shown image acquired by sensors.
  • Image processing has been used in attempts to identify various objects.
  • processing is techniques for extracting lines or edges as features of an object.
  • techniques provide inaccurate results or take an amount of time that is impractical for many applications.
  • problems due to fragmentation of the extracted features. Accordingly, there is a need to improve the ability to perform object identification.
  • This disclosure provides an apparatus for geo-semantic recognition of an structural object, comprising: a sensor configured to acquire at least one image; a pre processing processor configured to process the acquired image to conform to a predetermined format; a visual feature extraction processor configured to extract object features from the processed image; a geo-semantic line detection processor configured to identify geometric information corresponding to the extracted object features; and retrieve semantic information corresponding to the extracted object features; and a region detection processor configured to recognize a region shown in the image based on the geometric information and the semantic information.
  • This disclosure provides a method for geo-semantic recognition of an structural object, comprising: acquiring at least one image by at least one sensor; processing the acquired image to conform to a predetermined format; extracting object features from the processed image; identifying geometric information corresponding to the extracted object features; retrieving semantic information corresponding to the extracted object features; and recognizing a region shown in the image based on the geometric information and the semantic information.
  • FIGS. 1A and IB illustrate results of image processing for structural object recognitions in the conventional art.
  • FIGS. 2A and 2B illustrate results of image processing for structural object recognitions, in accordance with some embodiments of the present disclosure.
  • FIG. 3 illustrates a simplified block diagram of an exemplary device for implementing image processing and identifying structural objects, in accordance with some embodiments.
  • FIG. 4 illustrates a simplified block diagram of an exemplary process for implementing image processing and identifying structural objects, in accordance with some embodiments.
  • FIG. 5 illustrates a utilization of the result of image processing for structural object recognitions, in accordance with some embodiments of the present disclosure.
  • FIGS. 6A and 6B illustrate results of image processing for structural object recognitions in an outdoor setting, in accordance with some embodiments of the present disclosure.
  • FIGS. 7A and 7B illustrate results of image processing for structural object recognitions in an indoor setting, in accordance with some embodiments of the present disclosure.
  • systems, apparatuses and methods are provided herein that provide image recognition technology for acquiring geometric information and semantic information of structural objects shown image acquired by sensors.
  • structural objects can be recognized as being consisted of points, lines connecting two or more points, and regions surrounded by lines. Then, the geometric information and the semantic information of the lines and regions can be identified and/or retrieved.
  • a term “geo- semantic information” used in the present application refers to a combination of the geometric information and the semantic information. Such geo-semantic information can be used as specialized information to accurately recognize the structural objects.
  • some embodiments of the present disclosure provide an apparatus for geo-semantic recognition of an structural object, the apparatus comprising: a sensor that acquires at least one image; a pre-processing processor that processes the acquired image to conform to a predetermined format; a visual feature extraction processor extracts object features from the processed image; a geo-semantic line detection processor identifies geometric information corresponding to the extracted object features and retrieves semantic information corresponding to the extracted object features; and a region detection processor that recognizes a region shown in the image based on the geometric information and the semantic information.
  • This apparatus may be described as an image processing device in the present disclosure.
  • Some embodiments of the present disclosure alternatively or additionally provide an apparatus further comprising a database that stores machine learning data including the semantic information and having at least one label corresponding to one or more of the extracted object features.
  • the geo- semantic line detection processor retrieves the semantic information from the database.
  • Some embodiments of the present disclosure alternatively or additionally provide the visual feature extraction processor that detects lines of the structural object shown in the image, as the object features.
  • Some embodiments of the present disclosure alternatively or additionally provide the geo- semantic line detection processor that generates line indexes of the detected lines.
  • a respective line index of the line indexes has a format of (xl, yl, x2, y2, class), where “xl,” “yl,” “x2,” “y2” are coordinate values of a respective line of the detected lines, and “class” is the semantic information of the respective line.
  • Some embodiments of the present disclosure alternatively or additionally provide the geo-semantic line detection processor that inputs images adjacent to the detected lines and also inputs coordinate values of the detected lines, to the database, and the database that retrieves the semantic information of the detected lines, and outputs the retrieved semantic information to the geo- semantic line detection processor.
  • Some embodiments of the present disclosure alternatively or additionally provide the geo-semantic line detection processor that identifies one of a horizontal bottom line, a horizontal middle line, a horizontal top line, and a vertical line of the structural object, as the semantic information of the respective line.
  • Some embodiments of the present disclosure alternatively or additionally provide the region detection processor that identifies the region based on the detected lines.
  • Some embodiments of the present disclosure alternatively or additionally provide the geo-semantic line detection processor that inputs images adjacent to the identified region and also inputs coordinate values of the identified region, to the database, and the database that retrieves the semantic information of the identified region, and outputs the retrieved semantic information to the geo-semantic line detection processor.
  • Some embodiments of the present disclosure alternatively or additionally provide the geo-semantic line detection processor that identifies one of a rack surface and a wall surface as the semantic information of the region.
  • Some embodiments of the present disclosure alternatively or additionally provide a display that displays the identified lines with different colors.
  • Some embodiments of the present disclosure alternatively or additionally provide the sensor that acquires at least one of a still image and a video image.
  • Some embodiments of the present disclosure alternatively or additionally provide the sensor that acquires at least one of a gray-scale image and a color image.
  • Some embodiments of the present disclosure alternatively or additionally provide the pre-processing processor that processes the acquired image to conform to a predetermined image size.
  • Some embodiments of the present disclosure alternatively or additionally provide a machine learning training processor that trains object feature extraction based on the extracted object features.
  • Some embodiments of the present disclosure provide a method comprising: acquiring at least one image by at least one sensor; processing the acquired image to conform to a predetermined format; extracting object features from the processed image; identifying geometric information corresponding to the extracted object features; retrieving semantic information corresponding to the extracted object features; and recognizing a region shown in the image based on the geometric information and the semantic information.
  • This method may be described as operations of an image processing device in the present disclosure.
  • Some embodiments of the present disclosure alternatively or additionally provide retrieving the semantic information from a database that stores machine learning data has at least one label corresponding to one or more of the extracted object features.
  • a respective line index of the line indexes has a format of (xl, yl, x2, y2, class), where “xl,” “yl,” “x2,” “y2” are coordinate values of a respective line of the detected lines, and “class” is the semantic information of the respective line.
  • Some embodiments of the present disclosure alternatively or additionally provide: inputting images adjacent to the detected lines and also inputting coordinate values of the detected lines, to a database that stores machine learning data including the semantic information and having at least one label corresponding to one or more of the extracted object features; and retrieving the semantic information of the detected lines from the database.
  • Some embodiments of the present disclosure alternatively or additionally provide identifying one of a horizontal bottom line, a horizontal middle line, a horizontal top line, and a vertical line of the structural object, as the semantic information of the respective line.
  • Some embodiments of the present disclosure alternatively or additionally provide identifying the region based on the detected lines. [0039] Some embodiments of the present disclosure alternatively or additionally provide: inputting images adjacent to the identified region and also inputting coordinate values of the identified region, to a database that stores machine learning data including the semantic information and having at least one label corresponding to one or more of the extracted object features; and retrieving the semantic information of the identified region from the database.
  • Some embodiments of the present disclosure alternatively or additionally provide identifying one of a rack surface and a wall surface as the semantic information of the region.
  • Some embodiments of the present disclosure alternatively or additionally provide displaying the identified lines with different colors.
  • Some embodiments of the present disclosure alternatively or additionally provide acquiring at least one of a still image and a video image.
  • Some embodiments of the present disclosure alternatively or additionally provide acquiring at least one of a gray-scale image and a color image.
  • Some embodiments of the present disclosure alternatively or additionally provide processing the acquired image to conform to a predetermined image size.
  • Some embodiments of the present disclosure alternatively or additionally provide performing a machine learning training for an object feature extraction by using the extracted object features.
  • FIGS. 1A and IB illustrate results of image processing for structural object recognitions in the conventional art.
  • the present disclosure finds the correspondences between the detected line and the geometric structure of the sites by assuming that the detected line matches to the one of the physical line from the warehouse infrastructure, and based on such finding and assumption, the present disclosure provides the machine-learned contextual image processing technique.
  • the machine-learned contextual image processing technique of the present application may provide accurate and efficient image processing procedure.
  • FIGS. 2A and 2B illustrate results of image processing for structural object recognitions, in accordance with some embodiments of the present disclosure.
  • the structural objects shown in the image can be recognized by using the geometric information and the semantic information of the lines and region of the structural objects.
  • Geometric information includes coordinate values of the extracted object features. For example, when the extracted object feature is a line. Coordinate values of the line can be identified as the geometric information.
  • the image processing device detects lines of the structural object shown in the image, as the object features, and then, identifies one or more of a horizontal bottom line, a horizontal middle line, a horizontal top line, and a vertical line of the structural object, from the detected lines.
  • a horizontal bottom line, a horizontal middle line, a horizontal top line, and a vertical line of the structural object are identified.
  • a horizontal bottom line, a horizontal top line, and a vertical line of the structural object are identified.
  • Such exemplary names of the lines may be considered as the semantic information of the lines of the structural objects.
  • the image processing device may display the identified lines with different colors, as also shown in FIGS. 2A and 2B.
  • the image processing device may identify the region based on the identified lines. For example, as shown FIG. 2A, the region may be identified as “Rack,” or as shown in FIG. 2B, the region may be identified as “Wall”. Such exemplary names of the regions may be considered as the semantic information of the regions of the structural objects.
  • the present disclosure may provide more effective and accurate image recognition technology.
  • FIG. 3 illustrates a simplified block diagram of an exemplary device for implementing image processing and identifying structural objects, in accordance with some embodiments.
  • the image processing device may be implemented by one or more sensors, a sensor input processor 100, a pre-processing processor 200, a visual feature extraction processor 300, a geo-semantic line detection processor 400, a region detection processor 500, a display 600, a machine learning training processor 700, and a database 800.
  • the sensor input processor 100, the pre processing processor 200, the visual feature extraction processor 300, the geo-semantic line detection processor 400, the region detection processor 500, the display 600, the machine learning training processor 700, and the database 800 may be configured in one unified hardware processor. According to some other embodiments, the sensor input processor 100, the pre-processing processor 200, the visual feature extraction processor 300, the geo-semantic line detection processor 400, the region detection processor 500, the display 600, the machine learning training processor 700, and the database 800 may be configured in two or more units.
  • one or more images are acquired by one or more sensors.
  • the one or more sensors may include one or more camera modules.
  • Such camera modules may be equipped in a mobile device, a vehicle- equipped device, and/or a fixed device.
  • the image may be a color image or a gray image, and may also be a still image or video image.
  • the sensor input processor 100 communicates with the sensors and acquires images captured by the sensors.
  • the pre-processing processor 200 may process the acquired image to conform to a predetermined format. For example, the pre-processing processor 200 may convert the acquired image to an image having a predetermined image size, resolution or scale. Further, normalization of the images may be performed by the pre-processing processor 200.
  • the visual feature extraction processor 300 extract object features from the processed image.
  • the visual feature extraction processor 300 may utili e algorithms to detect and isolate various desired portions or shapes (features, e.g., dots, lines, surfaces) of objects in a digitized image or video stream.
  • the visual feature extraction processor 300 may utilize numerical programming environments such as MATLAB, to perform the visual feature extraction.
  • neural networks are used by the visual feature extraction processor 300 to perform the visual feature extraction.
  • the machine learning training processor 700 perform machine learning based on the results of the feature extraction.
  • the image feature vector can be used as an input for machine learning algorithms through training and test models and it can improve the performance.
  • the geo-semantic line detection processor 400 identifies geometric information corresponding to the extracted object features and retrieves semantic information corresponding to the extracted object features.
  • Geometric information includes coordinate values of the extracted object features. For example, when the extracted object feature is a line. Coordinate values of the line can be identified as the geometric information.
  • Semantic information includes corners, edges, blobs, and ridges.
  • the geo- semantic line detection processor 400 may use algorithms of corner detection, curve fitting, edge detection, global structure extraction, feature histograms, line detection, connected- component labeling et al.
  • Semantic information refers to an appearance-based inference using color, texture and/or shape information along with a type of context inference (or representation) that can combine and transform these machine-extracted evidence into what could be called as a scene description.
  • indexes of the detected lines are generated.
  • a respective line index of the line indexes may have a format of (xl, yl, x2, y2, class), where “xl,” “yl,” “x2,” “y2” are coordinate values of a respective line of the detected lines, and “class” is the semantic information of the respective line.
  • neural networks and machine learning algorithms are used by the geo-semantic line detection processor 400 to identify geometric information corresponding to the extracted object features, and to retrieve semantic information corresponding to the extracted object features.
  • the geo-semantic region detection processor 500 recognizes a region shown in the image based on the geometric information and the semantic information.
  • the geo-semantic region detection processor 500 may perform similar operations as mentioned above to the geo-semantic line detection processor 400 to acquire geometric and semantic information of the recognized region.
  • the display 600 displays the geometric information and the semantic information as an overlay image on the image captured by the sensors.
  • the database 800 stores machine learning data including the semantic information and having at least one label corresponding to one or more of the extracted object features. For example, when the extracted object feature is a line, the database 800 stores the semantic information corresponding to coordinate values of the line.
  • the geo-semantic line detection processor 400 utilizes a line detection algorithm to extract the geometric information of lines.
  • the line detection algorithm takes a collection of n edge points and finds all the lines on which these edge points lie.
  • the line detection algorithm includes the Hough transform and convolution- based techniques.
  • the geo-semantic line detection processor 400 utilizes a Neural net algorithm, which is based on end-to-end wireframe parsing, to extract the geometric information of lines. That is, the geo-semantic line detection processor 400 is end-to-end trainable and can directly output a vectorized wireframe that contains semantically meaningful and geometrically salient junctions and lines ( see e.g., End-to- End Wireframe Parsing , Yichao Zhou, Haozhi Qi, Yi Ma; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 962-971).
  • a Neural net algorithm which is based on end-to-end wireframe parsing, to extract the geometric information of lines. That is, the geo-semantic line detection processor 400 is end-to-end trainable and can directly output a vectorized wireframe that contains semantically meaningful and geometrically salient junctions and lines ( see e.g., End-to- End Wireframe Parsing , Yicha
  • the geo-semantic line detection processor 400 retrieves semantic information corresponding to the extracted object features from the database 800. In some embodiments, the geo-semantic line detection processor 400 retrieves semantic information corresponding to coordinate values of the extracted object features. In some embodiments, the geo-semantic line detection processor 400 retrieves semantic information, which is the most closely corresponding to coordinate values of the extracted object features. A matching rate between the semantic information and the geometric information of the extracted object feature can be calculated, and it can be used for retrieving the semantic information corresponding to the extracted object feature.
  • the geo-semantic line detection processor 400 inputs images adjacent to the detected lines and also inputs coordinate values of the detected lines, to the database 800.
  • the database 800 retrieves the semantic information of the detected lines based on input data, and outputs the retrieved semantic information to the geo- semantic line detection processor.
  • the database 800 may utilize an image comparison algorithm to retrieve the semantic information corresponding to the images adjacent to the detected lines, and the database 800 may additionally or alternatively utilize a coordinate value comparison algorithm to retrieve the semantic information corresponding to the coordinate values of the detected lines.
  • the database 800 stores labels, for example, a horizontal bottom line, a horizontal middle line, a horizontal top line, and a vertical line, and also stores example image portions corresponding to each label. Those labels and example image portions are initially inputted by a manufacture or a user of the database, and then, further developed and accumulated by a machine training scheme.
  • the geo-semantic line detection processor 400 inputs images adjacent to the identified region and also inputs coordinate values of the identified region, to the database 800.
  • the database 800 retrieves the semantic information of the identified region based on input data, and outputs the retrieved semantic information to the geo-semantic line detection processor.
  • the database 800 may utilize an image comparison algorithm to retrieve the semantic information corresponding to the images adjacent to the identified region, and the database 800 may additionally or alternatively utilize a coordinate value comparison algorithm to retrieve the semantic information corresponding to the coordinate values of the identified region.
  • the database 800 stores labels, for example, a wall surface and a rack surface, and also stores example image portions corresponding to each label. Those labels and example images portions are initially inputted by a manufacture or a user of the database, and then, further developed and accumulated by a machine training scheme.
  • FIG. 4 illustrates a simplified block diagram of an exemplary process for implementing image processing and identifying structural objects, in accordance with some embodiments.
  • step S100 of acquiring at least one image by at least one sensor is performed.
  • one or more images are acquired by one or more sensors.
  • the one or more sensors may include one or more camera modules. Such camera modules may be equipped in a mobile device, a vehicle-equipped device, and/or a fixed device.
  • the image may be a color image or a gray image, and may also be a still image or video image.
  • step S200 of processing the acquired image to conform to a predetermined format is performed.
  • step S200 converting the acquired image to an image having a predetermined image size, resolution or scale may also be performed. Performing normalization of the images may be also performed.
  • step S300 of extracting object features from the processed image is performed.
  • step S200 detecting lines of the structural object shown in the image, as the object features may also be performed.
  • step S400 of identifying geometric information corresponding to the extracted object features is performed.
  • step S400 identifying one or more of a horizontal bottom line, a horizontal middle line, a horizontal top line, and a vertical line of the structural object, from the detected lines.
  • Such exemplary names of the lines may be considered as the semantic information of the lines of the structural objects.
  • step S500 of retrieving semantic information corresponding to the extracted object features is performed.
  • step S500 retrieving the semantic information from a database that stores machine learning data has at least one label corresponding to one or more of the extracted object features may also be performed.
  • a step of generating line indexes of the detected lines can be performed.
  • a respective line index of the line indexes may have a format of (xl, yl, x2, y2, class), where “xl,” “yl,” “x2,” “y2” are coordinate values of a respective line of the detected lines, and “class” is the semantic information of the respective line.
  • step S600 of recognizing a region shown in the image based on the geometric information and the semantic information is performed.
  • S600 identifying the region based on the identified lines may also be performed.
  • the method may further include a step of displaying the geometric information and the semantic information as an overlay image on the image captured by the sensors.
  • the identified lines may be displayed with different colors.
  • FIG. 5 illustrates a utilization of the result of image processing for structural object recognitions, in accordance with some embodiments of the present disclosure.
  • the result of image processing for structural object recognitions is utilized in providing two-dimensional (2D) or navigation three-dimensional (3D) navigation, as shown in FIG. 5.
  • the semantic information or the geometric information of the lines and the regions may be displayed on a screen of a navigation system.
  • the lines of the structural object may be displayed with different colors, depending on the semantic information or the geometric information of the lines. With this feature, a user may more accurately recognize the space where the user would move.
  • FIGS. 6A and 6B illustrate results of image processing for structural object recognitions in an outdoor setting, in accordance with some embodiments of the present disclosure
  • FIGS. 7A and 7B illustrate results of image processing for structural object recognitions in an indoor setting, in accordance with some embodiments of the present disclosure.
  • the devices and the method described in the present disclosure may be used to perform a recognition of the structural objects (e.g., buildings and vehicles) in the outdoor setting.
  • the devices and the method described in the present disclosure may be used to perform a recognition of the structural objects (e.g., walls and corridors) in the indoor setting.
  • Embodiments described in the present disclosure can be implemented in digital electronic circuitry, in tangibly-embodied computer software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them.
  • Embodiments described in the present disclosure can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non-transitory storage medium for execution by, or to control the operation of, data processing apparatus.
  • the computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
  • the program instructions can be encoded on an artificially- generated propagated signal, e.g., a machine- generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus.
  • an artificially-generated propagated signal e.g., a machine- generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus.
  • processor or “processing unit” refers to data processing hardware and encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers.
  • the apparatus can also be, or further include, special purpose logic circuitry, e.g., an FPGA (field-programmable gate array) or an ASIC (application- specific integrated circuit).
  • the apparatus can optionally include, in addition to hardware, code that creates an execution environment for computer programs, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
  • a computer program which may also be referred to or described as a program, software, a software application, a module, a software module, a script, or code, can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a digital computing environment.
  • a computer program may, but need not, correspond to a file in a file system.
  • a program can be stored in a portion of a file that holds other programs or data, e.g., one or more scripts stored in a markup language document, in a single file dedicated to the program in question, or in multiple coordinated files, e.g., files that store one or more modules, sub-programs, or portions of code.
  • a computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a data communication network.
  • the processes and logic flows described in this specification can be performed by one or more programmable digital computers, operating with one or more quantum processors, as appropriate, executing one or more computer programs to perform functions by operating on input data and generating output.
  • the processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA or an ASIC, or by a combination of special purpose logic circuitry and one or more programmed computers.
  • special purpose logic circuitry e.g., an FPGA or an ASIC
  • special purpose logic circuitry e.g., an FPGA or an ASIC
  • For a system of one or more digital computers to be “configured to” perform particular operations or actions means that the system has installed on its software, firmware, hardware, or a combination of them that in operation cause the system to perform the operations or actions.
  • For one or more computer programs to be configured to perform particular operations or actions means that the one or more programs include instructions that, when executed by digital data processing apparatus, cause the apparatus to perform the operations
  • Digital computers suitable for the execution of a computer program can be based on general or special purpose microprocessors or both, or any other kind of central processing unit.
  • a central processing unit will receive instructions and data from a read-only memory or a random access memory or both.
  • the essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data.
  • the central processing unit and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
  • a digital computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks.
  • Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD-ROM and DVD-ROM disks.
  • semiconductor memory devices e.g., EPROM, EEPROM, and flash memory devices
  • magnetic disks e.g., internal hard disks or removable disks
  • magneto optical disks e.g., CD-ROM and DVD-ROM disks.
  • Control of the various systems described in this specification, or portions of them, can be implemented in a computer program product that includes instructions that are stored on one or more non-transitory machine-readable storage media, and that are executable on one or more digital processing devices.
  • the systems described in this specification, or portions of them, can each be implemented as an apparatus, method, or electronic system that may include one or more digital processing devices and memory to store executable instructions to perform the operations described in this specification.
  • Other examples of the present disclosure may include any number and combination of machine-learning models having any number and combination of characteristics.
  • the machine-learning model(s) can be trained in a supervised, semi- supervised, or unsupervised manner, or any combination of these.
  • the machine learning model(s) can be implemented using a single computing device or multiple computing devices.
  • Implementing some examples of the present disclosure at least in part by using machine-learning models can reduce the total number of processing iterations, time, memory, electrical power, or any combination of these consumed by a computing device when analyzing data.
  • a neural network may more readily identify patterns in data than other approaches. This may enable the neural network to analyze the data using fewer processing cycles and less memory than other approaches, while obtaining a similar or greater level of accuracy.
  • Some machine-learning approaches may be more efficiently and speedily executed and processed with machine-learning specific processors (e.g., not a generic CPU). Such processors may also provide an energy savings when compared to generic CPUs.
  • some of these processors can include a graphical processing unit (GPU), an application- specific integrated circuit (ASIC), a field-programmable gate array (FPGA), an artificial intelligence (AI) accelerator, a neural computing core, a neural computing engine, a neural processing unit, a purpose-built chip architecture for deep learning, and/or some other machine-learning specific processor that implements a machine learning approach or one or more neural networks using semiconductor (e.g., silicon (Si), gallium arsenide (GaAs)) devices.
  • semiconductor e.g., silicon (Si), gallium arsenide (GaAs)
  • processors may also be employed in heterogeneous computing architectures with a number of and a variety of different types of cores, engines, nodes, and/or layers to achieve various energy efficiencies, processing speed improvements, data communication speed improvements, and/or data efficiency targets and improvements throughout various parts of the system when compared to a homogeneous computing architecture that employs CPUs for general purpose computing.
  • the image processing device and method of the present disclosure is a practical application of image processing technique into an object recognition purpose.
  • a user of the image processing device and method of the present disclosure may practically use acquired geometric information and semantic information for a variety of purposes.
  • the acquired geometric information and semantic information of the object in the image may be displayed on a navigation screen of a user device (e.g., a mobile device or a vehicle navigation system) so that the user may drive a vehicle or move toward a target place (e.g., a rack or a particular wall).
  • a user device e.g., a mobile device or a vehicle navigation system
  • a working vehicle may be controlled by a control system to avoid a wall or a rack in a warehouse, and to find the best route to a loading place or an unloading place.
  • a control system to avoid a wall or a rack in a warehouse, and to find the best route to a loading place or an unloading place.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

La présente invention concerne un appareil et des procédés de reconnaissance d'image pour acquérir des informations géométriques et des informations sémantiques d'objets structuraux montrés dans une image acquise par des capteurs. L'appareil comprend : un capteur configuré pour acquérir au moins une image ; un processeur de prétraitement configuré pour traiter l'image acquise pour se conformer à un format prédéterminé ; un processeur d'extraction de caractéristiques visuelles configuré pour extraire des caractéristiques d'objet à partir de l'image traitée ; un processeur de détection de ligne géo-sémantique configuré pour identifier des informations géométriques correspondant aux caractéristiques d'objet extraites ; et récupérer des informations sémantiques correspondant aux caractéristiques d'objet extraites ; et un processeur de détection de région configuré pour reconnaître une région montrée dans l'image sur la base des informations géométriques et des informations sémantiques.
PCT/US2021/034854 2020-05-29 2021-05-28 Procédé et système pour la reconnaissance géo-sémantique d'un objet structural WO2021243205A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202063032514P 2020-05-29 2020-05-29
US63/032,514 2020-05-29

Publications (1)

Publication Number Publication Date
WO2021243205A1 true WO2021243205A1 (fr) 2021-12-02

Family

ID=78722850

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2021/034854 WO2021243205A1 (fr) 2020-05-29 2021-05-28 Procédé et système pour la reconnaissance géo-sémantique d'un objet structural

Country Status (1)

Country Link
WO (1) WO2021243205A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005215988A (ja) * 2004-01-29 2005-08-11 Canon Inc パターン認識用学習方法、パターン認識用学習装置、画像入力装置、コンピュータプログラム、及びコンピュータ読み取り可能な記録媒体
US20160267331A1 (en) * 2015-03-12 2016-09-15 Toyota Jidosha Kabushiki Kaisha Detecting roadway objects in real-time images
US20190035101A1 (en) * 2017-07-27 2019-01-31 Here Global B.V. Method, apparatus, and system for real-time object detection using a cursor recurrent neural network
US20190178436A1 (en) * 2016-08-03 2019-06-13 Sz Dji Osmo Technology Co., Ltd. Method and system for controlling gimbal
US20190377966A1 (en) * 2016-02-15 2019-12-12 Pictometry International Corp. Automated system and methodology for feature extraction

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005215988A (ja) * 2004-01-29 2005-08-11 Canon Inc パターン認識用学習方法、パターン認識用学習装置、画像入力装置、コンピュータプログラム、及びコンピュータ読み取り可能な記録媒体
US20160267331A1 (en) * 2015-03-12 2016-09-15 Toyota Jidosha Kabushiki Kaisha Detecting roadway objects in real-time images
US20190377966A1 (en) * 2016-02-15 2019-12-12 Pictometry International Corp. Automated system and methodology for feature extraction
US20190178436A1 (en) * 2016-08-03 2019-06-13 Sz Dji Osmo Technology Co., Ltd. Method and system for controlling gimbal
US20190035101A1 (en) * 2017-07-27 2019-01-31 Here Global B.V. Method, apparatus, and system for real-time object detection using a cursor recurrent neural network

Similar Documents

Publication Publication Date Title
US10691952B2 (en) Adapting to appearance variations when tracking a target object in video sequence
US10902615B2 (en) Hybrid and self-aware long-term object tracking
Ahmad et al. Design & implementation of real time autonomous car by using image processing & IoT
US10817752B2 (en) Virtually boosted training
US10964033B2 (en) Decoupled motion models for object tracking
US20230154170A1 (en) Method and apparatus with multi-modal feature fusion
Raghavan et al. Optimized building extraction from high-resolution satellite imagery using deep learning
US9691132B2 (en) Method and apparatus for inferring facial composite
Satzoda et al. Vision-based lane analysis: Exploration of issues and approaches for embedded realization
Parmar et al. Deeprange: deep‐learning‐based object detection and ranging in autonomous driving
US20230154157A1 (en) Saliency-based input resampling for efficient object detection
KR20220095169A (ko) 3차원 객체 감지를 위한 장치의 동작 방법 및 그 장치
Khellal et al. Pedestrian classification and detection in far infrared images
Shao et al. Semantic segmentation for free space and lane based on grid-based interest point detection
Nakamura et al. An effective combination of loss gradients for multi-task learning applied on instance segmentation and depth estimation
US11328170B2 (en) Unknown object identification for robotic device
Oberländer et al. Hierarchical SLAM using spectral submap matching with opportunities for long-term operation
WO2021243205A1 (fr) Procédé et système pour la reconnaissance géo-sémantique d'un objet structural
Aamir et al. Real-Time Object Detection in Occluded Environment with Background Cluttering Effects Using Deep Learning
Libiao et al. Semantic segmentation based on DeeplabV3+ with multiple fusions of low-level features
Jia LRD‐SLAM: A Lightweight Robust Dynamic SLAM Method by Semantic Segmentation Network
KR102224218B1 (ko) 비디오 시간 정보를 활용하는 딥러닝 기반 물체 검출 방법 및 장치
Kim et al. Development of a real-time automatic passenger counting system using head detection based on deep learning
Alsharabi Real-time object detection overview: Advancements, challenges, and applications
CN111950475A (zh) 一种基于yoloV3的calhe直方图增强型目标识别算法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21814421

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 28.04.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21814421

Country of ref document: EP

Kind code of ref document: A1