WO2021243205A1 - Procédé et système pour la reconnaissance géo-sémantique d'un objet structural - Google Patents
Procédé et système pour la reconnaissance géo-sémantique d'un objet structural Download PDFInfo
- Publication number
- WO2021243205A1 WO2021243205A1 PCT/US2021/034854 US2021034854W WO2021243205A1 WO 2021243205 A1 WO2021243205 A1 WO 2021243205A1 US 2021034854 W US2021034854 W US 2021034854W WO 2021243205 A1 WO2021243205 A1 WO 2021243205A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- semantic information
- semantic
- database
- object features
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 238000001514 detection method Methods 0.000 claims abstract description 53
- 238000000605 extraction Methods 0.000 claims abstract description 21
- 230000000007 visual effect Effects 0.000 claims abstract description 15
- 230000008569 process Effects 0.000 claims abstract description 12
- 238000007781 pre-processing Methods 0.000 claims abstract description 11
- 238000012545 processing Methods 0.000 claims description 65
- 238000010801 machine learning Methods 0.000 claims description 31
- 238000012549 training Methods 0.000 claims description 11
- 239000003086 colorant Substances 0.000 claims description 7
- 238000004422 calculation algorithm Methods 0.000 description 12
- 238000004590 computer program Methods 0.000 description 11
- 238000013528 artificial neural network Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000006872 improvement Effects 0.000 description 5
- 238000013459 approach Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000001537 neural effect Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000013467 fragmentation Methods 0.000 description 2
- 238000006062 fragmentation reaction Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000013515 script Methods 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- JBRZTFJDHDCESZ-UHFFFAOYSA-N AsGa Chemical compound [As]#[Ga] JBRZTFJDHDCESZ-UHFFFAOYSA-N 0.000 description 1
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000003708 edge detection Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 229910052710 silicon Inorganic materials 0.000 description 1
- 239000010703 silicon Substances 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/457—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by analysing connectivity, e.g. edge linking, connected component analysis or slices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Definitions
- This disclosure relates generally to image recognition technology for acquiring geometric information and semantic information of structural objects shown image acquired by sensors.
- Image processing has been used in attempts to identify various objects.
- processing is techniques for extracting lines or edges as features of an object.
- techniques provide inaccurate results or take an amount of time that is impractical for many applications.
- problems due to fragmentation of the extracted features. Accordingly, there is a need to improve the ability to perform object identification.
- This disclosure provides an apparatus for geo-semantic recognition of an structural object, comprising: a sensor configured to acquire at least one image; a pre processing processor configured to process the acquired image to conform to a predetermined format; a visual feature extraction processor configured to extract object features from the processed image; a geo-semantic line detection processor configured to identify geometric information corresponding to the extracted object features; and retrieve semantic information corresponding to the extracted object features; and a region detection processor configured to recognize a region shown in the image based on the geometric information and the semantic information.
- This disclosure provides a method for geo-semantic recognition of an structural object, comprising: acquiring at least one image by at least one sensor; processing the acquired image to conform to a predetermined format; extracting object features from the processed image; identifying geometric information corresponding to the extracted object features; retrieving semantic information corresponding to the extracted object features; and recognizing a region shown in the image based on the geometric information and the semantic information.
- FIGS. 1A and IB illustrate results of image processing for structural object recognitions in the conventional art.
- FIGS. 2A and 2B illustrate results of image processing for structural object recognitions, in accordance with some embodiments of the present disclosure.
- FIG. 3 illustrates a simplified block diagram of an exemplary device for implementing image processing and identifying structural objects, in accordance with some embodiments.
- FIG. 4 illustrates a simplified block diagram of an exemplary process for implementing image processing and identifying structural objects, in accordance with some embodiments.
- FIG. 5 illustrates a utilization of the result of image processing for structural object recognitions, in accordance with some embodiments of the present disclosure.
- FIGS. 6A and 6B illustrate results of image processing for structural object recognitions in an outdoor setting, in accordance with some embodiments of the present disclosure.
- FIGS. 7A and 7B illustrate results of image processing for structural object recognitions in an indoor setting, in accordance with some embodiments of the present disclosure.
- systems, apparatuses and methods are provided herein that provide image recognition technology for acquiring geometric information and semantic information of structural objects shown image acquired by sensors.
- structural objects can be recognized as being consisted of points, lines connecting two or more points, and regions surrounded by lines. Then, the geometric information and the semantic information of the lines and regions can be identified and/or retrieved.
- a term “geo- semantic information” used in the present application refers to a combination of the geometric information and the semantic information. Such geo-semantic information can be used as specialized information to accurately recognize the structural objects.
- some embodiments of the present disclosure provide an apparatus for geo-semantic recognition of an structural object, the apparatus comprising: a sensor that acquires at least one image; a pre-processing processor that processes the acquired image to conform to a predetermined format; a visual feature extraction processor extracts object features from the processed image; a geo-semantic line detection processor identifies geometric information corresponding to the extracted object features and retrieves semantic information corresponding to the extracted object features; and a region detection processor that recognizes a region shown in the image based on the geometric information and the semantic information.
- This apparatus may be described as an image processing device in the present disclosure.
- Some embodiments of the present disclosure alternatively or additionally provide an apparatus further comprising a database that stores machine learning data including the semantic information and having at least one label corresponding to one or more of the extracted object features.
- the geo- semantic line detection processor retrieves the semantic information from the database.
- Some embodiments of the present disclosure alternatively or additionally provide the visual feature extraction processor that detects lines of the structural object shown in the image, as the object features.
- Some embodiments of the present disclosure alternatively or additionally provide the geo- semantic line detection processor that generates line indexes of the detected lines.
- a respective line index of the line indexes has a format of (xl, yl, x2, y2, class), where “xl,” “yl,” “x2,” “y2” are coordinate values of a respective line of the detected lines, and “class” is the semantic information of the respective line.
- Some embodiments of the present disclosure alternatively or additionally provide the geo-semantic line detection processor that inputs images adjacent to the detected lines and also inputs coordinate values of the detected lines, to the database, and the database that retrieves the semantic information of the detected lines, and outputs the retrieved semantic information to the geo- semantic line detection processor.
- Some embodiments of the present disclosure alternatively or additionally provide the geo-semantic line detection processor that identifies one of a horizontal bottom line, a horizontal middle line, a horizontal top line, and a vertical line of the structural object, as the semantic information of the respective line.
- Some embodiments of the present disclosure alternatively or additionally provide the region detection processor that identifies the region based on the detected lines.
- Some embodiments of the present disclosure alternatively or additionally provide the geo-semantic line detection processor that inputs images adjacent to the identified region and also inputs coordinate values of the identified region, to the database, and the database that retrieves the semantic information of the identified region, and outputs the retrieved semantic information to the geo-semantic line detection processor.
- Some embodiments of the present disclosure alternatively or additionally provide the geo-semantic line detection processor that identifies one of a rack surface and a wall surface as the semantic information of the region.
- Some embodiments of the present disclosure alternatively or additionally provide a display that displays the identified lines with different colors.
- Some embodiments of the present disclosure alternatively or additionally provide the sensor that acquires at least one of a still image and a video image.
- Some embodiments of the present disclosure alternatively or additionally provide the sensor that acquires at least one of a gray-scale image and a color image.
- Some embodiments of the present disclosure alternatively or additionally provide the pre-processing processor that processes the acquired image to conform to a predetermined image size.
- Some embodiments of the present disclosure alternatively or additionally provide a machine learning training processor that trains object feature extraction based on the extracted object features.
- Some embodiments of the present disclosure provide a method comprising: acquiring at least one image by at least one sensor; processing the acquired image to conform to a predetermined format; extracting object features from the processed image; identifying geometric information corresponding to the extracted object features; retrieving semantic information corresponding to the extracted object features; and recognizing a region shown in the image based on the geometric information and the semantic information.
- This method may be described as operations of an image processing device in the present disclosure.
- Some embodiments of the present disclosure alternatively or additionally provide retrieving the semantic information from a database that stores machine learning data has at least one label corresponding to one or more of the extracted object features.
- a respective line index of the line indexes has a format of (xl, yl, x2, y2, class), where “xl,” “yl,” “x2,” “y2” are coordinate values of a respective line of the detected lines, and “class” is the semantic information of the respective line.
- Some embodiments of the present disclosure alternatively or additionally provide: inputting images adjacent to the detected lines and also inputting coordinate values of the detected lines, to a database that stores machine learning data including the semantic information and having at least one label corresponding to one or more of the extracted object features; and retrieving the semantic information of the detected lines from the database.
- Some embodiments of the present disclosure alternatively or additionally provide identifying one of a horizontal bottom line, a horizontal middle line, a horizontal top line, and a vertical line of the structural object, as the semantic information of the respective line.
- Some embodiments of the present disclosure alternatively or additionally provide identifying the region based on the detected lines. [0039] Some embodiments of the present disclosure alternatively or additionally provide: inputting images adjacent to the identified region and also inputting coordinate values of the identified region, to a database that stores machine learning data including the semantic information and having at least one label corresponding to one or more of the extracted object features; and retrieving the semantic information of the identified region from the database.
- Some embodiments of the present disclosure alternatively or additionally provide identifying one of a rack surface and a wall surface as the semantic information of the region.
- Some embodiments of the present disclosure alternatively or additionally provide displaying the identified lines with different colors.
- Some embodiments of the present disclosure alternatively or additionally provide acquiring at least one of a still image and a video image.
- Some embodiments of the present disclosure alternatively or additionally provide acquiring at least one of a gray-scale image and a color image.
- Some embodiments of the present disclosure alternatively or additionally provide processing the acquired image to conform to a predetermined image size.
- Some embodiments of the present disclosure alternatively or additionally provide performing a machine learning training for an object feature extraction by using the extracted object features.
- FIGS. 1A and IB illustrate results of image processing for structural object recognitions in the conventional art.
- the present disclosure finds the correspondences between the detected line and the geometric structure of the sites by assuming that the detected line matches to the one of the physical line from the warehouse infrastructure, and based on such finding and assumption, the present disclosure provides the machine-learned contextual image processing technique.
- the machine-learned contextual image processing technique of the present application may provide accurate and efficient image processing procedure.
- FIGS. 2A and 2B illustrate results of image processing for structural object recognitions, in accordance with some embodiments of the present disclosure.
- the structural objects shown in the image can be recognized by using the geometric information and the semantic information of the lines and region of the structural objects.
- Geometric information includes coordinate values of the extracted object features. For example, when the extracted object feature is a line. Coordinate values of the line can be identified as the geometric information.
- the image processing device detects lines of the structural object shown in the image, as the object features, and then, identifies one or more of a horizontal bottom line, a horizontal middle line, a horizontal top line, and a vertical line of the structural object, from the detected lines.
- a horizontal bottom line, a horizontal middle line, a horizontal top line, and a vertical line of the structural object are identified.
- a horizontal bottom line, a horizontal top line, and a vertical line of the structural object are identified.
- Such exemplary names of the lines may be considered as the semantic information of the lines of the structural objects.
- the image processing device may display the identified lines with different colors, as also shown in FIGS. 2A and 2B.
- the image processing device may identify the region based on the identified lines. For example, as shown FIG. 2A, the region may be identified as “Rack,” or as shown in FIG. 2B, the region may be identified as “Wall”. Such exemplary names of the regions may be considered as the semantic information of the regions of the structural objects.
- the present disclosure may provide more effective and accurate image recognition technology.
- FIG. 3 illustrates a simplified block diagram of an exemplary device for implementing image processing and identifying structural objects, in accordance with some embodiments.
- the image processing device may be implemented by one or more sensors, a sensor input processor 100, a pre-processing processor 200, a visual feature extraction processor 300, a geo-semantic line detection processor 400, a region detection processor 500, a display 600, a machine learning training processor 700, and a database 800.
- the sensor input processor 100, the pre processing processor 200, the visual feature extraction processor 300, the geo-semantic line detection processor 400, the region detection processor 500, the display 600, the machine learning training processor 700, and the database 800 may be configured in one unified hardware processor. According to some other embodiments, the sensor input processor 100, the pre-processing processor 200, the visual feature extraction processor 300, the geo-semantic line detection processor 400, the region detection processor 500, the display 600, the machine learning training processor 700, and the database 800 may be configured in two or more units.
- one or more images are acquired by one or more sensors.
- the one or more sensors may include one or more camera modules.
- Such camera modules may be equipped in a mobile device, a vehicle- equipped device, and/or a fixed device.
- the image may be a color image or a gray image, and may also be a still image or video image.
- the sensor input processor 100 communicates with the sensors and acquires images captured by the sensors.
- the pre-processing processor 200 may process the acquired image to conform to a predetermined format. For example, the pre-processing processor 200 may convert the acquired image to an image having a predetermined image size, resolution or scale. Further, normalization of the images may be performed by the pre-processing processor 200.
- the visual feature extraction processor 300 extract object features from the processed image.
- the visual feature extraction processor 300 may utili e algorithms to detect and isolate various desired portions or shapes (features, e.g., dots, lines, surfaces) of objects in a digitized image or video stream.
- the visual feature extraction processor 300 may utilize numerical programming environments such as MATLAB, to perform the visual feature extraction.
- neural networks are used by the visual feature extraction processor 300 to perform the visual feature extraction.
- the machine learning training processor 700 perform machine learning based on the results of the feature extraction.
- the image feature vector can be used as an input for machine learning algorithms through training and test models and it can improve the performance.
- the geo-semantic line detection processor 400 identifies geometric information corresponding to the extracted object features and retrieves semantic information corresponding to the extracted object features.
- Geometric information includes coordinate values of the extracted object features. For example, when the extracted object feature is a line. Coordinate values of the line can be identified as the geometric information.
- Semantic information includes corners, edges, blobs, and ridges.
- the geo- semantic line detection processor 400 may use algorithms of corner detection, curve fitting, edge detection, global structure extraction, feature histograms, line detection, connected- component labeling et al.
- Semantic information refers to an appearance-based inference using color, texture and/or shape information along with a type of context inference (or representation) that can combine and transform these machine-extracted evidence into what could be called as a scene description.
- indexes of the detected lines are generated.
- a respective line index of the line indexes may have a format of (xl, yl, x2, y2, class), where “xl,” “yl,” “x2,” “y2” are coordinate values of a respective line of the detected lines, and “class” is the semantic information of the respective line.
- neural networks and machine learning algorithms are used by the geo-semantic line detection processor 400 to identify geometric information corresponding to the extracted object features, and to retrieve semantic information corresponding to the extracted object features.
- the geo-semantic region detection processor 500 recognizes a region shown in the image based on the geometric information and the semantic information.
- the geo-semantic region detection processor 500 may perform similar operations as mentioned above to the geo-semantic line detection processor 400 to acquire geometric and semantic information of the recognized region.
- the display 600 displays the geometric information and the semantic information as an overlay image on the image captured by the sensors.
- the database 800 stores machine learning data including the semantic information and having at least one label corresponding to one or more of the extracted object features. For example, when the extracted object feature is a line, the database 800 stores the semantic information corresponding to coordinate values of the line.
- the geo-semantic line detection processor 400 utilizes a line detection algorithm to extract the geometric information of lines.
- the line detection algorithm takes a collection of n edge points and finds all the lines on which these edge points lie.
- the line detection algorithm includes the Hough transform and convolution- based techniques.
- the geo-semantic line detection processor 400 utilizes a Neural net algorithm, which is based on end-to-end wireframe parsing, to extract the geometric information of lines. That is, the geo-semantic line detection processor 400 is end-to-end trainable and can directly output a vectorized wireframe that contains semantically meaningful and geometrically salient junctions and lines ( see e.g., End-to- End Wireframe Parsing , Yichao Zhou, Haozhi Qi, Yi Ma; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 962-971).
- a Neural net algorithm which is based on end-to-end wireframe parsing, to extract the geometric information of lines. That is, the geo-semantic line detection processor 400 is end-to-end trainable and can directly output a vectorized wireframe that contains semantically meaningful and geometrically salient junctions and lines ( see e.g., End-to- End Wireframe Parsing , Yicha
- the geo-semantic line detection processor 400 retrieves semantic information corresponding to the extracted object features from the database 800. In some embodiments, the geo-semantic line detection processor 400 retrieves semantic information corresponding to coordinate values of the extracted object features. In some embodiments, the geo-semantic line detection processor 400 retrieves semantic information, which is the most closely corresponding to coordinate values of the extracted object features. A matching rate between the semantic information and the geometric information of the extracted object feature can be calculated, and it can be used for retrieving the semantic information corresponding to the extracted object feature.
- the geo-semantic line detection processor 400 inputs images adjacent to the detected lines and also inputs coordinate values of the detected lines, to the database 800.
- the database 800 retrieves the semantic information of the detected lines based on input data, and outputs the retrieved semantic information to the geo- semantic line detection processor.
- the database 800 may utilize an image comparison algorithm to retrieve the semantic information corresponding to the images adjacent to the detected lines, and the database 800 may additionally or alternatively utilize a coordinate value comparison algorithm to retrieve the semantic information corresponding to the coordinate values of the detected lines.
- the database 800 stores labels, for example, a horizontal bottom line, a horizontal middle line, a horizontal top line, and a vertical line, and also stores example image portions corresponding to each label. Those labels and example image portions are initially inputted by a manufacture or a user of the database, and then, further developed and accumulated by a machine training scheme.
- the geo-semantic line detection processor 400 inputs images adjacent to the identified region and also inputs coordinate values of the identified region, to the database 800.
- the database 800 retrieves the semantic information of the identified region based on input data, and outputs the retrieved semantic information to the geo-semantic line detection processor.
- the database 800 may utilize an image comparison algorithm to retrieve the semantic information corresponding to the images adjacent to the identified region, and the database 800 may additionally or alternatively utilize a coordinate value comparison algorithm to retrieve the semantic information corresponding to the coordinate values of the identified region.
- the database 800 stores labels, for example, a wall surface and a rack surface, and also stores example image portions corresponding to each label. Those labels and example images portions are initially inputted by a manufacture or a user of the database, and then, further developed and accumulated by a machine training scheme.
- FIG. 4 illustrates a simplified block diagram of an exemplary process for implementing image processing and identifying structural objects, in accordance with some embodiments.
- step S100 of acquiring at least one image by at least one sensor is performed.
- one or more images are acquired by one or more sensors.
- the one or more sensors may include one or more camera modules. Such camera modules may be equipped in a mobile device, a vehicle-equipped device, and/or a fixed device.
- the image may be a color image or a gray image, and may also be a still image or video image.
- step S200 of processing the acquired image to conform to a predetermined format is performed.
- step S200 converting the acquired image to an image having a predetermined image size, resolution or scale may also be performed. Performing normalization of the images may be also performed.
- step S300 of extracting object features from the processed image is performed.
- step S200 detecting lines of the structural object shown in the image, as the object features may also be performed.
- step S400 of identifying geometric information corresponding to the extracted object features is performed.
- step S400 identifying one or more of a horizontal bottom line, a horizontal middle line, a horizontal top line, and a vertical line of the structural object, from the detected lines.
- Such exemplary names of the lines may be considered as the semantic information of the lines of the structural objects.
- step S500 of retrieving semantic information corresponding to the extracted object features is performed.
- step S500 retrieving the semantic information from a database that stores machine learning data has at least one label corresponding to one or more of the extracted object features may also be performed.
- a step of generating line indexes of the detected lines can be performed.
- a respective line index of the line indexes may have a format of (xl, yl, x2, y2, class), where “xl,” “yl,” “x2,” “y2” are coordinate values of a respective line of the detected lines, and “class” is the semantic information of the respective line.
- step S600 of recognizing a region shown in the image based on the geometric information and the semantic information is performed.
- S600 identifying the region based on the identified lines may also be performed.
- the method may further include a step of displaying the geometric information and the semantic information as an overlay image on the image captured by the sensors.
- the identified lines may be displayed with different colors.
- FIG. 5 illustrates a utilization of the result of image processing for structural object recognitions, in accordance with some embodiments of the present disclosure.
- the result of image processing for structural object recognitions is utilized in providing two-dimensional (2D) or navigation three-dimensional (3D) navigation, as shown in FIG. 5.
- the semantic information or the geometric information of the lines and the regions may be displayed on a screen of a navigation system.
- the lines of the structural object may be displayed with different colors, depending on the semantic information or the geometric information of the lines. With this feature, a user may more accurately recognize the space where the user would move.
- FIGS. 6A and 6B illustrate results of image processing for structural object recognitions in an outdoor setting, in accordance with some embodiments of the present disclosure
- FIGS. 7A and 7B illustrate results of image processing for structural object recognitions in an indoor setting, in accordance with some embodiments of the present disclosure.
- the devices and the method described in the present disclosure may be used to perform a recognition of the structural objects (e.g., buildings and vehicles) in the outdoor setting.
- the devices and the method described in the present disclosure may be used to perform a recognition of the structural objects (e.g., walls and corridors) in the indoor setting.
- Embodiments described in the present disclosure can be implemented in digital electronic circuitry, in tangibly-embodied computer software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them.
- Embodiments described in the present disclosure can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non-transitory storage medium for execution by, or to control the operation of, data processing apparatus.
- the computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
- the program instructions can be encoded on an artificially- generated propagated signal, e.g., a machine- generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus.
- an artificially-generated propagated signal e.g., a machine- generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus.
- processor or “processing unit” refers to data processing hardware and encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers.
- the apparatus can also be, or further include, special purpose logic circuitry, e.g., an FPGA (field-programmable gate array) or an ASIC (application- specific integrated circuit).
- the apparatus can optionally include, in addition to hardware, code that creates an execution environment for computer programs, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
- a computer program which may also be referred to or described as a program, software, a software application, a module, a software module, a script, or code, can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a digital computing environment.
- a computer program may, but need not, correspond to a file in a file system.
- a program can be stored in a portion of a file that holds other programs or data, e.g., one or more scripts stored in a markup language document, in a single file dedicated to the program in question, or in multiple coordinated files, e.g., files that store one or more modules, sub-programs, or portions of code.
- a computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a data communication network.
- the processes and logic flows described in this specification can be performed by one or more programmable digital computers, operating with one or more quantum processors, as appropriate, executing one or more computer programs to perform functions by operating on input data and generating output.
- the processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA or an ASIC, or by a combination of special purpose logic circuitry and one or more programmed computers.
- special purpose logic circuitry e.g., an FPGA or an ASIC
- special purpose logic circuitry e.g., an FPGA or an ASIC
- For a system of one or more digital computers to be “configured to” perform particular operations or actions means that the system has installed on its software, firmware, hardware, or a combination of them that in operation cause the system to perform the operations or actions.
- For one or more computer programs to be configured to perform particular operations or actions means that the one or more programs include instructions that, when executed by digital data processing apparatus, cause the apparatus to perform the operations
- Digital computers suitable for the execution of a computer program can be based on general or special purpose microprocessors or both, or any other kind of central processing unit.
- a central processing unit will receive instructions and data from a read-only memory or a random access memory or both.
- the essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data.
- the central processing unit and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
- a digital computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks.
- Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD-ROM and DVD-ROM disks.
- semiconductor memory devices e.g., EPROM, EEPROM, and flash memory devices
- magnetic disks e.g., internal hard disks or removable disks
- magneto optical disks e.g., CD-ROM and DVD-ROM disks.
- Control of the various systems described in this specification, or portions of them, can be implemented in a computer program product that includes instructions that are stored on one or more non-transitory machine-readable storage media, and that are executable on one or more digital processing devices.
- the systems described in this specification, or portions of them, can each be implemented as an apparatus, method, or electronic system that may include one or more digital processing devices and memory to store executable instructions to perform the operations described in this specification.
- Other examples of the present disclosure may include any number and combination of machine-learning models having any number and combination of characteristics.
- the machine-learning model(s) can be trained in a supervised, semi- supervised, or unsupervised manner, or any combination of these.
- the machine learning model(s) can be implemented using a single computing device or multiple computing devices.
- Implementing some examples of the present disclosure at least in part by using machine-learning models can reduce the total number of processing iterations, time, memory, electrical power, or any combination of these consumed by a computing device when analyzing data.
- a neural network may more readily identify patterns in data than other approaches. This may enable the neural network to analyze the data using fewer processing cycles and less memory than other approaches, while obtaining a similar or greater level of accuracy.
- Some machine-learning approaches may be more efficiently and speedily executed and processed with machine-learning specific processors (e.g., not a generic CPU). Such processors may also provide an energy savings when compared to generic CPUs.
- some of these processors can include a graphical processing unit (GPU), an application- specific integrated circuit (ASIC), a field-programmable gate array (FPGA), an artificial intelligence (AI) accelerator, a neural computing core, a neural computing engine, a neural processing unit, a purpose-built chip architecture for deep learning, and/or some other machine-learning specific processor that implements a machine learning approach or one or more neural networks using semiconductor (e.g., silicon (Si), gallium arsenide (GaAs)) devices.
- semiconductor e.g., silicon (Si), gallium arsenide (GaAs)
- processors may also be employed in heterogeneous computing architectures with a number of and a variety of different types of cores, engines, nodes, and/or layers to achieve various energy efficiencies, processing speed improvements, data communication speed improvements, and/or data efficiency targets and improvements throughout various parts of the system when compared to a homogeneous computing architecture that employs CPUs for general purpose computing.
- the image processing device and method of the present disclosure is a practical application of image processing technique into an object recognition purpose.
- a user of the image processing device and method of the present disclosure may practically use acquired geometric information and semantic information for a variety of purposes.
- the acquired geometric information and semantic information of the object in the image may be displayed on a navigation screen of a user device (e.g., a mobile device or a vehicle navigation system) so that the user may drive a vehicle or move toward a target place (e.g., a rack or a particular wall).
- a user device e.g., a mobile device or a vehicle navigation system
- a working vehicle may be controlled by a control system to avoid a wall or a rack in a warehouse, and to find the best route to a loading place or an unloading place.
- a control system to avoid a wall or a rack in a warehouse, and to find the best route to a loading place or an unloading place.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
La présente invention concerne un appareil et des procédés de reconnaissance d'image pour acquérir des informations géométriques et des informations sémantiques d'objets structuraux montrés dans une image acquise par des capteurs. L'appareil comprend : un capteur configuré pour acquérir au moins une image ; un processeur de prétraitement configuré pour traiter l'image acquise pour se conformer à un format prédéterminé ; un processeur d'extraction de caractéristiques visuelles configuré pour extraire des caractéristiques d'objet à partir de l'image traitée ; un processeur de détection de ligne géo-sémantique configuré pour identifier des informations géométriques correspondant aux caractéristiques d'objet extraites ; et récupérer des informations sémantiques correspondant aux caractéristiques d'objet extraites ; et un processeur de détection de région configuré pour reconnaître une région montrée dans l'image sur la base des informations géométriques et des informations sémantiques.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202063032514P | 2020-05-29 | 2020-05-29 | |
US63/032,514 | 2020-05-29 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021243205A1 true WO2021243205A1 (fr) | 2021-12-02 |
Family
ID=78722850
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2021/034854 WO2021243205A1 (fr) | 2020-05-29 | 2021-05-28 | Procédé et système pour la reconnaissance géo-sémantique d'un objet structural |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2021243205A1 (fr) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005215988A (ja) * | 2004-01-29 | 2005-08-11 | Canon Inc | パターン認識用学習方法、パターン認識用学習装置、画像入力装置、コンピュータプログラム、及びコンピュータ読み取り可能な記録媒体 |
US20160267331A1 (en) * | 2015-03-12 | 2016-09-15 | Toyota Jidosha Kabushiki Kaisha | Detecting roadway objects in real-time images |
US20190035101A1 (en) * | 2017-07-27 | 2019-01-31 | Here Global B.V. | Method, apparatus, and system for real-time object detection using a cursor recurrent neural network |
US20190178436A1 (en) * | 2016-08-03 | 2019-06-13 | Sz Dji Osmo Technology Co., Ltd. | Method and system for controlling gimbal |
US20190377966A1 (en) * | 2016-02-15 | 2019-12-12 | Pictometry International Corp. | Automated system and methodology for feature extraction |
-
2021
- 2021-05-28 WO PCT/US2021/034854 patent/WO2021243205A1/fr active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005215988A (ja) * | 2004-01-29 | 2005-08-11 | Canon Inc | パターン認識用学習方法、パターン認識用学習装置、画像入力装置、コンピュータプログラム、及びコンピュータ読み取り可能な記録媒体 |
US20160267331A1 (en) * | 2015-03-12 | 2016-09-15 | Toyota Jidosha Kabushiki Kaisha | Detecting roadway objects in real-time images |
US20190377966A1 (en) * | 2016-02-15 | 2019-12-12 | Pictometry International Corp. | Automated system and methodology for feature extraction |
US20190178436A1 (en) * | 2016-08-03 | 2019-06-13 | Sz Dji Osmo Technology Co., Ltd. | Method and system for controlling gimbal |
US20190035101A1 (en) * | 2017-07-27 | 2019-01-31 | Here Global B.V. | Method, apparatus, and system for real-time object detection using a cursor recurrent neural network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10691952B2 (en) | Adapting to appearance variations when tracking a target object in video sequence | |
US10902615B2 (en) | Hybrid and self-aware long-term object tracking | |
Ahmad et al. | Design & implementation of real time autonomous car by using image processing & IoT | |
US10817752B2 (en) | Virtually boosted training | |
US10964033B2 (en) | Decoupled motion models for object tracking | |
US20230154170A1 (en) | Method and apparatus with multi-modal feature fusion | |
Raghavan et al. | Optimized building extraction from high-resolution satellite imagery using deep learning | |
US9691132B2 (en) | Method and apparatus for inferring facial composite | |
Satzoda et al. | Vision-based lane analysis: Exploration of issues and approaches for embedded realization | |
Parmar et al. | Deeprange: deep‐learning‐based object detection and ranging in autonomous driving | |
US20230154157A1 (en) | Saliency-based input resampling for efficient object detection | |
KR20220095169A (ko) | 3차원 객체 감지를 위한 장치의 동작 방법 및 그 장치 | |
Khellal et al. | Pedestrian classification and detection in far infrared images | |
Shao et al. | Semantic segmentation for free space and lane based on grid-based interest point detection | |
Nakamura et al. | An effective combination of loss gradients for multi-task learning applied on instance segmentation and depth estimation | |
US11328170B2 (en) | Unknown object identification for robotic device | |
Oberländer et al. | Hierarchical SLAM using spectral submap matching with opportunities for long-term operation | |
WO2021243205A1 (fr) | Procédé et système pour la reconnaissance géo-sémantique d'un objet structural | |
Aamir et al. | Real-Time Object Detection in Occluded Environment with Background Cluttering Effects Using Deep Learning | |
Libiao et al. | Semantic segmentation based on DeeplabV3+ with multiple fusions of low-level features | |
Jia | LRD‐SLAM: A Lightweight Robust Dynamic SLAM Method by Semantic Segmentation Network | |
KR102224218B1 (ko) | 비디오 시간 정보를 활용하는 딥러닝 기반 물체 검출 방법 및 장치 | |
Kim et al. | Development of a real-time automatic passenger counting system using head detection based on deep learning | |
Alsharabi | Real-time object detection overview: Advancements, challenges, and applications | |
CN111950475A (zh) | 一种基于yoloV3的calhe直方图增强型目标识别算法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21814421 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 28.04.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21814421 Country of ref document: EP Kind code of ref document: A1 |