WO2022052052A1 - Method and system for identifying objects - Google Patents

Method and system for identifying objects Download PDF

Info

Publication number
WO2022052052A1
WO2022052052A1 PCT/CN2020/114844 CN2020114844W WO2022052052A1 WO 2022052052 A1 WO2022052052 A1 WO 2022052052A1 CN 2020114844 W CN2020114844 W CN 2020114844W WO 2022052052 A1 WO2022052052 A1 WO 2022052052A1
Authority
WO
WIPO (PCT)
Prior art keywords
pictures
synthesized images
eigenvectors
fusing
view angles
Prior art date
Application number
PCT/CN2020/114844
Other languages
French (fr)
Inventor
Fanbo Meng
Xiang Li
Xiaofeng Wang
Original Assignee
Siemens Aktiengesellschaft
Siemens Ltd., China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens Aktiengesellschaft, Siemens Ltd., China filed Critical Siemens Aktiengesellschaft
Priority to US18/044,443 priority Critical patent/US20230360380A1/en
Priority to EP20952840.5A priority patent/EP4193297A4/en
Priority to PCT/CN2020/114844 priority patent/WO2022052052A1/en
Priority to CN202080103768.5A priority patent/CN116783630A/en
Publication of WO2022052052A1 publication Critical patent/WO2022052052A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/76Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries based on eigen-space representations, e.g. from pose or different illumination conditions; Shape manifolds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/12Acquisition of 3D measurements of objects

Definitions

  • a plurality of synthesized images have different view angles, and correspondingly, a plurality of pictures also have different view angles. In this way, more characteristics may be embodied.
  • the plurality of pictures respectively have the same view angles as at least a portion of the plurality of synthesized images, such that the interference caused by different angles is reduced.
  • the method achieves a high identification accuracy.
  • camera parameters for acquiring the plurality of pictures are determined according to the view angles of the plurality of synthesized images, or software parameters for generating the plurality of synthesized images are determined according to the plurality of pictures, such that the plurality of pictures respectively have same view angles as at least a portion of the plurality of synthesized images.
  • the first fused vector is generated by fusing the extracted eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures
  • the second fused vector is generated by fusing the extracted eigenvectors of the plurality of pictures.
  • the second fused vector is generated by fusing the extracted eigenvectors of the plurality of pictures in combination with auxiliary vectors, wherein a total quantity of the eigenvectors of the plurality of pictures and the auxiliary vectors is equal to a quantity of the synthesized images; and the first fused vector is generated by fusing the extracted eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures in combination with the auxiliary vectors, wherein a total quantity of the eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures and the auxiliary vectors is equal to the quantity of the synthesized images; or in the case that the plurality of pictures respectively have the same view angles as the at least a portion of the plurality of synthesized images, the first fused vector is generated
  • the plurality of synthesized images are generated by CAD software according to the three-dimensional digital model.
  • the eigenvectors of the plurality of synthesized vectors and the eigenvectors of the plurality of pictures are respectively extracted by CNN.
  • the classifier includes a classifier based on deep learning.
  • a scheme of the fusion is determined based on an AutoML technology or a neural architecture search technology.
  • the plurality of synthesized images are domain-randomized, and the eigenvectors of the plurality of synthesized images are respectively extracted; and the plurality of synthesized pictures are domain-randomized, and the eigenvectors of the plurality of pictures are respectively extracted
  • the processor is configured to control the photographing mechanism or the image generating module such that the plurality of pictures respectively have same view angles as at least a portion of the plurality of synthesized images.
  • the characteristic extracting module is further configured to respectively extract eigenvectors of the plurality of pictures.
  • the fusing module is further configured to generate a second fused vector by fusing the eigenvectors of the plurality of pictures.
  • the trained classifier module is configured to obtain a classification result of the object according to the second fused vector input.
  • a plurality of synthesized images have different view angles, and correspondingly, a plurality of pictures also have different view angles.
  • the processor is capable of controlling the photographing mechanism or the image generating module such that the plurality of pictures respectively have same view angles as at least a portion of the plurality of synthesized images. In this way, interference caused due to different angles is reduced.
  • the system achieves a high identification accuracy.
  • the photographing mechanism includes a camera and a stand.
  • the camera is movably connected to the stand.
  • the system further includes a driving mechanism, configured to drive the camera to move relative to the stand.
  • the processor is further configured to output a set of control signals according to the view angles of the plurality of synthesized images.
  • the driving mechanism is further configured to control movements of the camera according to the control signals to acquire the plurality of pictures respectively having the same view angles as the at least a portion of the plurality of synthesized images.
  • the fusing module is further configured to generate the first fused vector by fusing the extracted eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures, and generate the second fused vector by fusing the extracted eigenvectors of the plurality of pictures.
  • the fusing module is further configured to generate the second fused vector by fusing the extracted eigenvectors of the plurality of pictures in combination with auxiliary vectors, wherein a total quantity of the eigenvectors of the plurality of pictures and the auxiliary vectors is equal to a quantity of the synthesized images; and generate the first fused vector by fusing the extracted eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures in combination with the auxiliary vectors, wherein a total quantity of the eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures and the auxiliary vectors is equal to the quantity of the synthesized images; or in the case that the plurality of pictures respectively have the same view angles as the at least a portion of the plurality of synthesized images,
  • the image generating module is further configured to generate the plurality of synthesized images by CAD software according to the three-dimensional digital model.
  • the characteristic extracting module is further configured to respectively extract the eigenvectors of the plurality of synthesized vectors and the eigenvectors of the plurality of pictures by CNN.
  • the classifier module includes a classifier module based on deep learning.
  • the fusing module is further configured to determine a scheme of the fusion based on an AutoML technology or a neural architecture search technology.
  • the characteristic extracting module is further configured to domain-randomize the plurality of synthesized images, and respectively extract the eigenvectors of the plurality of synthesized images.
  • the characteristic extracting module is further configured to domain-randomize the plurality of pictures, and respectively extract the eigenvectors of the plurality of pictures.
  • the present disclosure is further intended to provide a computer-readable storage medium which store the code for use by the system and can identify the object accurately.
  • the system executes the above method when the code is executed by the processor.
  • FIG. 4 is a schematic structural diagram of a system for identifying an object according to an exemplary embodiment of the present disclosure
  • FIG. 5 schematically illustrates an operating process of the system for identifying the object as illustrated in FIG. 4;
  • FIG. 6 illustrates an exemplary embodiment of a characteristic extracting module
  • FIG. 7 is a schematic structural diagram of a system for identifying an object according to another exemplary embodiment of the present disclosure.
  • first the terms “first” , “second” , and the like do not represent degrees of importance or a sequence, but only for differentiation, and for ease of description.
  • the plurality of synthesized images are generated by computer aided design (CAD) software according to the three-dimensional digital model.
  • CAD computer aided design
  • the CAD software may be, for example, AutoCAD developed by Autodesk.
  • other software capable of generating the synthesized images according to the three-dimensional digital model may also be used, for example, Unigraphicx NX (UG) developed by Siemens PLM Software.
  • step S11 includes the following sub-steps that are performed in sequence in the CAD software:
  • S112 A plurality of virtual cameras are added and camera parameters of these virtual cameras are set.
  • the quantity of virtual cameras is consistent with the quantity of synthesized images to be generated, and the camera parameters of the virtual cameras determine the view angles of the synthesized images.
  • S113 Images are captured by the virtual cameras to obtain the synthesized images.
  • S12 Eigenvectors of the plurality of synthesized images are respectively extracted.
  • the eigenvectors of the plurality of synthesized images are respectively extracted by a convolutional neural network (CNN) .
  • CNN convolutional neural network
  • the eigenvectors of the plurality of synthesized images may also be extracted in other fashions.
  • the convolutional neural network is a feedforward neural network involving convolutional computation and having a deep structure, and one of representative algorithms of deeping learning.
  • the convolutional neural network has capabilities of characterization learning, and is capable of performing translation invariant classification for input information according to a hierarchical structure thereof, which is thus referred to as a "translation invariant artificial neural network” .
  • the CNN facilitates extraction of eigenvectors of key characteristics, to further improve the accuracy of the method for identifying the object.
  • the AutoML incorporates data pre-processing, characteristic selection, algorithm selection and the like steps in machine learning with model architecture design, model training and the likes steps in deep learning, and deploys the same in a "black box” .
  • a desired prediction result may be obtained as long as the data is input.
  • the method of "designing one neural network by using another neural network” is referred to as the neural architecture search (NAS) technology, and generally, this method designs a new neural network by using reinforcement learning or evolutionary algorithm.
  • the NAS may automatize architecture engineering, and the NAS is capable of automatically obtaining an optimal architecture as long as a data set is provided.
  • the first fused vector is input into a classifier to train the classifier.
  • the classifier includes a classifier based on deep learning.
  • a plurality of pictures of the object are acquired by the cameras.
  • the plurality of pictures respectively have same view angles as at least a portion of the plurality of synthesized images. That is, the quantity of pictures is less than the quantity of synthesized images. If the quantity of synthesized images is 5, the quantity of pictures may be, for example, 5 or 3.
  • the plurality of synthesized images includes synthesized images having the same view angles as the pictures.
  • S22 Eigenvectors of the plurality of pictures are respectively extracted.
  • the eigenvectors of the plurality of pictures are respectively extracted by a CNN.
  • a second fused vector is generated by fusing the eigenvectors of the plurality of pictures.
  • the scheme of the fusion is determined based on the AutoML technology or the neural architecture search technology.
  • the scheme of the fusion used in this step is the same as the scheme of the fusion used in step S13.
  • the plurality of synthesized images may be generally first, and then the camera parameters (for example, the positions and the angles) for acquiring the plurality of pictures may be determined according to the view angles of the plurality of synthesized images, such that the plurality of pictures respectively have the same view angles as the at least a portion of the plurality of synthesized images. Nevertheless, the plurality of pictures of the object may be acquired first, and then the camera parameters for generating the plurality of synthesized images may be determined according to the view angles of the plurality of pictures.
  • the first fused vector is generated by fusing the extracted eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures
  • the second fused vector is generated by fusing the extracted eigenvectors of the plurality of pictures.
  • the second fused vector is generated by fusing the extracted eigenvectors of the plurality of pictures in combination with the auxiliary vectors (vectors having a modulus of 1) , wherein a total quantity of the eigenvectors of the plurality of pictures and the auxiliary vectors is equal to the quantity of the synthesized images.
  • the first fused vector may be generated by fusing the extracted eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures in combination with the auxiliary vectors, wherein a total quantity of the eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures and the auxiliary vectors is equal to the quantity of the synthesized images. For example, if the quantity of synthesized images is 5 and the quantity of pictures is 3, the quantity of auxiliary vectors desired in the above two steps is 2.
  • the scheme of the fusion does not need to be re-determined, but the first fused vector only needs to be generated by re-fusion according to the original fusion scheme, and the classifier needs to be re-trained according to the re-generated first fused vector.
  • the auxiliary vector is, for example, a unit vector (that is, a vector with a modulus equal to 1) or a zero vector.
  • the first fused vector may be generated by fusing the extracted eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures
  • the second fused vector is generated by fusing the extracted eigenvectors of the plurality of pictures. Since the quantity of vectors input during the fusion is changed, the scheme of the fusion needs to be re-determined, the first fused vector only needs to be generated by re-fusion according to the new fusion scheme, and the classifier needs to be re-trained according to the re-generated first fused vector.
  • a plurality of synthesized images have different view angles, and correspondingly, a plurality of pictures also have different view angles. In this way, more characteristics may be embodied.
  • the plurality of pictures respectively have same view angles as at least a portion of the plurality of synthesized images. In this way, interference caused due to different angles is reduced.
  • the method achieves a high identification accuracy.
  • FIG. 3 is a flowchart of a method for identifying an object according to another exemplary embodiment of the present disclosure.
  • the common points between the method for identifying the object according to this exemplary embodiment and the method for identifying the object as illustrated in FIG. 1 are not described herein any further, and the differences between these two methods are described hereinafter.
  • step S11 upon completion of step S11, step S15 is first performed to domain-randomize the plurality of synthesized images, and then step S12 is performed.
  • step S24 is first performed to domain-randomize the plurality of pictures, and then step S22 is performed.
  • known characteristics for example, environment of the object, color of the object, and the like
  • the accuracy and efficiency of the method for identifying the object are improved.
  • FIG. 4 is a schematic structural diagram of a system for identifying an object according to an exemplary embodiment of the present disclosure.
  • the system for identifying the object includes a processor 20 and a photographing mechanism 40.
  • the processor 20 includes an image generating module 21, a characteristic extracting module 22, a fusing module 23, and a classifier module 24.
  • the image generating module 21 is capable of generating a plurality of synthesized images according to a three-dimensional digital model.
  • the plurality of synthesized images have different view angles.
  • the image generating module 21 generates the plurality of synthesized images, for example, by computer aided design (CAD) software according to the three-dimensional digital model.
  • CAD computer aided design
  • the characteristic extracting module 22 is configured to respectively extract eigenvectors of the plurality of synthesized image.
  • the characteristic extracting module 22 for example, respectively extracts the eigenvectors of the plurality of synthesized images by a CNN.
  • the characteristic extracting module 22 may also extract the eigenvectors of the plurality of synthesized images by using other algorithms.
  • the fusing module 23 is capable of generating a first fused vector by fusing the eigenvectors of the plurality of synthesized images.
  • the fusing module 23 for example, determines a scheme of the fusion based on the AutoML technology or the neural architecture search technology, which facilitates determination of an optional scheme of the fusion.
  • the determination of the scheme of the fusion is not limited herein.
  • the classifier module 24 is capable of being trained according to the first fused vector input.
  • the photographing mechanism 40 is capable of acquiring a plurality of pictures of an object 80.
  • the photographing mechanism 40 includes a camera 41 and a stand 42.
  • the camera 41 is movably connected to the stand 42.
  • the system further includes a driving mechanism 50, capable of driving the camera 41 to move relative to the stand 42.
  • the processor 20 is capable of outputting a set of control signals according to the view angles of the plurality of synthesized images.
  • the driving mechanism 50 is capable of controlling movements of the camera 41 according to the control signals to acquire the plurality of pictures respectively having the same view angles as the at least a portion of the plurality of synthesized images. Accordingly, photographing positions and angles of the camera 41 may be controlled according to the view angles of the synthesized images, which saves manpower. In this case, one camera 41 needs to capture the plurality of pictures by changing positions and angles. However, in other exemplary embodiments, a plurality of cameras 41 may be deployed. In this way, the time for acquiring the pictures may be saved.
  • the characteristic extracting module 22 is capable of respectively extracting eigenvectors of the plurality of pictures.
  • the fusing module 23 is capable of generating a second fused vector by fusing the eigenvectors of the plurality of pictures.
  • the trained classifier module 24 is capable of obtaining a classification result of the object according to the second fused vector input.
  • the plurality of pictures have the same view angles as a portion of the plurality of synthesized images. That is, the quantity of pictures is less than the quantity of synthesized images.
  • the classifier has been trained by using 5 synthesized images (for example, a front view, a rear view, a plan view, a bottom view, and a three-dimensional view) , but during photographing for the object, the same quantity of pictures having the same view angles fail to be acquired since the cameras fail to be deployed due to, for example, restriction of space, and instead only a portion of pictures having the same view angles are acquired, for example, 3 pictures (for example, a front view, a rear view, and a three-dimensional view) .
  • the scheme of the fusion does not need to be re-determined, but the first fused vector only needs to be generated by re-fusion according to the original fusion scheme, and the classifier needs to be re-trained according to the re-generated first fused vector.
  • the fusion module 23 is capable of generating the first fused vector by fusing the extracted eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures, and generating the second fused vector by fusing the extracted eigenvectors of the plurality of pictures. Since the quantity of vectors input during the fusion is changed, the fusing module 23 needs to re-determine the scheme of the fusion, and generate the first fused vector only by re-fusion according to the new fusion scheme, and the classifier module 24 needs to be re-trained according to the re-generated first fused vector.
  • the characteristic extracting module 22 is capable of domain-randomizing the plurality of synthesized images, and respectively extracting the eigenvectors of the plurality of synthesized images.
  • the characteristic extracting module 22 is capable of domain-randomizing the plurality of pictures, and respectively extracting the eigenvectors of the plurality of pictures.
  • domain-randomization known characteristics (for example, environment of the object, color of the object, and the like) that may not be used to differentiate objects may be excluded in practice. In this way, the accuracy and efficiency of the method for identifying the object are improved.
  • FIG. 5 schematically illustrates an operating process of the system for identifying the object as illustrated in FIG. 4, which is not intended to limit the present disclosure.
  • a three-dimensional digital model M is input into the image generating module 21, the image generating module 21 is capable of generating a synthesized image S1, a synthesized image S2, and a synthesized image S3 according to the three-dimensional digital model M.
  • the synthesized image S1, the synthesized image S2, and the synthesized image S3 are input into the characteristic extracting module 22, and the characteristic extracting module 22 extracts an eigenvector Sv1, an eigenvector Sv2, and an eigenvector Sv3.
  • the photographing mechanism 40 acquires a picture P1, a picture P2, and a picture P3 by photographing the object 80.
  • the picture P1 has the same view angle as the synthesized image S1
  • the picture P2 has the same view angle as the synthesized image S2
  • the picture P3 has the same view angle as the synthesized image S3.
  • the picture P1, the picture P2, and the picture P3 are input into the characteristic extracting module 22, and the characteristic extracting module 22 extracts an eigenvector Pv1, an eigenvector Pv2, and an eigenvector Pv3.
  • the characteristic extracting module 22 includes a plurality of convolutional neural networks, that is, a CNN 1, a CNN 2, and a CNN 3, which are configured to respectively process different synthesized images to obtain corresponding eigenvectors.
  • the plurality of CNNs may have the same parameter or different parameters.
  • a plurality of synthesized images have different view angles, and correspondingly, a plurality of pictures also have different view angles.
  • the processor is capable of controlling the photographing mechanism or the image generating module such that the plurality of pictures respectively have same view angles as at least a portion of the plurality of synthesized images. In this way, interference caused due to different angles is reduced.
  • the system achieves a high identification accuracy.
  • FIG. 7 is a schematic structural diagram of a system for identifying an object according to another exemplary embodiment of the present disclosure.
  • the photographing mechanism 40 includes a plurality of cameras 41.
  • the quantity of cameras is consistent with the quantity of pictures to be acquired.
  • the system further includes a position sensing unit 60.
  • the position sensing unit 60 is capable of detecting spatial positions and photographing angles of the plurality of cameras 41 and generating a set of view angle signals according to the spatial positions and the photographing angles of the plurality of cameras 41.
  • the processor 20 is capable of determining determine parameters for generating the plurality of synthesized images according to the view angle signals, such that the plurality of pictures respectively have the same view angles as the at least a portion of the plurality of pictures. In this way, the parameters for generating the plurality of synthesized images may be automatically determined according to the spatial positions and the photographing angles of the cameras, which saves manpower.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

A method for identifying an object is disclosed. The method includes: generating a plurality of synthesized images according to a three-dimensional digital model, the plurality of synthesized images having different view angles; respectively extracting eigenvectors of the plurality of synthesized images; generating a first fused vector by fusing the eigenvectors of the plurality of synthesized images; inputting the first fused vector into a classifier to train the classifier; acquiring a plurality of pictures of the object, the plurality of pictures respectively having same view angles as at least a portion of the plurality of synthesized images; respectively extracting eigenvectors of the plurality of pictures; generating a second fused vector by fusing the eigenvectors of the plurality of pictures; and inputting the second fused vector into the trained classifier to obtain a classification result of the object.

Description

METHOD AND SYSTEM FOR IDENTIFYING OBJECTS TECHNICAL FIELD
The present disclosure relates to the technical field of computer vision.
BACKGROUND
Object identification pertains to the technical field of computer vision, and is mainly intended to identify objects in images. At present, mainstream methods for identifying objects are those based on 2D real image training and prediction, or those use three-dimensional digital model as an auxiliary recognition method. However, the conventional methods for identifying objects based on the three-dimensional model data fail to satisfy the requirements on workpiece classification in factories, in terms of accuracy.
SUMMARY
The present disclosure is intended to provide a method for identifying an object, which achieves a high identification accuracy.
The present disclosure is further intended to provide a system for identifying an object, which achieves high identification accuracy.
The present disclosure is further intended to provide a computer-readable storage medium which executes the code stored thereon and can identify the object accurately.
The method for identifying the object includes:
generating a plurality of synthesized images according to a three-dimensional digital model, the plurality of synthesized images having different view angles;
respectively extracting eigenvectors of the plurality of synthesized images;
generating a first fused vector by fusing the eigenvectors of the plurality of  synthesized images;
inputting the first fused vector into a classifier to train the classifier;
acquiring a plurality of pictures of the object, the plurality of pictures respectively having same view angles as at least a portion of the plurality of synthesized images;
respectively extracting eigenvectors of the plurality of pictures;
generating a second fused vector by fusing the eigenvectors of the plurality of pictures; and
inputting the second fused vector into the trained classifier to obtain a classification result of the object.
In the method for identifying the object, a plurality of synthesized images have different view angles, and correspondingly, a plurality of pictures also have different view angles. In this way, more characteristics may be embodied. The plurality of pictures respectively have the same view angles as at least a portion of the plurality of synthesized images, such that the interference caused by different angles is reduced. The method achieves a high identification accuracy.
In an optional embodiment of the method, camera parameters for acquiring the plurality of pictures are determined according to the view angles of the plurality of synthesized images, or software parameters for generating the plurality of synthesized images are determined according to the plurality of pictures, such that the plurality of pictures respectively have same view angles as at least a portion of the plurality of synthesized images.
In another exemplary embodiment of the method, in the case that the plurality of pictures respectively have the same view angles as all the plurality of synthesized images, the first fused vector is generated by fusing the extracted eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures, and the second fused vector is generated by fusing the extracted eigenvectors of the plurality of pictures.
In still another exemplary embodiment of the method, in the case that the plurality of pictures respectively have the same view angles as the at least a portion of the plurality of synthesized images, the second fused vector is generated by fusing the extracted eigenvectors of the plurality of pictures in combination with auxiliary vectors, wherein a total quantity of the eigenvectors of the plurality of pictures and the auxiliary vectors is equal to a quantity of the  synthesized images; and the first fused vector is generated by fusing the extracted eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures in combination with the auxiliary vectors, wherein a total quantity of the eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures and the auxiliary vectors is equal to the quantity of the synthesized images; or in the case that the plurality of pictures respectively have the same view angles as the at least a portion of the plurality of synthesized images, the first fused vector is generated by fusing the extracted eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures, and the second fused vector is generated by fusing the extracted eigenvectors of the plurality of pictures.
In still another exemplary embodiment of the method, the plurality of synthesized images are generated by CAD software according to the three-dimensional digital model.
In still another exemplary embodiment of the method, the eigenvectors of the plurality of synthesized vectors and the eigenvectors of the plurality of pictures are respectively extracted by CNN. The classifier includes a classifier based on deep learning.
In still another exemplary embodiment of the method, a scheme of the fusion is determined based on an AutoML technology or a neural architecture search technology.
In still another exemplary embodiment of the method, the plurality of synthesized images are domain-randomized, and the eigenvectors of the plurality of synthesized images are respectively extracted; and the plurality of synthesized pictures are domain-randomized, and the eigenvectors of the plurality of pictures are respectively extracted
The system for identifying the object includes a processor and a photographing mechanism. The processor includes an image generating module, a characteristic extracting module, a fusing module, and a classifier module. The image generating module is configured to generate a plurality of synthesized images according to a three-dimensional digital model. The plurality of synthesized images have different view angles. The characteristic extracting module is configured to respectively extract eigenvectors of the plurality of synthesized image. The fusing module is configured to generate a first fused vector by fusing the eigenvectors of the plurality of synthesized images. The classifier module is configured to be trained according to the first fused vector input. The photographing mechanism is configured to acquire a plurality of  pictures. The processor is configured to control the photographing mechanism or the image generating module such that the plurality of pictures respectively have same view angles as at least a portion of the plurality of synthesized images. The characteristic extracting module is further configured to respectively extract eigenvectors of the plurality of pictures. The fusing module is further configured to generate a second fused vector by fusing the eigenvectors of the plurality of pictures. The trained classifier module is configured to obtain a classification result of the object according to the second fused vector input.
In the system for identifying the object, a plurality of synthesized images have different view angles, and correspondingly, a plurality of pictures also have different view angles. In this way, more characteristics may be embodied. The processor is capable of controlling the photographing mechanism or the image generating module such that the plurality of pictures respectively have same view angles as at least a portion of the plurality of synthesized images. In this way, interference caused due to different angles is reduced. The system achieves a high identification accuracy.
In an exemplary embodiment of the system, the photographing mechanism includes a camera and a stand. The camera is movably connected to the stand. The system further includes a driving mechanism, configured to drive the camera to move relative to the stand. The processor is further configured to output a set of control signals according to the view angles of the plurality of synthesized images. The driving mechanism is further configured to control movements of the camera according to the control signals to acquire the plurality of pictures respectively having the same view angles as the at least a portion of the plurality of synthesized images.
In another exemplary embodiment of the system, the photographing mechanism includes a plurality of cameras. The system further includes a position sensing unit. The position sensing unit is configured to detect spatial positions and photographing angles of the plurality of cameras and generate a set of view angle signals according to the spatial positions and the photographing angles of the plurality of cameras. The processor is further configured to determine parameters for generating the plurality of synthesized images according to the view angle signals, such that the plurality of pictures respectively have the same view angles as the at least a portion of the plurality of pictures.
In another exemplary embodiment of the system, in the case that the plurality of pictures respectively have the same view angles as all the plurality of synthesized images, the fusing module is further configured to generate the first fused vector by fusing the extracted eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures, and generate the second fused vector by fusing the extracted eigenvectors of the plurality of pictures.
In still another exemplary embodiment of the system, in the case that the plurality of pictures respectively have the same view angles as the at least a portion of the plurality of synthesized images, the fusing module is further configured to generate the second fused vector by fusing the extracted eigenvectors of the plurality of pictures in combination with auxiliary vectors, wherein a total quantity of the eigenvectors of the plurality of pictures and the auxiliary vectors is equal to a quantity of the synthesized images; and generate the first fused vector by fusing the extracted eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures in combination with the auxiliary vectors, wherein a total quantity of the eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures and the auxiliary vectors is equal to the quantity of the synthesized images; or in the case that the plurality of pictures respectively have the same view angles as the at least a portion of the plurality of synthesized images, the fusing module is further configured to generate the first fused vector by fusing the extracted eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures, and the second fused vector is generated by fusing the extracted eigenvectors of the plurality of pictures.
In still another exemplary embodiment of the system, the image generating module is further configured to generate the plurality of synthesized images by CAD software according to the three-dimensional digital model.
In still another exemplary embodiment of the method, the characteristic extracting module is further configured to respectively extract the eigenvectors of the plurality of synthesized vectors and the eigenvectors of the plurality of pictures by CNN. The classifier module includes a classifier module based on deep learning.
In still another exemplary embodiment of the system, the fusing module is further configured to determine a scheme of the fusion based on an AutoML technology or a neural  architecture search technology.
In still another exemplary embodiment of the system, the characteristic extracting module is further configured to domain-randomize the plurality of synthesized images, and respectively extract the eigenvectors of the plurality of synthesized images. The characteristic extracting module is further configured to domain-randomize the plurality of pictures, and respectively extract the eigenvectors of the plurality of pictures.
The present disclosure is further intended to provide a computer-readable storage medium which store the code for use by the system and can identify the object accurately. The system executes the above method when the code is executed by the processor.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings are merely for schematic and illustrative description and demonstration of the present disclosure, instead of limiting the scope of the present disclosure.
FIG. 1 is a flowchart of a method for identifying an object according to an exemplary embodiment of the present disclosure;
FIG. 2 is a flowchart of step S11 in the method for identifying the object as illustrated in FIG. 1;
FIG. 3 is a flowchart of a method for identifying an object according to another exemplary embodiment of the present disclosure;
FIG. 4 is a schematic structural diagram of a system for identifying an object according to an exemplary embodiment of the present disclosure;
FIG. 5 schematically illustrates an operating process of the system for identifying the object as illustrated in FIG. 4;
FIG. 6 illustrates an exemplary embodiment of a characteristic extracting module; and
FIG. 7 is a schematic structural diagram of a system for identifying an object according to another exemplary embodiment of the present disclosure.
Reference numerals and denotations thereof:
20-Processor
21-Image generating module
22-Character extracting module
23-Fusing module
24-Classifier module
40-Photographing mechanism
41-Camera
42-Stand
50-Driving mechanism
60-Position sensing unit
80-Object
M-Three-dimensional digital model
S1, S2, S3-Synthesized images
Sv1, Sv2, Sv3-Eigenvectors of the synthesized images
Fv1-First fused vector
P1, P2, P3-Pictures
Pv1, Pv2, Pv3-Eigenvectors of the pictures
Fv2-Second fused vector
CNN1, CNN2, CNN3-Convolutional neural networks
R-Classification result
DETAILED DESCRIPTION
For clearer descriptions of the technical features, objectives, and the technical effects of the present disclosure, the specific embodiments of the present disclosure are hereinafter described with reference to the accompanying drawings. In the drawings, like reference numerals denote elements having the same structure or having the similar structure but the same function.
In this text, the term "exemplary" is used herein to mean "serving as an example, instance, or illustration" , and any illustration or embodiment described herein as "exemplary" shall not be necessarily construed as preferred or advantageous over other illustrations or embodiment.
In this text, the terms "first" , "second" , and the like do not represent degrees of  importance or a sequence, but only for differentiation, and for ease of description.
For brevity, parts relevant to the present invention are merely illustrated in the drawings, and these parts do not denote the actual structure of the product.
FIG. 1 is a flowchart of a method for identifying an object according to an exemplary embodiment of the present disclosure. As illustrated in FIG. 1, the method according to this exemplary embodiment includes the following steps, wherein the sequence of the steps is not limited to the following:
S11: A plurality of synthesized images are generated according to a three-dimensional digital model. The plurality of synthesized images have different view angles, and preferably have a plurality of view angles that represent more characteristics. In this text, the term "plurality of" is interpreted as "at least two" .
In an exemplary embodiment of the method, the plurality of synthesized images are generated by computer aided design (CAD) software according to the three-dimensional digital model. The CAD software may be, for example, AutoCAD developed by Autodesk. In other exemplary embodiments, other software capable of generating the synthesized images according to the three-dimensional digital model may also be used, for example, Unigraphicx NX (UG) developed by Siemens PLM Software.
Specifically, taking the CAD software as an example, as illustrated in FIG. 2, step S11, for example, includes the following sub-steps that are performed in sequence in the CAD software:
S111: The three-dimensional digital model is rendered.
S112: A plurality of virtual cameras are added and camera parameters of these virtual cameras are set. The quantity of virtual cameras is consistent with the quantity of synthesized images to be generated, and the camera parameters of the virtual cameras determine the view angles of the synthesized images.
S113: Images are captured by the virtual cameras to obtain the synthesized images.
S12: Eigenvectors of the plurality of synthesized images are respectively extracted. In an exemplary embodiment, for example, the eigenvectors of the plurality of synthesized images are respectively extracted by a convolutional neural network (CNN) . However, in other exemplary embodiments, the eigenvectors of the plurality of synthesized images may also be  extracted in other fashions.
The convolutional neural network is a feedforward neural network involving convolutional computation and having a deep structure, and one of representative algorithms of deeping learning. The convolutional neural network has capabilities of characterization learning, and is capable of performing translation invariant classification for input information according to a hierarchical structure thereof, which is thus referred to as a "translation invariant artificial neural network" . The CNN facilitates extraction of eigenvectors of key characteristics, to further improve the accuracy of the method for identifying the object.
S13: A first fused vector is generated by fusing the eigenvectors of the plurality of synthesized images. In an exemplary embodiment, for example, a scheme of the fusion is determined by the automated machine learning (AutoML) technology or the neural architecture search technology, which facilitates determination of an optional scheme of the fusion. However, the determination of the scheme of the fusion is not limited herein.
The AutoML incorporates data pre-processing, characteristic selection, algorithm selection and the like steps in machine learning with model architecture design, model training and the likes steps in deep learning, and deploys the same in a "black box" . By the "black box" , a desired prediction result may be obtained as long as the data is input.
The method of "designing one neural network by using another neural network" is referred to as the neural architecture search (NAS) technology, and generally, this method designs a new neural network by using reinforcement learning or evolutionary algorithm. The NAS may automatize architecture engineering, and the NAS is capable of automatically obtaining an optimal architecture as long as a data set is provided.
S14: The first fused vector is input into a classifier to train the classifier. In an exemplary embodiment, the classifier includes a classifier based on deep learning.
S21: A plurality of pictures of the object (for example, a workpiece) are acquired by the cameras. The plurality of pictures respectively have same view angles as at least a portion of the plurality of synthesized images. That is, the quantity of pictures is less than the quantity of synthesized images. If the quantity of synthesized images is 5, the quantity of pictures may be, for example, 5 or 3. The plurality of synthesized images includes synthesized images having the same view angles as the pictures.
S22: Eigenvectors of the plurality of pictures are respectively extracted. In an exemplary embodiment, for example, the eigenvectors of the plurality of pictures are respectively extracted by a CNN.
S23: A second fused vector is generated by fusing the eigenvectors of the plurality of pictures. In an exemplary embodiment, for example, the scheme of the fusion is determined based on the AutoML technology or the neural architecture search technology. The scheme of the fusion used in this step is the same as the scheme of the fusion used in step S13.
S30: The second fused vector is input into the classifier trained in step S14, to obtain a classification result of the object.
In an exemplary embodiment, for example, the plurality of synthesized images may be generally first, and then the camera parameters (for example, the positions and the angles) for acquiring the plurality of pictures may be determined according to the view angles of the plurality of synthesized images, such that the plurality of pictures respectively have the same view angles as the at least a portion of the plurality of synthesized images. Nevertheless, the plurality of pictures of the object may be acquired first, and then the camera parameters for generating the plurality of synthesized images may be determined according to the view angles of the plurality of pictures.
In an exemplary embodiment, in the case that the plurality of pictures respectively have the same view angles as all the plurality of synthesized images (that is, the quantity of pictures is the same as the quantity of synthesized images, and the view angles thereof are in an one-to-one correspondence) , in step S13, the first fused vector is generated by fusing the extracted eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures, and in step S23, the second fused vector is generated by fusing the extracted eigenvectors of the plurality of pictures.
Nevertheless, it is likely that the plurality of pictures have the same view angles as a portion of the plurality of synthesized images. That is, the quantity of pictures is less than the quantity of synthesized images. This case occurs, for example, in the scenario where the classifier has been trained by using 5 synthesized images (for example, a front view, a rear view, a plan view, a bottom view, and a three-dimensional view) , but during photographing for the object, the same quantity of pictures having the same view angles fail to be acquired due to, for  example, restriction of space, and instead only a portion of pictures having the same view angles are acquired, for example, 3 pictures (for example, a front view, a rear view, and a three-dimensional view) . Then, in step S23, the second fused vector is generated by fusing the extracted eigenvectors of the plurality of pictures in combination with the auxiliary vectors (vectors having a modulus of 1) , wherein a total quantity of the eigenvectors of the plurality of pictures and the auxiliary vectors is equal to the quantity of the synthesized images. In step S13, the first fused vector may be generated by fusing the extracted eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures in combination with the auxiliary vectors, wherein a total quantity of the eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures and the auxiliary vectors is equal to the quantity of the synthesized images. For example, if the quantity of synthesized images is 5 and the quantity of pictures is 3, the quantity of auxiliary vectors desired in the above two steps is 2. In this method, the scheme of the fusion does not need to be re-determined, but the first fused vector only needs to be generated by re-fusion according to the original fusion scheme, and the classifier needs to be re-trained according to the re-generated first fused vector. In this exemplary embodiment, the auxiliary vector is, for example, a unit vector (that is, a vector with a modulus equal to 1) or a zero vector.
However, in other exemplary embodiments, in step S13, the first fused vector may be generated by fusing the extracted eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures, and in step S23, the second fused vector is generated by fusing the extracted eigenvectors of the plurality of pictures. Since the quantity of vectors input during the fusion is changed, the scheme of the fusion needs to be re-determined, the first fused vector only needs to be generated by re-fusion according to the new fusion scheme, and the classifier needs to be re-trained according to the re-generated first fused vector.
In the method for identifying the object, a plurality of synthesized images have different view angles, and correspondingly, a plurality of pictures also have different view angles. In this way, more characteristics may be embodied. The plurality of pictures respectively have same view angles as at least a portion of the plurality of synthesized images. In this way, interference caused due to different angles is reduced. The method achieves a high identification accuracy.
FIG. 3 is a flowchart of a method for identifying an object according to another exemplary embodiment of the present disclosure. The common points between the method for identifying the object according to this exemplary embodiment and the method for identifying the object as illustrated in FIG. 1 are not described herein any further, and the differences between these two methods are described hereinafter. In an exemplary embodiment, upon completion of step S11, step S15 is first performed to domain-randomize the plurality of synthesized images, and then step S12 is performed. Upon completion of step S21, step S24 is first performed to domain-randomize the plurality of pictures, and then step S22 is performed. By domain-randomization, known characteristics (for example, environment of the object, color of the object, and the like) that may not be used to differentiate objects may be excluded in practice. In this way, the accuracy and efficiency of the method for identifying the object are improved.
The present disclosure further provides a system for identifying an object. FIG. 4 is a schematic structural diagram of a system for identifying an object according to an exemplary embodiment of the present disclosure. As illustrated in FIG. 4, the system for identifying the object includes a processor 20 and a photographing mechanism 40. The processor 20 includes an image generating module 21, a characteristic extracting module 22, a fusing module 23, and a classifier module 24.
The image generating module 21 is capable of generating a plurality of synthesized images according to a three-dimensional digital model. The plurality of synthesized images have different view angles. In an exemplary embodiment, the image generating module 21 generates the plurality of synthesized images, for example, by computer aided design (CAD) software according to the three-dimensional digital model.
The characteristic extracting module 22 is configured to respectively extract eigenvectors of the plurality of synthesized image. In an exemplary embodiment, the characteristic extracting module 22, for example, respectively extracts the eigenvectors of the plurality of synthesized images by a CNN. However, in other exemplary embodiments, the characteristic extracting module 22 may also extract the eigenvectors of the plurality of synthesized images by using other algorithms.
The fusing module 23 is capable of generating a first fused vector by fusing the  eigenvectors of the plurality of synthesized images. In an exemplary embodiment, the fusing module 23, for example, determines a scheme of the fusion based on the AutoML technology or the neural architecture search technology, which facilitates determination of an optional scheme of the fusion. However, the determination of the scheme of the fusion is not limited herein.
The classifier module 24 is capable of being trained according to the first fused vector input. In an exemplary embodiment, the classifier module 24, for example, includes a classifier module 24 based on deep learning, which is not limited herein.
The photographing mechanism 40 is capable of acquiring a plurality of pictures of an object 80. In an exemplary embodiment, the photographing mechanism 40 includes a camera 41 and a stand 42. The camera 41 is movably connected to the stand 42. The system further includes a driving mechanism 50, capable of driving the camera 41 to move relative to the stand 42. The processor 20 is capable of outputting a set of control signals according to the view angles of the plurality of synthesized images. The driving mechanism 50 is capable of controlling movements of the camera 41 according to the control signals to acquire the plurality of pictures respectively having the same view angles as the at least a portion of the plurality of synthesized images. Accordingly, photographing positions and angles of the camera 41 may be controlled according to the view angles of the synthesized images, which saves manpower. In this case, one camera 41 needs to capture the plurality of pictures by changing positions and angles. However, in other exemplary embodiments, a plurality of cameras 41 may be deployed. In this way, the time for acquiring the pictures may be saved.
The characteristic extracting module 22 is capable of respectively extracting eigenvectors of the plurality of pictures. The fusing module 23 is capable of generating a second fused vector by fusing the eigenvectors of the plurality of pictures. The trained classifier module 24 is capable of obtaining a classification result of the object according to the second fused vector input.
In an exemplary embodiment, in the case that the plurality of pictures respectively have the same view angles as all the plurality of synthesized images (that is, the quantity of pictures is the same as the quantity of synthesized images, and the view angles thereof are in an one-to-one correspondence) , the fusing module 23 is capable of generating the first fused vector by fusing the extracted eigenvectors of the plurality of synthesized images having the same view  angles as the plurality of pictures, and generating the second fused vector by fusing the extracted eigenvectors of the plurality of pictures.
Nevertheless, it is likely that the plurality of pictures have the same view angles as a portion of the plurality of synthesized images. That is, the quantity of pictures is less than the quantity of synthesized images. This case occurs, for example, in the scenario where the classifier has been trained by using 5 synthesized images (for example, a front view, a rear view, a plan view, a bottom view, and a three-dimensional view) , but during photographing for the object, the same quantity of pictures having the same view angles fail to be acquired since the cameras fail to be deployed due to, for example, restriction of space, and instead only a portion of pictures having the same view angles are acquired, for example, 3 pictures (for example, a front view, a rear view, and a three-dimensional view) . In this case, the fusing module 23 is capable of generating the second fused vector by fusing the extracted eigenvectors of the plurality of pictures in combination with auxiliary vectors (that is, vectors having a modulus of 1) , wherein a total quantity of the eigenvectors of the plurality of pictures and the auxiliary vectors is equal to a quantity of the synthesized images; and generating the first fused vector by fusing the extracted eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures in combination with the auxiliary vectors, wherein a total quantity of the eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures and the auxiliary vectors is equal to the quantity of the synthesized images. For example, if the quantity of synthesized images is 5 and the quantity of pictures is 3, the quantity of auxiliary vectors desired in the above two steps is 2. Accordingly, the scheme of the fusion does not need to be re-determined, but the first fused vector only needs to be generated by re-fusion according to the original fusion scheme, and the classifier needs to be re-trained according to the re-generated first fused vector.
However, in other exemplary embodiments, the fusion module 23, for example, is capable of generating the first fused vector by fusing the extracted eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures, and generating the second fused vector by fusing the extracted eigenvectors of the plurality of pictures. Since the quantity of vectors input during the fusion is changed, the fusing module 23 needs to re-determine the scheme of the fusion, and generate the first fused vector only by re-fusion  according to the new fusion scheme, and the classifier module 24 needs to be re-trained according to the re-generated first fused vector.
In an exemplary embodiment, the characteristic extracting module 22 is capable of domain-randomizing the plurality of synthesized images, and respectively extracting the eigenvectors of the plurality of synthesized images. The characteristic extracting module 22 is capable of domain-randomizing the plurality of pictures, and respectively extracting the eigenvectors of the plurality of pictures. By domain-randomization, known characteristics (for example, environment of the object, color of the object, and the like) that may not be used to differentiate objects may be excluded in practice. In this way, the accuracy and efficiency of the method for identifying the object are improved.
FIG. 5 schematically illustrates an operating process of the system for identifying the object as illustrated in FIG. 4, which is not intended to limit the present disclosure. As illustrated in FIG. 5, a three-dimensional digital model M is input into the image generating module 21, the image generating module 21 is capable of generating a synthesized image S1, a synthesized image S2, and a synthesized image S3 according to the three-dimensional digital model M. The synthesized image S1, the synthesized image S2, and the synthesized image S3 are input into the characteristic extracting module 22, and the characteristic extracting module 22 extracts an eigenvector Sv1, an eigenvector Sv2, and an eigenvector Sv3. The eigenvector Sv1, the eigenvector Sv2, and the eigenvector Sv3 are input into the fusing module 23, and the fusing module 23 generates a first fused vector Fv1 by fusing the eigenvector Sv1, the eigenvector Sv2, and the eigenvector Sv3. The first fused vector Fv1 is input into the classifier module 24 for training.
As illustrated in FIG. 5, the photographing mechanism 40 acquires a picture P1, a picture P2, and a picture P3 by photographing the object 80. The picture P1 has the same view angle as the synthesized image S1, the picture P2 has the same view angle as the synthesized image S2, and the picture P3 has the same view angle as the synthesized image S3. The picture P1, the picture P2, and the picture P3 are input into the characteristic extracting module 22, and the characteristic extracting module 22 extracts an eigenvector Pv1, an eigenvector Pv2, and an eigenvector Pv3. The eigenvector Pv1, the eigenvector Pv2, and the eigenvector Pv3 are input into the fusing module 23, and the fusing module 23 generates a second fused vector Fv2 by  fusing the eigenvector Pv1, the eigenvector Pv2, and the eigenvector Pv3. The second fused vector Fv2 is input into the classifier module 24 to obtain a classification result R.
As illustrated in FIG. 6, in an exemplary embodiment, the characteristic extracting module 22, for example, includes a plurality of convolutional neural networks, that is, a CNN 1, a CNN 2, and a CNN 3, which are configured to respectively process different synthesized images to obtain corresponding eigenvectors. The plurality of CNNs may have the same parameter or different parameters. The fusing module 23, for example, implements the fusion by fusing the networks.
In the system for identifying the object, a plurality of synthesized images have different view angles, and correspondingly, a plurality of pictures also have different view angles. In this way, more characteristics may be embodied. The processor is capable of controlling the photographing mechanism or the image generating module such that the plurality of pictures respectively have same view angles as at least a portion of the plurality of synthesized images. In this way, interference caused due to different angles is reduced. The system achieves a high identification accuracy.
FIG. 7 is a schematic structural diagram of a system for identifying an object according to another exemplary embodiment of the present disclosure. The common points between the system for identifying the object as illustrated in FIG. 7 and the system for identifying the object as illustrated in FIG. 4 are not described herein any further, and the differences between these two systems are described hereinafter. In an exemplary embodiment, the photographing mechanism 40 includes a plurality of cameras 41. The quantity of cameras is consistent with the quantity of pictures to be acquired. The system further includes a position sensing unit 60. The position sensing unit 60 is capable of detecting spatial positions and photographing angles of the plurality of cameras 41 and generating a set of view angle signals according to the spatial positions and the photographing angles of the plurality of cameras 41. The processor 20 is capable of determining determine parameters for generating the plurality of synthesized images according to the view angle signals, such that the plurality of pictures respectively have the same view angles as the at least a portion of the plurality of pictures. In this way, the parameters for generating the plurality of synthesized images may be automatically determined according to the spatial positions and the photographing angles of the cameras,  which saves manpower.
It should be understood that, although this specification is described based on the embodiments, not each of the embodiments discloses an independent technical solution. Such description manner of the specification is only for clarity. A person skilled in the art should consider the specification as an entirety. The technical solutions according to the embodiments may also be suitably combined to derive other embodiments that may be understood by a person skilled in the art.
A series of detailed descriptions given in this specifically are merely intended to illustrate feasible embodiments of the present disclosure, instead of limiting the protection scope of the present disclosure. Any equivalent embodiments or modifications, for example, combinations, segmentations, or repetition of features, derived without departing from the spirit of the present disclosure shall fall within the protection scope of the present disclosure.

Claims (18)

  1. A method for identifying an object, comprising:
    generating a plurality of synthesized images according to a three-dimensional digital model, the plurality of synthesized images having different view angles;
    respectively extracting eigenvectors of the plurality of synthesized images;
    generating a first fused vector by fusing the eigenvectors of the plurality of synthesized images;
    inputting the first fused vector into a classifier to train the classifier;
    acquiring a plurality of pictures of the object, the plurality of pictures respectively having same view angles as at least a portion of the plurality of synthesized images;
    respectively extracting eigenvectors of the plurality of pictures;
    generating a second fused vector by fusing the eigenvectors of the plurality of pictures; and
    inputting the second fused vector into the trained classifier to obtain a classification result of the object.
  2. The method according to claim 1, wherein camera parameters for acquiring the plurality of pictures are determined according to the view angles of the plurality of synthesized images, or software parameters for generating the plurality of synthesized images are determined according to the plurality of pictures, such that the plurality of pictures respectively have same view angles as at least a portion of the plurality of synthesized images.
  3. The method according to claim 1, wherein in the case that the plurality of pictures respectively have the same view angles as the at least a portion of the plurality of synthesized images, the first fused vector is generated by fusing the extracted eigenvectors of the plurality of synthesized images, and the second fused vector is generated by fusing the extracted eigenvectors of the plurality of pictures.
  4. The method according to claim 1, wherein in the case that the plurality of pictures respectively have the same view angles as the at least a portion of the plurality of synthesized images, the second fused vector is generated by fusing the extracted eigenvectors of the plurality  of pictures in combination with auxiliary vectors, wherein a total quantity of the eigenvectors of the plurality of pictures and the auxiliary vectors is equal to a quantity of the synthesized images; and the first fused vector is generated by fusing the extracted eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures in combination with the auxiliary vectors, wherein a total quantity of the eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures and the auxiliary vectors is equal to the quantity of the synthesized images; or
    in the case that the plurality of pictures respectively have the same view angles as the at least a portion of the plurality of synthesized images, the first fused vector is generated by fusing the extracted eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures, and the second fused vector is generated by fusing the extracted eigenvectors of the plurality of pictures.
  5. The method according to claim 1, wherein the plurality of synthesized images are generated by CAD software according to the three-dimensional digital model.
  6. The method according to claim 1, wherein the eigenvectors of the plurality of synthesized vectors and the eigenvectors of the plurality of pictures are respectively extracted by CNN, and the classifier comprises a classifier based on deep learning.
  7. The method according to claim 1, wherein a scheme of the fusion is determined based on an AutoML technology or a neural architecture search technology.
  8. The method according to claim 1, wherein the plurality of synthesized images are domain-randomized, and the eigenvectors of the plurality of synthesized images are respectively extracted; and the plurality of synthesized pictures are domain-randomized, and the eigenvectors of the plurality of pictures are respectively extracted.
  9. A system for identifying an object, comprising:
    a processor (20) , comprising:
    an image generating module (21) , configured to generate a plurality of synthesized images according to a three-dimensional digital model, the plurality of synthesized images having different view angles;
    a characteristic extracting module (22) , configured to respectively extract eigenvectors of the plurality of synthesized images;
    a fusing module (23) , configured to generate a first fused vector by fusing the eigenvectors of the plurality of synthesized images; and
    a classifier module (24) , configured to be trained according to the first fused vector input; and
    a photographing mechanism (40) , configured to acquire a plurality of images of the object; wherein the processor (20) is configured to control the photographing mechanism (40) or the image generating module (21) such that the plurality of pictures respectively have same view angles as at least a portion of the plurality of synthesized images, the characteristic extracting module (22) is further configured to respectively extract eigenvectors of the plurality of pictures, the fusing module (23) is further configured to generate a second fused vector by fusing the eigenvectors of the plurality of pictures, and the trained classifier module (24) is configured to obtain a classification result of the object according to the second fused vector input.
  10. The system according to claim 9, wherein the photographing mechanism (40) comprises a camera (41) and a stand (42) , the camera (41) being movably connected to the stand (42) ; and the system further comprises a driving mechanism (50) , configured to drive the camera (41) to move relative to the stand (42) , wherein the processor (20) is further configured to output a set of control signals according to the view angles of the plurality of synthesized images, and the driving mechanism (50) is further configured to control movements of the camera (41) according to the control signals to acquire the plurality of pictures respectively having the same view angles as the at least a portion of the plurality of synthesized images.
  11. The system according to claim 9, wherein the photographing mechanism (40) comprises a plurality of cameras (41) ; and the system further comprises a position sensing unit (60) , the position sensing unit (60) being configured to detect spatial positions and photographing angles of the plurality of cameras (41) and generate a set of view angle signals according to the spatial positions and the photographing angles of the plurality of cameras (41) , wherein the processor (20) is further configured to determine parameters for generating the plurality of synthesized images according to the view angle signals, such that the plurality of pictures respectively have the same view angles as the at least a portion of the plurality of pictures.
  12. The system according to claim 9, wherein in the case that the plurality of pictures respectively have the same view angles as the at least a portion of the plurality of synthesized  images, the fusing module (23) is further configured to generate the first fused vector by fusing the extracted eigenvectors of the plurality of synthesized images, and generate the second fused vector by fusing the extracted eigenvectors of the plurality of pictures.
  13. The system according to claim 9, wherein in the case that the plurality of pictures respectively have the same view angles as the at least a portion of the plurality of synthesized images, the fusing module (23) is further configured to generate the second fused vector by fusing the extracted eigenvectors of the plurality of pictures in combination with auxiliary vectors, wherein a total quantity of the eigenvectors of the plurality of pictures and the auxiliary vectors is equal to a quantity of the synthesized images; and generate the first fused vector by fusing the extracted eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures in combination with the auxiliary vectors, wherein a total quantity of the eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures and the auxiliary vectors is equal to the quantity of the synthesized images; or
    in the case that the plurality of pictures respectively have the same view angles as the at least a portion of the plurality of synthesized images, the fusing module (23) is further configured to generate the first fused vector by fusing the extracted eigenvectors of the plurality of synthesized images having the same view angles as the plurality of pictures, and generate the second fused vector by fusing the extracted eigenvectors of the plurality of pictures.
  14. The system according to claim 9, wherein the image generating module (21) is further configured to generate the plurality of synthesized images by CAD software according to the three-dimensional digital model.
  15. The system according to claim 9, wherein the characteristic extracting module (22) is further configured to respectively extract the eigenvectors of the plurality of synthesized vectors and the eigenvectors of the plurality of pictures by CNN, and the classifier module (24) comprises a classifier (24) based on deep learning.
  16. The system according to claim 9, wherein the fusing module (23) is further configured to determine a scheme of the fusion based on an AutoML technology or a neural architecture search technology.
  17. The system according to claim 9, wherein the characteristic extracting module (22) is  further configured to domain-randomize the plurality of synthesized images, and respectively extract the eigenvectors of the plurality of synthesized images; and domain-randomize the plurality of synthesized images, and respectively extract the eigenvectors of the plurality of pictures.
  18. acomputer-readable storage medium, wherein the computer-readable storage medium store code thereon for use by a system; the system performs the method according to any one of the claims 1-8 when the code is executed by a processor.
PCT/CN2020/114844 2020-09-11 2020-09-11 Method and system for identifying objects WO2022052052A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US18/044,443 US20230360380A1 (en) 2020-09-11 2020-09-11 Method and System for Identifying Objects
EP20952840.5A EP4193297A4 (en) 2020-09-11 2020-09-11 Method and system for identifying objects
PCT/CN2020/114844 WO2022052052A1 (en) 2020-09-11 2020-09-11 Method and system for identifying objects
CN202080103768.5A CN116783630A (en) 2020-09-11 2020-09-11 Object recognition method and object recognition system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/114844 WO2022052052A1 (en) 2020-09-11 2020-09-11 Method and system for identifying objects

Publications (1)

Publication Number Publication Date
WO2022052052A1 true WO2022052052A1 (en) 2022-03-17

Family

ID=80630225

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/114844 WO2022052052A1 (en) 2020-09-11 2020-09-11 Method and system for identifying objects

Country Status (4)

Country Link
US (1) US20230360380A1 (en)
EP (1) EP4193297A4 (en)
CN (1) CN116783630A (en)
WO (1) WO2022052052A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017156043A1 (en) * 2016-03-08 2017-09-14 Nant Holdings Ip, Llc Image feature combination for image-based object recognition
CN109446985A (en) * 2018-10-28 2019-03-08 贵州师范学院 Multi-angle plants identification method based on vector neural network
CN110070626A (en) * 2019-03-15 2019-07-30 西安电子科技大学 A kind of three-dimension object search method based on multi-angle of view classification
CN110176064A (en) * 2019-05-24 2019-08-27 武汉大势智慧科技有限公司 A kind of photogrammetric main object automatic identifying method for generating threedimensional model
CN111179440A (en) * 2020-01-02 2020-05-19 哈尔滨工业大学 Three-dimensional object model retrieval method oriented to natural scene

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017156043A1 (en) * 2016-03-08 2017-09-14 Nant Holdings Ip, Llc Image feature combination for image-based object recognition
CN109446985A (en) * 2018-10-28 2019-03-08 贵州师范学院 Multi-angle plants identification method based on vector neural network
CN110070626A (en) * 2019-03-15 2019-07-30 西安电子科技大学 A kind of three-dimension object search method based on multi-angle of view classification
CN110176064A (en) * 2019-05-24 2019-08-27 武汉大势智慧科技有限公司 A kind of photogrammetric main object automatic identifying method for generating threedimensional model
CN111179440A (en) * 2020-01-02 2020-05-19 哈尔滨工业大学 Three-dimensional object model retrieval method oriented to natural scene

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4193297A4 *

Also Published As

Publication number Publication date
US20230360380A1 (en) 2023-11-09
EP4193297A1 (en) 2023-06-14
EP4193297A4 (en) 2024-03-27
CN116783630A (en) 2023-09-19

Similar Documents

Publication Publication Date Title
Zhan et al. Visual odometry revisited: What should be learnt?
Richardson et al. Learning detailed face reconstruction from a single image
Sahin et al. A review on object pose recovery: From 3D bounding box detectors to full 6D pose estimators
Wu et al. Automatic eyeglasses removal from face images
Dou et al. End-to-end 3D face reconstruction with deep neural networks
US10109055B2 (en) Multiple hypotheses segmentation-guided 3D object detection and pose estimation
Zhuang et al. Learning structure-and-motion-aware rolling shutter correction
JP6011102B2 (en) Object posture estimation method
US6757571B1 (en) System and process for bootstrap initialization of vision-based tracking systems
JP6554900B2 (en) Template creation apparatus and template creation method
KR20160096460A (en) Recognition system based on deep learning including a plurality of classfier and control method thereof
WO2023185069A1 (en) Object detection method and apparatus, and computer-readable storage medium and unmanned vehicle
KR102111667B1 (en) Apparatus of generating 2d image data set for deep learning through 3d design drawing data processing and system for searching simillar design drawing data based on deep learning using the same
Loutas et al. Probabilistic multiple face detection and tracking using entropy measures
WO2022052052A1 (en) Method and system for identifying objects
Zhou et al. Sub-depth: Self-distillation and uncertainty boosting self-supervised monocular depth estimation
JP3774495B2 (en) Image information extracting apparatus and method
Luo et al. Alignment and tracking of facial features with component-based active appearance models and optical flow
Xiao et al. 3D-assisted coarse-to-fine extreme-pose facial landmark detection
Kaiser et al. Co-registration of video-grammetric point clouds with BIM–first conceptual results
Zhou et al. Self-distillation and uncertainty boosting self-supervised monocular depth estimation
Burch et al. Convolutional neural networks for real-time eye tracking in interactive applications
Xi et al. Localizing 3-d anatomical landmarks using deep convolutional neural networks
Southey et al. Object discovery through motion, appearance and shape
Breslav et al. 3D pose estimation of bats in the wild

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20952840

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202080103768.5

Country of ref document: CN

ENP Entry into the national phase

Ref document number: 2020952840

Country of ref document: EP

Effective date: 20230308

NENP Non-entry into the national phase

Ref country code: DE