WO2019062619A1 - 对图像内目标物体进行自动标注的方法、装置及系统 - Google Patents

对图像内目标物体进行自动标注的方法、装置及系统 Download PDF

Info

Publication number
WO2019062619A1
WO2019062619A1 PCT/CN2018/106493 CN2018106493W WO2019062619A1 WO 2019062619 A1 WO2019062619 A1 WO 2019062619A1 CN 2018106493 W CN2018106493 W CN 2018106493W WO 2019062619 A1 WO2019062619 A1 WO 2019062619A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
target object
coordinate system
dimensional
information
Prior art date
Application number
PCT/CN2018/106493
Other languages
English (en)
French (fr)
Inventor
李博韧
谢宏伟
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Priority to EP18861929.0A priority Critical patent/EP3690815B1/en
Priority to JP2020516393A priority patent/JP7231306B2/ja
Priority to US16/648,285 priority patent/US11164001B2/en
Publication of WO2019062619A1 publication Critical patent/WO2019062619A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/75Determining position or orientation of objects or cameras using feature-based methods involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • G06T2207/20104Interactive definition of region of interest [ROI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30204Marker
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose

Definitions

  • the present application relates to the field of image processing technologies, and in particular, to a method, device and system for automatically labeling target objects in an image.
  • the labeling of the image training samples is mainly divided into two types, one is based on the annotation of the two-dimensional image, and the other is the three-dimensional image annotation based on the object CAD model.
  • the so-called two-dimensional image annotation mainly refers to the rectangular area where the target object is located in the two-dimensional image, and the process usually needs to be completed by the staff by manual labeling.
  • the staff is required to manually select the location of the target object in each image.
  • the efficiency of manual labeling is very low. In the case of a large number of image training samples, it takes a lot of manpower and time cost to mark.
  • the three-dimensional image annotation based on the object CAD model mainly uses each frame in the pre-captured video as the image training sample, and first obtains the CAD model of the target object.
  • the target object in the video is one In a car
  • model-based tracking can be used to track the target object for batch labeling.
  • the tracking process specifically uses the feature points of the marked target object to identify the location of the target object in other frame images.
  • This annotation method in 3D space is more automated than the two-dimensional image annotation, and can realize the purpose of labeling one frame in the video and automatically labeling the entire video.
  • the automatic labeling itself has a uniform quantitative standard for labeling accuracy, which is more accurate than artificial labeling.
  • its shortcomings are also very obvious.
  • the CAD model of the target object is usually provided by the production or design side of the target object.
  • the CAD model is not available from the production or design side, it will not be possible to achieve automatic use of the above method. Marking, and in practice, this phenomenon is very common, that is, it is difficult to obtain the CAD model of the target object, thus affecting the versatility of this method.
  • the present application provides a method, device and system for automatically marking a target object in an image, which can accurately and effectively perform automatic image annotation and improve the versatility of the method.
  • a method for automatically annotating a target object within an image comprising:
  • an image training sample which includes a plurality of images, each image being obtained by photographing the same target object, and the same environmental feature points exist between adjacent images;
  • the image plane of the image The image plane of the image.
  • a method for establishing a target object recognition model includes:
  • each image training sample which includes a plurality of images, each image is obtained by photographing the same target object, and the same environmental feature points exist between adjacent images; each image further includes a position of the target object.
  • Labeling information obtained by using one of the images as a reference image and creating a three-dimensional space model based on the reference three-dimensional coordinate system, and determining the target object in the reference three-dimensional image according to the position to which the three-dimensional space model is moved Position information in the coordinate system, and mapping the three-dimensional spatial model to the image plane of each image according to respective camera attitude information determined according to the environmental feature points in the respective images;
  • An augmented reality AR information providing method includes:
  • a device for automatically marking a target object in an image comprising:
  • a training sample obtaining unit configured to obtain an image training sample, which comprises a plurality of images, each image being obtained by photographing the same target object, and the same environmental feature points exist between adjacent images;
  • a three-dimensional space model creating unit configured to use one of the images as a reference image, and determine a reference coordinate system, and create a three-dimensional space model based on the reference three-dimensional coordinate system;
  • a position information determining unit configured to determine position information of the target object in the reference three-dimensional coordinate system when the three-dimensional space model is moved to a position where the target object is located in the reference image
  • a mapping unit configured to: according to position information of the target object in the reference three-dimensional coordinate system, and corresponding camera attitude information determined according to environmental feature points in each image, the three-dimensional space model Map to the image plane of each image separately.
  • An apparatus for establishing a target object recognition model comprising:
  • An image training sample obtaining unit is configured to obtain an image training sample, which includes a plurality of images, each image is obtained by photographing the same target object, and the same environment feature points exist between adjacent images; in each image Also included is annotation information on the location of the target object, the annotation information is obtained by using one of the images as a reference image and creating a three-dimensional space model based on the reference three-dimensional coordinate system, according to the position to which the three-dimensional space model is moved, Determining position information of the target object in the reference three-dimensional coordinate system, and mapping the three-dimensional space model to each image according to respective camera attitude information determined by the environmental feature points in the respective images Image plane
  • the recognition model generating unit is configured to generate a recognition model for the target object according to the annotation information of the position of the target object in the image training sample.
  • An augmented reality AR information providing apparatus includes:
  • a real-time image capturing unit configured to acquire a real-life image, and use the pre-established target object recognition model to identify location information of the target object from the real-life image, wherein the target object recognition model passes the method of claim 15. Make up;
  • a virtual image display unit configured to determine a display position of the associated virtual image according to position information of the target object in the real-life image, and display the virtual image.
  • a computer system comprising:
  • One or more processors are One or more processors;
  • a memory associated with the one or more processors the memory for storing program instructions that, when read by the one or more processors, perform the following operations:
  • an image training sample which includes a plurality of images, each image being obtained by photographing the same target object, and the same environmental feature points exist between adjacent images;
  • the image plane of the image The image plane of the image.
  • the present application discloses the following technical effects:
  • the target object is marked by a relatively regular three-dimensional space model, and the advantage is more easily obtained with respect to the CAD model of the target object.
  • the three-dimensional spatial model is remapped back to the image plane corresponding to each image according to the camera posture change of each image with respect to the reference image. .
  • the recognition of the camera pose can be realized as long as the feature points in the shooting environment are sufficiently obvious. That is to say, in the embodiment of the present application, the camera pose recognition can be performed based on the feature points of the entire shooting environment, thereby achieving the target. Automatic labeling of objects, rather than identifying the feature points of the target object to achieve tracking of the target object, therefore, automatic labeling of the target object can be achieved even if the target object itself is pure color, highly reflective or transparent.
  • 1-1 and 1-2 are schematic diagrams of labeling methods in the prior art
  • FIG. 2 is a schematic diagram of a method for creating a reference coordinate system according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of a three-dimensional space model provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram showing the result of labeling a reference image provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram showing display results after the mapping result is rectangularized according to an embodiment of the present application.
  • FIG. 6 is a flowchart of a first method provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a first device provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a second device provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of a third device provided by an embodiment of the present application.
  • FIG. 12 is a schematic diagram of a computer system according to an embodiment of the present application.
  • an image automatic annotation tool can be provided, by which a target object in an image can be abstracted into a more general regular object (for example, a rectangular parallelepiped, a cylinder, etc.), or a regular object. Combine objects and even any 3D space, and more. Thus, the target object labeling problem is transformed into a three-dimensional space (volume), and all objects in this three-dimensional space are marked as target objects.
  • a more general regular object for example, a rectangular parallelepiped, a cylinder, etc.
  • a regular object for example, a rectangular parallelepiped, a cylinder, etc.
  • one of the images may be first taken as a reference image, and a three-dimensional space model (not a CAD model of the target object) is initialized in the reference image, and then the user can move The position of the three-dimensional space model, adjusting its length, width and height, etc., so that the three-dimensional space model can just "snap" the target object in the image, so that the target object can be determined according to the position of the moved three-dimensional space model.
  • the position in the reference image is first taken as a reference image, and a three-dimensional space model (not a CAD model of the target object) is initialized in the reference image, and then the user can move The position of the three-dimensional space model, adjusting its length, width and height, etc., so that the three-dimensional space model can just "snap" the target object in the image, so that the target object can be determined according to the position of the moved three-dimensional space model.
  • the position in the reference image is a three-dimensional space model (not a CAD model
  • each image can also satisfy the following characteristics: each image is obtained by capturing the same target object in the same environment, and the same environment feature points exist between adjacent images (in the specific implementation, the same video can be the same video) The image of each frame in the file, etc.), so that the camera pose of each image can be obtained by using techniques such as SLAM positioning, so that after acquiring the marked position of the target object in the reference image, according to other frames The image is remapped to the image plane of each image with respect to the camera pose change relationship of the reference image, thereby realizing automatic labeling of the target object in each image.
  • the three-dimensional coordinate system should be fixed with respect to each image.
  • the camera pose information in each image it is also necessary to use a fixed three-dimensional coordinate system as the reference coordinate system, and respectively solve the camera coordinate system of each image frame to the three-dimensional coordinate system of the reference coordinate system.
  • 3D rigid transformation using the sensor fusion with the IMU module in the mobile phone, the camera-based six-degree-of-freedom information can be obtained based on the visual SLAM.
  • a reference coordinate system may be first determined, so that the creation, movement, and determination of the camera pose in each frame may be performed by the subsequent three-dimensional space model.
  • the coordinate system is based on the benchmark.
  • the reference three-dimensional coordinate system may be determined by means of a preset marker.
  • the image training sample may be an image obtained by a special manner, for example, may be an image of each frame in a video obtained by performing image acquisition on a target object in a manner specified in the embodiment of the present application, and the like.
  • a preset marker with a planar structure may be first placed at a table or the like, for example, as shown in FIG. 2, may refer to a piece of paper with a preset pattern, etc. Wait. Before shooting the target object, you can first focus the lens on the plane of the marker, and then move the lens to the target object to shoot. Specifically, when shooting the target object, it can be 360 degrees around the target object. Shooting, and so on. In this way, when the target object is marked on the image in the video, the reference three-dimensional coordinate system can be firstly created according to the planar marker information captured in the first few frames of the video, and then the three-dimensional space can be performed based on the reference three-dimensional coordinate system. The creation of the model, as well as the determination of successive poses in each frame of the image.
  • the reference three-dimensional coordinate system when the reference three-dimensional coordinate system is created according to the planar markers in the images of the previous frames, since the planar markers can usually be a piece of paper or a thin plate, etc., the area is fixed, and when placed, It has the characteristics parallel to the ground. Therefore, when creating the reference three-dimensional coordinate system, it is possible to first locate the position of the marker according to the preset pattern in the images of the first few frames. Then, the center point of the plane where the marker is located may be the origin, the plane is x-y plane, and the reference three-dimensional coordinate system is established according to the right-hand rule.
  • the reference three-dimensional coordinate system created in this way because its x-y plane is parallel to the ground and the z-axis is vertically downward, the reference three-dimensional coordinate system can also be called a world coordinate system.
  • the preparation work can be performed, and the marker Marker for creating the reference three-dimensional coordinate system is added to the video, so that the subsequent automatic marking is performed.
  • the reference 3D coordinate system can be determined from the Marker in the video file.
  • the three-dimensional space model when marking each image, taking each frame image in the same video file as an image training sample as an example, after determining a specific reference three-dimensional coordinate system, one frame image may be arbitrarily selected as a reference frame. Then, a three-dimensional space model can be initialized based on the reference three-dimensional coordinate system. Specifically, since the final marking result is required to add a rectangular frame to the target object, the three-dimensional space model may be a rectangular parallelepiped. Of course, in a specific implementation, the three-dimensional space model may also be a cylinder, so that the final marking result may be a circular frame around the target object. Or, it is also possible to combine a plurality of rectangular parallelepipeds, and the like.
  • the three-dimensional space model in the embodiment of the present application is a relatively regular and simple shape compared to the CAD model in the prior art, and is not required to be provided by a manufacturer of a specific target object according to a design drawing, etc. It is easy to create a concrete 3D space model.
  • initialization may be performed on the XY plane in the world coordinate system, and the three-dimensional space model may appear in the camera field of view, and the user may The three-dimensional space is moved.
  • the user can move the three-dimensional space model in the XY plane, and move the three-dimensional space model in the Z direction if necessary.
  • the annotation tool can provide an alignment method for rotating a three-dimensional space model along three coordinate axes, and an operation mode for adjusting the size of the three-dimensional space (the length, width, height, and the like of the rectangular parallelepiped), and the final goal is to accurately calculate the three-dimensional space model.
  • the target object is nested, that is, as shown at 401 in Fig. 4, so that the target object is located inside the three-dimensional space model. After completion, it can be confirmed by the button provided by the labeling tool, etc., thus completing the manual labeling process of the reference frame therein.
  • the position information of the target object in the reference three-dimensional coordinate system can be determined according to the position to which the final three-dimensional space model is moved and/or rotated.
  • the location information may be specifically expressed by information on: displacement degrees of freedom, rotational degrees of freedom in three dimensions of the target object in the reference three-dimensional coordinate system, and the three-dimensional space model in three Size information on the dimension.
  • the method for image acquisition of the target object may be that the target object is fixed, and the image capturing device rotates around the target object for one week, thereby completing image collection of the target object. And generate a video file. Therefore, the target object is stationary with respect to the reference three-dimensional coordinate system, that is to say, after the position of the target object in the reference three-dimensional coordinate system is determined by one of the image frames, the position information is fixed. In other frame images, the camera pose changes. The change of the camera pose determines the position, angle, size, etc. of the target object appearing in each frame image.
  • the three-dimensional space model is inversely mapped back to the image plane where each frame image is located, so that the automatic labeling of the target object in the other frame images can be completed.
  • the three-dimensional space model is displayed as a two-dimensional region.
  • the two-dimensional region may become a parallelogram. , diamond shape and other quadrilateral shapes.
  • the rectangular frame may be required to be labeled. Therefore, the shape of the quadrilateral may be further adjusted to be a rectangle, and the adjusted effect may be as shown at 501 in FIG. .
  • each image training sample is taken from each frame image in the same video file.
  • the same target object may be captured from different angles in the same environment. Photographs, etc., as long as the photos are arranged in a certain order, and the same environmental feature points exist between adjacent photos, the recognition of the camera poses in each photo can be realized, and the subsequent specific labeling methods and video files are
  • the labeling manner of each frame image may be the same, and will not be described in detail herein.
  • the target object is marked by a relatively regular three-dimensional space model, and the advantage is more easily obtained with respect to the CAD model of the target object.
  • the three-dimensional spatial model is remapped back to the image plane corresponding to each image according to the camera posture change of each image with respect to the reference image. .
  • the recognition of the camera pose can be realized as long as the feature points in the shooting environment are sufficiently obvious. That is to say, in the embodiment of the present application, the camera pose recognition can be performed based on the feature points of the entire shooting environment, thereby achieving the target. Automatic labeling of objects, rather than identifying the feature points of the target object to achieve tracking of the target object, therefore, automatic labeling of the target object can be achieved even if the target object itself is pure color, highly reflective or transparent.
  • the embodiment of the present application provides a method for automatically marking a target object in an image, and the method may specifically include:
  • S601 Obtain an image training sample, where multiple images are included, each image is obtained by capturing the same target object, and the same environmental feature points exist between adjacent images;
  • the image training samples may be obtained from a target video file, or may be obtained from a plurality of photos and the like obtained in advance.
  • the target video file may be pre-recorded.
  • the target object may be image-acquired in advance.
  • each picture obtained by image acquisition is used as an image training sample, and a specific target image is marked from each image training sample, and then specific machine learning is performed.
  • the above image acquisition process can obtain a corresponding video file, which includes multiple frames, and each frame can be used as an image training sample.
  • the target object in order to perform image acquisition on the target object, in a preferred embodiment, may be placed in the middle, and then the image capturing device is used to shoot the target object for one week, thereby generating a corresponding video file.
  • a multi-frame image is extracted from the video file as an image training sample.
  • the specific content, the angle of the target object, etc., which are ultimately displayed in the image plane are mainly caused by the difference in camera posture during shooting. It will be different.
  • the camera pose corresponding to each image can be calculated, and further, the target object can be calculated in the image plane of each image. s position.
  • the embodiment of the present application may select all image frames from a pre-recorded video file, or part of the image frame, or may be multiple photos taken in advance, etc., but regardless of Whether all or part of a frame or a photo can satisfy the following conditions: each image is obtained by capturing the same target object in the same environment, and the same environmental feature points exist between adjacent images, that is, The image content in adjacent images has overlapping portions, so that the change of the camera pose in each image can be recognized.
  • the image training sample may also be preprocessed, the preprocessing includes: determining a reference three-dimensional coordinate system, and according to the reference three-dimensional coordinate system and the environmental feature point, Determining camera attitude information corresponding to each image;
  • the image training sample may be pre-processed first, and the pre-processing process is the foregoing The image of the camera poses during the process of recognition. Specifically, the so-called camera pose is actually a relative concept. Therefore, when performing the calculation specifically, first, a reference three-dimensional coordinate system can be determined, wherein the camera coordinate system of the first frame image in the video file can be used as the The reference three-dimensional coordinate system, or, in a more preferred embodiment, may perform special processing when performing image acquisition as described above.
  • the target object may be placed in the target environment, and a marker with a planar structure (for example, a paper with the words "alibaba” shown in FIG. 2, etc.), and The plane of the marker is parallel to the ground plane.
  • the lens is first aligned with the marker, and then the lens is moved to the position of the target object for shooting.
  • the marker plane can be first identified from the first few frames of the video file, and then the center point of the plane where the marker is located is taken as the origin, and the plane is used as the reference coordinate system xy Plane, and according to the rules of the right hand system, establish the reference three-dimensional coordinate system. Since the plane of the marker is parallel to the ground plane, the subsequent reference coordinate system established based on the plane can be regarded as a world coordinate system.
  • camera attitude information corresponding to each image may be determined according to the reference three-dimensional coordinate system and the environmental feature point.
  • the above-described determination of the camera pose information may be performed by using a technique such as SLAM.
  • the camera attitude refers to the 3D rigid transformation of the camera coordinate system to the reference coordinate system.
  • the vision-based SLAM can obtain the 6-degree-of-freedom information of the camera pose. Therefore, the camera can be positioned in the 3D physical space, and then in the specific labeling process, the pair can be utilized. The positioning information of the camera pose is automatically labeled.
  • the SLAM technology is used to locate the camera in a three-dimensional physical space, instead of tracking the target object, specifically when the camera is positioned, in a shooting environment.
  • the feature point not the feature point of the target object itself.
  • S602 using one of the images as a reference image, and determining a reference coordinate system, and creating a three-dimensional space model based on the reference three-dimensional coordinate system;
  • one of the image training samples may first be used as a reference image, and the so-called reference image is an image that needs to be manually labeled.
  • the reference image is an image that needs to be manually labeled.
  • the three-dimensional model is not a CAD model of the target object, and does not need to be provided by the production or design manufacturer of the target object, but a regular three-dimensional model such as a cuboid or a cylinder, or a combination of a plurality of regular three-dimensional models.
  • the three-dimensional model is easily available.
  • the role of the three-dimensional model is to specify the position of the target object in the reference three-dimensional coordinate system. Therefore, the three-dimensional space model is movable and the size can be adjusted, and the user can move the three-dimensional space model, adjust its length, width, and height so that it can just "snap" the target object.
  • S603 determining position information of the target object in the reference three-dimensional coordinate system when the three-dimensional space model is moved to a position where the target object is located in the reference image;
  • the three-dimensional space model when the three-dimensional space model is moved to the position where the target object is located, the three-dimensional space model may be a state of “strapping” the target object, that is, the target object is located in the three-dimensional space model, and at this time, the pair is completed.
  • the location information may include: a displacement degree of freedom, a degree of freedom of rotation, and a size information of the three-dimensional space model in three dimensions of the target object in three dimensions in the reference three-dimensional coordinate system. .
  • the position information is fixed after being determined, that is, in each specific image training sample, the target object is relative to the reference.
  • the positions of the three-dimensional coordinate system are the same and fixed.
  • the three-dimensional space model After determining the position of the target object relative to the reference three-dimensional coordinate system, the three-dimensional space model can be respectively mapped to the image plane of each image according to the camera posture information corresponding to each image, so that other Automatic labeling of target objects in each image.
  • the three-dimensional space model after mapping the three-dimensional space model to the image plane of each image, it will become a two-dimensional shape.
  • the three-dimensional space model is a rectangular parallelepiped
  • it after mapping back to the image plane, it will be a quadrilateral, including a diamond shape, Parallelograms and so on.
  • the quadrilateral obtained by mapping the three-dimensional space model may be rectangularized.
  • each image training sample can add a rectangular frame to the target object, and then the training model of the rectangular frame can be trained to establish a recognition model of the specific target object. Used to identify a target object in a scene such as an AR.
  • the target object is marked by a relatively regular three-dimensional space model, and the advantage is more easily obtained with respect to the CAD model of the target object.
  • the target object in the reference image is manually labeled by using the three-dimensional model, and then the image is compared with the reference image according to each image.
  • the camera pose changes, and the three-dimensional model is remapped back to the image plane corresponding to each image. In this process, the recognition of the camera pose can be realized as long as the feature points in the shooting environment are sufficiently obvious.
  • the camera pose recognition can be performed based on the feature points of the entire shooting environment, thereby achieving the target. Automatic labeling of objects, rather than identifying the feature points of the target object to achieve tracking of the target object, therefore, automatic labeling of the target object can be achieved even if the target object itself is pure color, highly reflective or transparent.
  • the second embodiment is applied to the automatic labeling method provided in the first embodiment, that is, after the automatic labeling of the target object in the image training sample is completed, it can be applied to the creation process of the target object recognition model.
  • the method of the second embodiment of the present application provides a method for establishing a target object recognition model. Referring to FIG. 7, the method may specifically include:
  • S701 Obtain an image training sample, which includes multiple images, each image is obtained by capturing the same target object, and the same environmental feature points exist between adjacent images; each image also includes the target object Positioning information of the position, the labeling information is obtained by using one of the images as a reference image, and creating a three-dimensional space model based on the reference three-dimensional coordinate system, and determining the target object according to the position to which the three-dimensional space model is moved Positioning information in a reference three-dimensional coordinate system, and mapping respective three-dimensional spatial models to image planes of the respective images according to respective corresponding camera attitude information determined by the environmental feature points in the respective images;
  • S702 Generate an identification model for the target object according to the annotation information of the location of the target object in the image training sample.
  • the recognition model of the target object is applied to the augmented reality AR interaction process to identify the target object from the captured real-life image, and determine the position of the target object in the real-life image for use according to the A positional information of the target object in the real-life image, and displaying a virtual image associated with the target object.
  • the third embodiment is based on the second embodiment, and further provides an augmented reality AR information providing method. Specifically, referring to FIG. 8, the method may specifically include:
  • S801 Collect a real-life image, and use the pre-established target object recognition model to identify the location information of the target object from the real-time image, wherein the target object recognition model is established by the method in the foregoing Embodiment 2;
  • S802 Determine, according to position information of the target object in the real-life image, a display position of the associated virtual image, and display the virtual image.
  • the position of the virtual image follows the position change of the real image.
  • both the virtual image and the real image are located at the A position in the picture.
  • the real image is moved to the B position, and the virtual image is still located at the A position. It will only follow the change to the B position after a few seconds. If the user moves the terminal device more frequently or moves left and right or up and down, the user will feel that the virtual image is relatively "floating" and the display effect is not good.
  • the position of the virtual image may be changed according to the position of the real image in the following manner:
  • the second thread uses the target object recognition model to identify location information of the target object from the real-life image, and according to the target object, the real-life image Location information in the location, determining the location of the associated virtual image;
  • the second thread completes the determination and rendering of the display attribute of the virtual image according to the real-time image information collected by the first thread, and then the first thread performs the next thread.
  • the acquisition of a real-time image of a frame, which makes the display property of the position, size, and the like of the virtual image in the AR image can be determined strictly according to the position, size, and the like of the current real-image image in the image, and is simultaneously drawn.
  • the virtual image can be drawn according to the real scene image of the first few frames collected by the camera thread, and the synchronous change of the display position of the position and size of the virtual image and the real image in the AR image can be realized, and the movement of the terminal device can be avoided.
  • the virtual image is “floating”, which improves the quality and display effect of the AR image.
  • the embodiment of the present application further provides a device for automatically marking a target object in an image.
  • the device may specifically include:
  • the training sample obtaining unit 901 is configured to obtain an image training sample, where a plurality of images are included, each image is obtained by photographing the same target object, and the same environmental feature points exist between adjacent images;
  • a three-dimensional space model creating unit 902 configured to use one of the images as a reference image, and determine a reference coordinate system, and create a three-dimensional space model based on the reference three-dimensional coordinate system;
  • a position information determining unit 903 configured to determine position information of the target object in the reference three-dimensional coordinate system when the three-dimensional space model is moved to a position where the target object is located in the reference image;
  • the mapping unit 904 is configured to: according to position information of the target object in the reference three-dimensional coordinate system, and corresponding camera attitude information determined according to environmental feature points in each image, the three-dimensional space
  • the models are mapped to the image planes of the individual images.
  • the device may further include:
  • a preprocessing unit configured to perform preprocessing on the image training sample, the preprocessing includes: determining a reference three-dimensional coordinate system, and determining, according to the reference three-dimensional coordinate system and the environmental feature point, each image corresponding to each Camera pose information.
  • the pre-processing unit may be specifically configured to:
  • the environment feature point information of each image frame is analyzed by using the vision-based concurrent mapping and positioning SLAM technology, and the camera pose information corresponding to each image is determined according to the analysis result.
  • the target object is located in the three-dimensional space model.
  • the training sample obtaining unit may be specifically configured to:
  • the multi-frame image in the video file is used as an image training sample; wherein the target video file is obtained by capturing a target object in a target environment.
  • a camera coordinate system of the first frame image in the video file is used as the reference three-dimensional coordinate system.
  • the target video file is photographed by placing the target object in the target environment and a marker with a planar structure, the plane of the marker being parallel to the ground plane, first taking the lens Aligning the marker, and then moving the lens to the position of the target object for shooting;
  • the reference three-dimensional coordinate system is established according to a plane in which the marker is located in the first few frames of the video file.
  • the reference three-dimensional coordinate system may be established by taking the center point of the plane where the marker is located as the origin, using the plane as the x-y plane, and according to the right-hand rule.
  • the marker with a planar structure comprises a paper piece displaying a preset pattern.
  • the video file can be obtained by photographing the position of the target object, and performing a one-week shooting of the target object with a video capture device.
  • the location information determining unit may be specifically configured to:
  • the three-dimensional space model comprises: a rectangular parallelepiped model.
  • the device may further include:
  • the rectangular processing unit is configured to perform rectangular processing on the quadrilateral obtained by mapping the three-dimensional spatial model after mapping the three-dimensional spatial model to the image plane of each image.
  • the three-dimensional space model may further include: a combination model composed of a plurality of cuboid models.
  • the embodiment of the present application further provides a device for establishing a target object recognition model.
  • the device may specifically include:
  • the image training sample obtaining unit 1001 is configured to obtain an image training sample, which includes multiple images, each image is obtained by capturing the same target object, and the same environment feature points exist between adjacent images; each image
  • the annotation information of the location of the target object is also obtained, and the annotation information is obtained by using one of the images as a reference image and creating a three-dimensional space model based on the reference three-dimensional coordinate system, according to the position to which the three-dimensional space model is moved. Determining position information of the target object in the reference three-dimensional coordinate system, and mapping the three-dimensional space model to each image according to respective camera attitude information determined by the environmental feature points in the respective images Image plane
  • the recognition model generating unit 1002 is configured to generate a recognition model for the target object according to the annotation information of the position of the target object in the image training sample.
  • the recognition model of the target object is applied to the augmented reality AR interaction process to identify the target object from the captured real-life image, and determine the position of the target object in the real-life image for use according to the target object
  • a virtual image associated with the target object is displayed in the location information in the live view image.
  • the embodiment of the present application further provides an augmented reality AR information providing apparatus.
  • the apparatus may specifically include:
  • the real-time image capturing unit 1101 is configured to collect a real-life image, and use the pre-established target object recognition model to identify location information of the target object from the real-life image, where the target object recognition model is provided by the foregoing embodiment 2.
  • the virtual image display unit 1102 is configured to determine a display position of the associated virtual image according to position information of the target object in the real-life image, and display the virtual image.
  • the device may further include:
  • a synchronization changing unit configured to change a position of the virtual image in accordance with a position of the real image when the position of the target object changes in the real image.
  • the position of the virtual image can be changed to follow the position change of the real image by:
  • the second thread uses the target object recognition model to identify location information of the target object from the real-life image, and according to the target object, the real-life image Location information in the location, determining the location of the associated virtual image;
  • the embodiment of the present application further provides a computer system, including:
  • One or more processors are One or more processors;
  • a memory associated with the one or more processors the memory for storing program instructions that, when read by the one or more processors, perform the following operations:
  • an image training sample which includes a plurality of images, each image being obtained by photographing the same target object, and the same environmental feature points exist between adjacent images;
  • the image plane of the image The image plane of the image.
  • the 12 is an exemplary embodiment of a computer system, and may specifically include a processor 1210, a video display adapter 1211, a disk drive 1212, an input/output interface 1213, a network interface 1214, and a memory 1220.
  • the processor 1210, the video display adapter 1211, the disk drive 1212, the input/output interface 1213, the network interface 1214, and the memory 1220 can be communicably connected via the communication bus 1230.
  • the processor 1210 can be implemented by using a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits. Related procedures are performed to implement the technical solutions provided by the present application.
  • a general-purpose CPU Central Processing Unit
  • ASIC Application Specific Integrated Circuit
  • the memory 1220 can be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like.
  • the memory 1220 can store an operating system 1221 for controlling the operation of the computer system 1200, a basic input output system (BIOS) for controlling low level operation of the computer system 1200.
  • BIOS basic input output system
  • a web browser 1223, a data storage management system 1224, an image annotation system 1225, and the like can also be stored.
  • the image labeling system 1225 may be an application that specifically implements the foregoing steps in the embodiments of the present application.
  • the related program code is stored in the memory 1220 and is called by the processor 1210 for execution.
  • the input/output interface 1213 is used to connect an input/output module to implement information input and output.
  • the input/output/module can be configured as a component in the device (not shown) or externally connected to the device to provide the corresponding function.
  • the input device may include a keyboard, a mouse, a touch screen, a microphone, various types of sensors, and the like, and the output device may include a display, a speaker, a vibrator, an indicator light, and the like.
  • the network interface 1214 is used to connect a communication module (not shown) to implement communication interaction between the device and other devices.
  • the communication module can communicate by wired means (such as USB, network cable, etc.), or can communicate by wireless means (such as mobile network, WIFI, Bluetooth, etc.).
  • Bus 1230 includes a path for communicating information between various components of the device (e.g., processor 1210, video display adapter 1211, disk drive 1212, input/output interface 1213, network interface 1214, and memory 1220).
  • the computer system 1200 can also obtain information of specific collection conditions from the virtual resource object collection condition information database 1241 for condition determination, and the like.
  • the above device only shows the processor 1210, the video display adapter 1211, the disk drive 1212, the input/output interface 1213, the network interface 1214, the memory 1220, the bus 1230, etc., in a specific implementation process, The device may also include other components necessary to achieve normal operation.
  • the above-mentioned devices may also include only the components necessary for implementing the solution of the present application, and do not necessarily include all the components shown in the drawings.
  • the present application can be implemented by means of software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be embodied in the form of a software product in essence or in the form of a software product, which may be stored in a storage medium such as a ROM/RAM or a disk. , an optical disk, etc., includes instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments of the present application or portions of the embodiments.
  • a computer device which may be a personal computer, server, or network device, etc.

Abstract

本申请实施例公开了对图像内目标物体进行自动标注的方法、装置及系统,该方法包括:获得图像训练样本,其中包括多幅图像,各幅图像是对同一目标物体进行拍摄获得的,且相邻的图像之间存在相同的环境特征点;将其中一幅图像作为基准图像,并确定基准坐标系,基于所述基准三维坐标系创建三维空间模型;在所述三维空间模型被移动到所述基准图像内目标物体所在的位置时,确定所述目标物体在所述基准三维坐标系中的位置信息;根据所述各幅图像中的环境特征点确定出的各自对应的相机姿态信息,将所述三维空间模型分别映射到各幅图像的像平面。通过本申请实施例,能够更准确有效的进行图像自动标注,并提高方法的通用性。

Description

对图像内目标物体进行自动标注的方法、装置及系统
本申请要求2017年09月29日递交的申请号为201710912283.0、发明名称为“对图像内目标物体进行自动标注的方法、装置及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理技术领域,特别是涉及对图像内目标物体进行自动标注的方法、装置及系统。
背景技术
在AR/VR等相关的业务中,利用机器学习方法在图像中进行场景/物体的识别应用广泛,而在机器学习过程中,需要用到大量的图像训练样本,并需要对图像训练样本中的目标物体进行标注。所谓的标注也就是指,需要标注出目标物体在图像中的位置,以便机器学习时从目标物体的图像中进行特征提取进行学习。
现有技术中,图像训练样本的标注主要分为两种,一种是基于二维图像的标注,另一种是基于物体CAD模型的三维图像标注。所谓二维图像标注主要是指在二维图像中标注目标物体所在的矩形区域,该过程通常需要由工作人员通过手工标注的方式来完成。例如,参见图1-1所示,需要工作人员在各个图像中手动框选出目标物体所在的位置。但是,手工标注的效率会很低,在图像训练样本众多的情况下,需要耗费大量的人力以及时间成本去进行标注。
基于物体CAD模型的三维图像标注,则主要是将预先拍摄的视频中的各帧作为图像训练样本,首先获得目标物体的CAD模型,例如,参见图1-2,当视频中的目标物体为一辆汽车时,需要首先获得该汽车的CAD模型,然后,在其中一帧图像中人为标注CAD模型中的多个点与目标物体对应的特征点之间的对应关系。随后可以利用model-based tracking通过跟踪目标物体以进行批量标注,跟踪的过程具体是利用标注出的目标物体的特征点,在其他帧图像中识别出目标物体所在的位置。这种在3D空间中的标注方法较之二维图像标注自动化程度高,可实现标注视频中的一帧进而自动标注整个视频的目的。同时,自动标注本身对标注精度也有统一量化标准,较之人为标注更加精确。然而,其缺点也十分明显,即通常情况下,目标物体的CAD模型通常是由目标物体的生产或者设 计方提供,但是,如果生产或者设计方无法提供CAD模型,则会无法利用上述方式实现自动标注,而且在实际应用中,这种现象又是很常见的,也即,很难获取目标物体的CAD模型,因此,影响了这种方法的通用性。其次,即使能够找到目标物体的CAD模型,由于对目标物体的跟踪通常情况下依赖物体上有足够多的特征点,但是,当物体本身是纯色、高反光或透明等情况时,model-based tracking将无法保证其足够的准确性,进而影响自动标注的效果。
因此,如何更准确有效的进行图像自动标注,并提高方法的通用性,成需要本领域技术人员解决的技术问题。
发明内容
本申请提供了对图像内目标物体进行自动标注的方法、装置及系统,能够更准确有效的进行图像自动标注,并提高方法的通用性。
本申请提供了如下方案:
一种对图像内目标物体进行自动标注的方法,包括:
获得图像训练样本,其中包括多幅图像,各幅图像是对同一目标物体进行拍摄获得的,且相邻的图像之间存在相同的环境特征点;
将其中一幅图像作为基准图像,并确定基准坐标系,基于所述基准三维坐标系创建三维空间模型;
在所述三维空间模型被移动到所述基准图像内目标物体所在的位置时,确定所述目标物体在所述基准三维坐标系中的位置信息;
根据所述目标物体在所述基准三维坐标系中的位置信息,以及根据所述各幅图像中的环境特征点确定出的各自对应的相机姿态信息,将所述三维空间模型分别映射到各幅图像的像平面。
一种建立目标物体识别模型的方法,包括:
获得图像训练样本,其中包括多幅图像,各幅图像是对同一目标物体进行拍摄获得的,且相邻的图像之间存在相同的环境特征点;各幅图像中还包括对目标物体所在位置的标注信息,所述标注信息通过以下方式获得:将其中一幅图像作为基准图像,并基于基准三维坐标系创建三维空间模型,根据三维空间模型被移动到的位置,确定目标物体在所述基准三维坐标系中的位置信息,并根据所述各幅图像中的环境特征点确定出的各自对应的相机姿态信息,将所述三维空间模型分别映射到各幅图像的像平面;
根据所述图像训练样本中对所述目标物体所在位置的标注信息,生成对所述目标物体的识别模型。
一种增强现实AR信息提供方法,包括:
采集实景图像,并利用预先建立的目标物体识别模型从所述实景图像中识别目标物体所在的位置信息,其中,所述目标物体识别模型通过权利要求15所述的方法进行建立;
根据所述目标物体在所述实景图像中的位置信息,确定关联的虚拟图像的展示位置,并对所述虚拟图像进行展示。
一种对图像内目标物体进行自动标注的装置,包括:
训练样本获得单元,用于获得图像训练样本,其中包括多幅图像,各幅图像是对同一目标物体进行拍摄获得的,且相邻的图像之间存在相同的环境特征点;
三维空间模型创建单元,用于将其中一幅图像作为基准图像,并确定基准坐标系,基于所述基准三维坐标系创建三维空间模型;
位置信息确定单元,用于在所述三维空间模型被移动到所述基准图像内目标物体所在的位置时,确定所述目标物体在所述基准三维坐标系中的位置信息;
映射单元,用于根据所述目标物体在所述基准三维坐标系中的位置信息,以及根据所述各幅图像中的环境特征点确定出的各自对应的相机姿态信息,将所述三维空间模型分别映射到各幅图像的像平面。
一种建立目标物体识别模型的装置,包括:
图像训练样本获得单元,用于获得图像训练样本,其中包括多幅图像,各幅图像是对同一目标物体进行拍摄获得的,且相邻的图像之间存在相同的环境特征点;各幅图像中还包括对目标物体所在位置的标注信息,所述标注信息通过以下方式获得:将其中一幅图像作为基准图像,并基于基准三维坐标系创建三维空间模型,根据三维空间模型被移动到的位置,确定目标物体在所述基准三维坐标系中的位置信息,并根据所述各幅图像中的环境特征点确定出的各自对应的相机姿态信息,将所述三维空间模型分别映射到各幅图像的像平面;
识别模型生成单元,用于根据所述图像训练样本中对所述目标物体所在位置的标注信息,生成对所述目标物体的识别模型。
一种增强现实AR信息提供装置,包括:
实景图像采集单元,用于采集实景图像,并利用预先建立的目标物体识别模型从所述实景图像中识别目标物体所在的位置信息,其中,所述目标物体识别模型通过权利要 求15所述的方法进行建立;
虚拟图像展示单元,用于根据所述目标物体在所述实景图像中的位置信息,确定关联的虚拟图像的展示位置,并对所述虚拟图像进行展示。
一种计算机系统,包括:
一个或多个处理器;以及
与所述一个或多个处理器关联的存储器,所述存储器用于存储程序指令,所述程序指令在被所述一个或多个处理器读取执行时,执行如下操作:
获得图像训练样本,其中包括多幅图像,各幅图像是对同一目标物体进行拍摄获得的,且相邻的图像之间存在相同的环境特征点;
将其中一幅图像作为基准图像,并确定基准坐标系,基于所述基准三维坐标系创建三维空间模型;
在所述三维空间模型被移动到所述基准图像内目标物体所在的位置时,确定所述目标物体在所述基准三维坐标系中的位置信息;
根据所述目标物体在所述基准三维坐标系中的位置信息,以及根据所述各幅图像中的环境特征点确定出的各自对应的相机姿态信息,将所述三维空间模型分别映射到各幅图像的像平面。
根据本申请提供的具体实施例,本申请公开了以下技术效果:
通过本申请实施例,是通过相对规则的三维空间模型对目标物体进行标注,相对于目标物体的CAD模型而言,具有更容易获得的优点。另外,具体在利用手动标注的基准图像对其他各幅图像进行自动标注的过程中,是根据各幅图像相对于基准图像的相机姿态变化,将三维空间模型重映射回各幅图像对应的像平面。在此过程中,只要拍摄环境中的特征点足够明显即可实现对相机姿态的识别,也就是说,本申请实施例中,可以基于整个拍摄环境的特征点进行相机姿态识别,进而实现对目标物体的自动标注,而不是对目标物体的特征点进行识别以实现对目标物体的跟踪,因此,即使目标物体本身是纯色、高反光或透明等情况时,也能够实现对目标物体的自动标注。
当然,实施本申请的任一产品并不一定需要同时达到以上所述的所有优点。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施 例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1-1、图1-2是现有技术中的标注方式示意图;
图2是本申请实施例提供的基准坐标系创建方式示意图;
图3是本申请实施例提供的三维空间模型的示意图;
图4是本申请实施例提供的对基准图像的标注结果示意图;
图5是本申请实施例提供的将映射结果矩形化处理后的展示结果示意图;
图6是本申请实施例提供的第一方法的流程图;
图7是本申请实施例提供的第二方法的流程图;
图8是本申请实施例提供的第三方法的流程图;
图9是本申请实施例提供的第一装置的示意图;
图10是本申请实施例提供的第二装置的示意图;
图11是本申请实施例提供的第三装置的示意图;
图12是本申请实施例提供的计算机系统的示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员所获得的所有其他实施例,都属于本申请保护的范围。
在本申请实施例中,可以提供一种图像自动标注工具,通过该工具可以将图像中的目标物体抽象成为更加通用的规则物体(例如长方体,圆柱体等),或者,由规则物体所组成的组合物体乃至任意三维空间,等等。这样,目标物体标注问题转化为了标注一个三维空间(体积),所有在这个三维空间内的物体都被标记为目标物体。这样,具体在对多幅图像进行自动标注时,可以首先取出其中一幅作为基准图像,并在该基准图像内初始化一个三维空间模型(并不是目标物体的CAD模型),然后,用户可以通过移动该三维空间模型的位置、调整其长宽高等方式,使得该三维空间模型可以刚好“套住”图像中的目标物体,这样,就可以根据移动后的三维空间模型的位置,确定出目标物体在该基准图像中的位置。另外,各幅图像还可以满足以下特点:各幅图像是在同一环境中对同一目标物体进行拍摄获得的,且相邻的图像之间存在相同的环境特征点(具体实 现时,可以是同一视频文件中的各帧图像,等等),这样,可以利用SLAM定位等技术,获取到各幅图像的相机姿态,这样,在获取到目标物体在基准图像内的标注位置后,可以根据其他各幅图像相对于该基准图像的相机姿态变化关系,分别将三维空间模型重映射到各幅图像的像平面,从而实现对各幅图像中的目标物体的自动标注。
需要说明的是,在创建三维空间模型并对其进行移动的过程中,需要基于一个基准三维坐标系来进行,并且,该三维坐标系相对于各幅图像而言应该是固定不变的。另外,在确定各幅图像中的相机姿态信息时,也需要用到一个固定不变的三维坐标系作为基准坐标系,并分别求解出各幅图像帧的相机坐标系到该基准坐标系的三维刚体变换(3D rigid transformation),利用与手机中IMU模组的传感器融合,基于视觉的SLAM可以得到相机姿态六自由度信息。为此,在具体实现时,在创建三维空间模型之前,还可以首先确定出一个基准坐标系,这样,后续的三维空间模型的创建、移动、以及各帧中相机姿态的确定,都可以以该坐标系为基准来进行。
需要说明的是,确定所述基准三维坐标系的方式可以有多种,例如,在各幅图像为同一视频文件中的各帧图像时,可以将视频文件中第一帧的相机坐标系作为基准坐标系,其他各帧的相机姿态变化情况分别以第一帧中的相机姿态为基准进行确定。或者,在另一种更为优选的方式下,可以通过预先设定的标志物(Marker)的方式来确定基准三维坐标系。具体的,图像训练样本可以是通过特殊的方式获得的图像,例如,可以是通过本申请实施例中指定的方式对目标物体进行图像采集而获得的视频中的各帧图像,等等。其中,在进行图像采集时,可以首先在桌子等处放置预先设定的带有平面结构的标志物,例如,如图2所示,可以是指一张带有预置图样的纸件,等等。在对目标物体进行拍摄之前,可以首先将镜头对准该标志物的平面进行拍摄,之后再将镜头移动到目标物体进行拍摄,具体在对目标物体进行拍摄时,可以是围绕目标物体进行360度的拍摄,等等。这样,后续具体在对视频中的图像进行目标物体标注时,首先可以根据视频中前几帧拍摄到的平面标志物信息,创建基准三维坐标系,之后,可以基于该基准三维坐标系进行三维空间模型的创建,以及各帧图像中相继姿态的确定。
其中,具体在根据前几帧图像中的平面标志物进行基准三维坐标系的创建时,由于平面标志物通常可以是一张纸或者一个薄板等等,其面积是固定的,并且在放置时,具有平行于地面的特点,因此,在创建基准三维坐标系时,就可以首先根据对所述前几帧图像中的预置图样进行识别,定位出标志物所在的位置。之后,可以将该标志物所在平面的中心点为原点,以所述所在平面为x-y面,并按右手系规则,建立所述基准三维坐 标系。通过这种方式创建的基准三维坐标系,由于其x-y面与地面平行,z轴垂直向下,因此,该基准三维坐标系也可以称为世界坐标系。
也就是说,在上述方案中,在拍摄视频进行目标物体的图像采集时,就可以做好准备工作,在视频中加入用于创建基准三维坐标系的标志物Marker,这样,后续在进行自动标记时,就可以根据视频文件中的Marker来确定出基准三维坐标系。
具体在对各幅图像进行标记时,以同一视频文件中的各帧图像作为图像训练样本时为例,可以在确定出具体的基准三维坐标系后,首先任意选择其中一帧图像作为基准帧,然后,可以基于基准三维坐标系初始化一个三维空间模型。具体的,由于在通常情况下要求最终的标记结果是为目标物体加上矩形框,因此,该三维空间模型可以是一个长方体。当然,在具体实现时,该三维空间模型还可以是圆柱体,这样,最终的标记结果中可以是在目标物体周围加上圆形框。再或者,还可以通过多个长方体组合而成的组合体,等等。总之,本申请实施例中的三维空间模型相对于现有技术中的CAD模型而言,属于比较规则且简单的形状,不需要具体目标物体的生产厂家根据设计图等进行提供,而是可以很容易的创建出具体的三维空间模型。
如图3中的301所示,在创建三维空间模型时,可以是以在世界坐标系中X-Y平面上进行初始化,并且,该三维空间模型可以出现在相机视场中,并且,用户可以对该三维空间进行移动,例如,用户可在X-Y平面移动该三维空间模型,如有需要也可沿Z方向移动该三维空间模型。除此之外,标注工具可以提供沿三个坐标轴转动三维空间模型的对齐方式,以及对三维空间的大小(长方体的长宽高等)进行调整的操作方式,最终目标是将三维空间模型准确“套住”目标物体,也即,如图4中的401处所示,使得目标物体位于三维空间模型内部。在完成后,可以通过标注工具提供的按钮等进行确认,这样,就完成了对其中基准帧的手动标注过程。
在完成对基准帧的手动标注后,可以根据最终三维空间模型被移动和/或转动到的位置,确定出目标物体在基准三维坐标系中的位置信息。具体的,该位置信息具体可以通过以下几个方面的信息来表达:目标物体在所述基准三维坐标系中三个维度上的位移自由度、转动自由度,以及所述三维空间模型在三个维度上的大小信息。
需要说明的是,在本申请实施例中,具体对目标物体进行图像采集的方式可以是,目标物体固定不动,图像采集设备围绕该目标物体旋转一周,从而完成对该目标物体的图像采集,并生成视频文件。因此,目标物体相对于基准三维坐标系而言是静止的,这也就是说,在通过其中一帧图像确定出目标物体在基准三维坐标系中的位置后,该位置 信息就是固定不变的,而在其他各帧图像中,发生变化的是相机姿态,这种相机姿态的变化则决定了目标物体出现在各帧图像中的位置、角度、大小等会存在不同。而在本申请实施例中,由于在预处理的过程中,已经获知了各帧图像对应的相机姿态,也即,相机坐标系相对于基准三维坐标系的刚体变换信息,因此,可以通过计算的方式,将三维空间模型反映射回各帧图像所在的像平面,这样,即可完成对其他各帧图像中目标物体的自动标注。
其中,在将三维空间模型反映射回各帧图像的像平面后,三维空间模型会显示为一个二维区域,例如,在三维空间模型为长方体的情况下,该二维区域可能会成为平行四边形、菱形等四边形形状。而在具体的标注要求中,可能会要求采用矩形框的形式进行标注,因此,还可以进一步对该四边形进行形状调整,使其成为矩形,调整后的效果可以如图5中的501处所示。
以上是以各图像训练样本取自同一视频文件中的各帧图像的情况为例进行的介绍,而在其他实施方式中,也可以是在同一环境中对同一目标物体分别从不同角度进行拍摄得到的照片等,只要各照片之间按照一定的顺序进行排列,相邻的照片之间存在相同的环境特征点,即可实现对各照片中相机姿态的识别,后续的具体标注方式与对视频文件中各帧图像的标注方式可以是相同的,这里不再详述。
总之,在本申请实施例中,是通过相对规则的三维空间模型对目标物体进行标注,相对于目标物体的CAD模型而言,具有更容易获得的优点。另外,具体在利用手动标注的基准图像对其他各幅图像进行自动标注的过程中,是根据各幅图像相对于基准图像的相机姿态变化,将三维空间模型重映射回各幅图像对应的像平面。在此过程中,只要拍摄环境中的特征点足够明显即可实现对相机姿态的识别,也就是说,本申请实施例中,可以基于整个拍摄环境的特征点进行相机姿态识别,进而实现对目标物体的自动标注,而不是对目标物体的特征点进行识别以实现对目标物体的跟踪,因此,即使目标物体本身是纯色、高反光或透明等情况时,也能够实现对目标物体的自动标注。
下面对具体实现方案进行详细介绍。
实施例一
参见图6,本申请实施例提供了一种对图像内目标物体进行自动标注的方法,该方法具体可以包括:
S601:获得图像训练样本,其中包括多幅图像,各幅图像是对同一目标物体进行拍摄获得的,且相邻的图像之间存在相同的环境特征点;
其中,图像训练样本可以从一个目标视频文件中获得,或者,也可以从预先拍摄得到的多张照片等文件中获得。例如,目标视频文件可以是预先录制好的,具体的,可以是为了对某目标物体的特征进行机器学习,进而在AR等场景下能够识别出目标物体,则可以预先对该目标物体进行图像采集,然后,将图像采集获得的各个图片作为图像训练样本,从各个图像训练样本中标注出具体的目标图像,再进行具体的机器学习。其中,上述图像采集过程就可以获得对应的视频文件,其中包括多帧,每一帧都可以作为一个图像训练样本。
具体实现时,为了对目标物体进行图像采集,在优选的实施方式中,可以是将目标物体放置在中间,然后,用图像摄取设备围绕该目标物体拍摄一周,以此生成对应的视频文件,从视频文件中提取出多帧图像作为图像训练样本。或者,也可以是分别从多个角度对目标物体进行拍照,得到多张照片,将各照片作为图像训练样本,等等。也就是说,具体的图像采集结果中包括的各幅图像,是在同一环境中,分别从各个不同角度对目标物体进行拍摄所获得的结果。因此,各幅图像之间主要是由于拍摄过程中相机姿态的不同,才导致的最终在像平面(也即用户实际能够观察到的图像平面)中显示出的具体的内容、目标物体的角度等会有所不同。而在一个基准坐标系能够确定,并且拍摄环境中的特征点足够的情况下,各幅图像对应的相机姿态是可以计算出来的,进而,还可以计算出目标物体在各幅图像的像平面中的位置。
总之,本申请实施例在选择具体的图像训练样本时,可以从预先录制的视频文件中选择全部图像帧,或者,部分图像帧,或者,还可以是预先拍摄的多张照片等,但是,无论是全部还是部分帧或者是照片,都可以满足以下条件:各幅图像是在同一环境中对同一目标物体进行拍摄获得的,并且,相邻的图像之间存在相同的环境特征点,也即,相邻的图像内的图像内容存在相互重叠的部分,这样才能够识别出各幅图像中相机姿态的变化情况。
具体实现时,在优选的实现方式下,还可以对所述图像训练样本进行预处理,所述预处理包括:确定基准三维坐标系,并根据所述基准三维坐标系以及所述环境特征点,确定各幅图像分别对应的相机姿态信息;
也就是说,为了实现从一个基准图像出发,对其他各幅图像中目标物体的自动标注,在本申请实施例中,首先可以对图像训练样本进行预处理,预处理的过程即为前述对各幅图像的相机姿态进行识别的过程中。具体的,所谓的相机姿态实际是一个相对的概念,因此,在具体进行计算时,首先可以确定一个基准三维坐标系,其中,可以将所述视频 文件中第一帧图像的相机坐标系作为所述基准三维坐标系,或者,在更优选的方案中,可以如前文所述,在进行图像采集时,就进行特殊处理。具体的,可以在所述目标环境中放置所述目标物体,以及带有平面结构的标志物(例如,图2中所示的带有“alibaba”等字样的纸张,等等),并且使得所述标志物的所述平面与地平面平行,在具体进行拍摄时,先将镜头对准所述标志物,再将镜头移动到所述目标物体的位置进行拍摄。这样,具体在创建基准三维坐标系时,可以首先从视频文件的前几帧中识别出标志物平面,然后以所述标志物所在平面的中心点为原点,以该平面作为基准坐标系的x-y平面,并按右手系规则,建立所述基准三维坐标系。由于标志物的平面与地平面平行,因此,后续基于该平面建立的基准坐标系,可以作为世界坐标系来看待。
在确定出基准坐标系后,可以根据所述基准三维坐标系以及所述环境特征点,确定各幅图像分别对应的相机姿态信息。具体的,可以利用SLAM等技术,来进行上述对相机姿态信息的确定。其中,相机姿态指相机坐标系到基准坐标系的3D rigid transformation(刚体变换)。利用与终端设备中IMU模组的传感器融合,基于视觉的SLAM可以得到相机姿态6自由度信息,因此,可以完成相机在3D物理空间中的定位,后续在具体的标注过程中,就可以利用对相机姿态的定位信息实现自动标注。
需要说明的是,在本申请实施例中,利用SLAM技术进行的是对相机在三维物理空间中的定位,而不是对目标物体的跟踪,具体在对相机进行定位时,使用的是拍摄环境中的特征点,而不是目标物体本身的特征点。
S602:将其中一幅图像作为基准图像,并确定基准坐标系,基于所述基准三维坐标系创建三维空间模型;
具体在进行标注的过程中,首先可以将图像训练样本中的其中一幅图像作为基准图像,所谓基准图像就是需要通过手动方式进行标注的图像。在具体进行手动标注之前,本申请实施例中首先需要基于基准三维坐标系创建三维空间模型,这里的基准坐标系与进行相机姿态确定时使用的基准坐标系是相同的。其中,三维空间模型并不是目标物体的CAD模型,不需要由目标物体的生产或者设计厂家来提供,而是长方体、圆柱体等规则的三维空间模型,或者,由多个规则的三维空间模型组合而成的组合体,等等。也就是说,在本申请实施例中,三维空间模型是容易获得的。该三维空间模型的作用就是用于指定目标物体在基准三维坐标系中的位置。因此,该三维空间模型是可以移动的,并且大小可以调节,用户可以移动该三维空间模型,调节其长宽高等,使得其刚好可以“套住”目标物体。
S603:在所述三维空间模型被移动到所述基准图像内目标物体所在的位置时,确定所述目标物体在所述基准三维坐标系中的位置信息;
具体在将三维空间模型移动到目标物体所在的位置时,三维空间模型可以是“套住”目标物体的状态,也即所述目标物体位于所述三维空间模型内,此时,就完成了对基准图像的手动标注,此时,可以确定出所述目标物体在所述基准三维坐标系中的位置信息。具体实现时,该位置信息可以包括:所述目标物体在所述基准三维坐标系中三个维度上的位移自由度、转动自由度,以及所述三维空间模型在三个维度上的大小信息等。
由于在进行图像采集的过程中,目标物体的位置保持不变,因此,该位置信息确定后,就是固定不变的,也即,在各幅具体的图像训练样本中,该目标物体相对于基准三维坐标系的位置都是相同且固定不变的。
S604:根据所述目标物体在所述基准三维坐标系中的位置信息,以及根据所述各幅图像中的环境特征点确定出的各自对应的相机姿态信息,将所述三维空间模型分别映射到各幅图像的像平面。
在确定出目标物体相对于基准三维坐标系的位置后,就可以根据各幅图像分别对应的相机姿态信息,将所述三维空间模型分别映射到各幅图像的像平面,这样就可以完成对其他各幅图像中对目标物体的自动标注。其中,在将三维空间模型映射到各幅图像的像平面后,会成为一个二维形状,例如,如果三维空间模型是一个长方体,则在映射回像平面后,会是一个四边形,包括菱形、平行四边形等等。而在具体的标注要求中,通常会需要使用矩形的方式进行标注,因此,在实际应用中,还可以将三维空间模型映射后得到的四边形进行矩形化处理。这样,最终获得的标注效果就是每幅图像训练样本中,都可以对其中的目标物体添加上矩形框,后续就可以通过对矩形框内的图像进行训练学习,建立起具体目标物体的识别模型,以用于在AR等场景中对目标物体进行识别。
总之,在本申请实施例中,是通过相对规则的三维空间模型对目标物体进行标注,相对于目标物体的CAD模型而言,具有更容易获得的优点。另外,具体在利用手动标注的基准图像对其他各幅图像进行自动标注的过程中,是利用上述三维空间模型对基准图像中的目标物体进行手动标注,之后,是根据各幅图像相对于基准图像的相机姿态变化,将三维空间模型重映射回各幅图像对应的像平面。在此过程中,只要拍摄环境中的特征点足够明显即可实现对相机姿态的识别,也就是说,本申请实施例中,可以基于整个拍摄环境的特征点进行相机姿态识别,进而实现对目标物体的自动标注,而不是对目标物体的特征点进行识别以实现对目标物体的跟踪,因此,即使目标物体本身是纯色、高反 光或透明等情况时,也能够实现对目标物体的自动标注。
实施例二
该实施例二是对实施例一提供的自动标注方法的应用,也即,在完成对图像训练样本中目标物体的自动标注后,可以应用到对目标物体识别模型的创建过程中。具体的,本申请实施例二提供了一种建立目标物体识别模型的方法,参见图7,该方法具体可以包括:
S701:获得图像训练样本,其中包括多幅图像,各幅图像是对同一目标物体进行拍摄获得的,且相邻的图像之间存在相同的环境特征点;各幅图像中还包括对目标物体所在位置的标注信息,所述标注信息通过以下方式获得:将其中一幅图像作为基准图像,并基于基准三维坐标系创建三维空间模型,根据三维空间模型被移动到的位置,确定目标物体在所述基准三维坐标系中的位置信息,并根据所述各幅图像中的环境特征点确定出的各自对应的相机姿态信息,将所述三维空间模型分别映射到各幅图像的像平面;
S702:根据所述图像训练样本中对所述目标物体所在位置的标注信息,生成对所述目标物体的识别模型。
具体实现时,所述目标物体的识别模型应用于增强现实AR互动过程中从拍摄得到的实景图像中识别出目标物体,并确定目标物体在所述实景图像中的位置,以用于根据所述目标物体在所述实景图像中的位置信息,将所述目标物体关联的虚拟图像进行展示。
实施例三
该实施例三是在实施例二的基础上,进一步提供了一种增强现实AR信息提供方法,具体的,参见图8,该方法具体可以包括:
S801:采集实景图像,并利用预先建立的目标物体识别模型从所述实景图像中识别目标物体所在的位置信息,其中,所述目标物体识别模型通过前述实施例二中的方法进行建立;
S802:根据所述目标物体在所述实景图像中的位置信息,确定关联的虚拟图像的展示位置,并对所述虚拟图像进行展示。
具体实现时,当所述目标物体在所述实景图像中的位置发生变化时,则所述虚拟图像的位置跟随所述实景图像的位置变化。
但是,在现有技术中经常出现虚拟图像与真实图像的位置无法同步变化的情况。例如,假设某状态下,虚拟图像与真实图像均位于画面中的A位置,某时刻,由于用户对终端设备进行了移动,使得真实图像被移动到B位置,而虚拟图像却仍然位于A位置, 间隔几秒钟之后才会跟随变化到B位置。如果用户对终端设备进行移动的动作比较频繁或者左右或者上下往复的移动,则会让用户感觉到虚拟图像比较“飘”,展示效果不佳。
为了解决该问题,本申请实施例中还可以通过以下方式实现所述虚拟图像的位置跟随所述实景图像的位置变化:
接收第一线程采集的一帧实景图像信息,暂停所述第一线程的实景图像采集操作;
将所述实景图像信息提供给第二线程,由所述第二线程利用所述目标物体识别模型从所述实景图像中识别目标物体所在的位置信息,并根据所述目标物体在所述实景图像中的位置信息,确定关联的虚拟图像的展示位置;
指示第三线程对所述第一线程采集的实景图像以及所述第二线程生成的虚拟图像进行合成及绘制,并指示所述第一线程执行下一帧图像的采集操作。
也就是说,通过对第一线程采集实景图像的时机进行限制,使得第二线程在根据第一线程采集到的实景图像信息完成虚拟图像的展示属性的确定以及渲染之后,第一线程再进行下一帧实景图像的采集,这使得虚拟图像在AR画面中的位置、大小等展示属性可以是严格按照目标实景图像当前在画面中的位置、大小等展示属性来确定的,并同时进行绘制,因此,不会出现根据相机线程采集到的前几帧的实景图像进行虚拟图像绘制的情况,可以实现虚拟图像与实景图像在AR画面中位置、大小等展示属性的同步变化,避免在终端设备发生移动等情况时导致的虚拟图像发“飘”的现象发生,提高AR画面的质量及展示效果。
与实施例一相对应,本申请实施例还提供了一种对图像内目标物体进行自动标注的装置,参见图9,该装置具体可以包括:
训练样本获得单元901,用于获得图像训练样本,其中包括多幅图像,各幅图像是对同一目标物体进行拍摄获得的,且相邻的图像之间存在相同的环境特征点;
三维空间模型创建单元902,用于将其中一幅图像作为基准图像,并确定基准坐标系,基于所述基准三维坐标系创建三维空间模型;
位置信息确定单元903,用于在所述三维空间模型被移动到所述基准图像内目标物体所在的位置时,确定所述目标物体在所述基准三维坐标系中的位置信息;
映射单元904,用于根据所述目标物体在所述基准三维坐标系中的位置信息,以及根据所述各幅图像中的环境特征点确定出的各自对应的相机姿态信息,将所述三维空间模型分别映射到各幅图像的像平面。
具体实现时,该装置还可以包括:
预处理单元,用于对所述图像训练样本进行预处理,所述预处理包括:确定基准三维坐标系,并根据所述基准三维坐标系以及所述环境特征点,确定各幅图像分别对应的相机姿态信息。
具体的,所述预处理单元具体可以用于:
利用基于视觉的并发建图与定位SLAM技术对各幅图像帧的环境特征点信息进行分析,根据分析结果确定各幅图像分别对应的相机姿态信息。
其中,所述三维空间模型被移动到所述基准图像内目标物体所在的位置时,所述目标物体位于所述三维空间模型内。
具体实现时,所述训练样本获得单元具体可以用于:
获得目标视频文件,将该视频文件中的多帧图像作为图像训练样本;其中,所述目标视频文件为在目标环境中对目标物体进行拍摄所获得的。
其中,可以通过以下方式创建基准三维坐标系:
将所述视频文件中第一帧图像的相机坐标系作为所述基准三维坐标系。
或者,所述目标视频文件通过以下方式进行拍摄:在所述目标环境中放置所述目标物体,以及带有平面结构的标志物,所述标志物的所述平面与地平面平行,先将镜头对准所述标志物,再将镜头移动到所述目标物体的位置进行拍摄;
此时,可以通过以下方式创建基准三维坐标系:
根据所述视频文件的前几帧中所述标志物所在的平面建立所述基准三维坐标系。
更为具体的,可以以所述标志物所在平面的中心点为原点,以所述平面为x-y面,并按右手系规则,建立所述基准三维坐标系。
其中,所述带有平面结构的标志物包括显示有预置图样的纸件。
所述视频文件可以通过以下方式拍摄获得:将所述目标物体的位置固定不动,用视频拍摄设备对所述目标物体进行环绕一周的拍摄。
具体实现时,所述位置信息确定单元具体可以用于:
确定所述目标物体在所述基准三维坐标系中三个维度上的位移自由度、转动自由度,以及所述三维空间模型在三个维度上的大小信息。
其中,所述三维空间模型包括:长方体模型。
另外,该装置还可以包括:
矩形化处理单元,用于在将所述三维空间模型分别映射到各幅图像的像平面之后,将所述三维空间模型映射后得到的四边形进行矩形化处理。
其中,在目标物体的结构相对较复杂的情况下,所述三维空间模型还可以包括:由多个长方体模型组合而成的组合体模型。
与实施例二相对应,本申请实施例还提供了一种建立目标物体识别模型的装置,参见图10,该装置具体可以包括:
图像训练样本获得单元1001,用于获得图像训练样本,其中包括多幅图像,各幅图像是对同一目标物体进行拍摄获得的,且相邻的图像之间存在相同的环境特征点;各幅图像中还包括对目标物体所在位置的标注信息,所述标注信息通过以下方式获得:将其中一幅图像作为基准图像,并基于基准三维坐标系创建三维空间模型,根据三维空间模型被移动到的位置,确定目标物体在所述基准三维坐标系中的位置信息,并根据所述各幅图像中的环境特征点确定出的各自对应的相机姿态信息,将所述三维空间模型分别映射到各幅图像的像平面;
识别模型生成单元1002,用于根据所述图像训练样本中对所述目标物体所在位置的标注信息,生成对所述目标物体的识别模型。
其中,所述目标物体的识别模型应用于增强现实AR互动过程中从拍摄得到的实景图像中识别出目标物体,并确定目标物体在所述实景图像中的位置,以用于根据所述目标物体在所述实景图像中的位置信息,将所述目标物体关联的虚拟图像进行展示。
与实施例三相对应,本申请实施例还提供了一种增强现实AR信息提供装置,参见图11,该装置具体可以包括:
实景图像采集单元1101,用于采集实景图像,并利用预先建立的目标物体识别模型从所述实景图像中识别目标物体所在的位置信息,其中,所述目标物体识别模型通过前述实施例二中提供的方法进行建立;
虚拟图像展示单元1102,用于根据所述目标物体在所述实景图像中的位置信息,确定关联的虚拟图像的展示位置,并对所述虚拟图像进行展示。
具体实现时,该装置还可以包括:
同步变化单元,用于所述目标物体在所述实景图像中的位置发生变化时,则所述虚拟图像的位置跟随所述实景图像的位置变化。
其中,可以通过以下方式实现所述虚拟图像的位置跟随所述实景图像的位置变化:
接收第一线程采集的一帧实景图像信息,暂停所述第一线程的实景图像采集操作;
将所述实景图像信息提供给第二线程,由所述第二线程利用所述目标物体识别模型从所述实景图像中识别目标物体所在的位置信息,并根据所述目标物体在所述实景图像 中的位置信息,确定关联的虚拟图像的展示位置;
指示第三线程对所述第一线程采集的实景图像以及所述第二线程生成的虚拟图像进行合成及绘制,并指示所述第一线程执行下一帧图像的采集操作。
另外,本申请实施例还提供了一种计算机系统,包括:
一个或多个处理器;以及
与所述一个或多个处理器关联的存储器,所述存储器用于存储程序指令,所述程序指令在被所述一个或多个处理器读取执行时,执行如下操作:
获得图像训练样本,其中包括多幅图像,各幅图像是对同一目标物体进行拍摄获得的,且相邻的图像之间存在相同的环境特征点;
将其中一幅图像作为基准图像,并确定基准坐标系,基于所述基准三维坐标系创建三维空间模型;
在所述三维空间模型被移动到所述基准图像内目标物体所在的位置时,确定所述目标物体在所述基准三维坐标系中的位置信息;
根据所述目标物体在所述基准三维坐标系中的位置信息,以及根据所述各幅图像中的环境特征点确定出的各自对应的相机姿态信息,将所述三维空间模型分别映射到各幅图像的像平面。
其中,图12示例性的展示出了计算机系统的架构,具体可以包括处理器1210,视频显示适配器1211,磁盘驱动器1212,输入/输出接口1213,网络接口1214,以及存储器1220。上述处理器1210、视频显示适配器1211、磁盘驱动器1212、输入/输出接口1213、网络接口1214,与存储器1220之间可以通过通信总线1230进行通信连接。
其中,处理器1210可以采用通用的CPU(Central Processing Unit,中央处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit,ASIC)、或者一个或多个集成电路等方式实现,用于执行相关程序,以实现本申请所提供的技术方案。
存储器1220可以采用ROM(Read Only Memory,只读存储器)、RAM(Random Access Memory,随机存取存储器)、静态存储设备,动态存储设备等形式实现。存储器1220可以存储用于控制计算机系统1200运行的操作系统1221,用于控制计算机系统1200的低级别操作的基本输入输出系统(BIOS)。另外,还可以存储网页浏览器1223,数据存储管理系统1224,以及图像标注系统1225等等。上述图像标注系统1225就可以是本申请实施例中具体实现前述各步骤操作的应用程序。总之,在通过软件或者固件来实现本申请所提供的技术方案时,相关的程序代码保存在存储器1220中,并由处理器1210来调 用执行。
输入/输出接口1213用于连接输入/输出模块,以实现信息输入及输出。输入输出/模块可以作为组件配置在设备中(图中未示出),也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等,输出设备可以包括显示器、扬声器、振动器、指示灯等。
网络接口1214用于连接通信模块(图中未示出),以实现本设备与其他设备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。
总线1230包括一通路,在设备的各个组件(例如处理器1210、视频显示适配器1211、磁盘驱动器1212、输入/输出接口1213、网络接口1214,与存储器1220)之间传输信息。
另外,该计算机系统1200还可以从虚拟资源对象领取条件信息数据库1241中获得具体领取条件的信息,以用于进行条件判断,等等。
需要说明的是,尽管上述设备仅示出了处理器1210、视频显示适配器1211、磁盘驱动器1212、输入/输出接口1213、网络接口1214,存储器1220,总线1230等,但是在具体实施过程中,该设备还可以包括实现正常运行所必需的其他组件。此外,本领域的技术人员可以理解的是,上述设备中也可以仅包含实现本申请方案所必需的组件,而不必包含图中所示的全部组件。
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例或者实施例的某些部分所述的方法。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统或系统实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的系统及系统实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。 本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
以上对本申请所提供的对图像内目标物体进行自动标注的方法、装置及系统,进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处。综上所述,本说明书内容不应理解为对本申请的限制。

Claims (23)

  1. 一种对图像内目标物体进行自动标注的方法,其特征在于,包括:
    获得图像训练样本,其中包括多幅图像,各幅图像是对同一目标物体进行拍摄获得的,且相邻的图像之间存在相同的环境特征点;
    将其中一幅图像作为基准图像,并确定基准坐标系,基于所述基准三维坐标系创建三维空间模型;
    在所述三维空间模型被移动到所述基准图像内目标物体所在的位置时,确定所述目标物体在所述基准三维坐标系中的位置信息;
    根据所述目标物体在所述基准三维坐标系中的位置信息,以及根据所述各幅图像中的环境特征点确定出的各自对应的相机姿态信息,将所述三维空间模型分别映射到各幅图像的像平面。
  2. 根据权利要求1所述的方法,其特征在于,还包括:
    对所述图像训练样本进行预处理,所述预处理包括:确定基准三维坐标系,并根据所述基准三维坐标系以及所述环境特征点,确定各幅图像分别对应的相机姿态信息。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述基准三维坐标系,确定各幅图像分别对应的相机姿态信息,包括:
    利用基于视觉的并发建图与定位SLAM技术对各幅图像帧的环境特征点信息进行分析,根据分析结果确定各幅图像分别对应的相机姿态信息。
  4. 根据权利要求1所述的方法,其特征在于,所述三维空间模型被移动到所述基准图像内目标物体所在的位置时,所述目标物体位于所述三维空间模型内。
  5. 根据权利要求1所述的方法,其特征在于,所述获得图像训练样本,包括:
    获得目标视频文件,将该视频文件中的多帧图像作为图像训练样本;其中,所述目标视频文件为在目标环境中对目标物体进行拍摄所获得的。
  6. 根据权利要求5所述的方法,其特征在于,所述确定基准三维坐标系,包括:
    将所述视频文件中第一帧图像的相机坐标系作为所述基准三维坐标系。
  7. 根据权利要求5所述的方法,其特征在于,所述目标视频文件通过以下方式进行拍摄:在所述目标环境中放置所述目标物体,以及带有平面结构的标志物,所述标志物的所述平面与地平面平行,先将镜头对准所述标志物,再将镜头移动到所述目标物体的位置进行拍摄;
    所述确定基准三维坐标系,包括:
    根据所述视频文件的前几帧中所述标志物所在的平面建立所述基准三维坐标系。
  8. 根据权利要求7所述的方法,其特征在于,所述根据所述标志物所在的平面建立所述基准三维坐标系,包括:
    以所述标志物所在平面的中心点为原点,以所述平面为x-y面,并按右手系规则,建立所述基准三维坐标系。
  9. 根据权利要求7所述的方法,其特征在于,所述带有平面结构的标志物包括显示有预置图样的纸件。
  10. 根据权利要求5所述的方法,其特征在于,所述视频文件通过以下方式拍摄获得:将所述目标物体的位置固定不动,用视频拍摄设备对所述目标物体进行环绕一周的拍摄。
  11. 根据权利要求1所述的方法,其特征在于,所述确定所述目标物体在所述基准三维坐标系中的位置信息,包括:
    确定所述目标物体在所述基准三维坐标系中三个维度上的位移自由度、转动自由度,以及所述三维空间模型在三个维度上的大小信息。
  12. 根据权利要求1所述的方法,其特征在于,所述三维空间模型包括:长方体模型。
  13. 根据权利要求12所述的方法,其特征在于,所述将所述三维空间模型分别映射到各幅图像的像平面之后,还包括:
    将所述三维空间模型映射后得到的四边形进行矩形化处理。
  14. 根据权利要求1所述的方法,其特征在于,所述三维空间模型包括:由多个长方体模型组合而成的组合体模型。
  15. 一种建立目标物体识别模型的方法,其特征在于,包括:
    获得图像训练样本,其中包括多幅图像,各幅图像是对同一目标物体进行拍摄获得的,且相邻的图像之间存在相同的环境特征点;各幅图像中还包括对目标物体所在位置的标注信息,所述标注信息通过以下方式获得:将其中一幅图像作为基准图像,并基于基准三维坐标系创建三维空间模型,根据三维空间模型被移动到的位置,确定目标物体在所述基准三维坐标系中的位置信息,并根据所述各幅图像中的环境特征点确定出的各自对应的相机姿态信息,将所述三维空间模型分别映射到各幅图像的像平面;
    根据所述图像训练样本中对所述目标物体所在位置的标注信息,生成对所述目标物 体的识别模型。
  16. 根据权利要求15所述的方法,其特征在于,所述目标物体的识别模型应用于增强现实AR互动过程中从拍摄得到的实景图像中识别出目标物体,并确定目标物体在所述实景图像中的位置,以用于根据所述目标物体在所述实景图像中的位置信息,将所述目标物体关联的虚拟图像进行展示。
  17. 一种增强现实AR信息提供方法,其特征在于,包括:
    采集实景图像,并利用预先建立的目标物体识别模型从所述实景图像中识别目标物体所在的位置信息,其中,所述目标物体识别模型通过权利要求15所述的方法进行建立;
    根据所述目标物体在所述实景图像中的位置信息,确定关联的虚拟图像的展示位置,并对所述虚拟图像进行展示。
  18. 根据权利要求17所述的方法,其特征在于,还包括:
    所述目标物体在所述实景图像中的位置发生变化时,则所述虚拟图像的位置跟随所述实景图像的位置变化。
  19. 根据权利要求18所述的方法,其特征在于,通过以下方式实现所述虚拟图像的位置跟随所述实景图像的位置变化:
    接收第一线程采集的一帧实景图像信息,暂停所述第一线程的实景图像采集操作;
    将所述实景图像信息提供给第二线程,由所述第二线程利用所述目标物体识别模型从所述实景图像中识别目标物体所在的位置信息,并根据所述目标物体在所述实景图像中的位置信息,确定关联的虚拟图像的展示位置;
    指示第三线程对所述第一线程采集的实景图像以及所述第二线程生成的虚拟图像进行合成及绘制,并指示所述第一线程执行下一帧图像的采集操作。
  20. 一种对图像内目标物体进行自动标注的装置,其特征在于,包括:
    训练样本获得单元,用于获得图像训练样本,其中包括多幅图像,各幅图像是对同一目标物体进行拍摄获得的,且相邻的图像之间存在相同的环境特征点;
    三维空间模型创建单元,用于将其中一幅图像作为基准图像,并确定基准坐标系,基于所述基准三维坐标系创建三维空间模型;
    位置信息确定单元,用于在所述三维空间模型被移动到所述基准图像内目标物体所 在的位置时,确定所述目标物体在所述基准三维坐标系中的位置信息;
    映射单元,用于根据所述目标物体在所述基准三维坐标系中的位置信息,以及根据所述各幅图像中的环境特征点确定出的各自对应的相机姿态信息,将所述三维空间模型分别映射到各幅图像的像平面。
  21. 一种建立目标物体识别模型的装置,其特征在于,包括:
    图像训练样本获得单元,用于获得图像训练样本,其中包括多幅图像,各幅图像是对同一目标物体进行拍摄获得的,且相邻的图像之间存在相同的环境特征点;各幅图像中还包括对目标物体所在位置的标注信息,所述标注信息通过以下方式获得:将其中一幅图像作为基准图像,并基于基准三维坐标系创建三维空间模型,根据三维空间模型被移动到的位置,确定目标物体在所述基准三维坐标系中的位置信息,并根据所述各幅图像中的环境特征点确定出的各自对应的相机姿态信息,将所述三维空间模型分别映射到各幅图像的像平面;
    识别模型生成单元,用于根据所述图像训练样本中对所述目标物体所在位置的标注信息,生成对所述目标物体的识别模型。
  22. 一种增强现实AR信息提供装置,其特征在于,包括:
    实景图像采集单元,用于采集实景图像,并利用预先建立的目标物体识别模型从所述实景图像中识别目标物体所在的位置信息,其中,所述目标物体识别模型通过权利要求15所述的方法进行建立;
    虚拟图像展示单元,用于根据所述目标物体在所述实景图像中的位置信息,确定关联的虚拟图像的展示位置,并对所述虚拟图像进行展示。
  23. 一种计算机系统,其特征在于,包括:
    一个或多个处理器;以及
    与所述一个或多个处理器关联的存储器,所述存储器用于存储程序指令,所述程序指令在被所述一个或多个处理器读取执行时,执行如下操作:
    获得图像训练样本,其中包括多幅图像,各幅图像是对同一目标物体进行拍摄获得的,且相邻的图像之间存在相同的环境特征点;
    将其中一幅图像作为基准图像,并确定基准坐标系,基于所述基准三维坐标系创建 三维空间模型;
    在所述三维空间模型被移动到所述基准图像内目标物体所在的位置时,确定所述目标物体在所述基准三维坐标系中的位置信息;
    根据所述目标物体在所述基准三维坐标系中的位置信息,以及根据所述各幅图像中的环境特征点确定出的各自对应的相机姿态信息,将所述三维空间模型分别映射到各幅图像的像平面。
PCT/CN2018/106493 2017-09-29 2018-09-19 对图像内目标物体进行自动标注的方法、装置及系统 WO2019062619A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP18861929.0A EP3690815B1 (en) 2017-09-29 2018-09-19 Method, medium and apparatus for automatically labeling target object within image
JP2020516393A JP7231306B2 (ja) 2017-09-29 2018-09-19 イメージ内のターゲットオブジェクトに自動的にアノテーションするための方法、装置およびシステム
US16/648,285 US11164001B2 (en) 2017-09-29 2018-09-19 Method, apparatus, and system for automatically annotating a target object in images

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710912283.0A CN109584295B (zh) 2017-09-29 2017-09-29 对图像内目标物体进行自动标注的方法、装置及系统
CN201710912283.0 2017-09-29

Publications (1)

Publication Number Publication Date
WO2019062619A1 true WO2019062619A1 (zh) 2019-04-04

Family

ID=65900635

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/106493 WO2019062619A1 (zh) 2017-09-29 2018-09-19 对图像内目标物体进行自动标注的方法、装置及系统

Country Status (6)

Country Link
US (1) US11164001B2 (zh)
EP (1) EP3690815B1 (zh)
JP (1) JP7231306B2 (zh)
CN (1) CN109584295B (zh)
TW (1) TW201915943A (zh)
WO (1) WO2019062619A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110807431A (zh) * 2019-11-06 2020-02-18 上海眼控科技股份有限公司 对象定位方法、装置、电子设备及存储介质
CN113191388A (zh) * 2021-03-31 2021-07-30 中国船舶重工集团公司第七一九研究所 用于目标检测模型训练的图像采集系统及样本生成方法

Families Citing this family (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10628957B2 (en) * 2018-06-18 2020-04-21 Google Llc Vision-enhanced pose estimation
US10854006B2 (en) * 2018-11-15 2020-12-01 Palo Alto Research Center Incorporated AR-enabled labeling using aligned CAD models
CN110210328B (zh) * 2019-05-13 2020-08-07 北京三快在线科技有限公司 在图像序列中标注物体的方法、装置及电子设备
CN110390258A (zh) * 2019-06-05 2019-10-29 东南大学 图像目标三维信息标注方法
US11335021B1 (en) * 2019-06-11 2022-05-17 Cognex Corporation System and method for refining dimensions of a generally cuboidal 3D object imaged by 3D vision system and controls for the same
CN112434548A (zh) * 2019-08-26 2021-03-02 杭州海康威视数字技术股份有限公司 一种视频标注方法及装置
TWI715252B (zh) 2019-10-18 2021-01-01 宏碁股份有限公司 電子裝置及其利用觸控資料的物件資訊辨識方法
CN110866979A (zh) * 2019-11-14 2020-03-06 联想(北京)有限公司 数据处理方法、装置、计算设备以及介质
CN111179271B (zh) * 2019-11-22 2021-05-11 浙江众合科技股份有限公司 一种基于检索匹配的物体角度信息标注方法及电子设备
CN111158463A (zh) * 2019-11-29 2020-05-15 淮北幻境智能科技有限公司 一种基于slam的计算机视觉大空间定位方法及系统
CN111009038B (zh) * 2019-12-03 2023-12-29 上海世长信息科技有限公司 一种基于slam的空间标注方法
CN111127422A (zh) * 2019-12-19 2020-05-08 北京旷视科技有限公司 图像标注方法、装置、系统及主机
CN111401423B (zh) * 2020-03-10 2023-05-26 北京百度网讯科技有限公司 用于自动驾驶车辆的数据处理方法和装置
CN113378606A (zh) * 2020-03-10 2021-09-10 杭州海康威视数字技术股份有限公司 一种标注信息确定方法、装置及系统
CN113066122B (zh) * 2020-05-15 2022-05-13 支付宝(杭州)信息技术有限公司 图像处理方法以及装置
CN111815759B (zh) * 2020-06-18 2021-04-02 广州建通测绘地理信息技术股份有限公司 一种可量测实景图片的生成方法、装置、计算机设备
CN113920189A (zh) 2020-07-08 2022-01-11 财团法人工业技术研究院 同时追踪可移动物体与可移动相机的六自由度方位的方法与系统
TWI793579B (zh) * 2020-07-08 2023-02-21 財團法人工業技術研究院 同時追蹤可移動物體與可移動相機的六自由度方位之方法與系統
CN111898489B (zh) * 2020-07-15 2023-08-08 北京百度网讯科技有限公司 用于标注手掌位姿的方法、装置、电子设备及存储介质
CN111611438B (zh) * 2020-07-24 2020-10-27 支付宝(杭州)信息技术有限公司 图片标注方法、装置、处理设备及系统
US11531829B2 (en) * 2020-07-24 2022-12-20 Accenture Global Solutions Limited Automatic image annotation
CN114092632A (zh) 2020-08-06 2022-02-25 财团法人工业技术研究院 标注方法、应用其的装置、系统、方法及计算机程序产品
CN112348944B (zh) * 2020-10-29 2022-06-28 久瓴(江苏)数字智能科技有限公司 三维模型数据更新方法、装置、计算机设备和存储介质
CN112418335B (zh) * 2020-11-27 2024-04-05 北京云聚智慧科技有限公司 基于连续图像帧跟踪标注的模型训练方法及电子设备
US11869319B2 (en) * 2020-12-31 2024-01-09 Datalogic Usa, Inc. Fixed retail scanner with annotated video and related methods
EP4024005A1 (en) * 2021-01-04 2022-07-06 Aptiv Technologies Limited Method, device, and computer program for determining the change in position and/or orientation of the mobile apparatus
EP4036856A1 (en) * 2021-02-02 2022-08-03 Axis AB Updating of annotated points in a digital image
US20220277472A1 (en) * 2021-02-19 2022-09-01 Nvidia Corporation Single-stage category-level object pose estimation
CN113033426B (zh) * 2021-03-30 2024-03-01 北京车和家信息技术有限公司 动态对象标注方法、装置、设备和存储介质
CN113128382A (zh) * 2021-04-06 2021-07-16 青岛以萨数据技术有限公司 交通路口车道线检测方法及系统
CN113205144B (zh) * 2021-05-13 2022-09-30 北京三快在线科技有限公司 一种模型训练的方法及装置
CN113256802A (zh) * 2021-06-17 2021-08-13 中山大学 一种建筑物的虚拟三维重建及场景创建方法
TWI773476B (zh) * 2021-08-05 2022-08-01 財團法人車輛研究測試中心 特徵點整合定位系統及特徵點整合定位方法
WO2023068527A1 (ko) * 2021-10-18 2023-04-27 삼성전자 주식회사 콘텐트를 식별하기 위한 전자 장치 및 방법
CN115359192B (zh) * 2022-10-14 2023-03-28 阿里巴巴(中国)有限公司 三维重建与商品信息处理方法、装置、设备及存储介质
TWI826189B (zh) * 2022-12-16 2023-12-11 仁寶電腦工業股份有限公司 具六自由度之控制器追蹤系統及方法
TWI830549B (zh) * 2022-12-22 2024-01-21 財團法人工業技術研究院 物件自動化標記方法及其系統

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170061625A1 (en) * 2015-08-26 2017-03-02 Digitalglobe, Inc. Synthesizing training data for broad area geospatial object detection
CN106599051A (zh) * 2016-11-15 2017-04-26 北京航空航天大学 一种基于生成图像标注库的图像自动标注的方法
CN106650705A (zh) * 2017-01-17 2017-05-10 深圳地平线机器人科技有限公司 区域标注方法、装置和电子设备

Family Cites Families (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6858826B2 (en) 1996-10-25 2005-02-22 Waveworx Inc. Method and apparatus for scanning three-dimensional objects
US7098435B2 (en) 1996-10-25 2006-08-29 Frederick E. Mueller Method and apparatus for scanning three-dimensional objects
US8432414B2 (en) * 1997-09-05 2013-04-30 Ecole Polytechnique Federale De Lausanne Automated annotation of a view
US7386799B1 (en) 2002-11-21 2008-06-10 Forterra Systems, Inc. Cinematic techniques in avatar-centric communication during a multi-user online simulation
US7619626B2 (en) 2003-03-01 2009-11-17 The Boeing Company Mapping images from one or more sources into an image for display
CN100377168C (zh) * 2004-06-29 2008-03-26 索尼株式会社 用光学信息进行情形识别的方法及装置
US8823775B2 (en) 2009-04-30 2014-09-02 Board Of Regents, The University Of Texas System Body surface imaging
US20100302233A1 (en) 2009-05-26 2010-12-02 Holland David Ames Virtual Diving System and Method
US9888973B2 (en) 2010-03-31 2018-02-13 St. Jude Medical, Atrial Fibrillation Division, Inc. Intuitive user interface control for remote catheter navigation and 3D mapping and visualization systems
GB201016251D0 (en) 2010-09-28 2010-11-10 Omnisense Ltd Positioning system
EP2560145A3 (en) * 2011-08-18 2017-03-15 Layar B.V. Methods and systems for enabling the creation of augmented reality content
EP2750110B1 (en) 2011-08-24 2020-03-18 Sony Corporation Information processing device, information processing method, and program
CN103959308B (zh) 2011-08-31 2017-09-19 Metaio有限公司 以参考特征匹配图像特征的方法
IN2014CN03498A (zh) * 2011-11-08 2015-10-09 Koninkl Philips Nv
US9928652B2 (en) * 2013-03-01 2018-03-27 Apple Inc. Registration between actual mobile device position and environmental model
CN103218854B (zh) * 2013-04-01 2016-04-20 成都理想境界科技有限公司 在增强现实过程中实现部件标注的方法及增强现实系统
EP3080760A1 (en) 2013-12-09 2016-10-19 Laulagnet, Vincent An apparatus, a system and a method for monitoring a usage of an item
WO2016029939A1 (en) 2014-08-27 2016-03-03 Metaio Gmbh Method and system for determining at least one image feature in at least one image
US9916002B2 (en) 2014-11-16 2018-03-13 Eonite Perception Inc. Social applications for augmented reality technologies
US10043319B2 (en) 2014-11-16 2018-08-07 Eonite Perception Inc. Optimizing head mounted displays for augmented reality
US10445895B2 (en) 2014-11-21 2019-10-15 Apple Inc. Method and system for determining spatial coordinates of a 3D reconstruction of at least part of a real object at absolute spatial scale
US10412373B2 (en) 2015-04-15 2019-09-10 Google Llc Image capture for virtual reality displays
EP3309751B1 (en) * 2015-06-12 2022-04-20 Sony Group Corporation Image processing device, method, and program
DE102016211805A1 (de) 2015-10-09 2017-04-13 Volkswagen Aktiengesellschaft Fusion von Positionsdaten mittels Posen-Graph
US10143526B2 (en) 2015-11-30 2018-12-04 Auris Health, Inc. Robot-assisted driving systems and methods
CN105739704A (zh) * 2016-02-02 2016-07-06 上海尚镜信息科技有限公司 基于增强现实的远程引导方法和系统
US9965870B2 (en) 2016-03-29 2018-05-08 Institut National D'optique Camera calibration method using a calibration target
US10739142B2 (en) 2016-09-02 2020-08-11 Apple Inc. System for determining position both indoor and outdoor
US11080534B2 (en) 2016-11-14 2021-08-03 Lyft, Inc. Identifying objects for display in a situational-awareness view of an autonomous-vehicle environment
US10417781B1 (en) * 2016-12-30 2019-09-17 X Development Llc Automated data capture
US10186049B1 (en) 2017-03-06 2019-01-22 URC Ventures, Inc. Determining changes in object structure over time using mobile device images
US10467454B2 (en) 2017-04-26 2019-11-05 Mashgin Inc. Synchronization of image data from multiple three-dimensional cameras for image recognition
US10699481B2 (en) 2017-05-17 2020-06-30 DotProduct LLC Augmentation of captured 3D scenes with contextual information
US10977818B2 (en) 2017-05-19 2021-04-13 Manor Financial, Inc. Machine learning based model localization system
US20180350055A1 (en) 2017-06-01 2018-12-06 Tesla, Inc. Augmented reality feature detection
US11308673B2 (en) 2018-05-03 2022-04-19 Magic Leap, Inc. Using three-dimensional scans of a physical subject to determine positions and/or orientations of skeletal joints in the rigging for a virtual character
US20200004225A1 (en) 2018-06-29 2020-01-02 Velo3D, Inc. Manipulating one or more formation variables to form three-dimensional objects
EP3830673A4 (en) 2018-07-27 2022-05-04 Magic Leap, Inc. REDUCING POSE SPACE DIMENSIONALITY FOR POSE SPACE DEFORMATION OF A VIRTUAL CHARACTER

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170061625A1 (en) * 2015-08-26 2017-03-02 Digitalglobe, Inc. Synthesizing training data for broad area geospatial object detection
CN106599051A (zh) * 2016-11-15 2017-04-26 北京航空航天大学 一种基于生成图像标注库的图像自动标注的方法
CN106650705A (zh) * 2017-01-17 2017-05-10 深圳地平线机器人科技有限公司 区域标注方法、装置和电子设备

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110807431A (zh) * 2019-11-06 2020-02-18 上海眼控科技股份有限公司 对象定位方法、装置、电子设备及存储介质
CN113191388A (zh) * 2021-03-31 2021-07-30 中国船舶重工集团公司第七一九研究所 用于目标检测模型训练的图像采集系统及样本生成方法
CN113191388B (zh) * 2021-03-31 2023-08-25 中国船舶重工集团公司第七一九研究所 用于目标检测模型训练的图像采集系统及样本生成方法

Also Published As

Publication number Publication date
US11164001B2 (en) 2021-11-02
US20200265231A1 (en) 2020-08-20
CN109584295A (zh) 2019-04-05
EP3690815A4 (en) 2021-05-19
CN109584295B (zh) 2022-08-26
JP7231306B2 (ja) 2023-03-01
EP3690815B1 (en) 2023-10-25
TW201915943A (zh) 2019-04-16
JP2020535509A (ja) 2020-12-03
EP3690815A1 (en) 2020-08-05

Similar Documents

Publication Publication Date Title
WO2019062619A1 (zh) 对图像内目标物体进行自动标注的方法、装置及系统
US11245806B2 (en) Method and apparatus for scanning and printing a 3D object
WO2019242262A1 (zh) 基于增强现实的远程指导方法、装置、终端和存储介质
WO2022068225A1 (zh) 点云标注的方法、装置、电子设备、存储介质及程序产品
CN104574267B (zh) 引导方法和信息处理设备
US20200387745A1 (en) Method of Determining a Similarity Transformation Between First and Second Coordinates of 3D Features
US9516214B2 (en) Information processing device and information processing method
WO2017152803A1 (zh) 图像处理方法和设备
WO2016029939A1 (en) Method and system for determining at least one image feature in at least one image
WO2023093217A1 (zh) 数据标注方法、装置、计算机设备、存储介质和程序
JP2016192132A (ja) 画像認識ar装置並びにその姿勢推定装置及び姿勢追跡装置
CN104933704B (zh) 一种三维立体扫描方法及系统
CN110827392A (zh) 具好的场景易用性的单目图像三维重建方法、系统及装置
CN105701828A (zh) 一种图像处理方法和装置
JP2005256232A (ja) 3dデータ表示方法、装置、およびプログラム
US20180020203A1 (en) Information processing apparatus, method for panoramic image display, and non-transitory computer-readable storage medium
CN113129451B (zh) 基于双目视觉定位的全息三维影像空间定量投影方法
Gupta et al. The universal media book: tracking and augmenting moving surfaces with projected information
CN112073640B (zh) 全景信息采集位姿获取方法及装置、系统
RU2735066C1 (ru) Способ отображения широкоформатного объекта дополненной реальности
WO2023151271A1 (zh) 模型展示方法、装置、电子设备及存储介质
JP6632298B2 (ja) 情報処理装置、情報処理方法及びプログラム
WO2017147826A1 (zh) 智能设备的图像处理方法及装置
WO2023092638A1 (zh) 头戴显示设备的标定方法、装置、设备、系统及存储介质
JP2013149023A (ja) 表示システム、表示プログラム、および表示方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18861929

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020516393

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018861929

Country of ref document: EP

Effective date: 20200429