CN115661493B

CN115661493B - Method, device, equipment and storage medium for determining object pose

Info

Publication number: CN115661493B
Application number: CN202211692642.3A
Authority: CN
Inventors: 刘嘉宇; 许伟山
Original assignee: Aerospace Cloud Machine Beijing Technology Co ltd
Current assignee: Aerospace Cloud Machine Beijing Technology Co ltd
Priority date: 2022-12-28
Filing date: 2022-12-28
Publication date: 2023-07-04
Anticipated expiration: 2042-12-28
Also published as: CN115661493A

Abstract

The application relates to a method, a device, equipment and a storage medium for determining the pose of an object, wherein the method comprises the steps of acquiring a two-dimensional image containing a target object by adopting a two-dimensional image acquisition unit, and processing the two-dimensional image to obtain a two-dimensional feature point corresponding to the target object; acquiring a projection image set corresponding to the target object, wherein the projection image set is an image obtained by projecting a three-dimensional point cloud model of the target object under different poses; matching the two-dimensional characteristic points with the projection image set to obtain a target projection image; and obtaining the pose of the target object based on the target projection image. Thus, the pose of the target object can be acquired by adopting the two-dimensional image acquisition unit instead of the 3D camera, and the cost of hardware can be reduced.

Description

Method, device, equipment and storage medium for determining object pose

Technical Field

The present invention relates to the field of image processing, and in particular, to a method, an apparatus, a device, and a storage medium for determining a pose of an object.

Background

At present, when a 2D camera is used for acquiring real-time images, a 3D scene is recorded and stored in the form of 2D data, so that depth information is lost, and a real 3D scene cannot be restored. Thus, the 2D camera cannot acquire pose information of an object, such as a target object, and a 3D camera is generally used in the prior art to acquire pose information of an object.

However, the price of the 3D camera is high, which increases the cost of hardware.

Disclosure of Invention

According to a first aspect of the present application, there is provided a method for determining a pose of an object, the method comprising: acquiring a two-dimensional image containing the target object acquired by a two-dimensional image acquisition unit; determining two-dimensional feature points corresponding to the target object based on the two-dimensional image; acquiring a projection image set corresponding to the target object, wherein the projection image set is an image set obtained by projecting a three-dimensional point cloud model of the target object under different poses; respectively matching the projection images in the projection image set with the two-dimensional feature points to obtain a target projection image, wherein the target projection image is the projection image with the highest matching degree with the two-dimensional feature points; and determining the pose of the target object based on the target projection image.

According to a second aspect of the present application, there is provided a device for determining a pose of an object, the device comprising: the two-dimensional image acquisition unit is used for acquiring a two-dimensional image of the target object and sending the acquired two-dimensional image to the processor;

the memory is used for storing a three-dimensional point cloud model corresponding to the target object; the memory sends the three-dimensional point cloud model to the processor when receiving an acquisition instruction of the three-dimensional point cloud model;

the processor is coupled with the two-dimensional image acquisition unit and the memory and is used for processing the two-dimensional image received from the two-dimensional image acquisition unit to obtain a two-dimensional characteristic point corresponding to the target object; the three-dimensional point cloud model is used for projecting the three-dimensional point cloud model received from the memory to obtain projection image sets under different poses; the method comprises the steps of collecting the projection images, and obtaining a two-dimensional characteristic point; and for determining a pose of the target object based on the target projection image.

According to a third aspect of the present application, there is provided a device for determining a pose of an object, the device comprising: a first acquisition module configured to acquire a two-dimensional image including the target object acquired by the two-dimensional image acquisition unit; the first determining module is configured to determine two-dimensional feature points corresponding to the target object based on the two-dimensional image; the second acquisition module is configured to acquire a projection image set corresponding to the target object, wherein the projection image set is an image set obtained by projecting a three-dimensional point cloud model of the target object under different poses; the matching module is configured to match the projection images in the projection image set with the two-dimensional feature points respectively to obtain a target projection image, wherein the target projection image is the projection image with the highest matching degree with the two-dimensional feature points; and the second determining module is used for determining the pose of the target object based on the target projection image.

According to a fourth aspect of the present application, there is provided an electronic device comprising: at least one processor; a memory for storing the at least one processor-executable instruction; wherein the at least one processor is configured to execute the instructions to implement the method according to the first aspect of the present application.

According to a fifth aspect of the present application, there is provided a computer readable storage medium, wherein instructions in the computer readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the method as described in the first aspect of the present application.

According to one or more technical schemes provided by the embodiment of the application, a two-dimensional image which is acquired by a two-dimensional image acquisition unit and contains a target object is acquired, and the two-dimensional image is processed to obtain two-dimensional feature points corresponding to the target object; acquiring a projection image set corresponding to the target object, wherein the projection image set is an image set obtained by projecting a three-dimensional point cloud model of the target object under different poses; matching the two-dimensional characteristic points with the projection image set to obtain a target projection image; and obtaining the pose of the target object based on the target projection image. Thus, the pose of the target object can be acquired by adopting the two-dimensional image acquisition unit instead of the 3D camera, and the cost of hardware can be reduced.

Drawings

Further details, features and advantages of the present application are disclosed in the following description of exemplary embodiments, with reference to the following drawings, wherein:

FIG. 1 is a flowchart of a method for determining a pose of an object according to an exemplary embodiment of the present application;

FIG. 2 is a flowchart of a method for acquiring a projection image set according to an exemplary embodiment of the present application;

FIG. 3 is a flowchart of matching two-dimensional feature points with a projection image set to obtain a target projection image according to an exemplary embodiment of the present application;

FIG. 4 is a schematic block diagram of functional modules of an apparatus for determining a pose of an object according to an exemplary embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present application;

fig. 6 is a block diagram of a computer system according to an exemplary embodiment of the present application.

Detailed Description

Embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present application are shown in the drawings, it is to be understood that the present application may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided to provide a more thorough and complete understanding of the present application. It should be understood that the drawings and examples of the present application are for illustrative purposes only and are not intended to limit the scope of the present application.

It should be understood that the various steps recited in the method embodiments of the present application may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present application is not limited in this respect.

The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below. It should be noted that the terms "first," "second," and the like herein are merely used for distinguishing between different devices, modules, or units and not for limiting the order or interdependence of the functions performed by such devices, modules, or units.

It should be noted that references to "one" or "a plurality" in this application are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be interpreted as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present application are for illustrative purposes only and are not intended to limit the scope of such messages or information.

The embodiment of the application provides a method for determining the pose of an object, which can be executed by an image processing device or a device comprising the image processing device, such as a robot, and is not limited in this application.

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings of the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

Fig. 1 is a flowchart of a method for determining a pose of an object according to an exemplary embodiment of the present application, as shown in fig. 1, including the following steps:

s101, acquiring a two-dimensional image containing a target object acquired by a two-dimensional image acquisition unit.

Embodiments of the present application are not limited to a particular type of target object, and may be, for example, an automobile or a human, animal, etc.

The two-dimensional image acquisition unit may specifically be a monocular two-dimensional image acquisition unit.

S102, determining two-dimensional feature points corresponding to the target object based on the two-dimensional image.

The embodiment of the application does not limit the specific manner of determining the two-dimensional feature point corresponding to the target object based on the two-dimensional image. In a possible implementation manner, based on the two-dimensional image, determining the two-dimensional feature point corresponding to the target object includes:

carrying out semantic segmentation on the two-dimensional image to obtain an area where the target object is located;

and carrying out threshold segmentation on the region where the target object is located to obtain a two-dimensional feature point corresponding to the target object.

In this embodiment, a semantic segmentation model may be used to perform semantic segmentation on the two-dimensional image, so as to obtain an area where the target object is located. The semantic segmentation model is obtained by training an image of the region where the target object is located, for example, the region where the target object is located in the image is manually marked, and the semantic segmentation model is trained by using the marked image, so that the trained semantic segmentation model can accurately segment the region where the target object is located.

In this embodiment, the two-dimensional feature point is a point on the edge of the region where the target object is located, and the two-dimensional feature point corresponding to the target object is obtained by using the difference between the pixel values of the edge point and other regions and adopting a threshold segmentation method.

S103, acquiring a projection image set corresponding to the target object, wherein the projection image set is an image obtained by projecting the three-dimensional point cloud model of the target object under different poses.

The embodiment of the present application does not limit the manner of acquiring the projection image set corresponding to the target object, and in a possible implementation manner, as shown in fig. 2, the acquiring the projection image set corresponding to the target object includes:

s201, acquiring a three-dimensional point cloud model of the target object.

The three-dimensional point cloud model of the target object may be stored in a certain area, such as a storage module, in advance, and may be acquired from the area when needed. The three-dimensional point cloud model of the target object can be obtained through shooting by a 3D camera.

S202, determining a projection observation point.

In this step, the projected observation points are determined, and the number of the projected observation points may be plural or one. For the case that a plurality of observation points are projected, the state of the three-dimensional point cloud model can be fixed, namely, a projection image of the three-dimensional point cloud model under different poses is obtained by changing the mode of projecting the observation points. Of course, several projection viewpoints may be determined, and at each projection viewpoint, the three-dimensional point cloud model is translated and/or rotated to obtain a set of projection images. For the case that the projection observation point is one, the three-dimensional point cloud model needs to translate and/or rotate to obtain different poses, so as to obtain projection images under different poses.

And S203, based on the projection observation points, carrying out translation and/or rotation on the three-dimensional point cloud model, and then carrying out projection to obtain a projection image set.

In the step, after the projection observation point is determined, the three-dimensional point cloud model can be translated or rotated to obtain different poses, and projection is carried out on the different poses to obtain a projection image set.

In one embodiment, a reference point is arranged in the three-dimensional point cloud model, the reference point is the origin of a reference coordinate system, and the reference coordinate system is the coordinate system in which the three-dimensional point cloud model is positioned; when obtaining the projection image set, the method further comprises the following steps:

and determining the positions of the reference points respectively corresponding to the projection image sets and the characteristic distances respectively corresponding to the projection image sets, wherein the characteristic distances are used for representing the distances between the projection observation points and the reference points. The position of the reference point can represent pose information corresponding to the projection image, and the characteristic distance can represent the observation distance, and the projection image set is the projection image of the three-dimensional point cloud model under different poses and different observation distances.

It will be appreciated that a three-dimensional point cloud model is a series of point cloud data, and that the projected image is also a point set data.

In one embodiment, before the three-dimensional point cloud model is projected, the three-dimensional point cloud model is subjected to coordinate system conversion, and a reference coordinate system where the three-dimensional point cloud model is located is converted into a target coordinate system, wherein the target coordinate system is a coordinate system with a projection observation point as an origin.

For example, assuming that there is a P point in the three-dimensional point cloud model, the coordinates of the P point in the reference coordinate system are (x, y, z), and the coordinates of the P point in the target coordinate system are (x ', y ', z ') =t (x, y, z), where T is a transformation matrix, and elements in the transformation matrix include parameters related to a rotation angle and a translation amount, and when the three-dimensional point cloud model rotates, the parameters related to the rotation angle in the transformation matrix change; when the three-dimensional point cloud model translates, parameters related to translation in the conversion matrix change.

It will be appreciated that the projection described above follows the principle of perspective and the "near-far-small" rule, and that projection images are obtained by projecting a three-dimensional point cloud model onto the XOY plane.

In practical application, the coordinates of each point in the projection image can be obtained based on the coordinates of each point in the three-dimensional point cloud model and the projection matrix.

And S104, respectively matching the projection images in the projection image set with the two-dimensional feature points to obtain a target projection image, wherein the target projection image is the projection image with the highest matching degree with the two-dimensional feature points.

In this step, each of the projection image sets is matched with the two-dimensional feature point in an iterative manner, and the target projection image with the highest matching degree with the two-dimensional feature point in the projection image set is determined.

In an embodiment, as shown in fig. 3, the matching the two-dimensional feature points with the projection image set to obtain the target projection image includes:

and S301, fitting operation is carried out on the projection image set and the two-dimensional feature points respectively, so that a plurality of fitting results are obtained.

The embodiment of the application is not limited to a specific mode of fitting operation, for example, fitting operation can be performed on the projection image and the two-dimensional feature points by using least square fitting operation, or fitting operation can be performed on the projection image and the two-dimensional feature points by using fourier series fitting operation.

In the fitting operation, for each projection image, calculating the minimum variance of the pixel distance between each point in the two-dimensional feature points and each corresponding point in the projection image, and comparing the calculated minimum variance with a preset threshold value to obtain a fitting result, wherein the fitting result can be a specific score.

S302, determining a projection image corresponding to a fitting result meeting preset conditions as the target projection image.

In this step, the preset condition includes that the score is highest, or that the score satisfies a matching threshold.

S105, determining the pose of the target object based on the target projection image.

The embodiments of the present application are not limited to a specific manner of determining the pose of the target object based on the target projection image, and in one possible implementation, the method includes: and determining the pose of the target object based on the position of the reference point corresponding to the target projection image. In this embodiment, the position of the reference point may characterize the pose of the target object, as the target object in different poses has different positions of the reference point.

In the embodiment of the application, a two-dimensional image acquisition unit is adopted to acquire a two-dimensional image containing a target object, and the two-dimensional image is processed to acquire a two-dimensional feature point corresponding to the target object; acquiring a projection image set corresponding to the target object, wherein the projection image set is an image obtained by projecting a three-dimensional point cloud model of the target object under different poses; matching the two-dimensional characteristic points with the projection image set to obtain a target projection image; and obtaining the pose of the target object based on the target projection image. Thus, the pose of the target object can be acquired by adopting the two-dimensional image acquisition unit instead of the 3D camera, and the cost of hardware can be reduced.

The embodiment of the application also discloses a device for determining the pose of the object, which comprises: the two-dimensional image acquisition unit is used for acquiring a two-dimensional image of the target object and transmitting the acquired two-dimensional image to the processor;

the memory is used for storing a three-dimensional point cloud model corresponding to the target object; when the memory receives an acquisition instruction of the three-dimensional point cloud model, the memory sends the three-dimensional point cloud model to the processor;

a processor, coupled to the two-dimensional image acquisition unit and the memory, for processing the two-dimensional image received from the two-dimensional image acquisition unit to obtain a two-dimensional feature point corresponding to the target object; the three-dimensional point cloud model is used for projecting the three-dimensional point cloud model received from the memory to obtain projection image sets under different poses; the method comprises the steps of collecting the projection images, and matching the projection images in the projection image set with the two-dimensional feature points to obtain target projection images; and determining the pose of the target object based on the target projection image.

The two-dimensional image acquisition unit may be, for example, a 2D camera; embodiments of the present application are not limited to a particular type of target object, and may be, for example, an automobile or a human, animal, etc.

The embodiments of the present application are not limited to the specific type of the above memory, and the above memory and the processor may be located in a host computer.

Under the condition that each function module is divided by corresponding each function, the embodiment of the application provides a device for determining the pose of an object. Fig. 4 is a schematic block diagram of functional modules of an apparatus for determining a pose of an object according to an exemplary embodiment of the present application. As shown in fig. 4, the apparatus 400 for determining the pose of the object includes:

a first acquisition module 401 configured to acquire a two-dimensional image including a target object acquired by the two-dimensional image acquisition unit;

a first determining module 402 configured to determine, based on the two-dimensional image, a two-dimensional feature point corresponding to the target object;

a second obtaining module 403, configured to obtain a set of projection images corresponding to the target object, where the set of projection images is an image obtained by projecting a three-dimensional point cloud model of the target object under different poses;

a matching module 404, configured to match the two-dimensional feature points with the set of projection images to obtain a target projection image, where the target projection image is a projection image with the highest matching degree with the two-dimensional feature points;

the second determining module 405 determines a pose of the target object based on the target projection image.

In a possible implementation manner, the second obtaining module 403 is configured to obtain a three-dimensional point cloud model of the target object; determining a projection observation point; and based on the projection observation point, carrying out translation and/or rotation on the three-dimensional point cloud model, and then carrying out projection to obtain the projection image set.

In one possible embodiment, the three-dimensional point cloud model has a reference point therein, the reference point being an origin of a reference coordinate system, the reference coordinate system being a coordinate system in which the three-dimensional point cloud model is located; the apparatus 400 for determining a pose of an object further includes: and a third determining module configured to determine positions of reference points respectively corresponding to the projection image sets, and feature distances respectively corresponding to the projection image sets, the feature distances being used for characterizing distances between the projection observation points and the reference points.

In a possible embodiment, the second determining module 405 is configured to determine a pose of the target object based on a position of the reference point corresponding to the target projection image.

In a possible implementation manner, the first determining module 402 is configured to perform semantic segmentation on the two-dimensional image to obtain an area where the target object is located;

and carrying out threshold segmentation on the region where the target object is located to obtain two-dimensional feature points corresponding to the target object.

In a possible implementation manner, the matching module 404 is configured to perform a fitting operation on the projection image set and the two-dimensional feature points, so as to obtain a plurality of fitting results;

and determining the projection image corresponding to the fitting result meeting the preset condition as the target projection image.

The embodiment of the application also provides electronic equipment, which comprises: at least one processor; a memory for storing the at least one processor-executable instruction; wherein the at least one processor is configured to execute the instructions to implement the method disclosed in the embodiments of the present application.

Fig. 5 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present application. As shown in fig. 5, the electronic device 1800 includes at least one processor 1801 and a memory 1802 coupled to the processor 1801, the processor 1801 may perform corresponding steps in the above-described methods disclosed in embodiments of the present application.

The processor 1801 may also be referred to as a central processing unit (central processing unit, CPU), which may be an integrated circuit chip with signal processing capabilities. The steps of the above method disclosed in the embodiments of the present application may be completed by the integrated logic of hardware in the processor 1801 or by instructions in the form of software. The processor 1801 may be a general purpose processor, a digital signal processor (digital signal processing, DSP), an ASIC, an off-the-shelf programmable gate array (field-programmable gate array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may reside in a memory 1802 such as random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as is well known in the art. The processor 1801 reads the information in the memory 1802 and, in combination with its hardware, performs the steps of the method described above.

In addition, various operations/processes according to the present application, in the case of being implemented by software and/or firmware, may be installed from a storage medium or network to a computer system having a dedicated hardware structure, such as the computer system 1900 shown in fig. 6, which is capable of performing various functions including functions such as those described above, and the like, when various programs are installed. Fig. 6 is a block diagram of a computer system according to an exemplary embodiment of the present application.

Computer system 1900 is intended to represent various forms of digital electronic computing devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.

As shown in fig. 6, the computer system 1900 includes a computing unit 1901, and the computing unit 1901 may perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1902 or a computer program loaded from a storage unit 1908 into a Random Access Memory (RAM) 1903. In the RAM 1903, various programs and data required for the operation of the computer system 1900 may also be stored. The computing unit 1901, ROM 1902, and RAM 1903 are connected to each other via a bus 1904. An input/output (I/O) interface 1905 is also connected to bus 1904.

Various components in computer system 1900 are connected to I/O interface 1905, including: an input unit 1906, an output unit 1907, a storage unit 1908, and a communication unit 1909. The input unit 1906 may be any type of device capable of inputting information to the computer system 1900, and the input unit 1906 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. The output unit 1907 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage unit 1908 may include, but is not limited to, magnetic disks, optical disks. The communication unit 1909 allows the computer system 1900 to exchange information/data with other devices over a network, such as the internet, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.

The computing unit 1901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1901 performs the various methods and processes described above. For example, in some embodiments, the above-described methods disclosed in embodiments of the present application may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1908. In some embodiments, some or all of the computer programs may be loaded and/or installed onto computer system 1900 via ROM 1902 and/or communication unit 1909. In some embodiments, the computing unit 1901 may be configured to perform the above-described methods disclosed by embodiments of the present application in any other suitable manner (e.g., by means of firmware).

The embodiment of the application also provides a computer readable storage medium, wherein when the instructions in the computer readable storage medium are executed by a processor of an electronic device, the electronic device is enabled to execute the method disclosed in the embodiment of the application.

A computer readable storage medium in embodiments of the present application may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium described above can include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specifically, the computer-readable storage medium described above may include one or more wire-based electrical connections, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

The embodiment of the application also provides a computer program product, which comprises a computer program, wherein the computer program is executed by a processor to realize the method disclosed in the embodiment of the application.

In embodiments of the present application, computer program code for performing the operations of the present application may be written in one or more programming languages, including but not limited to an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of remote computers, the remote computers may be connected to the user computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to external computers.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules, components or units referred to in the embodiments of the present application may be implemented by software or hardware. Where the name of a module, component or unit does not in some cases constitute a limitation of the module, component or unit itself.

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

The above description is merely illustrative of some of the embodiments of this application and of the principles of the technology that may be applied. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this application is not limited to the specific combinations of features described above, but it is intended to cover other embodiments in which any combination of features described above or equivalents thereof is possible without departing from the spirit of the disclosure. Such as the above-described features and technical features having similar functions (but not limited to) disclosed in the present application are replaced with each other.

Although some specific embodiments of the present application have been described in detail by way of example, it should be understood by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the present application. It will be appreciated by those skilled in the art that modifications may be made to the above embodiments without departing from the scope and spirit of the present application. The scope of the application is defined by the appended claims.

Claims

1. A method for determining the pose of an object, the method comprising:

acquiring a two-dimensional image containing the target object acquired by a two-dimensional image acquisition unit;

determining two-dimensional feature points corresponding to the target object based on the two-dimensional image;

acquiring a projection image set corresponding to the target object, wherein the projection image set is an image set obtained by projecting a three-dimensional point cloud model of the target object under different poses, and the three-dimensional point cloud model is point cloud data formed by a series of points;

determining the positions of reference points corresponding to the projection images in the projection image set respectively; the reference point is a point of the three-dimensional point cloud model and is an origin of a reference coordinate system, and the reference coordinate system is a coordinate system where the three-dimensional point cloud model is located;

fitting operation is carried out on the projection image set and the two-dimensional characteristic points respectively, so that a plurality of fitting results are obtained; during fitting operation, for each projection image, calculating the minimum variance of pixel distances between each point in the two-dimensional feature points and each corresponding point in the projection image, and comparing the calculated minimum variance with a preset threshold value to obtain a fitting result;

determining a projection image corresponding to a fitting result meeting a preset condition as a target projection image, wherein the target projection image is a projection image with the highest matching degree with the two-dimensional feature points;

determining the pose of the target object based on the position of the reference point corresponding to the target projection image; wherein the position of the reference point may characterize the pose of the target object.

2. The method of claim 1, wherein acquiring the set of projection images corresponding to the target object comprises: acquiring a three-dimensional point cloud model of the target object;

determining a projection observation point;

and based on the projection observation point, carrying out translation and/or rotation on the three-dimensional point cloud model, and then carrying out projection to obtain the projection image set.

3. The method according to claim 2, wherein the method further comprises:

and determining characteristic distances corresponding to the projection images in the projection image set respectively, wherein the characteristic distances are used for representing the distances between the projection observation point and the reference point.

4. The method of claim 1, wherein determining the two-dimensional feature point corresponding to the target object based on the two-dimensional image comprises:

carrying out semantic segmentation on the two-dimensional image to obtain an area where a target object is located;

5. An apparatus for determining a pose of an object, the apparatus comprising:

the two-dimensional image acquisition unit is used for acquiring a two-dimensional image of the target object and sending the acquired two-dimensional image to the processor;

the memory is used for storing a three-dimensional point cloud model corresponding to the target object; the memory sends the three-dimensional point cloud model to the processor when receiving an acquisition instruction of the three-dimensional point cloud model; wherein the three-dimensional point cloud model is point cloud data composed of a series of points;

the processor is coupled with the two-dimensional image acquisition unit and the memory and is used for processing the two-dimensional image received from the two-dimensional image acquisition unit to obtain a two-dimensional characteristic point corresponding to the target object; the three-dimensional point cloud model is used for projecting the three-dimensional point cloud model received from the memory to obtain projection image sets under different poses; and is used for determining the positions of the reference points corresponding to the projection images in the projection image set respectively; the method comprises the steps of obtaining a plurality of fitting results by fitting the projection image set with the two-dimensional feature points respectively, and determining a projection image corresponding to the fitting result meeting preset conditions as a target projection image; and determining the pose of the target object based on the position of the reference point corresponding to the target projection image;

during fitting operation, for each projection image, calculating the minimum variance of pixel distances between each point in the two-dimensional feature points and each corresponding point in the projection image, and comparing the calculated minimum variance with a preset threshold value to obtain a fitting result; the reference point is a point of the three-dimensional point cloud model and is an origin of a reference coordinate system, the reference coordinate system is a coordinate system where the three-dimensional point cloud model is located, and the position of the reference point can represent the pose of the target object.

6. An apparatus for determining a pose of an object, the apparatus comprising:

a first acquisition module configured to acquire a two-dimensional image including the target object acquired by the two-dimensional image acquisition unit;

the first determining module is configured to determine two-dimensional feature points corresponding to the target object based on the two-dimensional image;

the second acquisition module is configured to acquire a projection image set corresponding to the target object, wherein the projection image set is an image set obtained by projecting a three-dimensional point cloud model of the target object under different poses, and the three-dimensional point cloud model is point cloud data formed by a series of points;

a third determining module configured to determine positions of reference points corresponding to the projection images in the projection image set, respectively; the reference point is a point of the three-dimensional point cloud model and is an origin of a reference coordinate system, and the reference coordinate system is a coordinate system where the three-dimensional point cloud model is located;

the matching module is configured to perform fitting operation on the projection image set and the two-dimensional feature points respectively to obtain a plurality of fitting results; determining a projection image corresponding to a fitting result meeting preset conditions as a target projection image; during fitting operation, for each projection image, calculating the minimum variance of pixel distances between each point in the two-dimensional feature points and each corresponding point in the projection image, and comparing the calculated minimum variance with a preset threshold value to obtain a fitting result; the target projection image is the projection image with the highest matching degree with the two-dimensional characteristic points;

the second determining module is used for determining the pose of the target object based on the position of the reference point corresponding to the target projection image; wherein the position of the reference point may characterize the pose of the target object.

7. An electronic device, comprising:

at least one processor;

a memory for storing the at least one processor-executable instruction;

wherein the at least one processor is configured to execute the instructions to implement the method of any of claims 1-4.

8. A computer readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the method according to any of claims 1-4.