CN117853581A

CN117853581A - Target object visual positioning method and device for mobile intelligent platform

Info

Publication number: CN117853581A
Application number: CN202410039914.2A
Authority: CN
Inventors: 陈桪; 程健聪; 叶宪法; 黄诗泽; 姚敬松; 孙剑峰; 劳斯德; 谢丛楷
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2024-01-10
Filing date: 2024-01-10
Publication date: 2024-04-09

Abstract

The technical scheme provided by the application is that firstly, through collecting an environment image of a large target scene, the large target scene is divided into a plurality of classification areas, then, target object information is slowly acquired, the classification attribute of each classification area is combined, the target classification area corresponding to the target object is determined, the preliminary positioning of the target object is completed, then, the position information of the target classification area is issued to the mobile intelligent platform, the mobile intelligent platform is enabled to move to the target classification area and then scan the target object in combination with the camera equipment in the mobile intelligent platform, so that the accurate positioning of the target object is completed, and the technical problems of large data processing operation amount and slow response time existing in the complex large scene facing of the existing mobile intelligent platform are solved.

Description

Target object visual positioning method and device for mobile intelligent platform

Technical Field

The application relates to the technical field of robots, in particular to a target object visual positioning method and device for a mobile intelligent platform.

Background

With the progress of technologies of mobile intelligent platforms such as unmanned aerial vehicles and robots, most industries start to put the mobile intelligent platform into daily production and life, for example, logistics storage industry utilizes robots to take and put articles, power industry utilizes robots/unmanned aerial vehicles to carry out equipment inspection, and the use effect of the mobile intelligent platform in daily production and life is further improved, so that the problem of visual positioning of the mobile intelligent platform is one of key points.

Most of the existing visual positioning modes of the mobile intelligent platform are cameras based on the mobile intelligent platform, and the visual positioning of the target object is realized based on the image identification target object shot by the cameras, but the visual positioning effect of the positioning mode under the condition of facing a complex large scene is limited by factors such as hardware level, field environment, shooting angle and the like, and the technical problems of large data processing operand and slow response time exist.

Disclosure of Invention

The application provides a target object visual positioning method and device for a mobile intelligent platform, which are used for solving the technical problems of large data processing operand and slow response time existing in the existing mobile intelligent platform when facing complex large scenes.

In order to solve the above technical problems, a first aspect of the present application provides a method for visually locating a target object for a mobile intelligent platform, including:

acquiring an environment image of the target large scene, and dividing the target large scene into a plurality of classification areas based on the environment image;

acquiring target object information, and determining a target classification area corresponding to a target object based on morphological attributes and classification attributes in the target object information and combining the classification attributes of each classification area;

according to the preset conversion relation between the pixel coordinates and the world coordinates of each fixed camera device, determining first positioning information of the target classification area under the world coordinates, sending the first positioning information to a mobile intelligent platform so that the mobile intelligent platform moves towards the target classification area, and scanning the target classification area through scanning equipment in the mobile intelligent platform so as to determine second positioning information of the target object according to the scanning result.

Preferably, the dividing the target large scene into a plurality of classification areas based on the environmental image specifically includes:

performing binary conversion based on the environment image, and detecting the edges of the photo through a Canny edge detection algorithm;

dividing the detected object region or edge into a plurality of potential regions of interest by a watershed algorithm;

acquiring an edge pixel coordinate point set of the region of interest according to the connected domain, and determining a classification label and centroid coordinates of the region of interest based on objects contained in the region of interest to obtain a classification region.

Preferably, after the mobile intelligent platform moves to the target classification area, the method further comprises:

after the mobile intelligent platform reaches the upper-level classification area of the target classification area, the Euclidean distance between the centroid of the mobile intelligent platform and the centroid of the target classification area is calculated, the two centroid coordinates are used for solving through a trigonometric function theorem to obtain an intersection point of a vector between the two centroid coordinates and the boundary of the target classification area, a mapping point based on the intersection point on a target object coordinate axis is used as a destination of the mobile intelligent platform, and coordinate information of the mapping point is sent to the mobile intelligent platform.

Preferably, determining the second positioning information of the target object according to the scanning result specifically includes:

when the scanning result identifies the target object, calculating a centroid three-dimensional coordinate position point of the target object through the color photo and the depth information of the object, and then combining the relation of the pixel coordinate of the image pickup device to the camera coordinate to obtain the coordinate of the centroid of the target object under the camera coordinate system of the image pickup device as the second positioning information.

Preferably, the method further comprises:

when the target object is not scanned in the target classification area, determining an upper classification area of the target classification area according to the hierarchical relationship of each classification area;

and determining a second classification area except the target classification area in the upper classification area, and scanning the second classification area through scanning equipment in the mobile intelligent platform to obtain a scanning result.

Meanwhile, the second aspect of the present application further provides a target object visual positioning device for a mobile intelligent platform, including:

the scene area dividing unit is used for collecting an environment image of the target large scene and dividing the target large scene into a plurality of classification areas based on the environment image;

the target object initial positioning unit is used for acquiring target object information, and determining a target classification area corresponding to a target object based on morphological attributes and classification attributes in the target object information and combining the classification attributes of each classification area;

the intelligent platform linkage control unit is used for determining first positioning information of the target classification area under the world coordinates according to the preset conversion relation between the pixel coordinates and the world coordinates of each fixed camera device, sending the first positioning information to the mobile intelligent platform so that the mobile intelligent platform moves towards the target classification area, and scanning the target classification area through scanning equipment in the mobile intelligent platform so as to determine second positioning information of the target object according to the scanning result.

Preferably, the dividing the target large scene into a plurality of classification areas based on the environmental image in the scene area dividing unit specifically includes:

Preferably, the method further comprises: centroid distance calculation unit

Preferably, the determining, by the mobile intelligent platform, the second positioning information of the target object according to the scanning result specifically includes:

Preferably, the mobile intelligent platform further comprises: a rescan control unit for:

From the above technical scheme, the application has the following advantages:

according to the technical scheme, the environment images of the large target scene are acquired through the fixed camera equipment arranged in the large target scene, the large target scene is divided into the classified areas, then the target object information is obtained slowly, the target classified areas corresponding to the target object are determined based on the morphological attribute and the classification attribute in the target object information and combined with the classification attribute of each classified area, the preliminary positioning of the target object is completed, then the position information of the target classified areas is issued to the mobile intelligent platform, the mobile intelligent platform is enabled to move to the target classified areas and then conduct target object scanning on the target classified areas by combining the camera equipment in the mobile intelligent platform, and therefore accurate positioning of the target object is completed.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive faculty for a person skilled in the art.

Fig. 1 is a schematic flow chart of an embodiment of a method for visual positioning of a target object for a mobile intelligent platform.

Fig. 2 is a schematic diagram of an environment of a large target scene and an installation mode of a fixed camera device according to an embodiment of a target object visual positioning method for a mobile intelligent platform.

Fig. 3 is a schematic view of region positioning of a kitchen scene of a visual positioning method of an object for a mobile intelligent platform.

Fig. 4 is a simplified schematic diagram of region positioning of a kitchen scene of a method for visual positioning of an object of a mobile intelligent platform according to the present application.

Fig. 5 is a schematic structural diagram of an embodiment of a visual positioning device for a target object of a mobile intelligent platform provided in the present application.

Detailed Description

The embodiment of the application provides a target object visual positioning method and device for a mobile intelligent platform, which are used for solving the technical problems of large data processing operand and slow response time existing in the existing mobile intelligent platform when facing a complex large scene.

In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the embodiments described below are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

Firstly, a detailed description of an embodiment of a method for visually locating a target object for a mobile intelligent platform is provided in the present application, which specifically includes:

referring to fig. 1 to 2, a method for visually positioning a target object for a mobile intelligent platform according to the present embodiment includes:

step 101, acquiring an environment image of a target large scene, and dividing the target large scene into a plurality of classification areas based on the environment image;

it should be noted that, first, how many macro cameras are needed to be used can be planned according to the size of the complex scene. The macro camera is installed at a good position, the macro camera is required to be calibrated, the internal and external parameters of the camera can be obtained through a Zhang Zhengyou calibration method, and after parameter calibration is completed, the macro camera can acquire an environment image of a large scene as an initial template. The obtained initial template is sent to an upper computer, and the automatic interested region extraction operation can be carried out on the target object in the scene through the control system of the upper computer, so that the large target scene is divided into a plurality of classified regions, meanwhile, the shot image can be shot well, some noise on the image can be further removed through Gaussian filtering of the shot image, and the problems of blurring, burrs, noise and the like of the picture due to environmental factors or unavoidable factors inside the camera are avoided, so that the picture is clearer.

In addition to the above-described preferred manner of capturing images with a plurality of fixed image capturing apparatuses preset in a large target scene, in some embodiments, the capturing of the environmental image of the large target scene may be obtained in advance by capturing images with an unmanned aerial vehicle having an image capturing apparatus, for example, the large target scene is an outdoor scene, or a satellite captured image may be used as the environmental image of the large target scene.

102, acquiring target object information, and determining a target classification area corresponding to a target object based on morphological attributes and classification attributes in the target object information and combining the classification attributes of each classification area;

it should be noted that, based on the classification area divided in the previous step, when an object needs to be clamped by the mobile intelligent platform, the object information corresponding to the object can be obtained first, based on the morphological attribute and the classification attribute in the object information, and the classification attribute of each classification area is combined, the object classification area corresponding to the object is determined, for example, soy sauce in a kitchen area is taken as a target, firstly, the upper computer extracts the opposite category from the small to the large to locate the classification area (soy sauce- > sauce area- > kitchen table 1- > kitchen) corresponding to the object in the room, wherein the sauce area can be regarded as the object classification area where soy sauce is located, and kitchen table 1 and kitchen can be the directly upper classification area and the indirectly upper classification area of the sauce area.

Step 103, according to the preset conversion relation between the pixel coordinates and the world coordinates of each fixed camera device, determining the first positioning information of the target classification area under the world coordinates, sending the first positioning information to the mobile intelligent platform so that the mobile intelligent platform moves towards the target classification area, and scanning the target classification area through the scanning device in the mobile intelligent platform to determine the second positioning information of the target object according to the scanning result.

After the target classification area where the target object is located is determined, the position of the target classification area can be issued to the mobile intelligent platform, so that the mobile intelligent platform moves towards the target classification area according to the received instruction, and when the mobile intelligent platform moves to the vicinity of the target classification area, the scanning equipment in the mobile intelligent platform scans the target classification area to determine second positioning information of the target object according to the scanning result, namely, positioning information of the target object under the view angle of the image pickup equipment in the mobile intelligent platform, after the target object is determined, the area relation and the position point of the object are recorded, and then the subsequent clamping operation can be performed.

The scanning device in the mobile intelligent platform preferably adopts an image pickup device, but can also adopt laser radar, millimeter wave radar and other devices to replace.

Compared with the prior art, the visual positioning method provided by the embodiment is more flexible, low in cost, low in operand and simple in processing mode in the face of a complex large environment; meanwhile, the macro (fixed image pickup equipment) micro (image pickup equipment carried on the mobile intelligent platform) composite positioning system of the embodiment can maintain certain precision under the effect of quick positioning.

Based on the above, the method for visually positioning a target object for a mobile intelligent platform according to the present embodiment may further include the following:

further, in step 101 of the present embodiment, based on the environmental image, dividing the large target scene into a plurality of classification areas specifically includes:

acquiring an edge pixel coordinate point set of the region of interest according to the connected region, and determining a classification label and centroid coordinates of the region of interest based on objects contained in the region of interest to obtain a classification region.

It should be noted that, the following examples can be referred to for an implementation manner of automatically dividing the region of interest according to the present embodiment: the image is subjected to binary conversion, the edges of the image are detected by using a Canny edge detection algorithm, the detected object area or edges are divided into potential interesting areas by using a watershed algorithm, pixel coordinate points of the interesting areas are obtained according to connected areas, the areas are labeled manually (namely, the areas represent places), and the data points are saved in a classified mode. After the regions are divided and labeled, the calculation formula for calculating the centroid coordinates of the regions (the center point of the pixel position of the region of interest in the image) can be as follows:

wherein the method comprises the steps ofIs a coordinate point representing a centroid, i is 0 to n, and n is the number of pixel points in the region. The data is then saved and the initialization of the macroscopic camera is ended.

In addition to the above automatic dividing manner, a manual dividing manner may be adopted, for example, the region of interest (such as the position of the doorway on the picture, the position of the kitchen table, the position of the sauce placement, etc.) is manually divided, and then the edge pixel coordinate point set of the divided region of interest (such as a large region- > a middle region- > a small region where the target is located) is stored according to the overlapped partial classification.

Further, after the moving intelligent platform moves to the target classification area in step 103, the method further includes:

after the mobile intelligent platform reaches the upper-level classification area of the target classification area, the Euclidean distance between the centroid of the mobile intelligent platform and the centroid of the target classification area is calculated, the two centroid coordinates are solved through the trigonometric function theorem, so that the intersection point of the vector between the two centroid coordinates and the boundary of the target classification area is obtained, the mapping point of the intersection point on the target object coordinate axis is used as the destination of the mobile intelligent platform, and the coordinate information of the mapping point is sent to the mobile intelligent platform.

Further, the determining the second positioning information of the target object according to the scanning result in step 103 specifically includes:

when the scanning result identifies the target object, calculating a centroid three-dimensional coordinate position point of the target object through the color photo and the depth information of the object, and combining the relation of the pixel coordinate of the image pickup device to the camera coordinate to obtain the coordinate of the centroid of the target object under the camera coordinate system of the image pickup device as second positioning information.

Further, the process of determining the second positioning information of the target object according to the scanning result further includes:

when no object is scanned in the object classification area, determining an upper classification area of the object classification area according to the hierarchical relationship of each classification area;

and determining a second classification area except the target classification area in the upper classification area, and scanning the second classification area through a scanning device in the mobile intelligent platform to obtain a scanning result.

It should be noted that, based on the planar region of interest information, three-dimensional coordinate information is given to the position of the motion channel of the movable intelligent platform from the region of interest to the best.

The method is characterized in that soy sauce in a kitchen area is clamped as a target, the upper computer firstly extracts opposite categories from small to large to locate a large area (soy sauce- > sauce area- > kitchen table 1- > kitchen) of a room, and a kitchen gate coordinate point is sent to a movable intelligent platform through communication to enable the movable intelligent platform to travel to a kitchen gate in other areas according to predetermined map coordinates. When the intelligent platform reaches the kitchen entrance coordinates, the intelligent platform sends communication to the upper computer to control the macro camera to start ROI (region of interest extraction) to extract the pixel region of the entrance, and the pixel region is stored as a template. As shown in fig. 3 and 4, the intelligent platform is in the kitchen area, 21 and 31 are respectively a large-area kitchen table 1 and a sauce area, and 22 is a next small window searching direction which is used for tracking the running direction of the movable intelligent platform. And respectively constructing two coordinate systems according to the centroids of the movable intelligent platform and the sauce areas, wherein the x-axis and the y-axis of the coordinate systems are parallel to the x-axis and the y-axis of the pixel coordinate systems. When the movable intelligent platform reaches the kitchen entrance area, the upper computer calculates the Euclidean distance CO between the centroid of the movable intelligent platform and the centroid of the sauce area (the area where the target is located), and a right triangle is established through the centroid coordinates of the two movable intelligent platforms. The included angle theta can be obtained according to the trigonometric function, if the included angle theta is smaller than 90 degrees and larger than 270 degrees, the shortest distance between the boundary of the right target area and the large area is obtained based on the y-axis of the coordinate system of the area where the target is located as a branching line, and otherwise, the shortest distance on the left side is obtained. And according to the trigonometric function, the length of the CD is obtained, wherein a point C represents soy sauce, namely a target object, a point D is a proper destination coordinate point which is given to be moved by the mobile intelligent platform in a coordinate system of mapping the intersection point A of the CO and the 21 area to the target object, so that the coordinate point is prevented from overlapping with the blocking area. And converting the pixel coordinates of the point D into a world coordinate system and converting the world coordinate system into corresponding coordinate points on the channel, wherein the coordinate points are communicated to the intelligent platform through the upper computer. In the process that the intelligent platform moves to the target coordinate point, the moving condition of the intelligent platform can be tracked through a small window sliding mode, the sensor gives the head position of the intelligent platform relative to a map, the moving direction vector of the intelligent platform is obtained according to the angle difference between the angle theta and the head position, the small window searches nearby pixel points according to the moving direction, if the pixel average value of the small window is close to the intercepted template pixel average value, the moving direction of the intelligent platform is the moving direction of the intelligent platform, and therefore the tracking effect is as shown in the relation between the 22 virtual frame of fig. 4 and the virtual frame of the intelligent platform. When the intelligent platform arrives at the sauce area (the area where the target is located), data is transmitted to the upper computer to indicate that the target point is reached, and a micro camera (a depth camera) on the intelligent platform starts to perform finer objects (such as a target object: soy sauce) on the sauce area (the area where the target is located). Firstly, a picture of the region where the target is located is taken by a microscopic camera (depth camera), and the data information collected by the depth camera mainly comprises a color image and a depth image (the depth image feeds back the distance between the object and the depth camera). The object (soy sauce) is identified by deep learning using the photographed color picture, and the model of the deep learning identification object can be used: SDD, R-CNN, YOLO, etc.; if no target object is identified in the area, the target object is fed back to the upper computer, and the upper computer returns to the next target area to be searched or enlarges the current area to enable the intelligent platform to search in the area (for example, the sauce area cannot be searched, returns to the positions of other small areas in the kitchen table 1 or searches in the area of the kitchen table 1); if the target object is identified in the area, calculating a centroid three-dimensional coordinate position point of the target object through the color photo and the depth information of the object, wherein the relation from the pixel coordinate conversion of the depth camera to the camera coordinate is as follows:

wherein d is information fed back by the depth map, f _y 、f _x For depth camera parameters, Z _c 、X _c 、Y _c For the camera coordinate system, u and v are the pixel coordinate relations of points of the object. After the point relation of the camera coordinate system with the pixel coordinate conversion is obtained, the point relation can be converted into coordinate points of a corresponding world coordinate system through the rotation transformation matrix T, so that the accurate position location of the target object is obtained. After the target is determined, the regional relation and the position point of the object are saved, and the next clamping operation is carried out.

The above is a detailed description of an embodiment of a method for positioning a target object of a mobile intelligent platform, and the following is a detailed description of an embodiment of a device for positioning a target object of a mobile intelligent platform.

Referring to fig. 5, the present embodiment further provides a visual positioning device for a target object of a mobile intelligent platform, including:

a scene area dividing unit 201, configured to collect an environmental image of a large target scene through a plurality of fixed image capturing devices preset in the large target scene, and divide the large target scene into a plurality of classification areas based on the environmental image;

the target object initial positioning unit 202 is configured to obtain target object information, and determine a target classification area corresponding to the target object based on morphological attributes and classification attributes in the target object information and in combination with classification attributes of each classification area;

the intelligent platform linkage control unit 203 is configured to determine, according to a preset conversion relationship between pixel coordinates and world coordinates of each fixed image capturing device, first positioning information of the target classification area under the world coordinates, send the first positioning information to the mobile intelligent platform, so that the mobile intelligent platform moves towards the target classification area, and scan the target classification area through a scanning device in the mobile intelligent platform, so as to determine second positioning information of the target object according to a scanning result.

Further, the division of the target large scene into a plurality of classification areas based on the environmental image in the scene area division unit 201 specifically includes:

Further, the method further comprises the following steps: a centroid distance calculation unit 2031;

Further, the determining, by the mobile intelligent platform, the second positioning information of the target object according to the scanning result specifically includes:

Further, the mobile intelligent platform further comprises: a rescan control unit 204 for:

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus and units described above may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The terms "first," "second," "third," "fourth," and the like in the description of the present application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented, for example, in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in this application, "at least one" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above embodiments are merely for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

1. A method for visually locating a target object for a mobile intelligent platform, comprising:

2. The method for visual positioning of an object for a mobile intelligent platform according to claim 1, wherein the dividing the large scene of the object into a plurality of classification areas based on the environmental image specifically comprises:

3. The method for visual localization of an object for a mobile intelligent platform of claim 1, wherein the moving the mobile intelligent platform to the object classification area further comprises:

4. The method for visually locating a target object on a mobile intelligent platform according to claim 1, wherein determining the second locating information of the target object according to the scan result specifically comprises:

5. The method for visual localization of an object for a mobile intelligent platform of claim 1, further comprising:

6. A target visual positioning device for a mobile intelligent platform, comprising:

7. The visual positioning device for an object of a mobile intelligent platform according to claim 6, wherein the dividing the large object scene into a plurality of classification areas based on the environmental image in the scene area dividing unit specifically comprises:

8. The visual target positioning device for a mobile intelligent platform according to claim 6, further comprising: centroid distance calculation unit

9. The visual positioning device for a mobile intelligent platform according to claim 6, wherein the determining, by the mobile intelligent platform, the second positioning information of the target according to the scan result specifically comprises:

10. The visual target positioning device for a mobile intelligent platform according to claim 6, wherein the mobile intelligent platform further comprises: a rescan control unit for: