WO2023202062A1 - Target docking method based on image recognition and terminal device and medium thereof - Google Patents

Target docking method based on image recognition and terminal device and medium thereof Download PDF

Info

Publication number
WO2023202062A1
WO2023202062A1 PCT/CN2022/132656 CN2022132656W WO2023202062A1 WO 2023202062 A1 WO2023202062 A1 WO 2023202062A1 CN 2022132656 W CN2022132656 W CN 2022132656W WO 2023202062 A1 WO2023202062 A1 WO 2023202062A1
Authority
WO
WIPO (PCT)
Prior art keywords
target object
target
candidate
mobile robot
convolution
Prior art date
Application number
PCT/CN2022/132656
Other languages
French (fr)
Chinese (zh)
Inventor
王雷
陈熙
Original Assignee
深圳市正浩创新科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市正浩创新科技股份有限公司 filed Critical 深圳市正浩创新科技股份有限公司
Publication of WO2023202062A1 publication Critical patent/WO2023202062A1/en

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1656Programme controls characterised by programming, planning systems for manipulators
    • B25J9/1664Programme controls characterised by programming, planning systems for manipulators characterised by motion, path, trajectory planning
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1694Programme controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
    • B25J9/1697Vision controlled systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application belongs to the field of image recognition technology, and in particular relates to a target docking method based on image recognition, a terminal device and its medium.
  • the mobile robot When the mobile robot is working, it is usually driven to reach the designated destination before the mobile robot starts to perform a series of related operations. Generally, the mobile robot determines the destination by identifying the target location of the target object. Since targets are usually small, the detection and recognition of target objects often involves a large amount of calculations, which in turn results in slow, low efficiency, and low recognition accuracy for position recognition of target objects. Due to inaccurate positioning of the target, the mobile robot is prone to yaw, resulting in the inability to accurately reach the location of the target.
  • a target docking method based on image recognition, a terminal device and a medium thereof are provided.
  • the embodiment of this application provides a target docking method based on image recognition, including:
  • the initial feature map is subjected to a preset number of cross-convolution fusions to extract multiple candidate position parameters of the target object in the environment image and the confidence corresponding to each candidate position parameter.
  • the cross-convolution fusion includes the initial feature map Perform different convolution residual processing and fuse the results obtained from different convolution residual processing;
  • the embodiment of the present application provides a target docking device based on image recognition, including:
  • the acquisition module is used to acquire the environment image of the mobile robot while it is moving toward the target;
  • the processing module is used to extract the initial feature map about the target object from the environment image
  • the processing module is also used to perform a preset number of cross-convolution fusions on the initial feature map to extract multiple candidate position parameters of the target object in the environment image and the confidence corresponding to each candidate position parameter, where cross-convolution Fusion includes performing different convolution residual processing on the initial feature map and fusing the results obtained by different convolution residual processing;
  • the processing module is also used to determine the target position information of the target object in the environment image based on the multiple candidate position parameters and the confidence corresponding to each candidate position parameter;
  • the processing module is also used to control the motion state of the mobile robot according to the target position information, so that the mobile robot can dock with the target object.
  • Embodiments of the present application provide a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor executes the computer program, the method in the first aspect is implemented.
  • Embodiments of the present application provide a computer-readable storage medium that stores a computer program.
  • the computer program is executed by a processor, the method of the first aspect is implemented.
  • Embodiments of the present application provide a computer program product, which when the computer program product is run on a terminal device, causes the terminal device to execute the method described in any one of the above first aspects.
  • Figure 1 is a schematic diagram of a mobile robot in an embodiment of the present application.
  • Figure 2 is a flow chart of a target docking method based on image recognition provided by an embodiment of the present application.
  • Figure 3 is an example of extracting an initial feature map in the embodiment of the present application.
  • Figure 4 is a schematic diagram of performing cross-convolution fusion with a preset number of times in an embodiment of the present application.
  • Figure 5 is a detailed flow chart of step 203 when only one cross-convolution fusion is performed in the embodiment of the present application.
  • Figure 6 is an example diagram of the network structure of the first cross-convolution fusion in the embodiment of this application.
  • Figure 7 is a processing flow chart of the i-th cross-convolution fusion and candidate position information extraction steps in the embodiment of the present application.
  • Figure 8 is an example diagram of the network structure of the i-th cross-convolution fusion according to this embodiment of the present application.
  • Figure 9 is a structural example diagram of multiple convolution layers and related functions used to obtain candidate location information in the embodiment of the present application.
  • Figure 10 is a schematic diagram of the sigmoid function in the embodiment of the present application.
  • Figure 11 is an example diagram of feature diagram 1 in the embodiment of the present application.
  • Figure 12 is a schematic diagram of candidate location information in an embodiment of the present application.
  • FIG. 13 is a flow chart of an implementation method for determining target position information of a target object in an environmental image in an embodiment of the present application.
  • Figure 14 is a flow chart for controlling the motion state of the mobile robot in the embodiment of the present application.
  • Figure 15 is one of the schematic diagrams of a target object in an environmental image in an embodiment of the present application.
  • Figure 16 is the second schematic diagram of the target object in the environment image in the embodiment of the present application.
  • Figure 17 is the third schematic diagram of the target object in the environment image in the embodiment of the present application.
  • FIG. 18 is a schematic diagram of the internal modules of a control unit 102 provided by an embodiment of the present application.
  • Figure 19 is a schematic diagram of a terminal device provided by an embodiment of the present application.
  • FIG. 1 is a schematic diagram of a mobile robot in an embodiment of the present application.
  • the mobile robot 100 can be various types of sweeping robots, mopping robots, food delivery robots, transport robots, lawn mowing robots, etc.
  • the embodiment of the present application does not limit the specific type and function of the mobile robot 100. It can be understood that the mobile robot in this embodiment may also include other devices with self-moving functions.
  • the mobile robot 100 is provided with a camera 101 .
  • the camera 101 is used to capture images of the environment around the mobile robot 100 .
  • the camera 101 may be fixed, or may be non-fixed and rotatable, which is not limited in the embodiments of the present application.
  • the environmental images captured by the camera 101 may be color images, black and white images, infrared images, etc., which are not limited in the embodiments of the present application.
  • the camera 101 is connected to the control unit 102 inside the mobile robot 100 .
  • the control unit 102 is also connected to the driving components of the mobile robot 100, such as the steering shaft, steering wheel, motor, etc. of the mobile robot 100, and is used to control the movement, steering, etc. of the mobile robot 100.
  • the control unit 102 can receive the environmental image captured by the camera 101, process the environmental image according to the target docking method based on image recognition provided in the embodiment of the present application, and adjust the forward direction of the mobile robot 100 so that the mobile robot 100 advances towards the target and docks.
  • the target object in the embodiment of the present application may refer to a target shelf, a target charging stand, a target location, etc., which is not limited in the embodiment of the present application.
  • the target object may be a target charging stand.
  • the mobile robot 100 moves toward the target charging base and docks with it to achieve charging.
  • the target docking method based on image recognition provided by the embodiment of the present application will be described in detail below.
  • the target docking method can be implemented by the control unit 102 inside the mobile robot 100 or a cloud platform used to control the mobile robot 100.
  • the embodiment of the present application does not limit the implementation subject of the target docking method. A detailed description will be given below with the control unit 102 as the execution subject.
  • Figure 2 is a flow chart of a target docking method based on image recognition provided by an embodiment of the present application. The process includes steps:
  • the control unit 102 can obtain an environment image through the camera 101 on the mobile robot 100 .
  • the environment image may be a color image, a black and white image, or an infrared image, which is not limited in the embodiments of the present application.
  • the control unit 102 receives a return to home charging instruction.
  • the return-to-home charging instruction may be return-to-home information automatically generated by the control unit 102 when the remaining power of the mobile robot 100 is lower than the preset minimum power threshold, or it may be the mobile phone terminal/cloud platform issuing information for the robot to return to its destination based on user operations.
  • the return-to-home charging command can control the mobile robot 100 to move to the target charging base and obtain an image of the environment through the camera 101 on the mobile robot 100.
  • the control unit 102 may extract an initial feature map about the target object from the environment image through at least one convolution layer.
  • Figure 3 is an example of extracting an initial feature map in the embodiment of the present application.
  • the first item in the convolution layer specification parameter W represents the number of convolution kernels
  • the second item represents the number of channels
  • the third and fourth items represent the size of the convolution kernel
  • the bias parameter B represents the bias value.
  • the bias value is randomly generated for the first time and is subsequently reversely corrected through gradient.
  • the specification parameter W of the convolution layer 301 is ⁇ 32 ⁇ 12 ⁇ 3 ⁇ 3>, which means that the convolution layer 301 uses 32 convolution kernels of 3 ⁇ 3 size for an image with 12 input channels.
  • the specification parameter W of the convolution layer 302 is ⁇ 64 ⁇ 32 ⁇ 1 ⁇ 1>, which means that 64 convolution kernels of 1 ⁇ 1 size are used to perform a convolution operation on an image with 32 input channels.
  • Relu Linear rectification function, linear rectification function
  • the control unit 102 performs a convolution operation on the environment image through the convolution layer 301 and the convolution layer 302 to obtain an initial feature map of the target object.
  • control unit 102 may perform a convolution operation on the environment image through one or more convolution layers to obtain an initial feature map of the target object.
  • the embodiments of this application do not limit the number of convolutional layers.
  • the candidate position information includes multiple candidate position parameters of the target object in the environment image and the confidence corresponding to each candidate position parameter.
  • Figure 4 is a schematic diagram of performing cross-convolution fusion with a preset number of times in an embodiment of the present application.
  • the control unit 102 can perform a preset number of cross-convolution fusions on the initial feature map to obtain candidate position information of the target object.
  • cross-convolution fusion specifically fuses the convolution identification result obtained by convolution processing and the residual identification result obtained by residual processing to obtain the fusion result. Because the more convolution times, that is, the deeper the structure where the convolution layer is located, the higher-level semantic information is extracted at this time. The fewer the convolution times, the shallower the structure where the convolution layer is located, and the low-level semantic information is extracted at this time. semantic information.
  • the convolution recognition results and residuals will contain the low-level semantic information obtained by the convolution processing.
  • the residual recognition results of the processed high-level semantic information are fused to make the feature information richer without adding additional convolutional layers, achieving high performance and improving recognition accuracy through a lightweight network.
  • control unit 102 may perform only one cross-convolution fusion on the initial feature map.
  • Figure 5 is a detailed flow chart of step S203 when only one cross-convolution fusion is performed in the embodiment of the present application. As shown in Figure 5, step S203 may include the following steps:
  • Figure 6 is an example diagram of the network structure of the first cross-convolution fusion in the embodiment of this application.
  • the control unit 102 can perform convolution processing on the initial feature map through the convolution layer 605 to obtain a convolution recognition result.
  • control unit 102 can perform residual processing on the initial feature map through a residual network composed of convolution layer 601, convolution layer 602, convolution layer 603, and convolution layer 604. , get the residual identification results.
  • Add in Figure 6 represents the identity mapping, which is the same as the identity mapping in the traditional residual network, and will not be described in detail in the embodiment of this application.
  • the convolution layer 602 in the residual network composed of the convolution layer 601, the convolution layer 602, the convolution layer 603, the convolution layer 604 and the corresponding identity mapping, the convolution layer 602, the convolution layer 603 and the identity mapping are The mappings form a residual block.
  • the residual network includes one residual block.
  • the residual block in the residual network may include more than one according to actual needs.
  • the embodiment of the present application does not limit the number of residual blocks.
  • control unit 102 can fuse the convolution recognition result output by the convolution layer 605 and the residual recognition result output by the convolution layer 604 through a merged array operation (ie, concat operation). , get the fusion result.
  • a merged array operation ie, concat operation
  • control unit 102 only performs one cross-convolution fusion on the initial feature map. After performing one cross-convolution fusion to obtain the fusion result, the candidate position information of the target object can be extracted from the fusion result, where, The candidate location information includes multiple candidate location parameters in the environment image and the confidence corresponding to each candidate location parameter.
  • control unit 102 may perform multiple cross-convolution fusions on the initial feature maps.
  • the processing flow of the first cross-convolution fusion is similar to the above-mentioned step S2031, step S2032, and step S2033, which will not be described in detail in this embodiment of the present application.
  • Figure 7 shows the i-th cross-convolution fusion and candidate position information extraction steps in the embodiment of the present application.
  • FIG. 8 is an example diagram of the network structure of the i-th cross-convolution fusion according to this embodiment of the present application.
  • the control unit 102 can use the i-1th fusion result as the i-th input feature, and perform convolution processing to obtain the i-th convolution recognition result.
  • the i-1th fusion result is subjected to residual processing to obtain the i-th residual identification result.
  • control unit 102 can combine the Relu function and the identity map through the convolution layer 801, the convolution layer 802, the convolution layer 803, the convolution layer 804, the convolution layer 805, and the convolution layer 806 (i.e., Figure
  • the residual network composed of the Add function and accumulation function shown in 8 performs residual processing on the i-1th fusion result to obtain the i-th residual identification result.
  • the convolutional layer 802, the convolutional layer 803 and the corresponding identity mapping form a residual block
  • the convolutional layer 804, the convolutional layer 805 and the corresponding identity mapping form another residual block.
  • the residual network in the example of Figure 8 includes two residual blocks.
  • the number of residual blocks in the residual network may be one or more according to actual needs.
  • the embodiment of the present application does not limit the number of residual blocks.
  • control unit 102 can fuse the convolution recognition result output by the convolution layer 807 and the residual recognition result output by the convolution layer 806 through a concat operation to obtain a fusion result.
  • the control unit 102 can obtain the candidate position information of the target object from any fusion result (i.e., the i-th time). Preferably, the control unit 102 can obtain more accurate candidate position information of the target object in the K-th fusion result. Therefore, in some embodiments, after K times of cross-convolution fusion, the control unit 102 may obtain candidate position information of the target object from the last (Kth) fusion result.
  • the control unit 102 obtains the candidate position information of the target object from the fusion result is described in detail below.
  • control unit 102 can extract multiple fusion results from the fusion results through multiple convolution layers and related activation functions, such as Relu function (linear correction function), Sigmoid function (S-shaped growth curve function), etc. feature map to obtain the candidate position information of the target object.
  • the fused feature map includes candidate position information.
  • the candidate position information includes multiple candidate position parameters of the target object in the environment image and the confidence corresponding to each candidate position parameter.
  • Control Unit 102 extracts multiple fusion feature maps from the fusion results through multiple convolution layers and related functions. Through the fusion feature maps, multiple candidate position parameters of the target object in the environment image and each candidate can be clearly and accurately represented.
  • FIG 9 is a structural example diagram of multiple convolution layers and related functions used to obtain candidate location information in the embodiment of the present application. As shown in Figure 9, the fusion result after being processed by the convolution layer 901 is processed in three processing methods.
  • the first processing method is: processing through the convolution layer 902, Relu function and convolution layer 903, and normalizing it to between 0 and 1 through the sigmoid function, thereby obtaining the feature map 1.
  • the feature map 1 shows the target object the corresponding confidence level.
  • Figure 10 is a schematic diagram of the sigmoid function in the embodiment of the present application.
  • the abscissa axis x represents the pixel value
  • the ordinate axis y represents probability (confidence).
  • Feature map 1 can be represented by 1 ⁇ 1 ⁇ 128 ⁇ 128, where the first 1 represents an image, and the second 1 represents a parameter, that is, the probability of whether each pixel contains a target object, which can be represented by the confidence obj_value .
  • 128 ⁇ 128 represents the size of the feature map, and the feature map 1 has 128 ⁇ 128 pixels.
  • Figure 11 is an example diagram of feature diagram 1 in the embodiment of the present application. As shown in Figure 11, the feature map 1 has 128 ⁇ 128 pixels, and the parameters on each pixel represent the probability of whether the pixel contains a target object.
  • the second processing method is to process the convolution layer 904, the Relu function and the convolution layer 905 to obtain the feature map 2 and the feature map 3.
  • the output of the convolution layer 905 can be represented by 1 ⁇ 2 ⁇ 128 ⁇ 128.
  • 1 represents an image
  • 2 represents two sets of 128 ⁇ 128 feature map outputs.
  • the value of the feature map represents the size of x and y, represented by x_value and y_value, that is, the number of x_value and y_value is 128.
  • the third processing method is to process the convolution layer 906, the Relu function and the convolution layer 907 to obtain the feature map 4 and the feature map 5.
  • the output of the convolution layer 907 can be represented by 1 ⁇ 2 ⁇ 128 ⁇ 128.
  • 1 represents an image
  • 2 represents the output of two sets of 128 ⁇ 128 feature maps.
  • the value of the feature map represents the size of w and h, represented by w_value and h_value, that is, the number of w_value and h_value is 128.
  • FIG. 12 is a schematic diagram of candidate location information in an embodiment of the present application.
  • the fusion feature map corresponding to obj_value at the bottom is feature map 1 in Figure 9
  • the fusion feature map corresponding to x_value at the top is feature map 2 in Figure 9
  • the feature map corresponding to y_value at the top is the fusion feature in Figure 9
  • the fusion feature map corresponding to the middle w_value is feature map 4 in Figure 9
  • the fusion feature map corresponding to the middle h_value is feature map 5 in Figure 9.
  • the candidate position information includes multiple candidate position parameters of the target in the environment image, that is, the candidate position parameters in feature map 1, feature map 2, feature map 3, and feature map 4, respectively x_value represents the target The x coordinate of the object in the environment image, y_value represents the y coordinate of the target object in the environment image, w_value represents the width of the target object in the environment image, and h_value represents the height of the target object in the environment image.
  • the candidate location information also includes the confidence corresponding to each candidate location parameter, that is, the confidence parameter obj_value in feature map 1.
  • the embodiment of the present application provides an example as shown in Figure 9 for obtaining candidate position information about the target object from the fusion result.
  • the control unit 102 may also obtain candidate position information about the target object from the fusion result through other methods, which is not limited in the embodiments of the present application.
  • the x-coordinate of the above-mentioned target object in the environment image may refer to the x-coordinate of the upper left corner vertex of the target object, or the x-coordinate of the center point of the target object, or a specified relationship with the target object.
  • the coordinates of the point are not limited in the embodiment of this application.
  • the above-mentioned y-coordinate of the target object in the environment image may refer to the y-coordinate of the upper left corner vertex of the target object, or the y-coordinate of the center point of the target object, or the specified point associated with the target object. Coordinates are not limited in the embodiments of this application.
  • the control unit 102 may filter multiple candidate position parameters to a suitable candidate position parameter as the target position information of the target object in the environment image based on the confidence level corresponding to each candidate position parameter. For example, as shown in Figure 12, among all the confidence parameters obj_value of the fusion feature map corresponding to obj_value, assuming that the largest confidence parameter is the fusion feature map corresponding to coordinates (1,1), then the remaining four fusion features can be The candidate position parameters corresponding to coordinates (1,1) are extracted from the figure as the target position information of the target object in the environment image. Therefore, it is an implementation method to select the candidate position parameter corresponding to the maximum confidence parameter as the target position information of the target object in the environment image. This application also provides another implementation method as follows:
  • FIG. 13 is a flow chart of an implementation method for determining target position information of a target object in an environmental image in an embodiment of the present application. The process includes:
  • the control unit 102 can extract the candidate position parameters corresponding to the same position and the confidence corresponding to the candidate position parameters. For example, for the position of coordinates (1,1), in the five fused feature maps as shown in Figure 12 , each fusion feature map selects the parameters corresponding to the coordinate (1,1) position, that is, x 1,1 , y 1,1 , w 1,1 , h 1,1 and obj 1,1 , then the coordinates can be obtained The set of candidate parameters (x 1,1 , y 1,1 , w 1,1 , h 1,1 , obj 1,1 ) corresponding to the (1,1) position. Therefore, in this embodiment of the present application, one position corresponds to one candidate parameter set. For example, the candidate parameter set corresponding to the position of coordinate (1,1) is (x 1,1 , y 1,1 , w 1,1 , h 1,1 , obj 1,1 ).
  • 128 sets of candidate parameters (x_value, y_value, w_value, h_value, obj_value) can be extracted.
  • the amount of data is 128*128*5 data, which can be expressed by the following formula :
  • x 0,0 represents the x_value corresponding to the coordinate (0,0) position, and so on for other parameters, which will not be described here.
  • control unit 102 can extract corresponding confidence levels from all candidate parameter sets, and compare each confidence level with a preset confidence threshold, thereby obtaining a ratio of the confidence level to the preset confidence threshold. to the results. It can be understood that the comparison result between the confidence level and the preset confidence threshold can determine a set of confidence levels greater than the preset confidence threshold, a set of confidence levels smaller than the preset confidence threshold, and a set of confidence levels equal to the preset confidence threshold.
  • control unit 102 may select a candidate parameter set corresponding to a confidence level set greater than the preset confidence threshold as the target candidate parameter set. For example, if the preset confidence threshold is set to 0.7, then the candidate parameter set corresponding to a confidence level greater than 0.7 is used as the target candidate parameter set.
  • the target candidate parameter set can be expressed by the following formula:
  • the target candidate parameter set is an empty set, which means that no target object is found in the current environment image. Then the control unit 102 can control the mobile robot 100 to rotate left or right until a target object is detected in the environment image.
  • the target position information of the target object can be determined through step 2043.
  • the target candidate parameter set includes one candidate parameter set, then the control unit 102 can directly use this candidate parameter set as the target position information of the target object.
  • the control unit 102 can select a candidate position parameter that meets the preset conditions from the target candidate parameter set as a new target candidate parameter set according to the preset conditions. , to achieve further screening.
  • the control unit 102 determines the candidate parameter set corresponding to the maximum confidence as the target candidate parameter set.
  • the target candidate parameter set includes three candidate parameter sets, namely (x 1,1 , y 1,1 , w 1,1 , h 1,1 , obj 1,1 ), (x 2,2 , y 2,2 , w 2,2 , h 2,2 , obj 2,2 ) and (x 3,3 , y 3,3 , w 3,3 , h 3,3 , obj 3,3 ).
  • the control unit 102 detects that among obj 1,1 , obj 2,2 , and obj 3,3 , obj 1,1 has the largest value. Then the control unit 102 can determine the candidate parameter set (x 1,1 , y 1,1 , w 1,1 , h 1,1 , obj 1,1 ) corresponding to obj 1,1 as the target candidate parameter set. In this embodiment, by determining the candidate parameter set corresponding to the maximum confidence level as the target candidate parameter set, better recognition accuracy can be achieved.
  • control unit 102 may determine the target position information of the target object based on the candidate position parameters in the target candidate parameter set. For example, if the target candidate parameter set is (x 1,1 , y 1,1 , w 1,1 , h 1,1 , obj 1,1 ), it can be determined as the x coordinate of the target object in the environment image, y 1 ,1 is the y coordinate of the target object in the environment image, w 1,1 is the width of the target object in the environment image, h 1,1 is the height of the target object in the environment image.
  • suitable candidate location parameters are screened based on the confidence level corresponding to the candidate location parameters, thereby improving the recognition accuracy.
  • S205 Control the motion state of the mobile robot according to the target position information so that the mobile robot can dock with the target object.
  • the control unit 102 after the control unit 102 determines the target position information of the target object in the environment image, it can control the motion state of the mobile robot according to the position information, so that the mobile robot docks with the target object.
  • the motion state may include but is not limited to the direction of the motion posture and the speed of the forward motion.
  • the direction of the motion posture may be to the left, right, forward, or offset by a certain angle.
  • FIG. 14 is a flow chart for controlling the motion state of the mobile robot in the embodiment of the present application. The process includes:
  • the control unit 102 first determines the center point position of the environment image. It can be understood that the control unit 102 can establish a coordinate system using the vertices in the environment image as the origin, for example, using the left vertex of the environment image as the origin, so as to describe the center point position of the environment image through the coordinate system. For example, as shown in Figure 15, the upper left corner vertex of the environment image is used as the origin of the coordinate system.
  • each frame of the environment image is 640*480, then the upper right corner vertex of the environment image is (640,0), the lower left corner vertex is (0,480), the lower right corner vertex is (640,480), and the center point position of the environment image can be (640/2, 480/2) is (320, 240).
  • the control unit 102 can identify the upper left corner coordinate (x 0 , y 0 ) and lower right corner coordinate (x 1 , y ) of the target object. 1 ).
  • the x coordinate and y coordinate are the coordinates of the upper left corner vertex of the target object, then x 0 is the x coordinate, and y 0 is the y coordinate.
  • the coordinate x 1 of the lower right corner is equal to the x coordinate plus the width w, and y 1 is the y coordinate minus the height h.
  • control unit 102 After the control unit 102 recognizes the coordinates of the upper left corner (x 0 , y 0 ) and the lower right corner (x 1 , y 1 ) of the target object, it can calculate the center point position (X c , Y c ) of the target object.
  • the center point position (X c , Y c ) of the target object can be expressed by the formula as follows:
  • the control unit 102 can adjust the motion state of the mobile robot according to the offset between the center point position of the target object and the center point position of the environment image.
  • the positive and negative values of the offset may indicate the direction of the offset. For example, if the offset corresponding to the x-coordinate is positive, it means that the offset direction is to the right; if the offset corresponding to the x-coordinate is negative, it means the offset direction is to the left.
  • the size of the offset can indicate the degree of offset. It can be understood that an offset of 0 means there is no offset.
  • control unit 102 can determine the offset direction and offset size of the target object from the center of the environment image based on the offset between the center point position of the target object and the center point position of the environment image, thereby controlling the mobile robot to move without deviation.
  • the mobile robot moves in a shifting state, allowing the mobile robot to dock with the target object.
  • the offset between the mobile robot and the target object can be accurately identified, thereby adjusting the motion state of the mobile robot according to the offset. Allows the mobile robot to accurately dock with the target.
  • the embodiment of the present application provides a specific implementation method including the following steps: determining the forward direction and forward distance of the mobile robot based on the offset. Adjust the movement posture direction of the mobile robot according to the forward direction. Control the moving distance of the mobile robot along the direction of the motion posture so that the mobile robot can dock with the target object. Among them, if the offset direction is determined to be the left according to the offset, then the forward direction of the mobile robot can be determined to be forward left; if the offset direction is determined to be to the right according to the offset, then the forward direction of the mobile robot can be determined To move forward to the right.
  • the forward distance can be positively related to the size of the offset.
  • the forward direction and forward distance of the mobile robot are determined based on the offset, which can more accurately control the movement attitude direction and forward distance of the mobile robot, and improve the docking effect between the mobile robot and the target object.
  • the mobile robot may be a four-legged mobile robot.
  • the quadruped includes two driving wheels and two steering wheels located on the left and right sides of the quadruped robot.
  • the driving wheel is the rear wheel with driving force
  • the steering wheel is the front wheel without independent power, which can also be called the driven wheel.
  • the height of the recharge base is determined according to the actual physical position and should be within In the field of view of the camera, for example, the range of the height of the charging base can be (0 ⁇ 480). In this example, the height of the charging base is set to 200.
  • the center point of the charging base is equal to 320, that is, the charging base is at the center of the image. This Then drive the quadruped robot forward until it touches the charging base.
  • the center point position of the charging base is less than 320, that is, when the charging base is located to the left of the center point of the image, the quadruped robot is driven to move to the left.
  • the specific driving method includes: the right driving wheel starts to drive to the right, and the left driving wheel does not.
  • Figure 15 is one of the schematic diagrams of a target object in an environmental image in an embodiment of the present application.
  • the origin coordinate is (0,0)
  • the vertex coordinate of the environment image is (640,0).
  • the center point 1502 of the target object is on the right side of the center point 1501 of the environment image. Then the control unit 102 may determine that the forward direction of the mobile robot is forward to the right, and the right arrow shown in FIG. 15 is the forward direction to the right of the mobile robot.
  • Figure 16 is the second schematic diagram of the target object in the environment image in the embodiment of the present application.
  • the origin coordinate is (0,0)
  • the vertex coordinate of the environment image is (640,0).
  • the position of the center point 1602 of the target coincides with the position of the center point 1601 of the environment image, then the control The unit 102 may determine that the forward direction of the mobile robot is forward, and the forward arrow shown in FIG. 16 is the forward direction of the mobile robot.
  • Figure 17 is the third schematic diagram of the target object in the environment image in the embodiment of the present application.
  • the origin coordinate is (0,0)
  • the vertex coordinate of the environment image is (640,0).
  • the center point 1702 of the target object is to the left of the center point 1701 of the environment image. Then the control unit 102 may determine that the forward direction of the mobile robot is forward left, and the left arrow shown in FIG. 17 is the forward left direction of the mobile robot.
  • the control unit 102 can adjust the movement posture direction of the mobile robot according to the forward direction. For example, if the forward direction is forward to the right, the control unit 102 can adjust the steering wheel of the mobile robot to rotate to the right, thereby adjusting the direction of the mobile robot's movement posture. If the forward direction is forward to the left, the control unit 102 can adjust the steering wheel of the mobile robot to rotate to the left, thereby adjusting the direction of the movement attitude of the mobile robot.
  • the steering wheels of the mobile robot may be the two front wheels of the mobile robot or the two rear wheels of the mobile robot, which is not limited in the embodiments of the present application.
  • the mobile robot 100 can be moved toward the target object, thereby allowing the mobile robot 100 to dock with the target object.
  • the target object may be a target charging stand, and the mobile robot 100 moves toward and docks with the target charging stand during the return-to-home charging process to achieve charging.
  • FIG. 18 is a schematic diagram of the internal modules of a control unit 102 provided by an embodiment of the present application.
  • the internal modules of the control unit 102 include:
  • the acquisition module 1801 is used to execute or implement step 201 in the various embodiments corresponding to Figure 2 mentioned above.
  • the processing module 1802 is used to execute or implement steps 202, 203, 204, and 205 in the respective embodiments corresponding to FIG. 2 .
  • FIG 19 is a schematic diagram of a terminal device provided by an embodiment of the present application.
  • the terminal device 1900 may be the mobile robot 100 in the above embodiment.
  • the terminal device 1900 includes a memory 1902, a processor 1901, and a computer program 1903 stored in the memory 1902 and executable on the processor 1901.
  • the processor 1901 executes the computer program 1903, the implementation is shown in Figures 2, 5, 7, and 13 or the method of each embodiment corresponding to Figure 14.
  • Embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program.
  • the steps in each of the above method embodiments can be implemented.
  • Embodiments of the present application provide a computer program product.
  • the steps in each of the above method embodiments can be implemented when the mobile terminal is executed.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium.
  • this application can implement all or part of the processes in the methods of the above embodiments by instructing relevant hardware through a computer program.
  • the computer program can be stored in a computer-readable storage medium.
  • the computer program When executed by a processor, the steps of each of the above method embodiments may be implemented.
  • the computer program includes computer program code, which may be in the form of source code, object code, executable file or some intermediate form.
  • the computer-readable medium may at least include: any entity or device capable of carrying computer program code to the camera device/terminal device, recording media, computer memory, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media.
  • ROM read-only memory
  • RAM random access memory
  • electrical carrier signals telecommunications signals
  • software distribution media For example, U disk, mobile hard disk, magnetic disk or CD, etc.
  • computer-readable media may not be electrical carrier signals and telecommunications signals.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

A target docking method based on image recognition, comprising: acquiring an environment image during the movement of a mobile robot to a target object, extracting an initial feature map related to the target object from the environment image, performing cross-convolution fusion on the initial feature map for a preset number of times, so as to extract multiple candidate position parameters of the target object in the environment image and a confidence corresponding to each candidate position parameter, determining target position information of the target object in the environment image, and controlling a motion state of the mobile robot according to the target position information, such that the mobile robot docks with the target object.

Description

基于图像识别的目标对接方法、终端设备及其介质Target docking method, terminal equipment and its media based on image recognition
相关申请的交叉引用Cross-references to related applications
本申请要求于2022年04月22日提交中国专利局、申请号为202210437107.7、发明名称为“基于图像识别的目标对接方法、装置、设备及其介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requests the priority of the Chinese patent application submitted to the China Patent Office on April 22, 2022, with the application number 202210437107.7 and the invention title "Target docking method, device, equipment and medium based on image recognition", and its entire content incorporated herein by reference.
技术领域Technical field
本申请属于图像识别技术领域,尤其涉及一种基于图像识别的目标对接方法、终端设备及其介质。The present application belongs to the field of image recognition technology, and in particular relates to a target docking method based on image recognition, a terminal device and its medium.
背景技术Background technique
这里的陈述仅提供与本申请有关的背景信息,而不必然地构成示例性技术。The statements herein merely provide background information relevant to the present application and do not necessarily constitute exemplary techniques.
随着现代科技的发展,各类小型移动机器人提高了社会的生产水平。例如,各类扫地机器人、拖地机器人、割草机器人等家用设备为人们的家居生活带来方便,各类运输机器人为工厂运输带来更高的效率。With the development of modern science and technology, various types of small mobile robots have improved the production level of society. For example, various types of household equipment such as sweeping robots, mopping robots, and lawn mowing robots bring convenience to people's home lives, and various types of transportation robots bring higher efficiency to factory transportation.
移动机器人工作时,通常先驱动移动机器人到达指定的目的地后,移动机器人才开始执行一系列的相关作业操作,一般移动机器人通过识别目标物所在的目标位置来确定目的地。由于目标物通常较小,对目标物的检测识别往往涉及的计算量较大,进而导致对目标物的位置识别速度慢、效率低,并且识别精度不高。由于目标物的定位不准确,移动机器人容易发生偏航,导致无法精准到达目标物所在位置。When the mobile robot is working, it is usually driven to reach the designated destination before the mobile robot starts to perform a series of related operations. Generally, the mobile robot determines the destination by identifying the target location of the target object. Since targets are usually small, the detection and recognition of target objects often involves a large amount of calculations, which in turn results in slow, low efficiency, and low recognition accuracy for position recognition of target objects. Due to inaccurate positioning of the target, the mobile robot is prone to yaw, resulting in the inability to accurately reach the location of the target.
发明内容Contents of the invention
根据本申请的各种实施例,提供了一种基于图像识别的目标对接方法、终端设备及其介质。According to various embodiments of the present application, a target docking method based on image recognition, a terminal device and a medium thereof are provided.
本申请实施例提供了一种基于图像识别的目标对接方法,包括:The embodiment of this application provides a target docking method based on image recognition, including:
获取移动机器人向目标物移动过程中的环境图像;Obtain the environment image of the mobile robot as it moves toward the target;
从环境图像中提取关于目标物的初始特征图;Extract initial feature maps about target objects from environmental images;
将初始特征图进行预设次数的交叉卷积融合,以提取目标物在环境图像中的多个候选位置参数以及每个候选位置参数对应的置信度,其中,交叉卷积融合包括对初始特征图进行不同的卷积残差处理以及将不同的卷积残差处理得到的结果进行融合;The initial feature map is subjected to a preset number of cross-convolution fusions to extract multiple candidate position parameters of the target object in the environment image and the confidence corresponding to each candidate position parameter. The cross-convolution fusion includes the initial feature map Perform different convolution residual processing and fuse the results obtained from different convolution residual processing;
根据多个候选位置参数和每个候选位置参数对应的置信度确定目标物在环境图像中的目标位置信息;Determine the target position information of the target object in the environment image based on multiple candidate position parameters and the confidence corresponding to each candidate position parameter;
根据目标位置信息控制移动机器人的运动状态,以使得移动机器人与目标物对接。Control the motion state of the mobile robot according to the target position information so that the mobile robot can dock with the target object.
本申请实施例提供了一种基于图像识别的目标对接装置,包括:The embodiment of the present application provides a target docking device based on image recognition, including:
获取模块,用于获取移动机器人向目标物移动过程中的环境图像;The acquisition module is used to acquire the environment image of the mobile robot while it is moving toward the target;
处理模块,用于从环境图像中提取关于目标物的初始特征图;The processing module is used to extract the initial feature map about the target object from the environment image;
处理模块,还用于将初始特征图进行预设次数的交叉卷积融合,以提取目标物在环境 图像中的多个候选位置参数以及每个候选位置参数对应的置信度,其中,交叉卷积融合包括对初始特征图进行不同的卷积残差处理以及将不同的卷积残差处理得到的结果进行融合;The processing module is also used to perform a preset number of cross-convolution fusions on the initial feature map to extract multiple candidate position parameters of the target object in the environment image and the confidence corresponding to each candidate position parameter, where cross-convolution Fusion includes performing different convolution residual processing on the initial feature map and fusing the results obtained by different convolution residual processing;
处理模块,还用于根据多个候选位置参数和每个候选位置参数对应的置信度确定目标物在环境图像中的目标位置信息;The processing module is also used to determine the target position information of the target object in the environment image based on the multiple candidate position parameters and the confidence corresponding to each candidate position parameter;
处理模块,还用于根据目标位置信息控制移动机器人的运动状态,以使得移动机器人与目标物对接。The processing module is also used to control the motion state of the mobile robot according to the target position information, so that the mobile robot can dock with the target object.
本申请实施例提供了一种终端设备,包括存储器、处理器以及存储在存储器中并可在处理器上运行的计算机程序,处理器执行计算机程序时实现如第一方面的方法。Embodiments of the present application provide a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, the method in the first aspect is implemented.
本申请实施例提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,计算机程序被处理器执行时实现如第一方面的方法。Embodiments of the present application provide a computer-readable storage medium that stores a computer program. When the computer program is executed by a processor, the method of the first aspect is implemented.
本申请实施例提供了一种计算机程序产品,当计算机程序产品在终端设备上运行时,使得终端设备执行上述第一方面中任一项所述的方法。Embodiments of the present application provide a computer program product, which when the computer program product is run on a terminal device, causes the terminal device to execute the method described in any one of the above first aspects.
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其他特征、目的和优点将从说明书、附图以及权利要求书变得明显。The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below. Other features, objects and advantages of the application will become apparent from the description, drawings and claims.
附图说明Description of the drawings
为了更清楚地说明本申请实施例或示例性技术中的技术方案,下面将对实施例或示例性技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他实施例的附图。In order to more clearly illustrate the technical solutions in the embodiments or exemplary technologies of the present application, the drawings required for description of the embodiments or exemplary technologies will be briefly introduced below. Obviously, the drawings in the following description are only These are some embodiments of the present application. For those of ordinary skill in the art, drawings of other embodiments can be obtained based on these drawings without exerting creative efforts.
图1为本申请实施例中移动机器人的示意图。Figure 1 is a schematic diagram of a mobile robot in an embodiment of the present application.
图2为本申请实施例提供的基于图像识别的目标对接方法的流程图。Figure 2 is a flow chart of a target docking method based on image recognition provided by an embodiment of the present application.
图3为本申请实施例中提取初始特征图的一个示例图。Figure 3 is an example of extracting an initial feature map in the embodiment of the present application.
图4为本申请实施例中进行预设次数交叉卷积融合的示意图。Figure 4 is a schematic diagram of performing cross-convolution fusion with a preset number of times in an embodiment of the present application.
图5为本申请实施例中仅进行1次交叉卷积融合时步骤203的详细流程图。Figure 5 is a detailed flow chart of step 203 when only one cross-convolution fusion is performed in the embodiment of the present application.
图6为本申请实施例中第1次交叉卷积融合的网络结构示例图。Figure 6 is an example diagram of the network structure of the first cross-convolution fusion in the embodiment of this application.
图7为本申请实施例中第i次交叉卷积融合以及候选位置信息提取步骤的处理流程图。Figure 7 is a processing flow chart of the i-th cross-convolution fusion and candidate position information extraction steps in the embodiment of the present application.
图8为本申请实施例第i次交叉卷积融合的网络结构示例图。Figure 8 is an example diagram of the network structure of the i-th cross-convolution fusion according to this embodiment of the present application.
图9为本申请实施例中获取候选位置信息所用的多个卷积层以及相关函数的结构示例图。Figure 9 is a structural example diagram of multiple convolution layers and related functions used to obtain candidate location information in the embodiment of the present application.
图10为本申请实施例中sigmoid函数的示意图。Figure 10 is a schematic diagram of the sigmoid function in the embodiment of the present application.
图11为本申请实施例中特征图1的示例图。Figure 11 is an example diagram of feature diagram 1 in the embodiment of the present application.
图12为本申请实施例中候选位置信息的示意图。Figure 12 is a schematic diagram of candidate location information in an embodiment of the present application.
图13为本申请实施例中确定目标物在环境图像中的目标位置信息的一种实施方式的流程图。FIG. 13 is a flow chart of an implementation method for determining target position information of a target object in an environmental image in an embodiment of the present application.
图14为本申请实施例中控制移动机器人运动状态的流程图。Figure 14 is a flow chart for controlling the motion state of the mobile robot in the embodiment of the present application.
图15为本申请实施例中目标物在环境图像中的示意图之一。Figure 15 is one of the schematic diagrams of a target object in an environmental image in an embodiment of the present application.
图16为本申请实施例中目标物在环境图像中的示意图之二。Figure 16 is the second schematic diagram of the target object in the environment image in the embodiment of the present application.
图17为本申请实施例中目标物在环境图像中的示意图之三。Figure 17 is the third schematic diagram of the target object in the environment image in the embodiment of the present application.
图18为本申请实施例提供的一种控制单元102的内部模块示意图。FIG. 18 is a schematic diagram of the internal modules of a control unit 102 provided by an embodiment of the present application.
图19为本申请实施例提供的终端设备的示意图。Figure 19 is a schematic diagram of a terminal device provided by an embodiment of the present application.
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization of the purpose, functional features and advantages of the present application will be further described with reference to the embodiments and the accompanying drawings.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clear, the present application will be further described in detail below with reference to the drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application and are not used to limit the present application.
图1为本申请实施例中移动机器人的示意图。该移动机器人100可以是各类扫地机器人、拖地机器人、送菜机器人、运输机器人、割草机器人等。本申请实施例对移动机器人100的具体类型、功能不作限定。可以理解,本实施例中的移动机器人也还可以包括其他具有自移动功能的设备。Figure 1 is a schematic diagram of a mobile robot in an embodiment of the present application. The mobile robot 100 can be various types of sweeping robots, mopping robots, food delivery robots, transport robots, lawn mowing robots, etc. The embodiment of the present application does not limit the specific type and function of the mobile robot 100. It can be understood that the mobile robot in this embodiment may also include other devices with self-moving functions.
该移动机器人100上设置有摄像头101。摄像头101用于拍摄移动机器人100周围的环境图像。摄像头101可以是固定的,也可以是非固定、可转动的,本申请实施例对此不作限定。摄像头101拍摄的环境图像可以是彩色图像、黑白图像、红外图像等,本申请实施例对此不作限定。The mobile robot 100 is provided with a camera 101 . The camera 101 is used to capture images of the environment around the mobile robot 100 . The camera 101 may be fixed, or may be non-fixed and rotatable, which is not limited in the embodiments of the present application. The environmental images captured by the camera 101 may be color images, black and white images, infrared images, etc., which are not limited in the embodiments of the present application.
摄像头101连接移动机器人100内部的控制单元102。该控制单元102还连接移动机器人100的驱动部件,例如移动机器人100的转向轴、转向轮、电机等,用于控制移动机器人100的移动、转向等。The camera 101 is connected to the control unit 102 inside the mobile robot 100 . The control unit 102 is also connected to the driving components of the mobile robot 100, such as the steering shaft, steering wheel, motor, etc. of the mobile robot 100, and is used to control the movement, steering, etc. of the mobile robot 100.
在本申请实施例中,控制单元102可以接收摄像头101拍摄到的环境图像,并根据本申请实施例提供的基于图像识别的目标对接方法处理环境图像,调整移动机器人100的前进方向,使得移动机器人100朝向目标物前进并对接。本申请实施例中的目标物可以是指目标货架、目标充电座或者目标位置等等,本申请实施例对此不作限定。示例性的,当移动机器人100在返航充电场景下,目标物可以是目标充电座。移动机器人100在返航充电过程中朝向目标充电座前进并对接,以实现充电。In the embodiment of the present application, the control unit 102 can receive the environmental image captured by the camera 101, process the environmental image according to the target docking method based on image recognition provided in the embodiment of the present application, and adjust the forward direction of the mobile robot 100 so that the mobile robot 100 advances towards the target and docks. The target object in the embodiment of the present application may refer to a target shelf, a target charging stand, a target location, etc., which is not limited in the embodiment of the present application. For example, when the mobile robot 100 is in a return-to-home charging scenario, the target object may be a target charging stand. During the return-to-home charging process, the mobile robot 100 moves toward the target charging base and docks with it to achieve charging.
以下将对本申请实施例提供的基于图像识别的目标对接方法进行详细的描述。该目标对接方法可以由移动机器人100内部的控制单元102或用于控制移动机器人100的云平台实施,本申请实施例对该目标对接方法的实施主体不作限定。以下以控制单元102作为执行主体进行详细的描述。The target docking method based on image recognition provided by the embodiment of the present application will be described in detail below. The target docking method can be implemented by the control unit 102 inside the mobile robot 100 or a cloud platform used to control the mobile robot 100. The embodiment of the present application does not limit the implementation subject of the target docking method. A detailed description will be given below with the control unit 102 as the execution subject.
图2为本申请实施例提供的基于图像识别的目标对接方法的流程图。该流程包括步骤:Figure 2 is a flow chart of a target docking method based on image recognition provided by an embodiment of the present application. The process includes steps:
S201、获取移动机器人向目标物移动过程中的环境图像。S201. Obtain the environment image of the mobile robot while it is moving toward the target object.
在本申请实施例中,在移动机器人100向目标物移动过程中,控制单元102可以通过移动机器人100上的摄像头101获取环境图像。可以理解的是,环境图像可以是彩色图像、黑白图像或者是红外图像,本申请实施例对此不作限定。In the embodiment of the present application, during the movement of the mobile robot 100 to the target object, the control unit 102 can obtain an environment image through the camera 101 on the mobile robot 100 . It can be understood that the environment image may be a color image, a black and white image, or an infrared image, which is not limited in the embodiments of the present application.
在移动机器人100返航充电的场景中,控制单元102接收到返航充电指令。该返航充电指令可以是控制单元102在移动机器人100的剩余电量低于预设最低电量阈值时自动生成的返航信息,也可以是手机终端/云平台根据用户操作下发机器人返航至目的地的信息。返航充电指令可以控制移动机器人100向目标充电座移动并通过移动机器人100上的摄像 头101获取环境图像。In the scenario where the mobile robot 100 returns to home for charging, the control unit 102 receives a return to home charging instruction. The return-to-home charging instruction may be return-to-home information automatically generated by the control unit 102 when the remaining power of the mobile robot 100 is lower than the preset minimum power threshold, or it may be the mobile phone terminal/cloud platform issuing information for the robot to return to its destination based on user operations. . The return-to-home charging command can control the mobile robot 100 to move to the target charging base and obtain an image of the environment through the camera 101 on the mobile robot 100.
S202、从环境图像中提取关于目标物的初始特征图。S202. Extract the initial feature map of the target object from the environment image.
在本申请实施例中,控制单元102可以通过至少一层卷积层从环境图像中提取关于目标物的初始特征图。图3为本申请实施例中提取初始特征图的一个示例图。在本申请实施例中,卷积层规格参数W中的第一项表示卷积核的个数,第二项表示通道数,第三项和第四项表示卷积核的大小,偏置参数B表示表示偏置值,偏置值第一次是随机生成,后续通过梯度进行反向修正。如图3所示,卷积层301的规格参数W为<32×12×3×3>,表示卷积层301采用32个3×3大小的卷积核,对输入通道数是12的图像求卷积运算。卷积层302的规格参数W为<64×32×1×1>,表示采用64个1×1大小的卷积核,对输入通道数是32的图像求卷积运算。此外,Relu(Linear rectification function,线性整流函数)为激活函数,本申请实施例对此不作赘述。在如图3的示例中,控制单元102通过卷积层301和卷积层302对环境图像进行求卷积运算,得到关于目标物的初始特征图。In this embodiment of the present application, the control unit 102 may extract an initial feature map about the target object from the environment image through at least one convolution layer. Figure 3 is an example of extracting an initial feature map in the embodiment of the present application. In the embodiment of this application, the first item in the convolution layer specification parameter W represents the number of convolution kernels, the second item represents the number of channels, the third and fourth items represent the size of the convolution kernel, and the bias parameter B represents the bias value. The bias value is randomly generated for the first time and is subsequently reversely corrected through gradient. As shown in Figure 3, the specification parameter W of the convolution layer 301 is <32×12×3×3>, which means that the convolution layer 301 uses 32 convolution kernels of 3×3 size for an image with 12 input channels. Find the convolution operation. The specification parameter W of the convolution layer 302 is <64×32×1×1>, which means that 64 convolution kernels of 1×1 size are used to perform a convolution operation on an image with 32 input channels. In addition, Relu (Linear rectification function, linear rectification function) is an activation function, which will not be described in detail in the embodiment of this application. In the example of FIG. 3 , the control unit 102 performs a convolution operation on the environment image through the convolution layer 301 and the convolution layer 302 to obtain an initial feature map of the target object.
在实际应用中,控制单元102可以通过一层或多层卷积层对环境图像进行求卷积运算,得到关于目标物的初始特征图。本申请实施例对卷积层的层数不作限定。In practical applications, the control unit 102 may perform a convolution operation on the environment image through one or more convolution layers to obtain an initial feature map of the target object. The embodiments of this application do not limit the number of convolutional layers.
S203、将初始特征图进行预设次数的交叉卷积融合,以提取目标物在环境图像中的多个候选位置参数以及每个候选位置参数对应的置信度。S203. Perform a preset number of cross-convolution fusions on the initial feature map to extract multiple candidate position parameters of the target object in the environment image and the confidence level corresponding to each candidate position parameter.
其中,交叉卷积融合用于对初始特征图进行不同的卷积残差处理以及将不同的卷积残差处理得到的结果进行融合。候选位置信息包括目标物在环境图像中的多个候选位置参数以及每个候选位置参数对应的置信度。Among them, cross-convolution fusion is used to perform different convolution residual processing on the initial feature map and fuse the results obtained from different convolution residual processing. The candidate position information includes multiple candidate position parameters of the target object in the environment image and the confidence corresponding to each candidate position parameter.
图4为本申请实施例中进行预设次数交叉卷积融合的示意图。如图4所示,控制单元102可以对初始特征图进行预设次数的交叉卷积融合,从而得到目标物的候选位置信息。在该实施例中,交叉卷积融合具体为将卷积处理得到的卷积识别结果和残差处理得到的残差识别结果进行融合,以得到融合结果。由于卷积次数越多,即卷积层所在的结构越深,此时提取到的是高层语义信息,卷积次数越少,说明卷积层所在的结构越浅,此时提取到的是低层语义信息。同时,由于残差处理通过残差网络中的多个卷积层处理实现,即残差处理得到的语义信息更深,因此,将包含卷积处理得到的低层语义信息的卷积识别结果和残差处理得到的高层语义信息的残差识别结果进行融合,在不需要增加额外的卷积层的情况下,使得特征信息更加丰富,通过轻量网络实现高性能,提高识别精度。Figure 4 is a schematic diagram of performing cross-convolution fusion with a preset number of times in an embodiment of the present application. As shown in FIG. 4 , the control unit 102 can perform a preset number of cross-convolution fusions on the initial feature map to obtain candidate position information of the target object. In this embodiment, cross-convolution fusion specifically fuses the convolution identification result obtained by convolution processing and the residual identification result obtained by residual processing to obtain the fusion result. Because the more convolution times, that is, the deeper the structure where the convolution layer is located, the higher-level semantic information is extracted at this time. The fewer the convolution times, the shallower the structure where the convolution layer is located, and the low-level semantic information is extracted at this time. semantic information. At the same time, since the residual processing is implemented through multiple convolutional layers in the residual network, that is, the semantic information obtained by the residual processing is deeper, therefore, the convolution recognition results and residuals will contain the low-level semantic information obtained by the convolution processing. The residual recognition results of the processed high-level semantic information are fused to make the feature information richer without adding additional convolutional layers, achieving high performance and improving recognition accuracy through a lightweight network.
在一些实施例中,控制单元102可以对初始特征图仅进行一次交叉卷积融合。图5为本申请实施例中仅进行1次交叉卷积融合时步骤S203的详细流程图。如图5所示,步骤S203可以包括以下步骤:In some embodiments, the control unit 102 may perform only one cross-convolution fusion on the initial feature map. Figure 5 is a detailed flow chart of step S203 when only one cross-convolution fusion is performed in the embodiment of the present application. As shown in Figure 5, step S203 may include the following steps:
S2031、将初始特征图作为第1次交叉卷积融合的输入特征,并进行卷积处理,得到卷积识别结果。S2031. Use the initial feature map as the input feature of the first cross-convolution fusion, and perform convolution processing to obtain a convolution recognition result.
图6为本申请实施例中第1次交叉卷积融合的网络结构示例图。在本申请实施例中,如图6所示,控制单元102可以通过卷积层605对初始特征图进行卷积处理,得到卷积识别结果。Figure 6 is an example diagram of the network structure of the first cross-convolution fusion in the embodiment of this application. In the embodiment of the present application, as shown in FIG. 6 , the control unit 102 can perform convolution processing on the initial feature map through the convolution layer 605 to obtain a convolution recognition result.
S2032、将初始特征图经过残差处理,得到残差识别结果。S2032. Perform residual processing on the initial feature map to obtain a residual identification result.
在本申请实施例中,如图6所示,控制单元102可以通过卷积层601、卷积层602、卷积层603、卷积层604组成的残差网络对初始特征图进行残差处理,得到残差识别结果。 可以理解的是图6中的Add表示恒等映射,该恒等映射与传统残差网络中的恒等映射相同,本申请实施例对此不作赘述。In the embodiment of the present application, as shown in Figure 6, the control unit 102 can perform residual processing on the initial feature map through a residual network composed of convolution layer 601, convolution layer 602, convolution layer 603, and convolution layer 604. , get the residual identification results. It can be understood that Add in Figure 6 represents the identity mapping, which is the same as the identity mapping in the traditional residual network, and will not be described in detail in the embodiment of this application.
在本申请实施例中,卷积层601、卷积层602、卷积层603、卷积层604以及对应恒等映射组成的残差网络中,卷积层602、卷积层603以及该恒等映射组成一个残差块。在如图6的示例中,该残差网络包括一个残差块。在实际应用中,根据实际需要,残差网络中的残差块可以包括不止一个,本申请实施例对残差块的数量不做限定。In the embodiment of the present application, in the residual network composed of the convolution layer 601, the convolution layer 602, the convolution layer 603, the convolution layer 604 and the corresponding identity mapping, the convolution layer 602, the convolution layer 603 and the identity mapping are The mappings form a residual block. In the example of Figure 6, the residual network includes one residual block. In practical applications, the residual block in the residual network may include more than one according to actual needs. The embodiment of the present application does not limit the number of residual blocks.
S2033、将卷积识别结果和残差识别结果进行融合,得到融合结果。S2033. Fusion of the convolution recognition result and the residual recognition result to obtain the fusion result.
在本申请实施例中,如图6所示,控制单元102可以通过合并数组运算(即concat运算)将卷积层605输出的卷积识别结果和卷积层604输出的残差识别结果进行融合,得到融合结果。In the embodiment of the present application, as shown in Figure 6, the control unit 102 can fuse the convolution recognition result output by the convolution layer 605 and the residual recognition result output by the convolution layer 604 through a merged array operation (ie, concat operation). , get the fusion result.
S2034、从融合结果中获取目标物在环境图像中的多个候选位置参数以及每个候选位置参数对应的置信度。S2034. Obtain multiple candidate position parameters of the target object in the environment image and the confidence corresponding to each candidate position parameter from the fusion result.
在一些实施例中,控制单元102对初始特征图仅进行一次交叉卷积融合,则在进行了一次交叉卷积融合得到融合结果后,可以从融合结果中提取目标物的候选位置信息,其中,候选位置信息包括在环境图像中的多个候选位置参数以及每个候选位置参数对应的置信度。In some embodiments, the control unit 102 only performs one cross-convolution fusion on the initial feature map. After performing one cross-convolution fusion to obtain the fusion result, the candidate position information of the target object can be extracted from the fusion result, where, The candidate location information includes multiple candidate location parameters in the environment image and the confidence corresponding to each candidate location parameter.
在一些实施例中,控制单元102可以对初始特征图进行多次交叉卷积融合。在进行多次交叉卷积融合的实施例中,第1次交叉卷积融合的处理流程与上述步骤S2031、步骤S2032和步骤S2033类似,本申请实施例对此不作赘述。In some embodiments, the control unit 102 may perform multiple cross-convolution fusions on the initial feature maps. In an embodiment in which multiple cross-convolution fusions are performed, the processing flow of the first cross-convolution fusion is similar to the above-mentioned step S2031, step S2032, and step S2033, which will not be described in detail in this embodiment of the present application.
在进行多次交叉卷积融合的实施例中,第i次交叉卷积融合的处理流程如图7所示,图7为本申请实施例中第i次交叉卷积融合以及候选位置信息提取步骤的处理流程图,其中,2≤i≤K,K为预设次数。该流程包括:In the embodiment where multiple cross-convolution fusions are performed, the processing flow of the i-th cross-convolution fusion is shown in Figure 7. Figure 7 shows the i-th cross-convolution fusion and candidate position information extraction steps in the embodiment of the present application. The processing flow chart of , where 2≤i≤K, K is the preset number of times. The process includes:
S2035、将第i-1次的融合结果作为第i次的输入特征,并进行卷积处理,得到第i次的卷积识别结果。S2035. Use the i-1th fusion result as the i-th input feature, and perform convolution processing to obtain the i-th convolution recognition result.
图8为本申请实施例第i次交叉卷积融合的网络结构示例图。如图8所示,控制单元102可以将第i-1次的融合结果作为第i次的输入特征,并进行卷积处理,得到第i次的卷积识别结果。Figure 8 is an example diagram of the network structure of the i-th cross-convolution fusion according to this embodiment of the present application. As shown in FIG. 8 , the control unit 102 can use the i-1th fusion result as the i-th input feature, and perform convolution processing to obtain the i-th convolution recognition result.
S2036、将第i-1次的融合结果经过残差处理,得到第i次的残差识别结果。S2036. The i-1th fusion result is subjected to residual processing to obtain the i-th residual identification result.
如图8所示,控制单元102可以通过卷积层801、卷积层802、卷积层803、卷积层804、卷积层805、卷积层806结合Relu函数以及恒等映射(即图8所示的Add函数,累加函数)组成的残差网络对第i-1次的融合结果进行残差处理,得到第i次的残差识别结果。As shown in Figure 8, the control unit 102 can combine the Relu function and the identity map through the convolution layer 801, the convolution layer 802, the convolution layer 803, the convolution layer 804, the convolution layer 805, and the convolution layer 806 (i.e., Figure The residual network composed of the Add function and accumulation function shown in 8 performs residual processing on the i-1th fusion result to obtain the i-th residual identification result.
在图8中的残差网络中,卷积层802、卷积层803以及对应恒等映射组成一个残差块,卷积层804、卷积层805以及对应恒等映射组成另一个残差块。可见,如图8的示例中的残差网络包括两个残差块。在实际应用中,根据实际需要,残差网络中的残差块数量可以为一个或多个,本申请实施例对残差块的数量不做限定。In the residual network in Figure 8, the convolutional layer 802, the convolutional layer 803 and the corresponding identity mapping form a residual block, and the convolutional layer 804, the convolutional layer 805 and the corresponding identity mapping form another residual block. . It can be seen that the residual network in the example of Figure 8 includes two residual blocks. In practical applications, the number of residual blocks in the residual network may be one or more according to actual needs. The embodiment of the present application does not limit the number of residual blocks.
S2037、将第i次的卷积识别结果和第i次的残差识别结果进行融合,得到第i次的融合结果。S2037. Fusion of the i-th convolution recognition result and the i-th residual recognition result to obtain the i-th fusion result.
在本申请实施例中,如图8所示,控制单元102可以通过concat运算将卷积层807输出的卷积识别结果和卷积层806输出的残差识别结果进行融合,得到融合结果。In the embodiment of the present application, as shown in FIG. 8 , the control unit 102 can fuse the convolution recognition result output by the convolution layer 807 and the residual recognition result output by the convolution layer 806 through a concat operation to obtain a fusion result.
S2038、从第i次的融合结果中获取目标物在环境图像中的多个候选位置参数以及每个候选位置参数对应的置信度。S2038. Obtain multiple candidate position parameters of the target object in the environment image and the confidence corresponding to each candidate position parameter from the i-th fusion result.
在本申请实施例中,根据实际需要,控制单元102可以从任意一次(即第i次)融合结果中获取目标物的候选位置信息。优选地,控制单元102在第K次融合结果中可以获取到目标物的较准确的候选位置信息。因此,在一些实施例中,经过了K次交叉卷积融合之后,控制单元102可以从最后一次(第K次)得到的融合结果中获取目标物的候选位置信息。In the embodiment of the present application, according to actual needs, the control unit 102 can obtain the candidate position information of the target object from any fusion result (i.e., the i-th time). Preferably, the control unit 102 can obtain more accurate candidate position information of the target object in the K-th fusion result. Therefore, in some embodiments, after K times of cross-convolution fusion, the control unit 102 may obtain candidate position information of the target object from the last (Kth) fusion result.
以下对控制单元102从融合结果中获取目标物的候选位置信息进行详细的描述。The control unit 102 obtains the candidate position information of the target object from the fusion result is described in detail below.
在本申请实施例中,控制单元102可以通过多个卷积层以及相关激活函数,例如Relu函数(线性修正函数)、Sigmoid函数(S型生长曲线函数)等从融合结果中提取到多个融合特征图,从而得到目标物的候选位置信息,其中,融合特征图包括候选位置信息,候选位置信息包括目标物在环境图像中的多个候选位置参数以及每个候选位置参数对应的置信度,控制单元102通过多个卷积层以及相关函数从融合结果中提取到多个融合特征图,通过融合特征图能够清楚且准确地表现出目标物在环境图像中的多个候选位置参数以及每个候选位置参数对应的置信度,因此分析这些融合特征图能够更快速更准确地确定目标物的候选位置信息。图9为本申请实施例中获取候选位置信息所用的多个卷积层以及相关函数的结构示例图。如图9所示,融合结果经过卷积层901处理后的结果采用三种处理方式处理。In the embodiment of the present application, the control unit 102 can extract multiple fusion results from the fusion results through multiple convolution layers and related activation functions, such as Relu function (linear correction function), Sigmoid function (S-shaped growth curve function), etc. feature map to obtain the candidate position information of the target object. The fused feature map includes candidate position information. The candidate position information includes multiple candidate position parameters of the target object in the environment image and the confidence corresponding to each candidate position parameter. Control Unit 102 extracts multiple fusion feature maps from the fusion results through multiple convolution layers and related functions. Through the fusion feature maps, multiple candidate position parameters of the target object in the environment image and each candidate can be clearly and accurately represented. The confidence corresponding to the position parameters, so analyzing these fused feature maps can determine the candidate position information of the target object more quickly and accurately. Figure 9 is a structural example diagram of multiple convolution layers and related functions used to obtain candidate location information in the embodiment of the present application. As shown in Figure 9, the fusion result after being processed by the convolution layer 901 is processed in three processing methods.
第一种处理方式为:通过卷积层902、Relu函数和卷积层903处理,并通过sigmoid函数,归一化到0~1之间,从而得到特征图1,特征图1展示了目标物对应的置信度。如图10所示,图10为本申请实施例中sigmoid函数的示意图,横坐标轴x表示像素值,纵坐标轴y表示概率(置信度)。特征图1可以用1×1×128×128表示,其中第一个1表示一个图像,第二个1表示1个参数,即每一个像素点是否含有目标物的概率,可以用置信度obj_value表示。128×128表示特征图的大小,该特征图1具有128×128个像素点。图11为本申请实施例中特征图1的示例图。如图11所示,该特征图1具有128×128个像素点,每个像素点上的参数表示该像素点是否含有目标物的概率。The first processing method is: processing through the convolution layer 902, Relu function and convolution layer 903, and normalizing it to between 0 and 1 through the sigmoid function, thereby obtaining the feature map 1. The feature map 1 shows the target object the corresponding confidence level. As shown in Figure 10, Figure 10 is a schematic diagram of the sigmoid function in the embodiment of the present application. The abscissa axis x represents the pixel value, and the ordinate axis y represents probability (confidence). Feature map 1 can be represented by 1×1×128×128, where the first 1 represents an image, and the second 1 represents a parameter, that is, the probability of whether each pixel contains a target object, which can be represented by the confidence obj_value . 128×128 represents the size of the feature map, and the feature map 1 has 128×128 pixels. Figure 11 is an example diagram of feature diagram 1 in the embodiment of the present application. As shown in Figure 11, the feature map 1 has 128×128 pixels, and the parameters on each pixel represent the probability of whether the pixel contains a target object.
第二种处理方式为:通过卷积层904、Relu函数和卷积层905处理,从而得到特征图2和特征图3。其中,卷积层905的输出可以用1×2×128×128表示。其中,1表示一个图像,2表示有两组128×128的特征图输出,该特征图的值表示x和y大小,用x_value,y_value表示,也就是x_value、y_value的数量均有128个。The second processing method is to process the convolution layer 904, the Relu function and the convolution layer 905 to obtain the feature map 2 and the feature map 3. Among them, the output of the convolution layer 905 can be represented by 1×2×128×128. Among them, 1 represents an image, and 2 represents two sets of 128×128 feature map outputs. The value of the feature map represents the size of x and y, represented by x_value and y_value, that is, the number of x_value and y_value is 128.
第三种处理方式为:通过卷积层906、Relu函数和卷积层907处理,从而得到特征图4和特征图5。其中,卷积层907的输出可以用1×2×128×128表示。其中,1表示一个图像,2表示两组128×128的特征图输出,该特征图的值表示w和h的大小,用w_value,h_value表示,也就是w_value、h_value的数量均有128个。The third processing method is to process the convolution layer 906, the Relu function and the convolution layer 907 to obtain the feature map 4 and the feature map 5. Among them, the output of the convolution layer 907 can be represented by 1×2×128×128. Among them, 1 represents an image, and 2 represents the output of two sets of 128×128 feature maps. The value of the feature map represents the size of w and h, represented by w_value and h_value, that is, the number of w_value and h_value is 128.
根据上述处理方式可以得到五张融合特征图,分别是特征图1、特征图2、特征图3、特征图4、特征图5。这五张融合特征图用于表示候选位置信息。图12为本申请实施例中候选位置信息的示意图。图12中,最下方obj_value对应的融合特征图为图9中的特征图1,上方x_value对应的融合特征图为图9中的特征图2,上方y_value对应的特征图为图9中的融合特征图3,中间w_value对应的融合特征图为图9中的特征图4,中间h_value对 应的融合特征图为图9中的特征图5。如图12所示,候选位置信息包括目标物在环境图像中的多个候选位置参数,即特征图1、特征图2、特征图3、特征图4中的候选位置参数,分别为x_value表示目标物在环境图像中的x坐标,y_value表示目标物在环境图像中的y坐标,w_value表示目标物在环境图像中的宽度,h_value表示目标物在环境图像中的高度。候选位置信息还包括每个候选位置参数对应的置信度,即特征图1中的置信度参数obj_value。According to the above processing method, five fused feature maps can be obtained, namely feature map 1, feature map 2, feature map 3, feature map 4, and feature map 5. These five fused feature maps are used to represent candidate location information. Figure 12 is a schematic diagram of candidate location information in an embodiment of the present application. In Figure 12, the fusion feature map corresponding to obj_value at the bottom is feature map 1 in Figure 9, the fusion feature map corresponding to x_value at the top is feature map 2 in Figure 9, and the feature map corresponding to y_value at the top is the fusion feature in Figure 9 In Figure 3, the fusion feature map corresponding to the middle w_value is feature map 4 in Figure 9, and the fusion feature map corresponding to the middle h_value is feature map 5 in Figure 9. As shown in Figure 12, the candidate position information includes multiple candidate position parameters of the target in the environment image, that is, the candidate position parameters in feature map 1, feature map 2, feature map 3, and feature map 4, respectively x_value represents the target The x coordinate of the object in the environment image, y_value represents the y coordinate of the target object in the environment image, w_value represents the width of the target object in the environment image, and h_value represents the height of the target object in the environment image. The candidate location information also includes the confidence corresponding to each candidate location parameter, that is, the confidence parameter obj_value in feature map 1.
本申请实施例提供了如图9所示的示例,用于从融合结果中获取关于目标物的候选位置信息。在实际应用中,控制单元102还可以通过其他方式从融合结果中获取关于目标物的候选位置信息,本申请实施例对此不作限定。The embodiment of the present application provides an example as shown in Figure 9 for obtaining candidate position information about the target object from the fusion result. In practical applications, the control unit 102 may also obtain candidate position information about the target object from the fusion result through other methods, which is not limited in the embodiments of the present application.
可以理解的是,上述的目标物在环境图像中的x坐标可以是指目标物的左上角顶点的x坐标,或者是目标物中心点的x坐标,或者是指定的与目标物具有关联关系的点的坐标,本申请实施例对此不作限定。同理,上述的目标物在环境图像中的y坐标可以是指目标物的左上角顶点的y坐标,或者是目标物中心点的y坐标,或者是指定的与目标物具有关联关系的点的坐标,本申请实施例对此不作限定。It can be understood that the x-coordinate of the above-mentioned target object in the environment image may refer to the x-coordinate of the upper left corner vertex of the target object, or the x-coordinate of the center point of the target object, or a specified relationship with the target object. The coordinates of the point are not limited in the embodiment of this application. In the same way, the above-mentioned y-coordinate of the target object in the environment image may refer to the y-coordinate of the upper left corner vertex of the target object, or the y-coordinate of the center point of the target object, or the specified point associated with the target object. Coordinates are not limited in the embodiments of this application.
S204、根据多个候选位置参数和每个候选位置参数对应的置信度确定目标物在环境图像中的目标位置信息。S204. Determine the target position information of the target object in the environment image based on the multiple candidate position parameters and the confidence corresponding to each candidate position parameter.
在本申请实施例中,控制单元102可以根据每个候选位置参数对应的置信度,从多个候选位置参数筛选到合适的候选位置参数作为目标物在环境图像中的目标位置信息。示例性的,如图12中,obj_value对应的融合特征图的所有置信度参数obj_value中,假设最大的置信度参数为坐标(1,1)对应的融合特征图,则可以从其余四个融合特征图中提取坐标(1,1)对应的候选位置参数作为目标物在环境图像中的目标位置信息。因此,选择置信度参数最大所对应的候选位置参数作为目标物在环境图像中的目标位置信息是一种实施方式。本申请还提供另一种实施方式如下:In this embodiment of the present application, the control unit 102 may filter multiple candidate position parameters to a suitable candidate position parameter as the target position information of the target object in the environment image based on the confidence level corresponding to each candidate position parameter. For example, as shown in Figure 12, among all the confidence parameters obj_value of the fusion feature map corresponding to obj_value, assuming that the largest confidence parameter is the fusion feature map corresponding to coordinates (1,1), then the remaining four fusion features can be The candidate position parameters corresponding to coordinates (1,1) are extracted from the figure as the target position information of the target object in the environment image. Therefore, it is an implementation method to select the candidate position parameter corresponding to the maximum confidence parameter as the target position information of the target object in the environment image. This application also provides another implementation method as follows:
图13为本申请实施例中确定目标物在环境图像中的目标位置信息的一种实施方式的流程图。该流程包括:FIG. 13 is a flow chart of an implementation method for determining target position information of a target object in an environmental image in an embodiment of the present application. The process includes:
S2041、提取多个融合特征图中相同位置对应的候选位置参数和候选位置参数对应的置信度,并组成关于目标物的候选参数集合。S2041. Extract candidate position parameters corresponding to the same position in multiple fusion feature maps and confidence levels corresponding to the candidate position parameters, and form a candidate parameter set about the target object.
在本申请实施例中,控制单元102可以提取相同位置对应的候选位置参数和候选位置参数对应的置信度,例如,对于坐标(1,1)的位置,在如图12的五张融合特征图中,每张融合特征图均选取坐标(1,1)位置对应的参数,即x 1,1,y 1,1,w 1,1,h 1,1和obj 1,1,则可以得到坐标(1,1)位置对应的候选参数集合(x 1,1,y 1,1,w 1,1,h 1,1,obj 1,1)。因此,本申请实施例中,一个位置对应一个候选参数集合。例如,坐标(1,1)的位置对应候选参数集合为(x 1,1,y 1,1,w 1,1,h 1,1,obj 1,1)。 In this embodiment of the present application, the control unit 102 can extract the candidate position parameters corresponding to the same position and the confidence corresponding to the candidate position parameters. For example, for the position of coordinates (1,1), in the five fused feature maps as shown in Figure 12 , each fusion feature map selects the parameters corresponding to the coordinate (1,1) position, that is, x 1,1 , y 1,1 , w 1,1 , h 1,1 and obj 1,1 , then the coordinates can be obtained The set of candidate parameters (x 1,1 , y 1,1 , w 1,1 , h 1,1 , obj 1,1 ) corresponding to the (1,1) position. Therefore, in this embodiment of the present application, one position corresponds to one candidate parameter set. For example, the candidate parameter set corresponding to the position of coordinate (1,1) is (x 1,1 , y 1,1 , w 1,1 , h 1,1 , obj 1,1 ).
以此类推,对于如图12中的候选位置信息,可以提取得到128组候选参数集合(x_value,y_value,w_value,h_value,obj_value),数据量为128*128*5个数据,可以通过如下公式表示:By analogy, for the candidate location information as shown in Figure 12, 128 sets of candidate parameters (x_value, y_value, w_value, h_value, obj_value) can be extracted. The amount of data is 128*128*5 data, which can be expressed by the following formula :
Figure PCTCN2022132656-appb-000001
Figure PCTCN2022132656-appb-000001
其中,x 0,0表示坐标(0,0)位置对应的x_value,其他参数以此类推,此处不再赘述。 Among them, x 0,0 represents the x_value corresponding to the coordinate (0,0) position, and so on for other parameters, which will not be described here.
S2042、根据置信度与预设置信阈值的比对结果,从候选参数集合中筛选出目标候选参数集合。S2042. Based on the comparison result between the confidence level and the preset confidence threshold, select the target candidate parameter set from the candidate parameter set.
在本申请实施例中,控制单元102可以从所有的候选参数集合中提取对应的置信度,并将每个置信度与预设置信阈值进行比对,从而得到置信度与预设置信阈值的比对结果。可以理解的是,置信度与预设置信阈值的比对结果可以确定比预设置信阈值大的置信度集合、比预设置信阈值小的置信度集合以及等于预设置信阈值的置信度集合。In this embodiment of the present application, the control unit 102 can extract corresponding confidence levels from all candidate parameter sets, and compare each confidence level with a preset confidence threshold, thereby obtaining a ratio of the confidence level to the preset confidence threshold. to the results. It can be understood that the comparison result between the confidence level and the preset confidence threshold can determine a set of confidence levels greater than the preset confidence threshold, a set of confidence levels smaller than the preset confidence threshold, and a set of confidence levels equal to the preset confidence threshold.
在一种较优的实施例中,控制单元102可以选择比预设置信阈值大的置信度集合对应的候选参数集合作为目标候选参数集合。示例性的,预设置信阈值设置为0.7,则比0.7大的置信度对应的候选参数集合作为目标候选参数集合。目标候选参数集合通过以下公式可以表示:In a preferred embodiment, the control unit 102 may select a candidate parameter set corresponding to a confidence level set greater than the preset confidence threshold as the target candidate parameter set. For example, if the preset confidence threshold is set to 0.7, then the candidate parameter set corresponding to a confidence level greater than 0.7 is used as the target candidate parameter set. The target candidate parameter set can be expressed by the following formula:
Figure PCTCN2022132656-appb-000002
Figure PCTCN2022132656-appb-000002
在一种情况中,目标候选参数集合为空集合,则说明在当前环境图像中没有发现目标物。则控制单元102可以控制移动机器人100向左或向右旋转,直到检测到环境图像中有目标物。In one case, the target candidate parameter set is an empty set, which means that no target object is found in the current environment image. Then the control unit 102 can control the mobile robot 100 to rotate left or right until a target object is detected in the environment image.
在其它情况中,目标候选参数集合有1组以上候选参数集合,则可以通过步骤2043确定目标物的目标位置信息。In other cases, if the target candidate parameter set has more than one candidate parameter set, the target position information of the target object can be determined through step 2043.
S2043、根据目标候选参数集合中的候选位置参数,确定目标物的目标位置信息。S2043. Determine the target position information of the target object according to the candidate position parameters in the target candidate parameter set.
在一种情况中,目标候选参数集合有1组候选参数集合,则控制单元102可以直接将该组候选参数集合作为目标物的目标位置信息。In one case, the target candidate parameter set includes one candidate parameter set, then the control unit 102 can directly use this candidate parameter set as the target position information of the target object.
在另一种情况中,目标候选参数集合有多组候选参数集合,则控制单元102可以根据预设条件,从目标候选参数集合中选择符合预设条件的候选位置参数作为新的目标候选参数集合,实现进一步筛选。本申请实施例提供较优的实施方式如下:In another case, the target candidate parameter set has multiple groups of candidate parameter sets, then the control unit 102 can select a candidate position parameter that meets the preset conditions from the target candidate parameter set as a new target candidate parameter set according to the preset conditions. , to achieve further screening. The embodiments of this application provide better implementation methods as follows:
当候选参数集合中的置信度为最大时,控制单元102将最大置信度对应的候选参数集合确定为目标候选参数集合。示例性的,目标候选参数集合中包括三组候选参数集合,即(x 1,1,y 1,1,w 1,1,h 1,1,obj 1,1)、(x 2,2,y 2,2,w 2,2,h 2,2,obj 2,2)和(x 3,3,y 3,3,w 3,3,h 3,3,obj 3,3)。控制单元102检测到obj 1,1、obj 2,2、obj 3,3中,obj 1,1的值最大。则控制单元102可以将obj 1,1对应的候选参数集合(x 1,1,y 1,1,w 1,1,h 1,1,obj 1,1)确定为目标候选参数集合。在该实施方式中,通过将最大置信度对应的候选参数集合确定为目标候选参数集合,能够实现较优的识别精度。 When the confidence in the candidate parameter set is the maximum, the control unit 102 determines the candidate parameter set corresponding to the maximum confidence as the target candidate parameter set. For example, the target candidate parameter set includes three candidate parameter sets, namely (x 1,1 , y 1,1 , w 1,1 , h 1,1 , obj 1,1 ), (x 2,2 , y 2,2 , w 2,2 , h 2,2 , obj 2,2 ) and (x 3,3 , y 3,3 , w 3,3 , h 3,3 , obj 3,3 ). The control unit 102 detects that among obj 1,1 , obj 2,2 , and obj 3,3 , obj 1,1 has the largest value. Then the control unit 102 can determine the candidate parameter set (x 1,1 , y 1,1 , w 1,1 , h 1,1 , obj 1,1 ) corresponding to obj 1,1 as the target candidate parameter set. In this embodiment, by determining the candidate parameter set corresponding to the maximum confidence level as the target candidate parameter set, better recognition accuracy can be achieved.
最后,控制单元102可以根据目标候选参数集合中的候选位置参数,确定目标物的目标位置信息。例如,目标候选参数集合为(x 1,1,y 1,1,w 1,1,h 1,1,obj 1,1),则可以确定为目标物在环境图像中的x坐标,y 1,1为目标物在环境图像中的y坐标,w 1,1为目标物在环境图像中的宽度,h 1,1为目标物在环境图像中的高度。 Finally, the control unit 102 may determine the target position information of the target object based on the candidate position parameters in the target candidate parameter set. For example, if the target candidate parameter set is (x 1,1 , y 1,1 , w 1,1 , h 1,1 , obj 1,1 ), it can be determined as the x coordinate of the target object in the environment image, y 1 ,1 is the y coordinate of the target object in the environment image, w 1,1 is the width of the target object in the environment image, h 1,1 is the height of the target object in the environment image.
在该实施例中,通过候选位置参数对应的置信度来筛选合适的候选位置参数,从而提高识别精度。In this embodiment, suitable candidate location parameters are screened based on the confidence level corresponding to the candidate location parameters, thereby improving the recognition accuracy.
S205、根据目标位置信息控制移动机器人的运动状态,以使得移动机器人与目标物对接。S205. Control the motion state of the mobile robot according to the target position information so that the mobile robot can dock with the target object.
在本申请实施例中,控制单元102确定目标物在环境图像中的目标位置信息后,可以根据该位置信息控制移动机器人的运动状态,以使得移动机器人与目标物对接。其中,运动状态可以包括但不限于运动姿态方向和运动前进时的速度,运动姿态方向可以是向左、向右、正向或者偏移一定角度的方向。具体地,图14为本申请实施例中控制移动机器人运动状态的流程图。该流程包括:In this embodiment of the present application, after the control unit 102 determines the target position information of the target object in the environment image, it can control the motion state of the mobile robot according to the position information, so that the mobile robot docks with the target object. The motion state may include but is not limited to the direction of the motion posture and the speed of the forward motion. The direction of the motion posture may be to the left, right, forward, or offset by a certain angle. Specifically, FIG. 14 is a flow chart for controlling the motion state of the mobile robot in the embodiment of the present application. The process includes:
S2051、获取环境图像的中心点位置。S2051. Obtain the center point position of the environment image.
在本申请实施例中,控制单元102首先确定环境图像的中心点位置。可以理解的是,控制单元102可以以环境图像中的顶点作为原点建立坐标系,例如以环境图像的左顶点作为原点,从而通过该坐标系描述环境图像的中心点位置。示例性的,如图15所示,环境图像的左上角顶点作为坐标系的原点。每一帧环境图像的宽高为640*480,则环境图像的右上角顶点为(640,0),左下角顶点为(0,480)右下角顶点为(640,480),环境图像的中心点位置可以为(640/2,480/2)即(320,240)。In the embodiment of the present application, the control unit 102 first determines the center point position of the environment image. It can be understood that the control unit 102 can establish a coordinate system using the vertices in the environment image as the origin, for example, using the left vertex of the environment image as the origin, so as to describe the center point position of the environment image through the coordinate system. For example, as shown in Figure 15, the upper left corner vertex of the environment image is used as the origin of the coordinate system. The width and height of each frame of the environment image is 640*480, then the upper right corner vertex of the environment image is (640,0), the lower left corner vertex is (0,480), the lower right corner vertex is (640,480), and the center point position of the environment image can be (640/2, 480/2) is (320, 240).
S2052、根据目标位置信息计算得到目标物的中心点位置。S2052. Calculate the center point position of the target object based on the target position information.
在本申请实施例中,控制单元102确定了目标物的x坐标、y坐标以及宽高之后,可以识别到目标物的左上角坐标(x 0,y 0)和右下角坐标(x 1,y 1)。示例性的,预先设定位置信息中,x坐标、y坐标为目标物左上角顶点坐标,则x 0即为x坐标,y 0即为y坐标。则右下角坐标x 1等于x坐标加上宽度w,y 1为y坐标减去高度h。 In the embodiment of the present application, after the control unit 102 determines the x coordinate, y coordinate, width and height of the target object, it can identify the upper left corner coordinate (x 0 , y 0 ) and lower right corner coordinate (x 1 , y ) of the target object. 1 ). For example, in the preset position information, the x coordinate and y coordinate are the coordinates of the upper left corner vertex of the target object, then x 0 is the x coordinate, and y 0 is the y coordinate. Then the coordinate x 1 of the lower right corner is equal to the x coordinate plus the width w, and y 1 is the y coordinate minus the height h.
控制单元102识别到目标物的左上角坐标(x 0,y 0)和右下角坐标(x 1,y 1)后,可以计算得到目标物的中心点位置(X c,Y c)。目标物的中心点位置(X c,Y c)可以用公式表示如下: After the control unit 102 recognizes the coordinates of the upper left corner (x 0 , y 0 ) and the lower right corner (x 1 , y 1 ) of the target object, it can calculate the center point position (X c , Y c ) of the target object. The center point position (X c , Y c ) of the target object can be expressed by the formula as follows:
X c=(x 0+x 1)/2。 X c =(x 0 +x 1 )/2.
Y c=(y 0+y 1)/2。 Y c =(y 0 +y 1 )/2.
S2053、根据目标物的中心点位置与环境图像的中心点位置之间的偏移量,调整移动机器人的运动状态,以使得移动机器人与目标物对接。S2053. According to the offset between the center point position of the target object and the center point position of the environment image, adjust the motion state of the mobile robot so that the mobile robot docks with the target object.
在本申请实施例中,控制单元102可以根据目标物的中心点位置与环境图像的中心点位置之间的偏移量,调整移动机器人的运动状态。在本申请实施例中,偏移量的正负可以表示偏移的方向。示例性的,x坐标对应的偏移量为正,则表示偏移方向为右侧,x坐标对应的偏移量为负,则表示偏移方向为左侧。偏移量的大小则可以表示偏移的程度。可以理解的是,偏移量为0则表示没有偏移。因此,控制单元102根据目标物的中心点位置与环境图像的中心点位置之间的偏移量,可以确定目标物距离环境图像中心的偏移方向和偏移大小,从而控制移动机器人向没有偏移的状态运动,从而使得移动机器人与目标物对接。In the embodiment of the present application, the control unit 102 can adjust the motion state of the mobile robot according to the offset between the center point position of the target object and the center point position of the environment image. In this embodiment of the present application, the positive and negative values of the offset may indicate the direction of the offset. For example, if the offset corresponding to the x-coordinate is positive, it means that the offset direction is to the right; if the offset corresponding to the x-coordinate is negative, it means the offset direction is to the left. The size of the offset can indicate the degree of offset. It can be understood that an offset of 0 means there is no offset. Therefore, the control unit 102 can determine the offset direction and offset size of the target object from the center of the environment image based on the offset between the center point position of the target object and the center point position of the environment image, thereby controlling the mobile robot to move without deviation. The mobile robot moves in a shifting state, allowing the mobile robot to dock with the target object.
在该实施例中,通过比较目标物的中心点位置与环境图像的中心点位置,能够准确识别移动机器人与目标物之间的偏移情况,从而根据偏移情况调整移动机器人的运动状态,以使得移动机器人与目标物准确对接。In this embodiment, by comparing the center point position of the target object with the center point position of the environment image, the offset between the mobile robot and the target object can be accurately identified, thereby adjusting the motion state of the mobile robot according to the offset. Allows the mobile robot to accurately dock with the target.
本申请实施例提供一种具体的实现方式包括以下步骤:根据偏移量确定移动机器人的前进方向和前进距离。根据前进方向调整移动机器人的运动姿态方向。控制移动机器人沿着运动姿态方向运动前进距离,以使移动机器人与目标物对接。其中,若根据偏移量确定偏移方向为左侧,则可以确定移动机器人的前进方向为向左前方前进,若根据偏移量确定偏移方向为右侧,则可以确定移动机器人的前进方向为向右前方前进。而前进距离可以与 偏移量的大小呈正相关。在该实施例中,根据偏移量确定移动机器人的前进方向和前进距离,能够更加精确地控制移动机器人的运动姿态方向和前进距离,提高移动机器人与目标物对接的效果。The embodiment of the present application provides a specific implementation method including the following steps: determining the forward direction and forward distance of the mobile robot based on the offset. Adjust the movement posture direction of the mobile robot according to the forward direction. Control the moving distance of the mobile robot along the direction of the motion posture so that the mobile robot can dock with the target object. Among them, if the offset direction is determined to be the left according to the offset, then the forward direction of the mobile robot can be determined to be forward left; if the offset direction is determined to be to the right according to the offset, then the forward direction of the mobile robot can be determined To move forward to the right. The forward distance can be positively related to the size of the offset. In this embodiment, the forward direction and forward distance of the mobile robot are determined based on the offset, which can more accurately control the movement attitude direction and forward distance of the mobile robot, and improve the docking effect between the mobile robot and the target object.
示例性的,以移动机器人对接充电座为例,在本示例中移动机器人可以为四足移动机器人。其中,四足包括位于四足机器人左右两侧的两个驱动轮和两个转向轮。其中,驱动轮为具备驱动力的后轮,转向轮为不具备自主动力的前轮,也可以称为从动轮。例如,规格为640*480的摄像头读取的图像的宽高为640*480,可以设置图像的中心点位置为640/2=320,回充座的高度根据实际物理位置而定,且应在摄像头的视野中,如充电座高度的范围可以是(0~480),本示例设置充电座的高度为200,当充电座的中心点位置等于320,也就是充电座在图像的正中心,此时驱动四足机器人向前直行,直到接触到充电座。当充电座的中心点位置小于320时,即充电座位于图像中心点的左边时,驱动四足机器人向左移动,具体驱动方式包括:右侧驱动轮开始向右驱动,左侧驱动轮则不予以动力驱动,使得两个转向轮向左转向,达到左转的目的,同时继续保持向左转向的状态,直到充电座的中心点位置等于图像的中心点位置时,停止转向,并向前直行。若充电座的中心点位置大于320,即充电座在图像的右边,控制四足机器人向右移动开始,转向方式同向左转向同理,此处不再赘述。For example, take the docking of a mobile robot to a charging base as an example. In this example, the mobile robot may be a four-legged mobile robot. Among them, the quadruped includes two driving wheels and two steering wheels located on the left and right sides of the quadruped robot. Among them, the driving wheel is the rear wheel with driving force, and the steering wheel is the front wheel without independent power, which can also be called the driven wheel. For example, the width and height of the image read by a camera with a specification of 640*480 is 640*480. You can set the center point of the image to 640/2=320. The height of the recharge base is determined according to the actual physical position and should be within In the field of view of the camera, for example, the range of the height of the charging base can be (0 ~ 480). In this example, the height of the charging base is set to 200. When the center point of the charging base is equal to 320, that is, the charging base is at the center of the image. This Then drive the quadruped robot forward until it touches the charging base. When the center point position of the charging base is less than 320, that is, when the charging base is located to the left of the center point of the image, the quadruped robot is driven to move to the left. The specific driving method includes: the right driving wheel starts to drive to the right, and the left driving wheel does not. Use power to drive the two steering wheels to the left to achieve the purpose of turning left. At the same time, continue to turn left until the center point of the charging base is equal to the center point of the image. Stop turning and go straight forward. . If the center point position of the charging base is greater than 320, that is, the charging base is on the right side of the image, control the quadruped robot to start moving to the right. The steering method is the same as turning left, and will not be described again here.
图15为本申请实施例中目标物在环境图像中的示意图之一。图15中,原点坐标为(0,0),环境图像顶角坐标为(640,0),如图15所示,目标物的中心点1502位置在环境图像的中心点1501位置的右侧,则控制单元102可以确定移动机器人的前进方向为向右前方前进,如图15中所示的右向箭头为移动机器人的向右前进方向。Figure 15 is one of the schematic diagrams of a target object in an environmental image in an embodiment of the present application. In Figure 15, the origin coordinate is (0,0), and the vertex coordinate of the environment image is (640,0). As shown in Figure 15, the center point 1502 of the target object is on the right side of the center point 1501 of the environment image. Then the control unit 102 may determine that the forward direction of the mobile robot is forward to the right, and the right arrow shown in FIG. 15 is the forward direction to the right of the mobile robot.
图16为本申请实施例中目标物在环境图像中的示意图之二。图16中,原点坐标为(0,0),环境图像顶角坐标为(640,0),如图16所示,目标物的中心点1602位置与环境图像的中心点1601位置重合,则控制单元102可以确定移动机器人的前进方向为向正前方前进,如图16中所示的正向箭头为移动机器人的正前方向。Figure 16 is the second schematic diagram of the target object in the environment image in the embodiment of the present application. In Figure 16, the origin coordinate is (0,0), and the vertex coordinate of the environment image is (640,0). As shown in Figure 16, the position of the center point 1602 of the target coincides with the position of the center point 1601 of the environment image, then the control The unit 102 may determine that the forward direction of the mobile robot is forward, and the forward arrow shown in FIG. 16 is the forward direction of the mobile robot.
图17为本申请实施例中目标物在环境图像中的示意图之三。图17中,原点坐标为(0,0),环境图像顶角坐标为(640,0),如图17所示,目标物的中心点1702位置在环境图像的中心点1701位置的左侧,则控制单元102可以确定移动机器人的前进方向为向左前方前进,如图17中所示的左向箭头为移动机器人的向左前进方向。Figure 17 is the third schematic diagram of the target object in the environment image in the embodiment of the present application. In Figure 17, the origin coordinate is (0,0), and the vertex coordinate of the environment image is (640,0). As shown in Figure 17, the center point 1702 of the target object is to the left of the center point 1701 of the environment image. Then the control unit 102 may determine that the forward direction of the mobile robot is forward left, and the left arrow shown in FIG. 17 is the forward left direction of the mobile robot.
然后控制单元102可以根据前进方向调整移动机器人的运动姿态方向。示例性的,前进方向为向右前方前进,则控制单元102可以调整移动机器人的转向轮向右旋转,从而调整移动机器人的运动姿态方向。前进方向为向左前方前进,则控制单元102可以调整移动机器人的转向轮向左旋转,从而调整移动机器人的运动姿态方向。Then the control unit 102 can adjust the movement posture direction of the mobile robot according to the forward direction. For example, if the forward direction is forward to the right, the control unit 102 can adjust the steering wheel of the mobile robot to rotate to the right, thereby adjusting the direction of the mobile robot's movement posture. If the forward direction is forward to the left, the control unit 102 can adjust the steering wheel of the mobile robot to rotate to the left, thereby adjusting the direction of the movement attitude of the mobile robot.
可以理解的是,移动机器人的转向轮可以是移动机器人的两个前轮,也可以是移动机器人的两个后轮,本申请实施例对此不作限定。It can be understood that the steering wheels of the mobile robot may be the two front wheels of the mobile robot or the two rear wheels of the mobile robot, which is not limited in the embodiments of the present application.
可以理解的是,经过上述方式调整移动机器人100的运动状态,可以使得移动机器人朝向目标物移动,从而使得移动机器人100与目标物对接。在返航充电场景下,目标物可以是目标充电座,移动机器人100在返航充电过程中朝向目标充电座前进并对接,以实现充电。It can be understood that by adjusting the motion state of the mobile robot 100 in the above manner, the mobile robot 100 can be moved toward the target object, thereby allowing the mobile robot 100 to dock with the target object. In the return-to-home charging scenario, the target object may be a target charging stand, and the mobile robot 100 moves toward and docks with the target charging stand during the return-to-home charging process to achieve charging.
图18为本申请实施例提供的一种控制单元102的内部模块示意图。该控制单元102 的内部模块包括:FIG. 18 is a schematic diagram of the internal modules of a control unit 102 provided by an embodiment of the present application. The internal modules of the control unit 102 include:
获取模块1801,用于执行或实现如前述图2对应的各个实施例中的步骤201。The acquisition module 1801 is used to execute or implement step 201 in the various embodiments corresponding to Figure 2 mentioned above.
处理模块1802,用于执行或实现如前述图2对应的各个实施例中的步骤202、步骤203、步骤204、步骤205。The processing module 1802 is used to execute or implement steps 202, 203, 204, and 205 in the respective embodiments corresponding to FIG. 2 .
图19为本申请实施例提供的终端设备的示意图。该终端设备1900可以是上述实施例中的移动机器人100。该终端设备1900包括存储器1902、处理器1901以及存储在存储器1902中并可在处理器1901上运行的计算机程序1903,处理器1901执行计算机程序1903时实现如图2、图5、图7、图13或图14对应的各个实施例的方法。Figure 19 is a schematic diagram of a terminal device provided by an embodiment of the present application. The terminal device 1900 may be the mobile robot 100 in the above embodiment. The terminal device 1900 includes a memory 1902, a processor 1901, and a computer program 1903 stored in the memory 1902 and executable on the processor 1901. When the processor 1901 executes the computer program 1903, the implementation is shown in Figures 2, 5, 7, and 13 or the method of each embodiment corresponding to Figure 14.
需要说明的是,上述装置/单元之间的信息交互、执行过程等内容,由于与本申请方法实施例基于同一构思,其具体功能及带来的技术效果,具体可参见方法实施例部分,此处不再赘述。It should be noted that the information interaction, execution process, etc. between the above-mentioned devices/units are based on the same concept as the method embodiments of the present application. For details of their specific functions and technical effects, please refer to the method embodiments section. No further details will be given.
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现可实现上述各个方法实施例中的步骤。Embodiments of the present application also provide a computer-readable storage medium. The computer-readable storage medium stores a computer program. When the computer program is executed by a processor, the steps in each of the above method embodiments can be implemented.
本申请实施例提供了一种计算机程序产品,当计算机程序产品在移动终端上运行时,使得移动终端执行时实现可实现上述各个方法实施例中的步骤。Embodiments of the present application provide a computer program product. When the computer program product is run on a mobile terminal, the steps in each of the above method embodiments can be implemented when the mobile terminal is executed.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质至少可以包括:能够将计算机程序代码携带到拍照装置/终端设备的任何实体或装置、记录介质、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质。例如U盘、移动硬盘、磁碟或者光盘等。在某些司法管辖区,根据立法和专利实践,计算机可读介质不可以是电载波信号和电信信号。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on this understanding, this application can implement all or part of the processes in the methods of the above embodiments by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. The computer program When executed by a processor, the steps of each of the above method embodiments may be implemented. Wherein, the computer program includes computer program code, which may be in the form of source code, object code, executable file or some intermediate form. The computer-readable medium may at least include: any entity or device capable of carrying computer program code to the camera device/terminal device, recording media, computer memory, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. For example, U disk, mobile hard disk, magnetic disk or CD, etc. In some jurisdictions, subject to legislation and patent practice, computer-readable media may not be electrical carrier signals and telecommunications signals.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。In the above embodiments, each embodiment is described with its own emphasis. For parts that are not detailed or documented in a certain embodiment, please refer to the relevant descriptions of other embodiments.

Claims (10)

  1. 一种基于图像识别的目标对接方法,包括:A target docking method based on image recognition, including:
    获取移动机器人向目标物移动过程中的环境图像;Obtain the environment image of the mobile robot as it moves toward the target;
    从所述环境图像中提取关于所述目标物的初始特征图;Extract an initial feature map about the target object from the environment image;
    将所述初始特征图进行预设次数的交叉卷积融合,以提取所述目标物在所述环境图像中的多个候选位置参数以及每个所述候选位置参数对应的置信度;其中,所述交叉卷积融合包括对所述初始特征图进行不同的卷积残差处理以及将不同的卷积残差处理得到的结果进行融合;The initial feature map is subjected to a preset number of cross-convolution fusions to extract multiple candidate position parameters of the target object in the environment image and the confidence corresponding to each candidate position parameter; wherein, The cross-convolution fusion includes performing different convolution residual processing on the initial feature map and fusing the results obtained by different convolution residual processing;
    根据所述多个候选位置参数和每个所述候选位置参数对应的置信度确定所述目标物在所述环境图像中的目标位置信息;Determine the target position information of the target object in the environment image according to the plurality of candidate position parameters and the confidence corresponding to each candidate position parameter;
    根据所述目标位置信息控制所述移动机器人的运动状态,以使得所述移动机器人与所述目标物对接。The motion state of the mobile robot is controlled according to the target position information, so that the mobile robot docks with the target object.
  2. 权利要求1所述的方法,其中,所述将所述初始特征图进行预设次数的交叉卷积融合,以提取所述目标物在所述环境图像中的多个候选位置参数以及每个所述候选位置参数对应的置信度,包括:The method of claim 1, wherein the initial feature map is subjected to a preset number of cross-convolution fusions to extract multiple candidate position parameters of the target object in the environment image and each of the candidate position parameters. The confidence level corresponding to the candidate location parameters includes:
    将所述初始特征图作为第1次交叉卷积融合的输入特征,并进行卷积处理,得到卷积识别结果;Use the initial feature map as the input feature of the first cross-convolution fusion, and perform convolution processing to obtain the convolution recognition result;
    将所述初始特征图经过残差处理,得到残差识别结果;The initial feature map is subjected to residual processing to obtain a residual identification result;
    将所述卷积识别结果和残差识别结果进行融合,得到融合结果;Fusion of the convolution recognition result and the residual recognition result to obtain a fusion result;
    从融合结果中获取所述目标物在所述环境图像中的多个候选位置参数以及每个所述候选位置参数对应的置信度。A plurality of candidate position parameters of the target object in the environment image and a confidence level corresponding to each candidate position parameter are obtained from the fusion result.
  3. 权利要求2所述的方法,其中,所述将所述初始特征图进行预设次数的交叉卷积融合,以提取所述目标物在所述环境图像中的多个候选位置参数以及每个所述候选位置参数对应的置信度还包括:The method of claim 2, wherein the initial feature map is subjected to a preset number of cross-convolution fusions to extract multiple candidate position parameters of the target object in the environment image and each of the candidate position parameters. The confidence corresponding to the candidate position parameters also includes:
    将第i-1次的融合结果作为第i次的输入特征,并进行卷积处理,得到第i次的卷积识别结果;The i-1th fusion result is used as the i-th input feature, and convolution processing is performed to obtain the i-th convolution recognition result;
    将所述第i-1次的融合结果经过残差处理,得到第i次的残差识别结果;The i-1th fusion result is subjected to residual processing to obtain the i-th residual identification result;
    将所述第i次的卷积识别结果和所述第i次的残差识别结果进行融合,得到第i次的融合结果;Fusion of the i-th convolution recognition result and the i-th residual recognition result to obtain the i-th fusion result;
    从所述第i次的融合结果中获取所述目标物在所述环境图像中的多个候选位置参数以及每个所述候选位置参数对应的置信度;Obtain multiple candidate position parameters of the target object in the environment image and the confidence level corresponding to each candidate position parameter from the i-th fusion result;
    其中,2≤i≤K,K为预设次数。Among them, 2≤i≤K, K is the preset number of times.
  4. 如权利要求1所述的方法,其中,所述融合结果包括所述环境图像对应的多个融合特征图,所述根据所述多个候选位置参数和每个所述候选位置参数对应的置信度确定所述目标物在所述环境图像中的目标位置信息,包括:The method of claim 1, wherein the fusion result includes a plurality of fusion feature maps corresponding to the environment image, and the confidence level corresponding to the plurality of candidate position parameters and each candidate position parameter is Determining the target position information of the target object in the environment image includes:
    提取所述多个融合特征图中相同位置对应的所述候选位置参数和所述候选位置参数对应的所述置信度,并组成关于所述目标物的候选参数集合,其中,一个所述相同位置对应一个所述候选参数集合;Extract the candidate position parameters corresponding to the same position in the multiple fusion feature maps and the confidence corresponding to the candidate position parameters, and form a candidate parameter set about the target object, wherein one of the same positions Corresponding to one of the candidate parameter sets;
    根据所述置信度与预设置信阈值的比对结果,从所述候选参数集合中筛选出目标候选参数集合;According to the comparison result between the confidence level and the preset confidence threshold, select a target candidate parameter set from the candidate parameter set;
    根据所述目标候选参数集合中的所述候选位置参数,确定所述目标物的目标位置信息。Target position information of the target object is determined according to the candidate position parameters in the target candidate parameter set.
  5. 如权利要求4所述的方法,其中,所述根据所述置信度与预设置信阈值的比对结果,从所述候选参数集合中筛选出目标候选参数集合,包括:The method of claim 4, wherein filtering out a target candidate parameter set from the candidate parameter set based on a comparison result between the confidence level and a preset confidence threshold includes:
    当所述比对结果的候选参数集合中的置信度为最大时,将最大置信度对应的候选参数集合确定为目标候选参数集合。When the confidence in the candidate parameter set of the comparison result is the maximum, the candidate parameter set corresponding to the maximum confidence is determined as the target candidate parameter set.
  6. 如权利要求1-5任一项所述的方法,其中,所述根据所述目标位置信息控制所述移动机器人的运动状态,以使得所述移动机器人与所述目标物对接,包括:The method according to any one of claims 1 to 5, wherein controlling the motion state of the mobile robot according to the target position information so that the mobile robot docks with the target includes:
    获取所述环境图像的中心点位置;Obtain the center point position of the environment image;
    根据所述目标位置信息计算得到所述目标物的中心点位置;Calculate the center point position of the target object based on the target position information;
    根据所述目标物的中心点位置与所述环境图像的中心点位置之间的偏移量,调整所述移动机器人的运动状态,以使得所述移动机器人与所述目标物对接。According to the offset between the center point position of the target object and the center point position of the environment image, the motion state of the mobile robot is adjusted so that the mobile robot docks with the target object.
  7. 如权利要求6所述的方法,其中,所述运动状态包括运动姿态方向,所述根据所述目标物的中心点位置与所述环境图像的中心点位置之间的偏移量,调整所述移动机器人的运动状态,以使得所述移动机器人与所述目标物对接,包括:The method of claim 6, wherein the motion state includes a motion posture direction, and the adjustment is performed according to the offset between the center point position of the target object and the center point position of the environment image. The motion state of the mobile robot so that the mobile robot docks with the target object includes:
    根据所述偏移量确定所述移动机器人的前进方向和前进距离;Determine the forward direction and forward distance of the mobile robot according to the offset;
    根据所述前进方向调整所述移动机器人的运动姿态方向;Adjust the movement posture direction of the mobile robot according to the forward direction;
    控制所述移动机器人沿着所述运动姿态方向运动所述前进距离,以使所述移动机器人与所述目标物对接。The mobile robot is controlled to move the forward distance along the motion posture direction so that the mobile robot docks with the target object.
  8. 如权利要求7所述的方法,其中,所述前进距离与所述偏移量呈正相关。The method of claim 7, wherein the advance distance is positively correlated with the offset.
  9. 一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如权利要求1至8任一项所述的方法。A terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements any one of claims 1 to 8 the method described.
  10. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至8任一项所述的方法。A computer-readable storage medium stores a computer program. When the computer program is executed by a processor, the method according to any one of claims 1 to 8 is implemented.
PCT/CN2022/132656 2022-04-22 2022-11-17 Target docking method based on image recognition and terminal device and medium thereof WO2023202062A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210437107.7 2022-04-22
CN202210437107.7A CN114789440B (en) 2022-04-22 2022-04-22 Target docking method, device, equipment and medium based on image recognition

Publications (1)

Publication Number Publication Date
WO2023202062A1 true WO2023202062A1 (en) 2023-10-26

Family

ID=82460837

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/132656 WO2023202062A1 (en) 2022-04-22 2022-11-17 Target docking method based on image recognition and terminal device and medium thereof

Country Status (2)

Country Link
CN (1) CN114789440B (en)
WO (1) WO2023202062A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114789440B (en) * 2022-04-22 2024-02-20 深圳市正浩创新科技股份有限公司 Target docking method, device, equipment and medium based on image recognition

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163197A (en) * 2018-08-24 2019-08-23 腾讯科技(深圳)有限公司 Object detection method, device, computer readable storage medium and computer equipment
CN110751134A (en) * 2019-12-23 2020-02-04 长沙智能驾驶研究院有限公司 Target detection method, storage medium and computer device
CN111136648A (en) * 2019-12-27 2020-05-12 深圳市优必选科技股份有限公司 Mobile robot positioning method and device and mobile robot
CN111694358A (en) * 2020-06-19 2020-09-22 北京海益同展信息科技有限公司 Method and device for controlling transfer robot, and storage medium
CN112307853A (en) * 2019-08-02 2021-02-02 成都天府新区光启未来技术研究院 Detection method of aerial image, storage medium and electronic device
CN113989616A (en) * 2021-10-26 2022-01-28 北京锐安科技有限公司 Target detection method, device, equipment and storage medium
CN114789440A (en) * 2022-04-22 2022-07-26 深圳市正浩创新科技股份有限公司 Target docking method, device, equipment and medium based on image recognition

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110866526A (en) * 2018-08-28 2020-03-06 北京三星通信技术研究有限公司 Image segmentation method, electronic device and computer-readable storage medium
CN109506628A (en) * 2018-11-29 2019-03-22 东北大学 Object distance measuring method under a kind of truck environment based on deep learning
CN109740463A (en) * 2018-12-21 2019-05-10 沈阳建筑大学 A kind of object detection method under vehicle environment
CN110032196B (en) * 2019-05-06 2022-03-29 北京云迹科技股份有限公司 Robot recharging method and device
CN110660082B (en) * 2019-09-25 2022-03-08 西南交通大学 Target tracking method based on graph convolution and trajectory convolution network learning
CN111598950A (en) * 2020-04-23 2020-08-28 四川省客车制造有限责任公司 Automatic passenger train hinging method and system based on machine vision
CN113989753A (en) * 2020-07-09 2022-01-28 浙江大华技术股份有限公司 Multi-target detection processing method and device
CN112767443A (en) * 2021-01-18 2021-05-07 深圳市华尊科技股份有限公司 Target tracking method, electronic equipment and related product

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163197A (en) * 2018-08-24 2019-08-23 腾讯科技(深圳)有限公司 Object detection method, device, computer readable storage medium and computer equipment
CN112307853A (en) * 2019-08-02 2021-02-02 成都天府新区光启未来技术研究院 Detection method of aerial image, storage medium and electronic device
CN110751134A (en) * 2019-12-23 2020-02-04 长沙智能驾驶研究院有限公司 Target detection method, storage medium and computer device
CN111136648A (en) * 2019-12-27 2020-05-12 深圳市优必选科技股份有限公司 Mobile robot positioning method and device and mobile robot
CN111694358A (en) * 2020-06-19 2020-09-22 北京海益同展信息科技有限公司 Method and device for controlling transfer robot, and storage medium
CN113989616A (en) * 2021-10-26 2022-01-28 北京锐安科技有限公司 Target detection method, device, equipment and storage medium
CN114789440A (en) * 2022-04-22 2022-07-26 深圳市正浩创新科技股份有限公司 Target docking method, device, equipment and medium based on image recognition

Also Published As

Publication number Publication date
CN114789440B (en) 2024-02-20
CN114789440A (en) 2022-07-26

Similar Documents

Publication Publication Date Title
US20230154015A1 (en) Virtual teach and repeat mobile manipulation system
CN113108771B (en) Movement pose estimation method based on closed-loop direct sparse visual odometer
Zhang et al. Robotic grasp detection based on image processing and random forest
US11887363B2 (en) Training a deep neural network model to generate rich object-centric embeddings of robotic vision data
CN111368759B (en) Monocular vision-based mobile robot semantic map construction system
CN112507918B (en) Gesture recognition method
CN111310631A (en) Target tracking method and system for rotor operation flying robot
CN110146080B (en) SLAM loop detection method and device based on mobile robot
WO2023202062A1 (en) Target docking method based on image recognition and terminal device and medium thereof
JP2022553356A (en) Data processing method and related device
WO2022127814A1 (en) Method and apparatus for detecting salient object in image, and device and storage medium
WO2023173950A1 (en) Obstacle detection method, mobile robot, and machine readable storage medium
CN112639874A (en) Object following method, object following apparatus, removable device, and storage medium
CN114283294A (en) Neural network point cloud feature extraction method, system, equipment and storage medium
Liu et al. A deep Q-learning network based active object detection model with a novel training algorithm for service robots
WO2021203368A1 (en) Image processing method and apparatus, electronic device and storage medium
CN112529917A (en) Three-dimensional target segmentation method, device, equipment and storage medium
CN112614161A (en) Three-dimensional object tracking method based on edge confidence
KR20220055072A (en) Method for indoor localization using deep learning
CN115272275A (en) Tray, obstacle detection positioning system and method based on RGB-D camera and neural network model
Wong et al. Ant Colony Optimization and image model-based robot manipulator system for pick-and-place tasks
CN112699800A (en) Vehicle searching method and device, storage medium and terminal
CN112036466A (en) Mixed terrain classification method
Guo et al. 3D Lidar SLAM Based on Ground Segmentation and Scan Context Loop Detection
Nakashima et al. Sir-net: scene-independent end-to-end trainable visual relocalizer

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22938281

Country of ref document: EP

Kind code of ref document: A1