WO2023165161A1

WO2023165161A1 - Multi-task convolution-based object grasping and positioning identification algorithm and system, and robot

Info

Publication number: WO2023165161A1
Application number: PCT/CN2022/131276
Authority: WO
Inventors: 赵景波; 国珍; 房桐; 杜保帅; 张田; 张晓寒
Original assignee: 青岛理工大学
Priority date: 2022-05-09
Filing date: 2022-11-11
Publication date: 2023-09-07
Also published as: CN114764831A

Abstract

The invention discloses a multi-task convolution-based object grasping and positioning identification algorithm and system, and a robot. The method comprises: obtaining the image and position information of an object in a field of view range; inputting the acquired image information into a Darknet-53 network, and extracting multi-scale feature information of a shared layer; mapping the extracted multi-scale feature information to a captured object recognition and detection Multi-Task branch exclusive layer and a captured object positioning ED-YOLO branch exclusive layer; separately calculating a loss value of each of the two branch exclusive layers according to a loss function thereof, and according to the loss values, separately carrying out, on the network, back propagation iteration of a branch exclusive parameter and a shared parameter; alternately training the shared layer and the branch exclusive layers; and using non-maximum suppression processing to obtain the final object image and position information. The method has the beneficial effect that, on the basis of ensuring stability and real-time action, grasping box detection and grasping object recognition and classification detection are completed in one model at the same time.

Description

Object grasping and positioning recognition algorithm, system and robot based on multi-task convolution

technical field

The invention relates to the technical field of robot control, in particular to an object grasping and positioning recognition algorithm, system and equipment based on multi-task convolution.

Background technique

Robot arm grasping is an important basic operation of robots, and it has a wide range of applications in many fields such as sorting, assembly and service robots of industrial parts. Traditional industrial robot grasping can only meet fixed task requirements, and set fixed control parameters for each joint and connecting rod of the robot by offline programming or teaching programming to complete a series of fixed actions. This requires that the position and pose of the grasping target need to be consistent, otherwise the grasping effect will be greatly affected. This traditional grasping mode is extremely inflexible and can only complete some simple tasks. When the size and pose of the target change, the parameters need to be reset, and it cannot face multiple grasping targets or the position of the grasping target is different. The fixed situation cannot meet the increasing demand for industrial production.

In the actual grasping tasks of industrial robots based on image processing, it is often not only required to be able to effectively locate the object to be grasped through the image processing unit, but also to identify and classify the category of the object to be grasped to complete the classification of different products Handling, screening, statistics.

The YOLO detection network itself is widely used in the task of item classification. The image information of the object to be captured is input into the two networks respectively, and the item capture frame and the classification result of the item can be obtained at the same time, but there is no doubt that this method It will increase the computing and storage burden of the computer, greatly increase the time-consuming to complete the task, and have a huge impact on the real-time requirements and work efficiency of the industrial robot's grasping task.

technical solution

In order to solve the deficiencies in the prior art, the present invention provides an object grasping and positioning recognition algorithm, system and robot based on multi-task convolution. On the basis of ensuring stability and real-time performance, the grasping frame can be completed simultaneously in one model Detection and Grab Identification Classification Detection.

In order to achieve the above object, the present invention is realized through the following technical solutions: According to one aspect of the present invention, an object grasping and positioning recognition algorithm based on multi-task convolution is provided, which includes the following steps: acquiring the image and position of the object within the field of view information, and output the obtained image information to the user; input the obtained image information into the Darknet-53 network, extract the multi-scale feature information of the shared layer; map the extracted multi-scale feature information to the grasped object Recognition and detection Multi-task branch exclusive layer and grasping object positioning ED-YOLO branch exclusive layer; the two branch exclusive layers independently calculate the loss value according to their loss function, and use the back propagation algorithm to perform network analysis according to the loss value Exclusive parameters and shared parameters of the branch where the backpropagation iteration is located; Multi-task for multi-scale feature information of the shared layer, recognition and detection of grasping objects The branch exclusive layer and the grasping object positioning ED-YOLO branch exclusive layer are trained alternately; the final object image and position information are obtained by non-maximum suppression processing.

Further, the alternate training includes full parameter update training of the shared layer and two exclusive layers in the early stage, and parameter update training of the exclusive layer in the later stage.

Further, the update training of the full parameters of the shared layer and the two exclusive layers in the previous period is as follows: First, the parameters of the ED-YOLO branch are fixed, the parameters of the Multi-task exclusive layer are trained, the Multi-task prediction loss is calculated, and learning The attenuation strategy updates the network parameters of the shared layer and the Multi-task exclusive layer; then, fixes the parameters of the Multi-task exclusive layer, trains the network parameters of the ED-YOLO exclusive layer, calculates the ED-YOLO prediction loss, and updates the shared layer with the learning attenuation strategy into the network parameters of the exclusive layer of ED-YOLO.

Further, the update training of the exclusive layer parameters in the later stage is as follows: First, fix the Multi-task exclusive layer parameters, perform parameter training on the ED-YOLO exclusive layer, calculate the ED-YOLO prediction loss, and update the ED with the learning decay strategy -YOLO exclusive layer network parameters; then, fix the ED-YOLO exclusive layer parameters, train the Multi-task exclusive layer parameters, calculate the Multi-task prediction loss, and update the Multi-task prediction loss exclusive layer network parameters with the learning attenuation strategy .

Further, after the last ED-YOLO exclusive layer update is completed, the Multi-task exclusive layer is no longer updated, and the parameters of the trained Multi-task exclusive layer are fine-tuned.

Further, the training optimizer uses Adam, the initial learning rate is 1e-3, and the learning rate decay strategy is cosine annealing.

According to another aspect of the present invention, an object grasping and positioning recognition system based on multi-task convolution is provided, which is characterized in that it includes: an image acquisition and output module configured to acquire images and position information of objects within the field of view, And output the obtained image information to the user; the image input module is configured to input the obtained image information into the Darknet-53 network, and extracts the multi-scale feature information of the shared layer; the information mapping module is configured to use The extracted multi-scale feature information is mapped to the grasping object recognition and detection Multi-task Branch-exclusive layer and grasping object positioning ED-YOLO branch-exclusive layer; parameter update module, configured for the two branch-exclusive layers to independently calculate loss values according to their loss functions, and use back propagation algorithm to The exclusive parameters and shared parameters of the branch where the network performs backpropagation iterations; the training module is configured for multi-task recognition and detection of shared layer multi-scale feature information and grasping objects The branch exclusive layer and the grasping object positioning ED-YOLO branch exclusive layer are alternately trained; the result acquisition module is configured to use non-maximum suppression processing to obtain the final object image and position information.

Further, the training module includes full parameter update training of the shared layer and two exclusive layers in the early stage, and parameter update training of the exclusive layer in the later stage.

According to another aspect of the present invention, there is provided a device, comprising: one or more processors; a memory for storing one or more programs, when the one or more programs are processed by the one or more When executed by a processor, the one or more processors are executed to perform any of the methods described above.

According to another aspect of the present invention, a computer-readable storage medium storing a computer program is provided, and when the program is executed by a processor, the method described in any one of the above items is implemented.

Beneficial effect

Compared with the prior art, the beneficial effect of the present invention is that: the present invention takes the Darknet-53 network as the center, combines multi-scale adaptive feature fusion with atrous convolution, and forms a complete grasping and positioning network ED-YOLO network; also adopts multi-scale Feature fusion forms a complete grasp recognition detection branch Multi-task, which fully increases the receptive field in the branch and improves the recognition ability of the branch; divides the network structure into a feature extraction shared layer and two branch exclusive layers, using A strategy of alternate training of exclusive parameters and shared parameters solves the training problem of multi-task networks.

Description of drawings

Accompanying drawing 1 is a schematic diagram of the network structure of the present invention; Accompanying drawing 2 is a schematic diagram of updating multi-task network parameters; Accompanying drawing 3 is a flow chart of multi-task joint detection; Accompanying drawing 4 is a loss value variation curve in multi-task joint detection; 5 is the accuracy value change curve in multi-task joint detection; accompanying drawing 6 is a schematic structural diagram of the device of the present invention.

Detailed ways

Below in conjunction with specific embodiment, further illustrate the present invention. It should be understood that these examples are only used to illustrate the present invention and are not intended to limit the scope of the present invention. In addition, it should be understood that after reading the content taught by the present invention, those skilled in the art may make various changes or modifications to the present invention, and these equivalent forms also fall within the scope defined in the present application.

The same data set picture can be used for the positioning and recognition tasks of the grasping object, but the functions of the data set in the two neural networks are different, and the corresponding label forms are also different, so the exclusive parameters of the two network branches cannot be trained at the same time . However, according to the network structure, the feature extraction network can be trained when the two tasks are trained, and its internal parameters are shared parameters of the two branches. Based on the above characteristics, the embodiment of the present application provides an object grasping and positioning recognition algorithm, system and device based on multi-task convolution.

The following first introduces the multi-task convolution-based object grasping and positioning recognition algorithm provided by the embodiment of the present application in conjunction with Figures 1-3. As shown in Figure 1, it mainly includes Darknet-53 feature extraction module, Multi-task Branch, ED-YOLO branch 3, among them, Darknet-53 feature extraction module, Multi-task The branches constitute a complete grasping object recognition and detection network; the Darknet-53 feature extraction module and the ED-YOLO branch form a complete grasping and positioning network. The specific method is as follows: obtain the image and position information of the object within the field of view, and send The user outputs the acquired image information; inputs the acquired image information into the Darknet-53 network, extracts the multi-scale feature information of the shared layer; maps the extracted multi-scale feature information to the grasping object recognition and detection Multi- Exclusive layer of task branch and grabber positioning ED-YOLO branch exclusive layer; where Multi-task The branch exclusive layer uses a deeper two-layer convolutional feature map to predict the confidence of the object category, the object frame coordinates, and the target category. The exclusive layer of the ED-YOLO branch uses all three layers of convolutional feature maps to predict the confidence and position of the grasping frame. Confidence indicates the probability that the real result falls near the predicted result; border regression and grasping frame regression refer to predicting the border of the object and the grasping frame of the object through regression analysis. Regression analysis refers to: an analysis method to determine the quantitative relationship between two or more variables that depend on each other. In our invention, it refers to the relationship between the border, the grasping frame and the object to be detected. Analyzing this relationship can Make corresponding predictions for untrained objects. 3 Target classification refers to: identifying the target object and judging which category the object belongs to.

The two branch-exclusive layers independently calculate the loss value according to the loss function, and use the back-propagation algorithm to respectively back-propagate the exclusive parameters and shared parameters of the branch where the network is iterated according to the loss value; the multi-scale feature information of the shared layer, grasp Extraction identification detection Multi-task The branch-exclusive layer and the grab object positioning ED-YOLO branch-exclusive layer are alternately trained.

As shown in Figure 2, the training strategy of alternately updating shared parameters and exclusive parameters needs to divide the training process into two stages, the early stage and the late stage.

The early stage is the full parameter update stage. In this stage, all parameters of the network will be updated and trained in sequence. But because the two branches cannot be updated at the same time, the parameters of one of the branches must be fixed first. In the present invention, firstly, the parameters of the ED-YOLO branch are fixed, the Multi-task exclusive layer parameters are trained, the Multi-task prediction loss is calculated, and the network parameters of the shared layer and the Multi-task exclusive layer are updated in conjunction with the learning attenuation strategy; then , fix the parameters of the Multi-task exclusive layer, train the network parameters of the ED-YOLO exclusive layer, calculate the ED-YOLO prediction loss, update the network parameters shared with the ED-YOLO exclusive layer with the learning attenuation strategy, and improve the network’s ability to capture box detection capabilities. At this stage, the training optimizer uses Adam, the initial learning rate is 1e-3, and the learning rate decay strategy is cosine annealing.

The later stage is the exclusive parameter update stage. In this stage, the parameters of the shared layer are fixed, and only the parameters of the two exclusive layers are trained and updated in sequence. Different from the previous period, the update training of the exclusive layer parameters in the latter stage at this stage is as follows: First, fix the parameters of the Multi-task exclusive layer, perform parameter training on the ED-YOLO exclusive layer, calculate the ED-YOLO prediction loss, and cooperate with the learning attenuation Policy update ED-YOLO exclusive layer network parameters; then, fix ED-YOLO exclusive layer parameters, train Multi-task exclusive layer parameters, calculate Multi-task prediction loss, and update Multi-task prediction loss exclusive with learning attenuation strategy layer network parameters. At this stage, the training optimizer uses Adam, the initial learning rate is 1e-3, and the learning rate decay strategy is cosine annealing.

After the last ED-YOLO exclusive layer update is completed, the Multi-task exclusive layer is no longer updated, and the parameters of the trained Multi-task exclusive layer are fine-tuned to ensure that both branches can achieve a good convergence effect. The parameter fine-tuning is to manually adjust the parameters according to the experience value according to the effect, and it may add a few tenths and subtract a few tenths each time to achieve the desired effect.

Finally, non-maximum suppression processing is used to obtain the final object image and position information.

In order to verify the performance of the multi-task grasping joint detection network and the effect of the training strategy, in this embodiment, the training and experiments are carried out on the ROCm-4.2.0 platform driven by Pytorch1.9 under the CentOS 7.6 system, the Docker version is 20.10.7, Python The version is 3.7, the computer is Huawei Matebook14, and the data set adopts the Cornell data set commonly used in the field of grasping detection.

Figure 4 shows the change of loss value and accuracy with the change of the number of iteration steps in the grab frame detection and grab object recognition tasks. From the loss change curve in Figure 4, it can be seen that the loss value of the multi-task joint detection network varies with As the number of iterations increases, effective convergence can be achieved. Since every two iterations of grabbing frame positioning detection iterations are performed for grabbing item identification and detection, the convergence speed of grabbing frame detection is significantly faster than that of grabbing object classification detection. Speed, and in the full parameter training phase, since the shared layer needs to be trained for two tasks successively, shocks inevitably occur during the training process. When the exclusive parameter is updated in the later stage of training, it can be seen that the oscillation is suppressed to a certain extent, which verifies the effectiveness of the training strategy of the present invention.

The accuracy comparison of each task in the multi-task and single-task network is shown in Table 1 below. Based on the above experimental results, in the single-task ED-YOLO network, the detection accuracy of the grasping frame is 96.79%, and in the multi-task network MT-YOLO In ED-YOLO, the detection accuracy of the grasping frame is 95.85%. The detection accuracy of the same task in the two models is equivalent, both of which have good accuracy and robustness. At the same time, the accuracy of the item classification task reaches 93.34%. The completion effect is good, and it can adapt to the task requirements of multi-task joint detection.

The present embodiment provides a kind of object grasping positioning recognition system based on multi-task convolution, comprising: image acquisition output module, configured to acquire the image and position information of object in the field of view, and output described to the user The obtained image information; the image input module is configured to input the obtained image information into the Darknet-53 network to extract the multi-scale feature information of the shared layer; the information mapping module is configured to use the extracted multi-scale feature The information is mapped to the grab object recognition detection Multi-task branch exclusive layer and the grab object positioning ED-YOLO branch exclusive layer; the parameter update module is configured for the two branch exclusive layers to independently calculate the loss value according to their loss functions, And according to the loss value, use the back-propagation algorithm to back-propagate the exclusive parameters and shared parameters of the branch where the network is iterated; the training module is configured to identify and detect the multi-task branch of the shared layer multi-scale feature information and grasping objects The exclusive layer and the grasping object positioning ED-YOLO branch alternately train the exclusive layer; the result acquisition module is configured to use non-maximum suppression processing to obtain the final object image and position information.

Further, the training module includes full parameter update training of the shared layer and two exclusive layers in the early stage, and parameter update training of the exclusive layer in the later stage. Specifically, the full parameter update training of the shared layer and the two exclusive layers in the early stage is : First, fix the parameters of the ED-YOLO branch, train the Multi-task exclusive layer parameters, calculate the Multi-task prediction loss, and update the network parameters of the shared layer and the Multi-task exclusive layer with the learning attenuation strategy; then, fix the Multi-task exclusive layer -task exclusive layer parameters, train the ED-YOLO exclusive layer network parameters, calculate the ED-YOLO prediction loss, and update the network parameters shared with the ED-YOLO exclusive layer in conjunction with the learning attenuation strategy. The update training of the exclusive layer parameters in the later stage is as follows: First, fix the parameters of the Multi-task exclusive layer, perform parameter training on the ED-YOLO exclusive layer, calculate the ED-YOLO prediction loss, and update the ED-YOLO exclusive layer with the learning attenuation strategy. Shared layer network parameters; then, fix ED-YOLO exclusive layer parameters, train Multi-task exclusive layer parameters, calculate Multi-task prediction loss, update Multi-task prediction loss exclusive layer network parameters with learning attenuation strategy, and optimize training Adam is used as the controller, the initial learning rate is 1e-3, and the learning rate decay strategy is cosine annealing. After the last ED-YOLO exclusive layer update is completed, the Multi-task exclusive layer is no longer updated, and the parameters of the trained Multi-task exclusive layer are fine-tuned.

A robot in this embodiment, the robot includes: one or more processors; a memory for storing one or more programs, when the one or more programs are executed by the one or more processors , so that the one or more processors execute the method described in any one of the above, with the Darknet-53 network as the center, combined with atrous convolution for multi-scale adaptive feature fusion, to form a complete grasping and positioning network ED-YOLO Network; also adopts multi-scale feature fusion to form a complete grab object recognition detection branch Multi-task, which fully increases the receptive field in the branch and improves the recognition ability of the branch; divides the network structure into a feature extraction shared layer and two The branch exclusive layer uses a strategy of alternating training of exclusive parameters and shared parameters to solve the training problem of multi-task networks.

A computer-readable storage medium storing a computer program in this embodiment, when the program is executed by a processor, implements the method described in any one of the above, stores an object grasping and positioning recognition algorithm based on multi-task convolution, and The network structure is divided into a feature extraction shared layer and two branch exclusive layers. A strategy of alternating training of exclusive parameters and shared parameters is used to solve the training problem of multi-task networks.

Further introduction is as follows: the computer system includes a central processing unit (CPU) 101, which can execute various Appropriate action and handling. in RAM103 In it, various programs and data required for system operation are also stored. The CPU 101 , ROM 102 , and RAM 103 are connected to each other via a bus 104 . An input/output (I/O) interface 105 is also connected to the bus 104 .

The following components are connected to the I/O interface 105: an input section 106 including a keyboard, a mouse, etc.; an output section including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker; a storage section 108 including a hard disk, etc.; And a communication section 109 including a network interface card such as a LAN card, a modem, and the like. The communication section 109 performs communication processing via a network such as the Internet. Drives are also connected to the I/O interface 105 as needed. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc. is mounted on the drive 510 as necessary so that a computer program read therefrom is installed into the storage section 108 as necessary.

In particular, according to an embodiment of the present invention, the process described above with reference to the flowchart 3 may be implemented as a computer software program. For example, Embodiment 1 of the present invention includes a computer program product, which includes a computer program carried on a computer-readable medium, where the computer program includes program code for executing the method shown in the flowchart. In such an embodiment, the computer program can be downloaded and installed from a network via the communication part, and/or installed from a removable medium. When this computer program is executed by a central processing unit (CPU) 101, the above-mentioned functions defined in the system of the present application are performed.

It should be noted that the computer-readable medium shown in the present invention may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two. A computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the present invention, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present invention, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, in which computer-readable program codes are carried. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device. . Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Block diagram 6 in the accompanying drawings illustrates the architecture, functions and operations of a possible implementation of the system, method and computer program product according to various embodiments 1 of the present invention. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or portion of code that includes one or more logical functions for implementing specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block in the block diagrams or flowchart illustrations, and combinations of blocks in the block diagrams or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified function or operation, or can be implemented by a A combination of dedicated hardware and computer instructions.

The units described in the embodiments of the present invention may be implemented by software or by hardware, and the described units may also be set in a processor. Wherein, the names of these units do not constitute a limitation of the unit itself under certain circumstances. The described unit or module can also be set in the processor. For example, it can be described as: a multi-task convolution-based object grasping and positioning recognition system, including: image acquisition and output module, image input module, information mapping module, Parameter update module, training module, and result acquisition module, wherein the names of these units do not constitute a limitation of the unit itself under certain circumstances. For example, the image acquisition and output module can also be described as "the Image and location information acquisition output module".

It should be noted that although several modules or units of the device for action execution are mentioned in the above detailed description, this division is not mandatory. Actually, according to the embodiment of the present disclosure, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of one module or unit described above can be further divided to be embodied by a plurality of modules or units.

In addition, although steps of the methods of the present disclosure are depicted in the drawings in a particular order, there is no requirement or implication that the steps must be performed in that particular order, or that all illustrated steps must be performed to achieve the desired result. Additionally or alternatively, certain steps may be omitted, multiple steps may be combined into one step for execution, and/or one step may be decomposed into multiple steps for execution, etc.

The above description is only a preferred embodiment of the present application and an illustration of the applied technical principles. Those skilled in the art should understand that the scope of the invention involved in this application is not limited to the technical solution formed by the specific combination of the above-mentioned technical features, but should also cover the technical solution formed by the above-mentioned technical features without departing from the inventive concept. Other technical solutions formed by any combination of or equivalent features thereof. For example, the features described above have similar functions to those disclosed (but not limited to) in this application.

In the description of this specification, descriptions referring to the terms "one embodiment", "example", "specific example" and the like mean that specific features, structures, materials or characteristics described in connection with the embodiment or example are included in at least one embodiment of the present invention. In an embodiment or example. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The preferred embodiments of the invention disclosed above are only to help illustrate the invention. The preferred embodiments are not exhaustive in all detail, nor are the inventions limited to specific embodiments described. Obviously, many modifications and variations can be made based on the contents of this specification. This description selects and specifically describes these embodiments in order to better explain the principle and practical application of the present invention, so that those skilled in the art can well understand and utilize the present invention. The invention is to be limited only by the claims, along with their full scope and equivalents.

Claims

An object grasping and positioning recognition algorithm based on multi-task convolution is characterized in that it includes the following steps: acquiring images and position information of objects within the field of view, and outputting the acquired image information to users; Input the image information into the Darknet-53 network to extract the multi-scale feature information of the shared layer; map the extracted multi-scale feature information to the exclusive layer of the grasp recognition detection Multi-task branch and the grasp positioning ED-YOLO Branch-only layer; the two branch-only layers independently calculate the loss value according to their loss functions, and use the back-propagation algorithm to perform back-propagation on the network according to the loss value. The exclusive parameters and shared parameters of the branch where the iteration is located; Multi-scale feature information, grasping object recognition and detection Multi-task branch exclusive layer and grasping object positioning ED-YOLO branch exclusive layer are alternately trained; non-maximum suppression processing is used to obtain the final object image and position information.
The multi-task convolution-based object grasping and positioning recognition algorithm according to claim 1, wherein the alternate training includes full parameter update training of the shared layer and two exclusive layers in the early stage, and the independent training in the later stage. Shared layer parameter update training.
The multi-task convolution-based object grasping and positioning recognition algorithm according to claim 2, characterized in that, the full parameter update training of the shared layer and the two exclusive layers in the early stage is as follows: firstly, fixed ED-YOLO The parameters of the branch, train the Multi-task exclusive layer parameters, calculate the Multi-task prediction loss, and update the network parameters of the shared layer and the Multi-task exclusive layer in conjunction with the learning attenuation strategy; then, fix the Multi-task exclusive layer parameters, Train the network parameters of the ED-YOLO exclusive layer, calculate the ED-YOLO prediction loss, and update the network parameters shared with the ED-YOLO exclusive layer in conjunction with the learning attenuation strategy.
According to claim 2 or 3, a multi-task convolution-based object grasping and positioning recognition algorithm is characterized in that, the later exclusive layer parameter update training is as follows: first, fixing the Multi-task exclusive layer parameters , perform parameter training on the ED-YOLO exclusive layer, calculate the ED-YOLO prediction loss, and update the network parameters of the ED-YOLO exclusive layer with the learning attenuation strategy; then, fix the parameters of the ED-YOLO exclusive layer, and train the Multi-task exclusive Layer parameters, calculate the Multi-task prediction loss, and update the multi-task prediction loss exclusive layer network parameters in conjunction with the learning decay strategy.
A kind of object grasping location recognition algorithm based on multi-task convolution according to claim 4, it is characterized in that, after the last ED-YOLO exclusive layer update is completed, the Multi-task exclusive layer is no longer updated, for training The subsequent Multi-task exclusive layer performs parameter fine-tuning.
The multi-task convolution-based object grasping and positioning recognition algorithm according to claim 4, wherein the training optimizer adopts Adam, the initial learning rate is 1e-3, and the learning rate decay strategy is cosine annealing.
An object grasping and positioning recognition system based on multi-task convolution, characterized in that it includes: an image acquisition and output module configured to acquire images and position information of objects within the field of view, and output the acquired information to the user Image information; the image input module is configured to input the obtained image information into the Darknet-53 network, and extracts the multi-scale feature information of the shared layer; the information mapping module is configured to map the extracted multi-scale feature information to Grasper recognition detection Multi-task branch exclusive layer and grab object positioning ED-YOLO branch exclusive layer; parameter update module, configured for the two branch exclusive layers to independently calculate the loss value according to their loss function, and according to the loss The values use the backpropagation algorithm to backpropagate the exclusive parameters and shared parameters of the branch where the network is iterated; the training module is configured for the multi-scale feature information of the shared layer and the recognition and detection of the grasping object. The exclusive layer of the Multi-task branch And grab object positioning ED-YOLO branch exclusive layer alternate training; The result acquisition module is configured to use non-maximum suppression processing to obtain the final object image and position information.
A multi-task convolution-based object grasping and positioning recognition system according to claim 7, characterized in that the training module includes a shared layer in the early stage and full parameter update training of two exclusive layers, and an exclusive layer in the later stage Parameter update training.
A robot comprising: one or more processors; a memory for storing one or more programs that, when executed by the one or more processors, cause the one or more A processor executes the method according to any one of claims 1-6.
A computer-readable storage medium, the computer-readable storage medium storing a computer program, characterized in that, when the computer program is executed by a processor, the method according to any one of claims 1-6 is implemented.