WO2023165161A1 - Multi-task convolution-based object grasping and positioning identification algorithm and system, and robot - Google Patents

Multi-task convolution-based object grasping and positioning identification algorithm and system, and robot Download PDF

Info

Publication number
WO2023165161A1
WO2023165161A1 PCT/CN2022/131276 CN2022131276W WO2023165161A1 WO 2023165161 A1 WO2023165161 A1 WO 2023165161A1 CN 2022131276 W CN2022131276 W CN 2022131276W WO 2023165161 A1 WO2023165161 A1 WO 2023165161A1
Authority
WO
WIPO (PCT)
Prior art keywords
exclusive
layer
task
branch
parameters
Prior art date
Application number
PCT/CN2022/131276
Other languages
French (fr)
Chinese (zh)
Inventor
赵景波
国珍
房桐
杜保帅
张田
张晓寒
Original Assignee
青岛理工大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 青岛理工大学 filed Critical 青岛理工大学
Publication of WO2023165161A1 publication Critical patent/WO2023165161A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • the invention relates to the technical field of robot control, in particular to an object grasping and positioning recognition algorithm, system and equipment based on multi-task convolution.
  • Robot arm grasping is an important basic operation of robots, and it has a wide range of applications in many fields such as sorting, assembly and service robots of industrial parts.
  • Traditional industrial robot grasping can only meet fixed task requirements, and set fixed control parameters for each joint and connecting rod of the robot by offline programming or teaching programming to complete a series of fixed actions. This requires that the position and pose of the grasping target need to be consistent, otherwise the grasping effect will be greatly affected.
  • This traditional grasping mode is extremely inflexible and can only complete some simple tasks.
  • the parameters need to be reset, and it cannot face multiple grasping targets or the position of the grasping target is different. The fixed situation cannot meet the increasing demand for industrial production.
  • the YOLO detection network itself is widely used in the task of item classification.
  • the image information of the object to be captured is input into the two networks respectively, and the item capture frame and the classification result of the item can be obtained at the same time, but there is no doubt that this method It will increase the computing and storage burden of the computer, greatly increase the time-consuming to complete the task, and have a huge impact on the real-time requirements and work efficiency of the industrial robot's grasping task.
  • the present invention provides an object grasping and positioning recognition algorithm, system and robot based on multi-task convolution.
  • the grasping frame can be completed simultaneously in one model Detection and Grab Identification Classification Detection.
  • an object grasping and positioning recognition algorithm based on multi-task convolution which includes the following steps: acquiring the image and position of the object within the field of view information, and output the obtained image information to the user; input the obtained image information into the Darknet-53 network, extract the multi-scale feature information of the shared layer; map the extracted multi-scale feature information to the grasped object Recognition and detection Multi-task branch exclusive layer and grasping object positioning ED-YOLO branch exclusive layer; the two branch exclusive layers independently calculate the loss value according to their loss function, and use the back propagation algorithm to perform network analysis according to the loss value Exclusive parameters and shared parameters of the branch where the backpropagation iteration is located; Multi-task for multi-scale feature information of the shared layer, recognition and detection of grasping objects The branch exclusive layer and the grasping object positioning ED-YOLO branch exclusive layer are trained alternately; the final object image and position information are obtained by non-maximum suppression
  • the alternate training includes full parameter update training of the shared layer and two exclusive layers in the early stage, and parameter update training of the exclusive layer in the later stage.
  • the update training of the full parameters of the shared layer and the two exclusive layers in the previous period is as follows: First, the parameters of the ED-YOLO branch are fixed, the parameters of the Multi-task exclusive layer are trained, the Multi-task prediction loss is calculated, and learning The attenuation strategy updates the network parameters of the shared layer and the Multi-task exclusive layer; then, fixes the parameters of the Multi-task exclusive layer, trains the network parameters of the ED-YOLO exclusive layer, calculates the ED-YOLO prediction loss, and updates the shared layer with the learning attenuation strategy into the network parameters of the exclusive layer of ED-YOLO.
  • the update training of the exclusive layer parameters in the later stage is as follows: First, fix the Multi-task exclusive layer parameters, perform parameter training on the ED-YOLO exclusive layer, calculate the ED-YOLO prediction loss, and update the ED with the learning decay strategy -YOLO exclusive layer network parameters; then, fix the ED-YOLO exclusive layer parameters, train the Multi-task exclusive layer parameters, calculate the Multi-task prediction loss, and update the Multi-task prediction loss exclusive layer network parameters with the learning attenuation strategy .
  • the Multi-task exclusive layer is no longer updated, and the parameters of the trained Multi-task exclusive layer are fine-tuned.
  • the training optimizer uses Adam, the initial learning rate is 1e-3, and the learning rate decay strategy is cosine annealing.
  • an object grasping and positioning recognition system based on multi-task convolution which is characterized in that it includes: an image acquisition and output module configured to acquire images and position information of objects within the field of view, And output the obtained image information to the user; the image input module is configured to input the obtained image information into the Darknet-53 network, and extracts the multi-scale feature information of the shared layer; the information mapping module is configured to use The extracted multi-scale feature information is mapped to the grasping object recognition and detection Multi-task Branch-exclusive layer and grasping object positioning ED-YOLO branch-exclusive layer; parameter update module, configured for the two branch-exclusive layers to independently calculate loss values according to their loss functions, and use back propagation algorithm to The exclusive parameters and shared parameters of the branch where the network performs backpropagation iterations; the training module is configured for multi-task recognition and detection of shared layer multi-scale feature information and grasping objects The branch exclusive layer and the grasping object positioning ED-YOLO branch exclusive layer are alternately trained; the result acquisition module
  • the training module includes full parameter update training of the shared layer and two exclusive layers in the early stage, and parameter update training of the exclusive layer in the later stage.
  • a device comprising: one or more processors; a memory for storing one or more programs, when the one or more programs are processed by the one or more When executed by a processor, the one or more processors are executed to perform any of the methods described above.
  • a computer-readable storage medium storing a computer program
  • the program when executed by a processor, the method described in any one of the above items is implemented.
  • the beneficial effect of the present invention is that: the present invention takes the Darknet-53 network as the center, combines multi-scale adaptive feature fusion with atrous convolution, and forms a complete grasping and positioning network ED-YOLO network; also adopts multi-scale Feature fusion forms a complete grasp recognition detection branch Multi-task, which fully increases the receptive field in the branch and improves the recognition ability of the branch; divides the network structure into a feature extraction shared layer and two branch exclusive layers, using A strategy of alternate training of exclusive parameters and shared parameters solves the training problem of multi-task networks.
  • Accompanying drawing 1 is a schematic diagram of the network structure of the present invention
  • Accompanying drawing 2 is a schematic diagram of updating multi-task network parameters
  • Accompanying drawing 3 is a flow chart of multi-task joint detection
  • Accompanying drawing 4 is a loss value variation curve in multi-task joint detection
  • 5 is the accuracy value change curve in multi-task joint detection
  • accompanying drawing 6 is a schematic structural diagram of the device of the present invention.
  • the same data set picture can be used for the positioning and recognition tasks of the grasping object, but the functions of the data set in the two neural networks are different, and the corresponding label forms are also different, so the exclusive parameters of the two network branches cannot be trained at the same time .
  • the feature extraction network can be trained when the two tasks are trained, and its internal parameters are shared parameters of the two branches.
  • FIG. 1 it mainly includes Darknet-53 feature extraction module, Multi-task Branch, ED-YOLO branch 3, among them, Darknet-53 feature extraction module, Multi-task
  • the branches constitute a complete grasping object recognition and detection network; the Darknet-53 feature extraction module and the ED-YOLO branch form a complete grasping and positioning network.
  • the specific method is as follows: obtain the image and position information of the object within the field of view, and send The user outputs the acquired image information; inputs the acquired image information into the Darknet-53 network, extracts the multi-scale feature information of the shared layer; maps the extracted multi-scale feature information to the grasping object recognition and detection Multi- Exclusive layer of task branch and grabber positioning ED-YOLO branch exclusive layer; where Multi-task The branch exclusive layer uses a deeper two-layer convolutional feature map to predict the confidence of the object category, the object frame coordinates, and the target category.
  • the exclusive layer of the ED-YOLO branch uses all three layers of convolutional feature maps to predict the confidence and position of the grasping frame.
  • Confidence indicates the probability that the real result falls near the predicted result
  • border regression and grasping frame regression refer to predicting the border of the object and the grasping frame of the object through regression analysis.
  • Regression analysis refers to: an analysis method to determine the quantitative relationship between two or more variables that depend on each other. In our invention, it refers to the relationship between the border, the grasping frame and the object to be detected. Analyzing this relationship can Make corresponding predictions for untrained objects.
  • Target classification refers to: identifying the target object and judging which category the object belongs to.
  • the two branch-exclusive layers independently calculate the loss value according to the loss function, and use the back-propagation algorithm to respectively back-propagate the exclusive parameters and shared parameters of the branch where the network is iterated according to the loss value; the multi-scale feature information of the shared layer, grasp Extraction identification detection Multi-task
  • the branch-exclusive layer and the grab object positioning ED-YOLO branch-exclusive layer are alternately trained.
  • the training strategy of alternately updating shared parameters and exclusive parameters needs to divide the training process into two stages, the early stage and the late stage.
  • the early stage is the full parameter update stage. In this stage, all parameters of the network will be updated and trained in sequence. But because the two branches cannot be updated at the same time, the parameters of one of the branches must be fixed first.
  • the parameters of the ED-YOLO branch are fixed, the Multi-task exclusive layer parameters are trained, the Multi-task prediction loss is calculated, and the network parameters of the shared layer and the Multi-task exclusive layer are updated in conjunction with the learning attenuation strategy; then , fix the parameters of the Multi-task exclusive layer, train the network parameters of the ED-YOLO exclusive layer, calculate the ED-YOLO prediction loss, update the network parameters shared with the ED-YOLO exclusive layer with the learning attenuation strategy, and improve the network’s ability to capture box detection capabilities.
  • the training optimizer uses Adam, the initial learning rate is 1e-3, and the learning rate decay strategy is cosine annealing.
  • the later stage is the exclusive parameter update stage.
  • the parameters of the shared layer are fixed, and only the parameters of the two exclusive layers are trained and updated in sequence.
  • the update training of the exclusive layer parameters in the latter stage at this stage is as follows: First, fix the parameters of the Multi-task exclusive layer, perform parameter training on the ED-YOLO exclusive layer, calculate the ED-YOLO prediction loss, and cooperate with the learning attenuation Policy update ED-YOLO exclusive layer network parameters; then, fix ED-YOLO exclusive layer parameters, train Multi-task exclusive layer parameters, calculate Multi-task prediction loss, and update Multi-task prediction loss exclusive with learning attenuation strategy layer network parameters.
  • the training optimizer uses Adam, the initial learning rate is 1e-3, and the learning rate decay strategy is cosine annealing.
  • the Multi-task exclusive layer is no longer updated, and the parameters of the trained Multi-task exclusive layer are fine-tuned to ensure that both branches can achieve a good convergence effect.
  • the parameter fine-tuning is to manually adjust the parameters according to the experience value according to the effect, and it may add a few tenths and subtract a few tenths each time to achieve the desired effect.
  • non-maximum suppression processing is used to obtain the final object image and position information.
  • the training and experiments are carried out on the ROCm-4.2.0 platform driven by Pytorch1.9 under the CentOS 7.6 system, the Docker version is 20.10.7, Python The version is 3.7, the computer is Huawei Matebook14, and the data set adopts the Cornell data set commonly used in the field of grasping detection.
  • Figure 4 shows the change of loss value and accuracy with the change of the number of iteration steps in the grab frame detection and grab object recognition tasks. From the loss change curve in Figure 4, it can be seen that the loss value of the multi-task joint detection network varies with As the number of iterations increases, effective convergence can be achieved. Since every two iterations of grabbing frame positioning detection iterations are performed for grabbing item identification and detection, the convergence speed of grabbing frame detection is significantly faster than that of grabbing object classification detection. Speed, and in the full parameter training phase, since the shared layer needs to be trained for two tasks successively, shocks inevitably occur during the training process. When the exclusive parameter is updated in the later stage of training, it can be seen that the oscillation is suppressed to a certain extent, which verifies the effectiveness of the training strategy of the present invention.
  • the present embodiment provides a kind of object grasping positioning recognition system based on multi-task convolution, comprising: image acquisition output module, configured to acquire the image and position information of object in the field of view, and output described to the user The obtained image information; the image input module is configured to input the obtained image information into the Darknet-53 network to extract the multi-scale feature information of the shared layer; the information mapping module is configured to use the extracted multi-scale feature The information is mapped to the grab object recognition detection Multi-task branch exclusive layer and the grab object positioning ED-YOLO branch exclusive layer; the parameter update module is configured for the two branch exclusive layers to independently calculate the loss value according to their loss functions, And according to the loss value, use the back-propagation algorithm to back-propagate the exclusive parameters and shared parameters of the branch where the network is iterated; the training module is configured to identify and detect the multi-task branch of the shared layer multi-scale feature information and grasping objects The exclusive layer and the grasping object positioning ED-YOLO branch alternately train the exclusive layer; the result acquisition module is configured
  • the training module includes full parameter update training of the shared layer and two exclusive layers in the early stage, and parameter update training of the exclusive layer in the later stage.
  • the full parameter update training of the shared layer and the two exclusive layers in the early stage is : First, fix the parameters of the ED-YOLO branch, train the Multi-task exclusive layer parameters, calculate the Multi-task prediction loss, and update the network parameters of the shared layer and the Multi-task exclusive layer with the learning attenuation strategy; then, fix the Multi-task exclusive layer -task exclusive layer parameters, train the ED-YOLO exclusive layer network parameters, calculate the ED-YOLO prediction loss, and update the network parameters shared with the ED-YOLO exclusive layer in conjunction with the learning attenuation strategy.
  • the update training of the exclusive layer parameters in the later stage is as follows: First, fix the parameters of the Multi-task exclusive layer, perform parameter training on the ED-YOLO exclusive layer, calculate the ED-YOLO prediction loss, and update the ED-YOLO exclusive layer with the learning attenuation strategy. Shared layer network parameters; then, fix ED-YOLO exclusive layer parameters, train Multi-task exclusive layer parameters, calculate Multi-task prediction loss, update Multi-task prediction loss exclusive layer network parameters with learning attenuation strategy, and optimize training Adam is used as the controller, the initial learning rate is 1e-3, and the learning rate decay strategy is cosine annealing. After the last ED-YOLO exclusive layer update is completed, the Multi-task exclusive layer is no longer updated, and the parameters of the trained Multi-task exclusive layer are fine-tuned.
  • the robot includes: one or more processors; a memory for storing one or more programs, when the one or more programs are executed by the one or more processors , so that the one or more processors execute the method described in any one of the above, with the Darknet-53 network as the center, combined with atrous convolution for multi-scale adaptive feature fusion, to form a complete grasping and positioning network ED-YOLO Network; also adopts multi-scale feature fusion to form a complete grab object recognition detection branch Multi-task, which fully increases the receptive field in the branch and improves the recognition ability of the branch; divides the network structure into a feature extraction shared layer and two The branch exclusive layer uses a strategy of alternating training of exclusive parameters and shared parameters to solve the training problem of multi-task networks.
  • a computer-readable storage medium storing a computer program in this embodiment when the program is executed by a processor, implements the method described in any one of the above, stores an object grasping and positioning recognition algorithm based on multi-task convolution, and
  • the network structure is divided into a feature extraction shared layer and two branch exclusive layers.
  • a strategy of alternating training of exclusive parameters and shared parameters is used to solve the training problem of multi-task networks.
  • the computer system includes a central processing unit (CPU) 101, which can execute various Appropriate action and handling.
  • RAM103 In it, various programs and data required for system operation are also stored.
  • the CPU 101 , ROM 102 , and RAM 103 are connected to each other via a bus 104 .
  • An input/output (I/O) interface 105 is also connected to the bus 104 .
  • the following components are connected to the I/O interface 105: an input section 106 including a keyboard, a mouse, etc.; an output section including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker; a storage section 108 including a hard disk, etc.; And a communication section 109 including a network interface card such as a LAN card, a modem, and the like.
  • the communication section 109 performs communication processing via a network such as the Internet.
  • Drives are also connected to the I/O interface 105 as needed.
  • a removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc. is mounted on the drive 510 as necessary so that a computer program read therefrom is installed into the storage section 108 as necessary.
  • Embodiment 1 of the present invention includes a computer program product, which includes a computer program carried on a computer-readable medium, where the computer program includes program code for executing the method shown in the flowchart.
  • the computer program can be downloaded and installed from a network via the communication part, and/or installed from a removable medium.
  • CPU central processing unit
  • the computer-readable medium shown in the present invention may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, in which computer-readable program codes are carried. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • Block diagram 6 in the accompanying drawings illustrates the architecture, functions and operations of a possible implementation of the system, method and computer program product according to various embodiments 1 of the present invention.
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that includes one or more logical functions for implementing specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block in the block diagrams or flowchart illustrations, and combinations of blocks in the block diagrams or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified function or operation, or can be implemented by a A combination of dedicated hardware and computer instructions.
  • the units described in the embodiments of the present invention may be implemented by software or by hardware, and the described units may also be set in a processor. Wherein, the names of these units do not constitute a limitation of the unit itself under certain circumstances.
  • the described unit or module can also be set in the processor. For example, it can be described as: a multi-task convolution-based object grasping and positioning recognition system, including: image acquisition and output module, image input module, information mapping module, Parameter update module, training module, and result acquisition module, wherein the names of these units do not constitute a limitation of the unit itself under certain circumstances.
  • the image acquisition and output module can also be described as "the Image and location information acquisition output module".
  • steps of the methods of the present disclosure are depicted in the drawings in a particular order, there is no requirement or implication that the steps must be performed in that particular order, or that all illustrated steps must be performed to achieve the desired result. Additionally or alternatively, certain steps may be omitted, multiple steps may be combined into one step for execution, and/or one step may be decomposed into multiple steps for execution, etc.

Abstract

The invention discloses a multi-task convolution-based object grasping and positioning identification algorithm and system, and a robot. The method comprises: obtaining the image and position information of an object in a field of view range; inputting the acquired image information into a Darknet-53 network, and extracting multi-scale feature information of a shared layer; mapping the extracted multi-scale feature information to a captured object recognition and detection Multi-Task branch exclusive layer and a captured object positioning ED-YOLO branch exclusive layer; separately calculating a loss value of each of the two branch exclusive layers according to a loss function thereof, and according to the loss values, separately carrying out, on the network, back propagation iteration of a branch exclusive parameter and a shared parameter; alternately training the shared layer and the branch exclusive layers; and using non-maximum suppression processing to obtain the final object image and position information. The method has the beneficial effect that, on the basis of ensuring stability and real-time action, grasping box detection and grasping object recognition and classification detection are completed in one model at the same time.

Description

基于多任务卷积的物体抓取定位识别算法、系统和机器人Object grasping and positioning recognition algorithm, system and robot based on multi-task convolution 技术领域technical field
本发明涉及机器人控制技术领域,具体是一种基于多任务卷积的物体抓取定位识别算法、系统和设备。The invention relates to the technical field of robot control, in particular to an object grasping and positioning recognition algorithm, system and equipment based on multi-task convolution.
背景技术Background technique
机械臂抓取是机器人重要的基础操作,在工业零部件的分拣、装配和服务型机器人等众多领域有着广泛的应用。传统的工业机器人抓取只能针对固定的任务需求,对机器人的各个关节和连杆采用离线编程或者示教编程的方式设定好固定的控制参数,以完成固定的一系列动作。这就要求抓取目标的位置和位姿需要保持一致,否则会大大影响抓取效果。这种传统的抓取模式灵活性极差,只能完成一些简单的任务,当目标大小变化、位姿变化时需要重新设定参数,且无法面对多个抓取目标或者抓取目标位置不固定的情况,无法满足越来越高的工业化生产需求。Robot arm grasping is an important basic operation of robots, and it has a wide range of applications in many fields such as sorting, assembly and service robots of industrial parts. Traditional industrial robot grasping can only meet fixed task requirements, and set fixed control parameters for each joint and connecting rod of the robot by offline programming or teaching programming to complete a series of fixed actions. This requires that the position and pose of the grasping target need to be consistent, otherwise the grasping effect will be greatly affected. This traditional grasping mode is extremely inflexible and can only complete some simple tasks. When the size and pose of the target change, the parameters need to be reset, and it cannot face multiple grasping targets or the position of the grasping target is different. The fixed situation cannot meet the increasing demand for industrial production.
在基于图像处理的工业机器人实际抓取任务中,经常不止要求能通过图像处理单元对待抓取物进行有效定位,而且需要对于待抓取物的物品类别进行识别分类,以完成对不同产品的分类搬运、筛选、统计。In the actual grasping tasks of industrial robots based on image processing, it is often not only required to be able to effectively locate the object to be grasped through the image processing unit, but also to identify and classify the category of the object to be grasped to complete the classification of different products Handling, screening, statistics.
YOLO检测网络本身被大范围的应用到物品分类任务中,将待抓取物的图像信息分别输入到两个网络中,便可以同时得到物品抓取框和物品的分类结果,但无疑这种方式都会增加计算机的计算和存储负担,大大增加完成任务耗时,对于工业机器人抓取任务的实时性要求和工作效率都会产生巨大影响。The YOLO detection network itself is widely used in the task of item classification. The image information of the object to be captured is input into the two networks respectively, and the item capture frame and the classification result of the item can be obtained at the same time, but there is no doubt that this method It will increase the computing and storage burden of the computer, greatly increase the time-consuming to complete the task, and have a huge impact on the real-time requirements and work efficiency of the industrial robot's grasping task.
技术解决方案technical solution
为解决现有技术中的不足,本发明提供一种基于多任务卷积的物体抓取定位识别算法、系统和机器人,在保证稳定性和实时性的基础上在一个模型内同时完成抓取框检测和抓取物识别分类检测。In order to solve the deficiencies in the prior art, the present invention provides an object grasping and positioning recognition algorithm, system and robot based on multi-task convolution. On the basis of ensuring stability and real-time performance, the grasping frame can be completed simultaneously in one model Detection and Grab Identification Classification Detection.
本发明为实现上述目的,通过以下技术方案实现:根据本发明的一个方面,提供一种基于多任务卷积的物体抓取定位识别算法,包括以下步骤:获取视场范围内物体的图像与位置信息,并向用户输出所述获取到的图像信息;将获取到的图像信息输入到Darknet-53网络中,提取共享层的多尺度特征信息;将提取到的多尺度特征信息映射到抓取物识别检测Multi-task 分支独享层和抓取物定位ED-YOLO分支独享层;两个分支独享层根据其损失函数独立计算损失值,并根据损失值利用反向传播算法分别对网络进行反向传播迭代所在分支的独享参数和共享参数;对共享层多尺度特征信息、抓取物识别检测Multi-task 分支独享层以及抓取物定位ED-YOLO分支独享层交替训练;采用非极大抑制处理,得到最终的物体图像与位置信息。In order to achieve the above object, the present invention is realized through the following technical solutions: According to one aspect of the present invention, an object grasping and positioning recognition algorithm based on multi-task convolution is provided, which includes the following steps: acquiring the image and position of the object within the field of view information, and output the obtained image information to the user; input the obtained image information into the Darknet-53 network, extract the multi-scale feature information of the shared layer; map the extracted multi-scale feature information to the grasped object Recognition and detection Multi-task branch exclusive layer and grasping object positioning ED-YOLO branch exclusive layer; the two branch exclusive layers independently calculate the loss value according to their loss function, and use the back propagation algorithm to perform network analysis according to the loss value Exclusive parameters and shared parameters of the branch where the backpropagation iteration is located; Multi-task for multi-scale feature information of the shared layer, recognition and detection of grasping objects The branch exclusive layer and the grasping object positioning ED-YOLO branch exclusive layer are trained alternately; the final object image and position information are obtained by non-maximum suppression processing.
进一步的,所述交替训练包括前期的共享层和两个独享层全参数更新训练、以及后期的独享层参数更新训练。Further, the alternate training includes full parameter update training of the shared layer and two exclusive layers in the early stage, and parameter update training of the exclusive layer in the later stage.
进一步的,所述前期的共享层和两个独享层全参数更新训练为:首先,固定ED-YOLO支路的参数,训练Multi-task独享层参数,计算Multi-task预测损失,配合学习衰减策略更新共享层与Multi-task独享层的网络参数;然后,固定Multi-task独享层参数,训练ED-YOLO独享层网络参数,计算ED-YOLO预测损失,配合学习衰减策略更新共享成与ED-YOLO独享层的网络参数。Further, the update training of the full parameters of the shared layer and the two exclusive layers in the previous period is as follows: First, the parameters of the ED-YOLO branch are fixed, the parameters of the Multi-task exclusive layer are trained, the Multi-task prediction loss is calculated, and learning The attenuation strategy updates the network parameters of the shared layer and the Multi-task exclusive layer; then, fixes the parameters of the Multi-task exclusive layer, trains the network parameters of the ED-YOLO exclusive layer, calculates the ED-YOLO prediction loss, and updates the shared layer with the learning attenuation strategy into the network parameters of the exclusive layer of ED-YOLO.
进一步的,所述后期的独享层参数更新训练为:首先,固定Multi-task独享层参数,对ED-YOLO独享层进行参数训练,计算ED-YOLO预测损失,配合学习衰减策略更新ED-YOLO独享层网络参数;然后,固定ED-YOLO独享层参数,训练Multi-task独享层参数,计算Multi-task预测损失,配合学习衰减策略更新Multi-task预测损失独享层网络参数。Further, the update training of the exclusive layer parameters in the later stage is as follows: First, fix the Multi-task exclusive layer parameters, perform parameter training on the ED-YOLO exclusive layer, calculate the ED-YOLO prediction loss, and update the ED with the learning decay strategy -YOLO exclusive layer network parameters; then, fix the ED-YOLO exclusive layer parameters, train the Multi-task exclusive layer parameters, calculate the Multi-task prediction loss, and update the Multi-task prediction loss exclusive layer network parameters with the learning attenuation strategy .
进一步的,最后一次ED-YOLO独享层更新完成后,Multi-task独享层不再更新,对训练后的Multi-task独享层进行参数微调。Further, after the last ED-YOLO exclusive layer update is completed, the Multi-task exclusive layer is no longer updated, and the parameters of the trained Multi-task exclusive layer are fine-tuned.
进一步的,训练优化器采用Adam,初始学习率为1e-3,学习率衰减策略为余弦退火。Further, the training optimizer uses Adam, the initial learning rate is 1e-3, and the learning rate decay strategy is cosine annealing.
根据本发明的另一个方面,提供一种基于多任务卷积的物体抓取定位识别系统,其特征在于,包括:图像获取输出模块,配置用于获取视场范围内物体的图像与位置信息,并向用户输出所述获取到的图像信息;图像输入模块,配置用于将获取到的图像信息输入到Darknet-53网络中,提取共享层的多尺度特征信息;信息映射模块,配置用于将提取到的多尺度特征信息映射到抓取物识别检测Multi-task 分支独享层和抓取物定位ED-YOLO分支独享层;参数更新模块,配置用于两个分支独享层根据其损失函数独立计算损失值,并根据损失值利用反向传播算法分别对网络进行反向传播迭代所在分支的独享参数和共享参数;训练模块,配置用于对共享层多尺度特征信息、抓取物识别检测Multi-task 分支独享层以及抓取物定位ED-YOLO分支独享层交替训练;结果获取模块,配置用于采用非极大抑制处理,得到最终的物体图像与位置信息。According to another aspect of the present invention, an object grasping and positioning recognition system based on multi-task convolution is provided, which is characterized in that it includes: an image acquisition and output module configured to acquire images and position information of objects within the field of view, And output the obtained image information to the user; the image input module is configured to input the obtained image information into the Darknet-53 network, and extracts the multi-scale feature information of the shared layer; the information mapping module is configured to use The extracted multi-scale feature information is mapped to the grasping object recognition and detection Multi-task Branch-exclusive layer and grasping object positioning ED-YOLO branch-exclusive layer; parameter update module, configured for the two branch-exclusive layers to independently calculate loss values according to their loss functions, and use back propagation algorithm to The exclusive parameters and shared parameters of the branch where the network performs backpropagation iterations; the training module is configured for multi-task recognition and detection of shared layer multi-scale feature information and grasping objects The branch exclusive layer and the grasping object positioning ED-YOLO branch exclusive layer are alternately trained; the result acquisition module is configured to use non-maximum suppression processing to obtain the final object image and position information.
进一步的,训练模块包括前期的共享层和两个独享层全参数更新训练、以及后期的独享层参数更新训练。Further, the training module includes full parameter update training of the shared layer and two exclusive layers in the early stage, and parameter update training of the exclusive layer in the later stage.
根据本发明的另一个方面,提供了一种设备,包括:一个或多个处理器;存储器,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器执行如上任一所述的方法。According to another aspect of the present invention, there is provided a device, comprising: one or more processors; a memory for storing one or more programs, when the one or more programs are processed by the one or more When executed by a processor, the one or more processors are executed to perform any of the methods described above.
根据本发明的另一个方面,提供了一种存储有计算机程序的计算机可读存储介质,该程序被处理器执行时实现如上任一项所述的方法。According to another aspect of the present invention, a computer-readable storage medium storing a computer program is provided, and when the program is executed by a processor, the method described in any one of the above items is implemented.
有益效果Beneficial effect
对比现有技术,本发明的有益效果在于:本发明以Darknet-53网络作为中心,结合空洞卷积进行多尺度自适应特征融合,组成完整的抓取定位网络ED-YOLO网络;同样采用多尺度特征融合,组成完整的抓取物识别检测分支Multi-task,充分增大在分支中的感受野,提高分支的识别能力;将网络结构分为特征提取共享层和两个分支独享层,使用一种独享参数与共享参数交替训练的策略解决多任务网络的训练难题。Compared with the prior art, the beneficial effect of the present invention is that: the present invention takes the Darknet-53 network as the center, combines multi-scale adaptive feature fusion with atrous convolution, and forms a complete grasping and positioning network ED-YOLO network; also adopts multi-scale Feature fusion forms a complete grasp recognition detection branch Multi-task, which fully increases the receptive field in the branch and improves the recognition ability of the branch; divides the network structure into a feature extraction shared layer and two branch exclusive layers, using A strategy of alternate training of exclusive parameters and shared parameters solves the training problem of multi-task networks.
附图说明Description of drawings
附图1为本发明的网络结构示意图;附图2为多任务网络参数更新示意图;附图3为多任务联合检测流程图;附图4为多任务联合检测中的loss值变化曲线;附图5为多任务联合检测中的精度值变化曲线;附图6为本发明的设备的结构示意图。Accompanying drawing 1 is a schematic diagram of the network structure of the present invention; Accompanying drawing 2 is a schematic diagram of updating multi-task network parameters; Accompanying drawing 3 is a flow chart of multi-task joint detection; Accompanying drawing 4 is a loss value variation curve in multi-task joint detection; 5 is the accuracy value change curve in multi-task joint detection; accompanying drawing 6 is a schematic structural diagram of the device of the present invention.
具体实施方式Detailed ways
下面结合具体实施例,进一步阐述本发明。应理解,这些实施例仅用于说明本发明而不用于限制本发明的范围。此外应理解,在阅读了本发明讲授的内容之后,本领域技术人员可以对本发明作各种改动或修改,这些等价形式同样落于本申请所限定的范围。Below in conjunction with specific embodiment, further illustrate the present invention. It should be understood that these examples are only used to illustrate the present invention and are not intended to limit the scope of the present invention. In addition, it should be understood that after reading the content taught by the present invention, those skilled in the art may make various changes or modifications to the present invention, and these equivalent forms also fall within the scope defined in the present application.
抓取物的定位和识别任务可以采用相同的数据集图片,但是数据集在两个神经网络中实现的功能不同,相应的标签形式也不同,所以不能够同时训练两个网络分支的独享参数。但根据网络结构可知,特征提取网络在两个任务进行训练时均可以得到训练,其内部参数为两分支的共享参数。基于以上特点本申请实施例提供了一种基于多任务卷积的物体抓取定位识别算法、系统和设备。The same data set picture can be used for the positioning and recognition tasks of the grasping object, but the functions of the data set in the two neural networks are different, and the corresponding label forms are also different, so the exclusive parameters of the two network branches cannot be trained at the same time . However, according to the network structure, the feature extraction network can be trained when the two tasks are trained, and its internal parameters are shared parameters of the two branches. Based on the above characteristics, the embodiment of the present application provides an object grasping and positioning recognition algorithm, system and device based on multi-task convolution.
下面首先结合图1-3对本申请实施例提供的基于多任务卷积的物体抓取定位识别算法进行介绍,如图1所示主要包括Darknet-53特征提取模、Multi-task 分支、ED-YOLO分支3,其中,Darknet-53特征提取模块、Multi-task 分支组成了完整的抓取物识别检测网络;Darknet-53特征提取模块、ED-YOLO分支组成了完整的抓取定位网络,具体方法如下:获取视场范围内物体的图像与位置信息,并向用户输出所述获取到的图像信息;将获取到的图像信息输入到Darknet-53网络中,提取共享层的多尺度特征信息;将提取到的多尺度特征信息映射到抓取物识别检测Multi-task 分支独享层和抓取物定位ED-YOLO分支独享层;其中Multi-task 分支独享层使用较深层次的两层卷积特征图对物体类别的置信度、物体边框坐标及目标类别进行预测。ED-YOLO分支独享层使用全部三层卷积特征图对抓取框置信度及位置进行预测。置信度表示真实结果落在预测结果附近的概率;边框回归以及抓取框回归是指:通过回归分析的方式预测物体的边框以及物体的抓取框。回归分析是指:确定两种或两种以上变量间相互依赖的定量关系的一种分析方法,我们的发明中就是指边框、抓取框与待检测物体间的关系,分析出这种关系可以对没训练过的物体进行相应的预测。3目标分类是指:识别目标物体,判断该物体属于哪种类别。The following first introduces the multi-task convolution-based object grasping and positioning recognition algorithm provided by the embodiment of the present application in conjunction with Figures 1-3. As shown in Figure 1, it mainly includes Darknet-53 feature extraction module, Multi-task Branch, ED-YOLO branch 3, among them, Darknet-53 feature extraction module, Multi-task The branches constitute a complete grasping object recognition and detection network; the Darknet-53 feature extraction module and the ED-YOLO branch form a complete grasping and positioning network. The specific method is as follows: obtain the image and position information of the object within the field of view, and send The user outputs the acquired image information; inputs the acquired image information into the Darknet-53 network, extracts the multi-scale feature information of the shared layer; maps the extracted multi-scale feature information to the grasping object recognition and detection Multi- Exclusive layer of task branch and grabber positioning ED-YOLO branch exclusive layer; where Multi-task The branch exclusive layer uses a deeper two-layer convolutional feature map to predict the confidence of the object category, the object frame coordinates, and the target category. The exclusive layer of the ED-YOLO branch uses all three layers of convolutional feature maps to predict the confidence and position of the grasping frame. Confidence indicates the probability that the real result falls near the predicted result; border regression and grasping frame regression refer to predicting the border of the object and the grasping frame of the object through regression analysis. Regression analysis refers to: an analysis method to determine the quantitative relationship between two or more variables that depend on each other. In our invention, it refers to the relationship between the border, the grasping frame and the object to be detected. Analyzing this relationship can Make corresponding predictions for untrained objects. 3 Target classification refers to: identifying the target object and judging which category the object belongs to.
两个分支独享层根据损失函数独立计算损失值,并根据损失值利用反向传播算法分别对网络进行反向传播迭代所在分支的独享参数和共享参数;对共享层多尺度特征信息、抓取物识别检测Multi-task 分支独享层以及抓取物定位ED-YOLO分支独享层交替训练。The two branch-exclusive layers independently calculate the loss value according to the loss function, and use the back-propagation algorithm to respectively back-propagate the exclusive parameters and shared parameters of the branch where the network is iterated according to the loss value; the multi-scale feature information of the shared layer, grasp Extraction identification detection Multi-task The branch-exclusive layer and the grab object positioning ED-YOLO branch-exclusive layer are alternately trained.
如图2所示,共享参数与独享参数交替更新的训练策略需要将训练过程分成前期和后期两个阶段。As shown in Figure 2, the training strategy of alternately updating shared parameters and exclusive parameters needs to divide the training process into two stages, the early stage and the late stage.
前期为全参数更新阶段,此阶段会对网络的所有参数全部依次进行更新训练。但因为两个分支不能同时进行更新,首先要对其中一条分支的参数进行固定。在本发明中首先,固定ED-YOLO支路的参数,训练Multi-task独享层参数,计算Multi-task预测损失,配合学习衰减策略更新共享层与Multi-task独享层的网络参数;然后,固定Multi-task独享层参数,训练ED-YOLO独享层网络参数,计算ED-YOLO预测损失,配合学习衰减策略更新共享成与ED-YOLO独享层的网络参数,提升网络对于抓取框检测的能力。此阶段训练优化器采用Adam,初始学习率为1e-3,学习率衰减策略为余弦退火。The early stage is the full parameter update stage. In this stage, all parameters of the network will be updated and trained in sequence. But because the two branches cannot be updated at the same time, the parameters of one of the branches must be fixed first. In the present invention, firstly, the parameters of the ED-YOLO branch are fixed, the Multi-task exclusive layer parameters are trained, the Multi-task prediction loss is calculated, and the network parameters of the shared layer and the Multi-task exclusive layer are updated in conjunction with the learning attenuation strategy; then , fix the parameters of the Multi-task exclusive layer, train the network parameters of the ED-YOLO exclusive layer, calculate the ED-YOLO prediction loss, update the network parameters shared with the ED-YOLO exclusive layer with the learning attenuation strategy, and improve the network’s ability to capture box detection capabilities. At this stage, the training optimizer uses Adam, the initial learning rate is 1e-3, and the learning rate decay strategy is cosine annealing.
后期为独享参数更新阶段,此阶段固定共享层参数,只两个独享层的参数进行依次训练更新。与前期不同,此阶段所述后期的独享层参数更新训练为:首先,固定Multi-task独享层参数,对ED-YOLO独享层进行参数训练,计算ED-YOLO预测损失,配合学习衰减策略更新ED-YOLO独享层网络参数;然后,固定ED-YOLO独享层参数,训练Multi-task独享层参数,计算Multi-task预测损失,配合学习衰减策略更新Multi-task预测损失独享层网络参数。此阶段训练优化器采用Adam,初始学习率为1e-3,学习率衰减策略为余弦退火。The later stage is the exclusive parameter update stage. In this stage, the parameters of the shared layer are fixed, and only the parameters of the two exclusive layers are trained and updated in sequence. Different from the previous period, the update training of the exclusive layer parameters in the latter stage at this stage is as follows: First, fix the parameters of the Multi-task exclusive layer, perform parameter training on the ED-YOLO exclusive layer, calculate the ED-YOLO prediction loss, and cooperate with the learning attenuation Policy update ED-YOLO exclusive layer network parameters; then, fix ED-YOLO exclusive layer parameters, train Multi-task exclusive layer parameters, calculate Multi-task prediction loss, and update Multi-task prediction loss exclusive with learning attenuation strategy layer network parameters. At this stage, the training optimizer uses Adam, the initial learning rate is 1e-3, and the learning rate decay strategy is cosine annealing.
最后一次ED-YOLO独享层更新完成后,Multi-task独享层不再更新,对训练后的Multi-task独享层进行参数微调,以保证两条支路都能达到良好的收敛效果。其中参数微调是根据效果,手动按照经验值调整参数,每次可能加零点零几减零点零几,以达到理想效果。After the last ED-YOLO exclusive layer update is completed, the Multi-task exclusive layer is no longer updated, and the parameters of the trained Multi-task exclusive layer are fine-tuned to ensure that both branches can achieve a good convergence effect. The parameter fine-tuning is to manually adjust the parameters according to the experience value according to the effect, and it may add a few tenths and subtract a few tenths each time to achieve the desired effect.
最后,采用非极大抑制处理,得到最终的物体图像与位置信息。Finally, non-maximum suppression processing is used to obtain the final object image and position information.
为了验证多任务抓取联合检测网络的性能以及训练策略的效果,本实施例中在CentOS 7.6系统下Pytorch1.9 驱动ROCm-4.2.0平台上进行训练和实验,Docker版本为20.10.7,Python版本为3.7,电脑为华为Matebook14,数据集采用抓取检测领域常用的康奈尔数据集。In order to verify the performance of the multi-task grasping joint detection network and the effect of the training strategy, in this embodiment, the training and experiments are carried out on the ROCm-4.2.0 platform driven by Pytorch1.9 under the CentOS 7.6 system, the Docker version is 20.10.7, Python The version is 3.7, the computer is Huawei Matebook14, and the data set adopts the Cornell data set commonly used in the field of grasping detection.
抓取框检测和抓取物识别任务中随着迭代步数的变化loss值和精度的变化情况如图4所示,由图4中的loss变化曲线可知,多任务联合检测网络的loss值随着迭代次数增加均可以达到有效收敛,由于在迭代中每进行2次抓取框定位检测迭代进行1次抓取物品识别检测,抓取框检测的收敛速度明显快于抓取物分类检测的收敛速度,并且在全参数训练阶段,由于共享层需要先后针对两个任务进行训练,不可避免的在训练过程中产生了震荡。在训练后期进行独享参数更新时可以看到这种震荡得到了一定的抑制,验证了本发明的训练策略的有效性。Figure 4 shows the change of loss value and accuracy with the change of the number of iteration steps in the grab frame detection and grab object recognition tasks. From the loss change curve in Figure 4, it can be seen that the loss value of the multi-task joint detection network varies with As the number of iterations increases, effective convergence can be achieved. Since every two iterations of grabbing frame positioning detection iterations are performed for grabbing item identification and detection, the convergence speed of grabbing frame detection is significantly faster than that of grabbing object classification detection. Speed, and in the full parameter training phase, since the shared layer needs to be trained for two tasks successively, shocks inevitably occur during the training process. When the exclusive parameter is updated in the later stage of training, it can be seen that the oscillation is suppressed to a certain extent, which verifies the effectiveness of the training strategy of the present invention.
各个任务在多任务和单任务网络中的精度对比如下表1所示,综合以上实验结果,在单任务ED-YOLO网络中,对抓取框的检测精度为96.79%,在多任务网络MT-ED-YOLO中,对抓取框的检测精度为95.85%,相同任务在两个模型中的检测精度相当,均具有良好的准确性和鲁棒性,同时对于物品分类任务的精度达到93.34%,完成效果较好,能够适应多任务联合检测的任务需求。The accuracy comparison of each task in the multi-task and single-task network is shown in Table 1 below. Based on the above experimental results, in the single-task ED-YOLO network, the detection accuracy of the grasping frame is 96.79%, and in the multi-task network MT-YOLO In ED-YOLO, the detection accuracy of the grasping frame is 95.85%. The detection accuracy of the same task in the two models is equivalent, both of which have good accuracy and robustness. At the same time, the accuracy of the item classification task reaches 93.34%. The completion effect is good, and it can adapt to the task requirements of multi-task joint detection.
   [0031] 本实施例提供一种基于多任务卷积的物体抓取定位识别系统,包括:图像获取输出模块,配置用于获取视场范围内物体的图像与位置信息,并向用户输出所述获取到的图像信息;图像输入模块,配置用于将获取到的图像信息输入到Darknet-53网络中,提取共享层的多尺度特征信息;信息映射模块,配置用于将提取到的多尺度特征信息映射到抓取物识别检测Multi-task 分支独享层和抓取物定位ED-YOLO分支独享层;参数更新模块,配置用于两个分支独享层根据其损失函数独立计算损失值,并根据损失值利用反向传播算法分别对网络进行反向传播迭代所在分支的独享参数和共享参数;训练模块,配置用于对共享层多尺度特征信息、抓取物识别检测Multi-task 分支独享层以及抓取物定位ED-YOLO分支独享层交替训练;结果获取模块,配置用于采用非极大抑制处理,得到最终的物体图像与位置信息。 The present embodiment provides a kind of object grasping positioning recognition system based on multi-task convolution, comprising: image acquisition output module, configured to acquire the image and position information of object in the field of view, and output described to the user The obtained image information; the image input module is configured to input the obtained image information into the Darknet-53 network to extract the multi-scale feature information of the shared layer; the information mapping module is configured to use the extracted multi-scale feature The information is mapped to the grab object recognition detection Multi-task branch exclusive layer and the grab object positioning ED-YOLO branch exclusive layer; the parameter update module is configured for the two branch exclusive layers to independently calculate the loss value according to their loss functions, And according to the loss value, use the back-propagation algorithm to back-propagate the exclusive parameters and shared parameters of the branch where the network is iterated; the training module is configured to identify and detect the multi-task branch of the shared layer multi-scale feature information and grasping objects The exclusive layer and the grasping object positioning ED-YOLO branch alternately train the exclusive layer; the result acquisition module is configured to use non-maximum suppression processing to obtain the final object image and position information.
进一步的,训练模块包括前期的共享层和两个独享层全参数更新训练、以及后期的独享层参数更新训练,具体的所述前期的共享层和两个独享层全参数更新训练为:首先,固定ED-YOLO支路的参数,训练Multi-task独享层参数,计算Multi-task预测损失,配合学习衰减策略更新共享层与Multi-task独享层的网络参数;然后,固定Multi-task独享层参数,训练ED-YOLO独享层网络参数,计算ED-YOLO预测损失,配合学习衰减策略更新共享成与ED-YOLO独享层的网络参数。所述后期的独享层参数更新训练为:首先,固定Multi-task独享层参数,对ED-YOLO独享层进行参数训练,计算ED-YOLO预测损失,配合学习衰减策略更新ED-YOLO独享层网络参数;然后,固定ED-YOLO独享层参数,训练Multi-task独享层参数,计算Multi-task预测损失,配合学习衰减策略更新Multi-task预测损失独享层网络参数,训练优化器采用Adam,初始学习率为1e-3,学习率衰减策略为余弦退火。最后一次ED-YOLO独享层更新完成后,Multi-task独享层不再更新,对训练后的Multi-task独享层进行参数微调。Further, the training module includes full parameter update training of the shared layer and two exclusive layers in the early stage, and parameter update training of the exclusive layer in the later stage. Specifically, the full parameter update training of the shared layer and the two exclusive layers in the early stage is : First, fix the parameters of the ED-YOLO branch, train the Multi-task exclusive layer parameters, calculate the Multi-task prediction loss, and update the network parameters of the shared layer and the Multi-task exclusive layer with the learning attenuation strategy; then, fix the Multi-task exclusive layer -task exclusive layer parameters, train the ED-YOLO exclusive layer network parameters, calculate the ED-YOLO prediction loss, and update the network parameters shared with the ED-YOLO exclusive layer in conjunction with the learning attenuation strategy. The update training of the exclusive layer parameters in the later stage is as follows: First, fix the parameters of the Multi-task exclusive layer, perform parameter training on the ED-YOLO exclusive layer, calculate the ED-YOLO prediction loss, and update the ED-YOLO exclusive layer with the learning attenuation strategy. Shared layer network parameters; then, fix ED-YOLO exclusive layer parameters, train Multi-task exclusive layer parameters, calculate Multi-task prediction loss, update Multi-task prediction loss exclusive layer network parameters with learning attenuation strategy, and optimize training Adam is used as the controller, the initial learning rate is 1e-3, and the learning rate decay strategy is cosine annealing. After the last ED-YOLO exclusive layer update is completed, the Multi-task exclusive layer is no longer updated, and the parameters of the trained Multi-task exclusive layer are fine-tuned.
本实施例的一种机器人,所述机器人包括:一个或多个处理器;存储器,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器执行上述任一项所述的方法,以Darknet-53网络作为中心,结合空洞卷积进行多尺度自适应特征融合,组成完整的抓取定位网络ED-YOLO网络;同样采用多尺度特征融合,组成完整的抓取物识别检测分支Multi-task,充分增大在分支中的感受野,提高分支的识别能力;将网络结构分为特征提取共享层和两个分支独享层,使用一种独享参数与共享参数交替训练的策略解决多任务网络的训练难题。A robot in this embodiment, the robot includes: one or more processors; a memory for storing one or more programs, when the one or more programs are executed by the one or more processors , so that the one or more processors execute the method described in any one of the above, with the Darknet-53 network as the center, combined with atrous convolution for multi-scale adaptive feature fusion, to form a complete grasping and positioning network ED-YOLO Network; also adopts multi-scale feature fusion to form a complete grab object recognition detection branch Multi-task, which fully increases the receptive field in the branch and improves the recognition ability of the branch; divides the network structure into a feature extraction shared layer and two The branch exclusive layer uses a strategy of alternating training of exclusive parameters and shared parameters to solve the training problem of multi-task networks.
本实施例的一种存储有计算机程序的计算机可读存储介质,该程序被处理器执行时实现上述任一项所述的方法,储存有基于多任务卷积的物体抓取定位识别算法,将网络结构分为特征提取共享层和两个分支独享层,使用一种独享参数与共享参数交替训练的策略解决多任务网络的训练难题。A computer-readable storage medium storing a computer program in this embodiment, when the program is executed by a processor, implements the method described in any one of the above, stores an object grasping and positioning recognition algorithm based on multi-task convolution, and The network structure is divided into a feature extraction shared layer and two branch exclusive layers. A strategy of alternating training of exclusive parameters and shared parameters is used to solve the training problem of multi-task networks.
进一步介绍如下:计算机系统包括中央处理单元(CPU)101,其可以根据存储在只读存储器(ROM)102中的程序或者从存储部分加载到随机访问存储器(RAM)103中的程序而执行各种适当的动作和处理。在RAM103 中,还存储有系统操作所需的各种程序和数据。CPU 101、ROM 102以及RAM 103通过总线104彼此相连。输入/输出(I/O)接口105也连接至总线104。Further introduction is as follows: the computer system includes a central processing unit (CPU) 101, which can execute various Appropriate action and handling. in RAM103 In it, various programs and data required for system operation are also stored. The CPU 101 , ROM 102 , and RAM 103 are connected to each other via a bus 104 . An input/output (I/O) interface 105 is also connected to the bus 104 .
以下部件连接至I/O接口105:包括键盘、鼠标等的输入部分106;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分;包括硬盘等的存储部分108;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分109。通信部分109经由诸如因特网的网络执行通信处理。驱动器也根据需要连接至I/O接口105。可拆卸介质511,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器510上,以便于从其上读出的计算机程序根据需要被安装入存储部分108。The following components are connected to the I/O interface 105: an input section 106 including a keyboard, a mouse, etc.; an output section including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker; a storage section 108 including a hard disk, etc.; And a communication section 109 including a network interface card such as a LAN card, a modem, and the like. The communication section 109 performs communication processing via a network such as the Internet. Drives are also connected to the I/O interface 105 as needed. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc. is mounted on the drive 510 as necessary so that a computer program read therefrom is installed into the storage section 108 as necessary.
特别地,根据本发明的实施例,上文参考流程图3描述的过程可以被实现为计算机软件程序。例如,本发明的实施例1包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分从网络上被下载和安装,和/或从可拆卸介质被安装。在该计算机程序被中央处理单元(CPU)101执行时,执行本申请的系统中限定的上述功能。In particular, according to an embodiment of the present invention, the process described above with reference to the flowchart 3 may be implemented as a computer software program. For example, Embodiment 1 of the present invention includes a computer program product, which includes a computer program carried on a computer-readable medium, where the computer program includes program code for executing the method shown in the flowchart. In such an embodiment, the computer program can be downloaded and installed from a network via the communication part, and/or installed from a removable medium. When this computer program is executed by a central processing unit (CPU) 101, the above-mentioned functions defined in the system of the present application are performed.
需要说明的是,本发明所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本发明中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本发明中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。It should be noted that the computer-readable medium shown in the present invention may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two. A computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the present invention, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present invention, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, in which computer-readable program codes are carried. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device. . Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
附图中的框图6,图示了按照本发明各种实施例1的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。Block diagram 6 in the accompanying drawings illustrates the architecture, functions and operations of a possible implementation of the system, method and computer program product according to various embodiments 1 of the present invention. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or portion of code that includes one or more logical functions for implementing specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block in the block diagrams or flowchart illustrations, and combinations of blocks in the block diagrams or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified function or operation, or can be implemented by a A combination of dedicated hardware and computer instructions.
描述于本发明实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现,所描述的单元也可以设置在处理器中。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定。所描述的单元或模块也可以设置在处理器中,例如,可以描述为:一种基于多任务卷积的物体抓取定位识别系统,包括:图像获取输出模块、图像输入模块、信息映射模块、参数更新模块、训练模块、结果获取模块,其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,图像获取输出模块还可以被描述为“视场范围内物体的图像与位置信息获取输出模块”。The units described in the embodiments of the present invention may be implemented by software or by hardware, and the described units may also be set in a processor. Wherein, the names of these units do not constitute a limitation of the unit itself under certain circumstances. The described unit or module can also be set in the processor. For example, it can be described as: a multi-task convolution-based object grasping and positioning recognition system, including: image acquisition and output module, image input module, information mapping module, Parameter update module, training module, and result acquisition module, wherein the names of these units do not constitute a limitation of the unit itself under certain circumstances. For example, the image acquisition and output module can also be described as "the Image and location information acquisition output module".
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本公开的实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。It should be noted that although several modules or units of the device for action execution are mentioned in the above detailed description, this division is not mandatory. Actually, according to the embodiment of the present disclosure, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of one module or unit described above can be further divided to be embodied by a plurality of modules or units.
此外,尽管在附图中以特定顺序描述了本公开中方法的各个步骤,但是,这并非要求或者暗示必须按照该特定顺序来执行这些步骤,或是必须执行全部所示的步骤才能实现期望的结果。附加的或备选地,可以省略某些步骤,将多个步骤合并为一个步骤执行,以及/或者将一个步骤分解为多个步骤执行等。In addition, although steps of the methods of the present disclosure are depicted in the drawings in a particular order, there is no requirement or implication that the steps must be performed in that particular order, or that all illustrated steps must be performed to achieve the desired result. Additionally or alternatively, certain steps may be omitted, multiple steps may be combined into one step for execution, and/or one step may be decomposed into multiple steps for execution, etc.
以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本申请中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离所述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能。The above description is only a preferred embodiment of the present application and an illustration of the applied technical principles. Those skilled in the art should understand that the scope of the invention involved in this application is not limited to the technical solution formed by the specific combination of the above-mentioned technical features, but should also cover the technical solution formed by the above-mentioned technical features without departing from the inventive concept. Other technical solutions formed by any combination of or equivalent features thereof. For example, the features described above have similar functions to those disclosed (but not limited to) in this application.
在本说明书的描述中,参考术语“一个实施例”、“示例”、“具体示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of this specification, descriptions referring to the terms "one embodiment", "example", "specific example" and the like mean that specific features, structures, materials or characteristics described in connection with the embodiment or example are included in at least one embodiment of the present invention. In an embodiment or example. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
以上公开的本发明优选实施例只是用于帮助阐述本发明。优选实施例并没有详尽叙述所有的细节,也不限制该发明仅为所述的具体实施方式。显然,根据本说明书的内容,可作很多的修改和变化。本说明书选取并具体描述这些实施例,是为了更好地解释本发明的原理和实际应用,从而使所属技术领域技术人员能很好地理解和利用本发明。本发明仅受权利要求书及其全部范围和等效物的限制。The preferred embodiments of the invention disclosed above are only to help illustrate the invention. The preferred embodiments are not exhaustive in all detail, nor are the inventions limited to specific embodiments described. Obviously, many modifications and variations can be made based on the contents of this specification. This description selects and specifically describes these embodiments in order to better explain the principle and practical application of the present invention, so that those skilled in the art can well understand and utilize the present invention. The invention is to be limited only by the claims, along with their full scope and equivalents.

Claims (10)

  1. 一种基于多任务卷积的物体抓取定位识别算法,其特征在于,包括以下步骤:获取视场范围内物体的图像与位置信息,并向用户输出所述获取到的图像信息;将获取到的图像信息输入到Darknet-53网络中,提取共享层的多尺度特征信息;将提取到的多尺度特征信息映射到抓取物识别检测Multi-task 分支独享层和抓取物定位ED-YOLO分支独享层;两个分支独享层根据其损失函数独立计算损失值,并根据损失值利用反向传播算法分别对网络进行反向传播迭代所在分支的独享参数和共享参数;对共享层多尺度特征信息、抓取物识别检测Multi-task 分支独享层以及抓取物定位ED-YOLO分支独享层交替训练;采用非极大抑制处理,得到最终的物体图像与位置信息。An object grasping and positioning recognition algorithm based on multi-task convolution is characterized in that it includes the following steps: acquiring images and position information of objects within the field of view, and outputting the acquired image information to users; Input the image information into the Darknet-53 network to extract the multi-scale feature information of the shared layer; map the extracted multi-scale feature information to the exclusive layer of the grasp recognition detection Multi-task branch and the grasp positioning ED-YOLO Branch-only layer; the two branch-only layers independently calculate the loss value according to their loss functions, and use the back-propagation algorithm to perform back-propagation on the network according to the loss value. The exclusive parameters and shared parameters of the branch where the iteration is located; Multi-scale feature information, grasping object recognition and detection Multi-task branch exclusive layer and grasping object positioning ED-YOLO branch exclusive layer are alternately trained; non-maximum suppression processing is used to obtain the final object image and position information.
  2. 根据权利要求1所述的一种基于多任务卷积的物体抓取定位识别算法,其特征在于,所述交替训练包括前期的共享层和两个独享层全参数更新训练、以及后期的独享层参数更新训练。The multi-task convolution-based object grasping and positioning recognition algorithm according to claim 1, wherein the alternate training includes full parameter update training of the shared layer and two exclusive layers in the early stage, and the independent training in the later stage. Shared layer parameter update training.
  3. 根据权利要求2所述的一种基于多任务卷积的物体抓取定位识别算法,其特征在于,所述前期的共享层和两个独享层全参数更新训练为:首先,固定ED-YOLO支路的参数,训练Multi-task独享层参数,计算Multi-task预测损失,配合学习衰减策略更新共享层与Multi-task独享层的网络参数;然后,固定Multi-task独享层参数,训练ED-YOLO独享层网络参数,计算ED-YOLO预测损失,配合学习衰减策略更新共享成与ED-YOLO独享层的网络参数。The multi-task convolution-based object grasping and positioning recognition algorithm according to claim 2, characterized in that, the full parameter update training of the shared layer and the two exclusive layers in the early stage is as follows: firstly, fixed ED-YOLO The parameters of the branch, train the Multi-task exclusive layer parameters, calculate the Multi-task prediction loss, and update the network parameters of the shared layer and the Multi-task exclusive layer in conjunction with the learning attenuation strategy; then, fix the Multi-task exclusive layer parameters, Train the network parameters of the ED-YOLO exclusive layer, calculate the ED-YOLO prediction loss, and update the network parameters shared with the ED-YOLO exclusive layer in conjunction with the learning attenuation strategy.
  4. 根据权利要求2或3所述的一种基于多任务卷积的物体抓取定位识别算法,其特征在于,所述后期的独享层参数更新训练为:首先,固定Multi-task独享层参数,对ED-YOLO独享层进行参数训练,计算ED-YOLO预测损失,配合学习衰减策略更新ED-YOLO独享层网络参数;然后,固定ED-YOLO独享层参数,训练Multi-task独享层参数,计算Multi-task预测损失,配合学习衰减策略更新Multi-task预测损失独享层网络参数。According to claim 2 or 3, a multi-task convolution-based object grasping and positioning recognition algorithm is characterized in that, the later exclusive layer parameter update training is as follows: first, fixing the Multi-task exclusive layer parameters , perform parameter training on the ED-YOLO exclusive layer, calculate the ED-YOLO prediction loss, and update the network parameters of the ED-YOLO exclusive layer with the learning attenuation strategy; then, fix the parameters of the ED-YOLO exclusive layer, and train the Multi-task exclusive Layer parameters, calculate the Multi-task prediction loss, and update the multi-task prediction loss exclusive layer network parameters in conjunction with the learning decay strategy.
  5. 根据权利要求4所述的一种基于多任务卷积的物体抓取定位识别算法,其特征在于,最后一次ED-YOLO独享层更新完成后,Multi-task独享层不再更新,对训练后的Multi-task独享层进行参数微调。A kind of object grasping location recognition algorithm based on multi-task convolution according to claim 4, it is characterized in that, after the last ED-YOLO exclusive layer update is completed, the Multi-task exclusive layer is no longer updated, for training The subsequent Multi-task exclusive layer performs parameter fine-tuning.
  6. 根据权利要求4所述的一种基于多任务卷积的物体抓取定位识别算法,其特征在于,训练优化器采用Adam,初始学习率为1e-3,学习率衰减策略为余弦退火。The multi-task convolution-based object grasping and positioning recognition algorithm according to claim 4, wherein the training optimizer adopts Adam, the initial learning rate is 1e-3, and the learning rate decay strategy is cosine annealing.
  7. 一种基于多任务卷积的物体抓取定位识别系统,其特征在于,包括:图像获取输出模块,配置用于获取视场范围内物体的图像与位置信息,并向用户输出所述获取到的图像信息;图像输入模块,配置用于将获取到的图像信息输入到Darknet-53网络中,提取共享层的多尺度特征信息;信息映射模块,配置用于将提取到的多尺度特征信息映射到抓取物识别检测Multi-task 分支独享层和抓取物定位ED-YOLO分支独享层;参数更新模块,配置用于两个分支独享层根据其损失函数独立计算损失值,并根据损失值利用反向传播算法分别对网络进行反向传播迭代所在分支的独享参数和共享参数;训练模块,配置用于对共享层多尺度特征信息、抓取物识别检测Multi-task 分支独享层以及抓取物定位ED-YOLO分支独享层交替训练; 结果获取模块,配置用于采用非极大抑制处理,得到最终的物体图像与位置信息。An object grasping and positioning recognition system based on multi-task convolution, characterized in that it includes: an image acquisition and output module configured to acquire images and position information of objects within the field of view, and output the acquired information to the user Image information; the image input module is configured to input the obtained image information into the Darknet-53 network, and extracts the multi-scale feature information of the shared layer; the information mapping module is configured to map the extracted multi-scale feature information to Grasper recognition detection Multi-task branch exclusive layer and grab object positioning ED-YOLO branch exclusive layer; parameter update module, configured for the two branch exclusive layers to independently calculate the loss value according to their loss function, and according to the loss The values use the backpropagation algorithm to backpropagate the exclusive parameters and shared parameters of the branch where the network is iterated; the training module is configured for the multi-scale feature information of the shared layer and the recognition and detection of the grasping object. The exclusive layer of the Multi-task branch And grab object positioning ED-YOLO branch exclusive layer alternate training; The result acquisition module is configured to use non-maximum suppression processing to obtain the final object image and position information.
  8. 根据权利要求7所述的一种基于多任务卷积的物体抓取定位识别系统,其特征在于,训练模块包括前期的共享层和两个独享层全参数更新训练、以及后期的独享层参数更新训练。A multi-task convolution-based object grasping and positioning recognition system according to claim 7, characterized in that the training module includes a shared layer in the early stage and full parameter update training of two exclusive layers, and an exclusive layer in the later stage Parameter update training.
  9. 一种机器人,包括:一个或多个处理器;存储器,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器执行如权利要求1-6任一所述的方法。A robot comprising: one or more processors; a memory for storing one or more programs that, when executed by the one or more processors, cause the one or more A processor executes the method according to any one of claims 1-6.
  10. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1-6任一项所述的方法。A computer-readable storage medium, the computer-readable storage medium storing a computer program, characterized in that, when the computer program is executed by a processor, the method according to any one of claims 1-6 is implemented.
PCT/CN2022/131276 2022-05-09 2022-11-11 Multi-task convolution-based object grasping and positioning identification algorithm and system, and robot WO2023165161A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210499055.6 2022-05-09
CN202210499055.6A CN114764831A (en) 2022-05-09 2022-05-09 Object grabbing, positioning and identifying algorithm and system based on multitask convolution and robot

Publications (1)

Publication Number Publication Date
WO2023165161A1 true WO2023165161A1 (en) 2023-09-07

Family

ID=82364822

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/131276 WO2023165161A1 (en) 2022-05-09 2022-11-11 Multi-task convolution-based object grasping and positioning identification algorithm and system, and robot

Country Status (2)

Country Link
CN (1) CN114764831A (en)
WO (1) WO2023165161A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114764831A (en) * 2022-05-09 2022-07-19 青岛理工大学 Object grabbing, positioning and identifying algorithm and system based on multitask convolution and robot

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190205643A1 (en) * 2017-12-29 2019-07-04 RetailNext, Inc. Simultaneous Object Localization And Attribute Classification Using Multitask Deep Neural Networks
CN110222604A (en) * 2019-05-23 2019-09-10 复钧智能科技(苏州)有限公司 Target identification method and device based on shared convolutional neural networks
CN111080693A (en) * 2019-11-22 2020-04-28 天津大学 Robot autonomous classification grabbing method based on YOLOv3
CN111862067A (en) * 2020-07-28 2020-10-30 中山佳维电子有限公司 Welding defect detection method and device, electronic equipment and storage medium
US20210012146A1 (en) * 2019-07-12 2021-01-14 Wuyi University Method and apparatus for multi-scale sar image recognition based on attention mechanism
CN114764831A (en) * 2022-05-09 2022-07-19 青岛理工大学 Object grabbing, positioning and identifying algorithm and system based on multitask convolution and robot

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108280488B (en) * 2018-02-09 2021-05-07 哈尔滨工业大学 Grippable object identification method based on shared neural network
CN109523532B (en) * 2018-11-13 2022-05-03 腾讯医疗健康(深圳)有限公司 Image processing method, image processing device, computer readable medium and electronic equipment
CN112949452B (en) * 2021-02-25 2022-05-31 山西大学 Robot low-light environment grabbing detection method based on multitask shared network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190205643A1 (en) * 2017-12-29 2019-07-04 RetailNext, Inc. Simultaneous Object Localization And Attribute Classification Using Multitask Deep Neural Networks
CN110222604A (en) * 2019-05-23 2019-09-10 复钧智能科技(苏州)有限公司 Target identification method and device based on shared convolutional neural networks
US20210012146A1 (en) * 2019-07-12 2021-01-14 Wuyi University Method and apparatus for multi-scale sar image recognition based on attention mechanism
CN111080693A (en) * 2019-11-22 2020-04-28 天津大学 Robot autonomous classification grabbing method based on YOLOv3
CN111862067A (en) * 2020-07-28 2020-10-30 中山佳维电子有限公司 Welding defect detection method and device, electronic equipment and storage medium
CN114764831A (en) * 2022-05-09 2022-07-19 青岛理工大学 Object grabbing, positioning and identifying algorithm and system based on multitask convolution and robot

Also Published As

Publication number Publication date
CN114764831A (en) 2022-07-19

Similar Documents

Publication Publication Date Title
Li et al. Few-shot cotton pest recognition and terminal realization
CN107403141B (en) Face detection method and device, computer readable storage medium and equipment
WO2020199931A1 (en) Face key point detection method and apparatus, and storage medium and electronic device
US11341626B2 (en) Method and apparatus for outputting information
CN110135503B (en) Deep learning identification method for parts of assembly robot
JP6596164B2 (en) Unsupervised matching in fine-grained datasets for single view object reconstruction
CN107705322A (en) Motion estimate tracking and system
Shao et al. Suction grasp region prediction using self-supervised learning for object picking in dense clutter
WO2022068623A1 (en) Model training method and related device
WO2021218470A1 (en) Neural network optimization method and device
WO2023165161A1 (en) Multi-task convolution-based object grasping and positioning identification algorithm and system, and robot
CN111931654A (en) Intelligent monitoring method, system and device for personnel tracking
CN111091101A (en) High-precision pedestrian detection method, system and device based on one-step method
CN113065633A (en) Model training method and associated equipment
CN111310827A (en) Target area detection method based on double-stage convolution model
CN111428567B (en) Pedestrian tracking system and method based on affine multitask regression
CN115546099A (en) Forging flaw detection defect detection method and device based on convolutional neural network
CN117292266B (en) Method and device for detecting concrete cracks of main canal of irrigation area and storage medium
Amritraj et al. An Automated and Fine-Tuned Image Detection and Classification System for Plant Leaf Diseases
Mahendar et al. Optimal Spatial Attention Network based Convolutional Neural Network for Facial Emotion Recognition
CN117218606B (en) Escape door detection method and device, storage medium and electronic equipment
CN114881240B (en) Robot vision teaching learning model and method based on multi-attention mechanism
WO2023035263A1 (en) Method and device for determining image signal processing parameters, and perception system
Wang et al. Citrus Yellow Shoot Disease Detection based on YOLOV5
Zhao et al. Real-time object detection and robotic manipulation for agriculture using a YOLO-based learning approach

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22929589

Country of ref document: EP

Kind code of ref document: A1