CN114419051A - Method and system for adapting to multi-task scene containing pixel-level segmentation - Google Patents

Method and system for adapting to multi-task scene containing pixel-level segmentation Download PDF

Info

Publication number
CN114419051A
CN114419051A CN202111492470.0A CN202111492470A CN114419051A CN 114419051 A CN114419051 A CN 114419051A CN 202111492470 A CN202111492470 A CN 202111492470A CN 114419051 A CN114419051 A CN 114419051A
Authority
CN
China
Prior art keywords
layer
feature
scale
convolution
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111492470.0A
Other languages
Chinese (zh)
Other versions
CN114419051B (en
Inventor
陈浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xi'an Deep Computing Information Technology Co ltd
Xidian University
Original Assignee
Xi'an Deep Computing Information Technology Co ltd
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xi'an Deep Computing Information Technology Co ltd, Xidian University filed Critical Xi'an Deep Computing Information Technology Co ltd
Priority to CN202111492470.0A priority Critical patent/CN114419051B/en
Publication of CN114419051A publication Critical patent/CN114419051A/en
Application granted granted Critical
Publication of CN114419051B publication Critical patent/CN114419051B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of computer vision algorithms, and discloses a method and a system for adapting to a multitask scene containing pixel-level segmentation.A sampling layer is added behind a backbone network and is superposed with a feature layer of the previous layer in a numerical value manner; obtaining a feature map of the original image size after several times of upsampling and numerical value superposition, and then adding a clone layer with the same structure as the original backbone network; the visual task branch performs parameter calculation from the feature map of the original image size. The invention provides the convolutional neural network structure of the feature map of the original image size, which is convenient for a subsequent computer vision task subjected to pixel-level segmentation without designing an additional operator, achieves the aim of improving the feasibility of landing the algorithm of the computer vision task containing the pixel-level segmentation, reduces the difficulty of algorithm design, does not need to design an operator separately to be compatible with different computer vision tasks, and therefore reduces the requirements of the algorithm on an edge end when the algorithm is applied to a multi-task scene such as the edge end, a server end and the like.

Description

Method and system for adapting to multi-task scene containing pixel-level segmentation
Technical Field
The invention belongs to the technical field of computer vision algorithms, and particularly relates to a method and a system for adapting to a multi-task scene containing pixel-level segmentation.
Background
At present, a current mainstream convolutional neural network is generally waterfall type when being designed, and under a fixed input, multi-level output is obtained by performing omnibearing multi-scale convolution operation on an input image. With the increase of the number of layers, the scale of the feature map is continuously reduced, and when the feature scale is small to a certain extent, the feature map can be extracted for visual task processing. Common visual tasks include an image classification task, a target detection task, a key point detection task, a target pixel level segmentation task and the like.
When an image classification task and a target detection task are performed, targets can be classified and detected through small-scale feature maps, but for the key point detection task and the target pixel level segmentation task, information contained in the small-scale feature maps needs to be subjected to some special processing, and useful information can be obtained.
Taking the example of target pixel level segmentation task segmentation as an example, the current mainstream network design methods include the two-stage method Mask-RCNN, the one-stage method Yolact, Solo, and the like. The two-stage method represented by Mask-RCNN is simple in network, no extra design is carried out on the main network and the task network in structure, however, at the joint of the main network and the task network, because the scale of an output characteristic diagram is small, an extra processing algorithm roiign needs to be designed to prevent the situations of pixel deviation and the like, and the method is not beneficial to the transplantation of the network to an embedded platform. The one-stage method represented by Yolact and Solo is specially designed for a task network, introduces a large number of parameters and algorithm processes, and cannot be applied to new visual tasks, such as key point regression, semantic segmentation and other tasks.
The algorithm adopting the traditional two-stage and one-stage methods can have more performance problems in the process of transplanting the actual product platform. Meanwhile, the pure waterfall network design cannot meet all visual tasks, and the complexity is increased in the transplanting process of some visual task algorithms. Therefore, it is desirable to design a new method and system for adapting to a multitask scenario with pixel-level segmentation.
Through the above analysis, the problems and defects of the prior art are as follows:
(1) when an image classification task is carried out, the existing methods are not calculation modules commonly used by a convolutional neural network, and a plurality of performance problems can be encountered in the process of transplanting an actual product platform by adopting the methods.
(2) The pure waterfall type network design cannot meet all visual tasks, and the complexity is increased in the transplanting process of some visual task algorithms.
The difficulty and significance for solving the problems and defects are as follows: the common idea of using a main network is abandoned, and the provided brand-new pure convolutional neural network structure solves the problems in the prior art. The significance is that the convolutional neural network algorithm designed based on the neural network structure can adapt to various computer vision tasks, the requirements on the edge end are reduced when the convolutional neural network algorithm is applied to various task scenes such as the edge end, a server end and the like, only basic convolution and up-sampling operators need to be supported, and the applicability of the convolutional neural network algorithm is greatly enhanced.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a method and a system for adapting to a multitask scene containing pixel-level segmentation.
The invention is realized in such a way that a method for adapting a multitask scene containing pixel-level segmentation comprises the following steps: obtaining a feature map with the same size as the feature map of the previous layer through one convolution and up-sampling operation on the feature layer, wherein the feature map with the same size can be subjected to numerical value addition operation; obtaining feature information of different scales through add operation on the feature layer; performing feature information fusion by performing convolution operation on the feature layer twice; performing two-time up-sampling operation on the characteristic layer, and performing linear interpolation on the characteristic layer to obtain a characteristic diagram of the original image size required by pixel level segmentation; and adding a basic backbone network behind the feature layer for subsequent feature extraction of the detection network.
The feature extraction part in the computer task can obtain a small-scale feature map containing target feature information, and can also obtain an original image scale feature map to meet the pixel-level segmentation task. Compared with the prior scheme, when the pixel level segmentation task is realized, an operator is required to be designed separately to realize pixel alignment in order to avoid pixel offset between a prediction result and a picture, when the algorithm is applied to an edge end platform and a mobile end platform, due to the specificity of a chip, the feasibility of algorithm transplantation can be reduced due to the operator, and basic operation is used for replacing the operator, so that the difficulty of applying the algorithm to various edge ends and mobile ends is reduced.
Further, the method for adapting to a multitask scene containing pixel level segmentation comprises the following steps:
step one, a backbone network outputs multi-scale feature layers LAY1, LAY2 and LAY 3; adding a convolution layer and an up-sampling layer behind LAY3, and outputting a feature layer LAY1 by enlarging the scale by one time after the feature layer LAY3 is subjected to convolution and up-sampling; the feature layer LAY2 (LAY 1+ LAY 2) is obtained by carrying out numerical value superposition on the feature layer and the previous layer, and detail information in a shallow feature map and semantic information in a deep feature map can be obtained simultaneously; adding two convolution layers behind the feature layer lay2 to output a feature layer lay 3;
step two, adding a convolution layer and an up-sampling layer behind the characteristic layer lay3, and obtaining a characteristic layer lay4 by amplifying the scale of the characteristic layer lay3 by one time after convolution and up-sampling; the feature layer LAY5 (LAY 4+ LAY 1) is obtained by carrying out numerical value superposition on the feature layer and the previous layer, and detail information in a shallow feature map and semantic information in a deep feature map can be obtained simultaneously; adding two convolution layers behind the feature layer lay5 to output a feature layer lay 6;
thirdly, adding an upper sampling layer behind the feature layer lay6, and carrying out up-sampling on the feature layer lay6 and then carrying out scale amplification by one time to obtain a feature layer lay 7;
adding an upper sampling layer behind the feature layer lay7, performing up-sampling on the feature layer lay7, performing scale amplification by one time to obtain a feature layer lay8 of the original image size, and then adding a clone layer with the same structure as the original main network; the characteristic information required by computer vision tasks such as pixel-level segmentation and target detection can be obtained simultaneously.
Fifthly, calculating parameters of the visual task branch from a feature map of the original image size; forming a feature map of the dimension of the original image suitable for a variety of computer vision tasks including pixel-level segmentation;
further, the method for adapting to a multitask scene containing pixel level segmentation further comprises the following steps:
(1) the characteristic layer LAY3 of the deepest layer obtained in the basic backbone network has the size of H3W 3; adding a feature layer lay1 output after the convolution operation and the upsampling operation, wherein the scale of the feature layer lay1 is H2W 2, H3 is 2H 2, and W3 is 2W 2;
(2) adding add layers after the feature map LAY1 obtained in step (1), and performing numerical summation with the feature layer LAY2 of the previous layer to obtain a feature layer LAY2 ═ LAY1+ LAY2, which is used for obtaining feature information of different scales, wherein the scale of LAY1 is H1 × W1, the scale of LAY2 is H2 ═ W2, the scale of H1 ═ H2, and the scale of W1 is W2;
(3) after the feature map lay2 is obtained in the step (2), adding two times of convolution output feature layer lay3 for feature information fusion, wherein the convolution operation does not change the scale of the feature layer and only changes the number of channels of the feature layer;
(4) the feature layer lay3 obtained in step (3) has a dimension h3 × w 3; adding a feature layer lay4 output after the convolution operation and the upsampling operation, wherein the scale of the feature layer lay4 is h4 w4, h4 is 2 h3, and w4 is 2 w 3;
(5) adding add layers after the feature map LAY4 obtained in the step (4), and performing numerical summation with the feature layer LAY1 of the previous layer to obtain a feature layer LAY5 ═ LAY4+ LAY1, which is used for obtaining feature information of different scales, wherein the scale of LAY4 is H4 × W4, the scale of LAY1 is H1 ═ W1, the scale of H4 ═ H1, and the scale of W4 is W1;
(6) after the feature map lay5 is obtained in the step (5), adding two times of convolution output feature layer lay6 for feature information fusion, wherein the convolution operation does not change the scale of the feature layer and only changes the number of channels of the feature layer;
(7) the feature map lay6 obtained in step (6) and having a dimension h6 × w 6; adding a feature layer lay7 output after one up-sampling operation, wherein the dimension of the feature layer lay7 is h7 w7, h7 is 2 h6, and w7 is 2 w 6;
(8) the feature map lay7 obtained in step (7) and having dimensions h7 × w 7; adding a feature layer lay8 output after one up-sampling operation, wherein the dimension of the feature layer lay8 is h8 w8, and the feature layer is the size of the original image, wherein h8 is 2 h7, and w8 is 2 w 7;
(9) and (5) adding a light-weight basic backbone network for secondary extraction of the feature information after the feature map lay8 obtained in the step (8).
Another object of the present invention is to provide a system for adapting a multitask scene containing pixel-level segmentation, which applies the method for adapting a multitask scene containing pixel-level segmentation, wherein the system for adapting a multitask scene containing pixel-level segmentation comprises:
the numerical value superposition module is used for carrying out numerical value superposition with the previous characteristic layer by adding an upper sampling layer on the backbone network;
the clone layer construction module is used for forming a clone layer with the same structure as the original backbone network through a plurality of superposed layers;
and the parameter calculation module is used for performing parameter calculation from the clone layer through visual task branches.
It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:
obtaining a feature map with the same size as the feature map of the previous layer through one convolution and up-sampling operation of the feature layer; obtaining feature information of different scales through add operation on the feature layer; performing feature information fusion by performing convolution operation on the feature layer twice; performing two-time up-sampling operation on the characteristic layer, and performing linear interpolation on the characteristic layer to obtain a characteristic diagram of the original image size required by pixel level segmentation; and adding a basic backbone network behind the feature layer for subsequent feature extraction of the detection network.
It is another object of the present invention to provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
obtaining a feature map with the same size as the feature map of the previous layer through one convolution and up-sampling operation of the feature layer; obtaining feature information of different scales through add operation on the feature layer; performing feature information fusion by performing convolution operation on the feature layer twice; performing two-time up-sampling operation on the characteristic layer, and performing linear interpolation on the characteristic layer to obtain a characteristic diagram of the original image size required by pixel level segmentation; and adding a basic backbone network behind the feature layer for subsequent feature extraction of the detection network.
It is another object of the present invention to provide a computer program product stored on a computer readable medium, comprising a computer readable program for providing a user input interface for applying said system for adapting a multitasking scenario involving pixel level segmentation, when executed on an electronic device.
It is another object of the present invention to provide a computer readable storage medium storing instructions that, when executed on a computer, cause the computer to apply the system for accommodating multitask scenarios involving pixel level segmentation.
It is another object of the present invention to provide an information data processing terminal for implementing said system for adapting to a multitasking scene with pixel-level segmentation.
Another object of the present invention is to provide an application of the system for adapting to a multitask scene containing pixel-level segmentation in feature extraction of various computer vision tasks.
By combining all the technical schemes, the invention has the advantages and positive effects that: the invention provides a method for adapting to a multitask scene containing pixel-level segmentation, provides a brand-new convolutional neural network structure for feature extraction, relates to a computer image technology, and relates to the convolutional neural network structure for feature extraction.
In the technical scheme provided by the invention, the specified computer vision task can be completed without introducing a new algorithm coefficient and a new algorithm calculation module by adjusting the number of layers of the head of the convolutional neural network. In the application process of a deep learning algorithm in the current industry, transplantation can be carried out on various hardware platforms, such as a general hardware platform, e.g., an Nvidia GPU platform, a special embedded hardware platform, e.g., a Google TPU platform, a full-log science and technology VPU platform, a Jia KPU platform and the like. In an embedded hardware platform, due to the specificity of a chip, certain requirements are required on the design of a neural network algorithm in terms of model size, complexity and structural design. Based on the characteristics of the technical scheme of the invention, a very good scheme is provided for the transplantation of deep neural networks of different visual tasks on an embedded platform, new calculation modules and algorithm coefficients cannot be introduced, and the added structural change is also the simple change of the basic network structure.
The invention provides a convolution neural network structure capable of providing a full-scale feature map, which can be used for acquiring feature maps with different scales including an original map in a computer vision task. The invention reduces the difficulty of algorithm design, does not need to design an operator independently to be compatible with different computer vision tasks, and reduces the requirements of the algorithm on the edge end when the algorithm is applied to multi-task scenes such as the edge end, the server end and the like.
A more complex task in the example segmentation task type computer vision is further refined on the basis of object detection, the foreground and the background of an object are separated, and the object separation at the pixel level is realized. Image instance segmentation is further enhanced based on object detection. Three computer vision tasks are involved in the example segmentation task, namely target classification, target detection and pixel-level semantic segmentation. The three tasks can be processed simultaneously through the algorithm structure designed by the scheme, and no additional algorithm module is added.
The convolutional neural network structure capable of providing the full-scale feature map is applied to the feature extraction part of various computer vision tasks, can facilitate the subsequent performance of the various computer vision tasks without designing additional operators, and achieves the purpose of improving the feasibility of landing the algorithms of the various computer vision tasks.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a method for adapting a multitask scenario with pixel-level segmentation, in which a full-convolution neural network structure adapts to pixel-level segmentation according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a method for adapting to a multitask scenario including pixel-level segmentation, in which a full-convolution neural network structure adapts to pixel-level segmentation according to an embodiment of the present invention.
FIG. 3 is a block diagram of a system architecture for adapting a full convolutional neural network architecture to pixel-level segmentation to a multitasking scenario with pixel-level segmentation, according to an embodiment of the present invention;
in the figure: 1. an upsampling and numerical superimposing module; 2. a clone layer building block; 3. and a parameter calculation module.
Fig. 4 is an application schematic diagram of a method for adapting to a multitask scenario containing pixel-level segmentation, in which a full convolution neural network structure adapts to pixel-level segmentation according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In view of the problems in the prior art, the present invention provides a method and system for adapting to a multitask scene with pixel level segmentation, which will be described in detail below with reference to the accompanying drawings.
As shown in fig. 1, the method for adapting a full convolution neural network structure to a multi-task scene with pixel level segmentation provided by the embodiment of the present invention includes the following steps:
s101, adding an upper sampling layer on a backbone network, and performing numerical value superposition with an upper characteristic layer;
s102, forming a clone layer with the same structure as the original backbone network by a plurality of superposed layers;
s103, the visual task branch carries out parameter calculation from the clone layer.
The method and schematic diagram of the full convolution neural network structure adaptive to the multi-task scene of pixel level segmentation provided by the embodiment of the invention is shown in fig. 2.
As shown in fig. 3, the system for adapting a multi-tasking scenario of pixel level segmentation for a full convolution neural network structure provided by the embodiment of the present invention includes:
the upper sampling and numerical value superposition module 1 is used for adding an upper sampling layer behind a backbone network and carrying out numerical value superposition with the upper characteristic layer;
the clone layer construction module 2 is used for obtaining a feature map of the original image size after several times of upsampling and numerical value superposition, and then adding a clone layer with the same structure as the original main network;
and the parameter calculation module 3 is used for performing parameter calculation from the feature map of the original image size through the visual task branch.
The technical solution of the present invention is further described below with reference to specific examples.
Example 1
Aiming at the problems in the prior art, the invention provides a convolutional neural network structure capable of providing a full-scale feature map, so that feature maps with different scales including an original map can be obtained in a computer vision task.
The invention is realized by the following steps:
1. adding an upper sampling layer behind the backbone network, and performing numerical value superposition with the upper characteristic layer;
2. obtaining a feature map of the original image size after several times of upsampling and numerical value superposition, and then adding a clone layer with the same structure as the original backbone network;
3. the visual task branch performs parameter calculation from the feature map of the original image size.
The example segmentation task is a relatively complex task in computer vision, and is to further refine on the basis of object detection, separate the foreground and the background of an object and realize the object separation at the pixel level. Image instance segmentation is further enhanced based on object detection. Three computer vision tasks are involved in the example segmentation task, namely target classification, target detection and pixel-level semantic segmentation. The three tasks can be processed simultaneously through the algorithm structure designed by the scheme, and no additional algorithm module is added.
The technical scheme of the invention also comprises:
1. after the deepest characteristic layer is obtained by the basic backbone network, increasing convolution and up-sampling once to enlarge the width and height of the characteristic diagram by one time;
2. after the characteristic diagram obtained in the step 1 is obtained, adding an add layer, and performing numerical addition with the characteristic diagram of the previous layer to obtain characteristic information of different scales;
3. after the characteristic diagram is obtained in the step 2, adding two times of convolution for characteristic information fusion;
4. repeating the above operations twice;
5. after the feature map obtained in the step 4, two times of up-sampling are added for obtaining the feature map of the original image size;
6. and (5) adding a light-weight basic backbone network after the characteristic graph obtained in the step (5) is used for carrying out secondary extraction on the characteristic information.
The working principle part of the invention comprises:
1. obtaining a feature map with the same size as the feature map of the previous layer through one convolution and up-sampling operation on the feature layer;
2. the method comprises the steps that through add operation on a characteristic layer, characteristic information of different scales is obtained;
3. performing convolution operation on the characteristic layer twice for characteristic information fusion;
4. performing two-time up-sampling operation on the characteristic layer, and performing linear interpolation on the characteristic layer to obtain a characteristic diagram of the original image size required by pixel level segmentation;
5. and adding a basic backbone network behind the feature layer for subsequent feature extraction of the detection network.
The embodiment of the invention provides a convolutional neural network structure capable of providing a full-scale feature map, which is applied to a feature extraction part of multiple computer vision tasks, can be used for conveniently performing the multiple computer vision tasks subsequently without designing an additional operator, achieves the purpose of improving the feasibility of landing the algorithm of the multiple computer vision tasks, reduces the difficulty of algorithm design, does not need to design an operator separately to be compatible with different computer vision tasks, and reduces the requirement of the algorithm on an edge end when the algorithm is applied to a multi-task scene such as an edge end, a service end and the like.
In the invention, a clone layer structure is introduced by adjusting the backbone network, so that the backbone network can generate a feature map output with a complete scale instead of only a small-scale feature map output, the requirement of feature maps with various scales in common visual tasks is met, and a separate network structure is not required to be designed for different visual tasks. For a pixel level segmentation task, an original scale feature map can be used, for a detection and classification task, a small scale feature map can be used, compared with a two-stage method, extra steps of designing roiign and the like are omitted, the whole network structure is simpler, and no special operator exists. Compared with a one-stage method, new parameters and an algorithm process are not introduced, the vision task can be completed only by the calculation process of the basic vision task, various vision tasks can be solved by using a unified thought, and a network and a training process do not need to be redesigned for the new task.
In the above example, the feature map of the original image size is obtained by the method, and the pixel level segmentation task is realized without designing an additional operator by considering the problems of pixel alignment and the like; meanwhile, the invention can also obtain the characteristic diagram containing the characteristic information of the object to realize tasks such as target detection and the like, so the invention is suitable for various computer vision tasks containing pixel-level segmentation. In the prior art, when the deep neural network algorithm is applied to various different edge terminals and mobile terminals, the compatibility of a hardware platform needs to be considered, and the network structure used by the method is basic operation and can replace a more complex operator, so that the difficulty of transplanting the deep neural network algorithm to the edge terminals and the mobile terminals is reduced and the portability of the algorithm is improved based on the technical scheme of the invention.
Example 2
As shown in fig. 4, the input image size of the embodiment of the present invention is 416 × 416 pixels;
obtaining a characteristic graph of 416 x 416 pixels after common convolution operation based on the input image obtained in the last step;
based on the feature map obtained in the last step, obtaining a feature map of 208 × 208 pixels after common convolution operation, wherein residual structures and the like are classified into some common convolution operations;
obtaining a characteristic diagram of 104 × 104 pixels through common convolution operation based on the characteristic diagram obtained in the last step, wherein residual structures and the like are classified into some common convolution operation;
obtaining a feature map of 52 pixels by 52 pixels through common convolution operation based on the feature map obtained in the last step, wherein residual structures and the like are classified into some common convolution operation;
obtaining feature maps of 26 x 26 pixels after common convolution operation based on the feature maps obtained in the last step, wherein residual structures and the like are classified into some common convolution operations;
the five steps are conventional backbone networks, and characteristic diagrams with different scales can be obtained;
on the basis, at the position 3 of the figure, upsampling and convolution operations are added, and a feature map of 52 pixels by 52 pixels is obtained;
at 2 of the figure, add operation is added, and the value is added with the feature map obtained in the previous step, so as to obtain a feature map of 52 pixels by 52 pixels; furthermore, the up-sampling and convolution operation is added, and a characteristic map of 104 x 104 pixels is obtained;
adding add operation to the feature map obtained in the previous step to obtain a feature map of 104 × 104 pixels at fig. 1; furthermore, an upsampling and convolution operation is added, and a characteristic map of 208 x 208 pixels is obtained;
at fig. 4, an upsampling operation is added, resulting in a characteristic map of 416 x 416 pixels;
at fig. 5, one branch of the obtained 416 × 416 pixel feature map is used for subsequent tasks such as pixel segmentation and key point regression, and the other branch is added with convolution operation to obtain 208 × 208 pixel feature map;
obtaining a characteristic diagram of 104 × 104 pixels through common convolution operation based on the characteristic diagram obtained in the last step, wherein residual structures and the like are classified into some common convolution operation;
obtaining a feature map of 52 pixels by 52 pixels through common convolution operation based on the feature map obtained in the last step, wherein residual structures and the like are classified into some common convolution operation;
obtaining feature maps of 26 x 26 pixels after common convolution operation based on the feature maps obtained in the last step, wherein residual structures and the like are classified into some common convolution operations;
at fig. 6, the obtained feature map of 26 × 26 pixels is used for subsequent tasks such as object classification and object detection.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When used in whole or in part, can be implemented in a computer program product that includes one or more computer instructions. When loaded or executed on a computer, cause the flow or functions according to embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.)). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A method for adapting to a multitask scene containing pixel-level segmentation is characterized in that a feature map with the same size as a feature map of a previous layer is obtained through one convolution and up-sampling operation on a feature layer; performing add operation on the feature layer after the backbone network to obtain feature information of different scales; performing feature information fusion by performing convolution operation on the feature layer twice; performing two-time up-sampling operation on the characteristic layer, and performing linear interpolation on the characteristic layer to obtain a characteristic diagram of the original image size required by the pixel level segmentation task; and adding a clone layer with the same structure as the original backbone network behind the feature layer for feature extraction of a subsequent detection network.
2. The method for adapting a multitask scenario involving pixel level partitioning according to claim 1, wherein said method for adapting a multitask scenario involving pixel level partitioning comprises the steps of:
step one, a backbone network outputs multi-scale feature layers LAY1, LAY2 and LAY 3; adding a convolution layer and an up-sampling layer behind LAY3, and outputting a feature layer LAY1 by enlarging the scale by one time after the feature layer LAY3 is subjected to convolution and up-sampling; the feature layer LAY2 is obtained by numerical value superposition with the previous feature layer LAY1+ LAY2, and meanwhile, detail information in the shallow feature map and semantic information in the deep feature map are obtained; adding two convolution layers behind the feature layer lay2 to output a feature layer lay 3;
step two, adding a convolution layer and an up-sampling layer behind the characteristic layer lay3, and obtaining a characteristic layer lay4 by amplifying the scale of the characteristic layer lay3 by one time after convolution and up-sampling; the feature layer LAY5 is obtained by numerical value superposition with the previous feature layer LAY4+ LAY1, and meanwhile, detail information in the shallow feature map and semantic information in the deep feature map are obtained; adding two convolution layers behind the feature layer lay5 to output a feature layer lay 6;
thirdly, adding an upper sampling layer behind the feature layer lay6, and carrying out up-sampling on the feature layer lay6 and then carrying out scale amplification by one time to obtain a feature layer lay 7;
adding an upper sampling layer behind the feature layer lay7, performing up-sampling on the feature layer lay7, performing scale amplification by one time to obtain a feature layer lay8 of the original image size, and then adding a clone layer with the same structure as the original main network; and simultaneously obtaining the characteristic information required by computer vision tasks such as pixel-level segmentation, target detection and the like.
Fifthly, calculating parameters of the visual task branch from a feature map of the original image size; feature maps that are created to the dimensions of the artwork are suitable for a variety of computer vision tasks that involve pixel-level segmentation.
3. The method for adapting a multitask scenario involving pixel level partitioning as recited in claim 1, wherein said method for adapting a multitask scenario involving pixel level partitioning further comprises:
(1) the characteristic layer LAY3 of the deepest layer obtained in the basic backbone network has the size of H3W 3; adding a feature layer lay1 output after the convolution operation and the upsampling operation, wherein the scale of the feature layer lay1 is H2W 2, H3 is 2H 2, and W3 is 2W 2;
(2) adding add layers after the feature map LAY1 obtained in step (1), and performing numerical summation with the feature layer LAY2 of the previous layer to obtain a feature layer LAY2 ═ LAY1+ LAY2, which is used for obtaining feature information of different scales, wherein the scale of LAY1 is H1 × W1, the scale of LAY2 is H2 ═ W2, the scale of H1 ═ H2, and the scale of W1 is W2;
(3) after the feature map lay2 is obtained in the step (2), adding two times of convolution output feature layer lay3 for feature information fusion, wherein the convolution operation does not change the scale of the feature layer and only changes the number of channels of the feature layer;
(4) the feature layer lay3 obtained in step (3) has a dimension h3 × w 3; adding a feature layer lay4 output after the convolution operation and the upsampling operation, wherein the scale of the feature layer lay4 is h4 w4, h4 is 2 h3, and w4 is 2 w 3;
(5) adding add layers after the feature map LAY4 obtained in the step (4), and performing numerical summation with the feature layer LAY1 of the previous layer to obtain a feature layer LAY5 ═ LAY4+ LAY1, which is used for obtaining feature information of different scales, wherein the scale of LAY4 is H4 × W4, the scale of LAY1 is H1 ═ W1, the scale of H4 ═ H1, and the scale of W4 is W1;
(6) after the feature map lay5 is obtained in the step (5), adding two times of convolution output feature layer lay6 for feature information fusion, wherein the convolution operation does not change the scale of the feature layer and only changes the number of channels of the feature layer;
(7) the feature map lay6 obtained in step (6) and having a dimension h6 × w 6; adding a feature layer lay7 output after one up-sampling operation, wherein the dimension of the feature layer lay7 is h7 w7, h7 is 2 h6, and w7 is 2 w 6;
(8) the feature map lay7 obtained in step (7) and having dimensions h7 × w 7; adding a feature layer lay8 output after one up-sampling operation, wherein the dimension of the feature layer lay8 is h8 w8, and the feature layer is the size of the original image, wherein h8 is 2 h7, and w8 is 2 w 7;
(9) and (5) adding a light-weight basic backbone network for secondary extraction of the feature information after the feature map lay8 obtained in the step (8).
4. A system for adapting a multitasking scenario with pixel level segmentation according to the full convolutional neural network structure for implementing the method for adapting a multitasking scenario with pixel level segmentation claimed in any one of claims 1 to 3, wherein the system for adapting a multitasking scenario with pixel level segmentation comprises:
the upper sampling and numerical value superposition module is used for adding an upper sampling layer behind the backbone network and carrying out numerical value superposition with the upper characteristic layer;
the clone layer construction module is used for obtaining a feature map of the original image size after a plurality of times of upsampling and numerical value superposition, and then adding a clone layer with the same structure as the original backbone network;
and the parameter calculation module is used for performing parameter calculation from the feature map of the original image size through the visual task branch.
5. A computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of:
obtaining a feature map with the same size as the feature map of the previous layer through one convolution and up-sampling operation of the feature layer; obtaining feature information of different scales through add operation on the feature layer; performing feature information fusion by performing convolution operation on the feature layer twice; performing two-time up-sampling operation on the characteristic layer, and performing linear interpolation on the characteristic layer to obtain a characteristic diagram of the original image size required by the pixel level segmentation task; and adding a clone layer with the same structure as the original backbone network behind the feature layer for feature extraction of a subsequent detection network.
6. A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
obtaining a feature map with the same size as the feature map of the previous layer through one convolution and up-sampling operation of the feature layer; obtaining feature information of different scales through add operation on the feature layer; performing feature information fusion by performing convolution operation on the feature layer twice; performing two-time up-sampling operation on the characteristic layer, and performing linear interpolation on the characteristic layer to obtain a characteristic diagram of the original image size required by the pixel level segmentation task; and adding a clone layer with the same structure as the original backbone network behind the feature layer for feature extraction of a subsequent detection network.
7. A computer program product stored on a computer readable medium, comprising a computer readable program for providing a user input interface for applying a system for adapting a multitasking scenario involving pixel level segmentation according to claim 4 when executed on an electronic device.
8. A computer-readable storage medium storing instructions that, when executed on a computer, cause the computer to apply the system for accommodating multitasking scenarios involving pixel level partitioning as recited in claim 4.
9. An information data processing terminal characterized by being adapted to implement the system for adapting a multitasking scenario with pixel level partitioning according to claim 4.
10. Use of a system for adapting a multitask scenario involving pixel level segmentation according to claim 4 in feature extraction for a variety of computer vision tasks.
CN202111492470.0A 2021-12-08 2021-12-08 Method and system for adapting to multi-task scene containing pixel level segmentation Active CN114419051B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111492470.0A CN114419051B (en) 2021-12-08 2021-12-08 Method and system for adapting to multi-task scene containing pixel level segmentation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111492470.0A CN114419051B (en) 2021-12-08 2021-12-08 Method and system for adapting to multi-task scene containing pixel level segmentation

Publications (2)

Publication Number Publication Date
CN114419051A true CN114419051A (en) 2022-04-29
CN114419051B CN114419051B (en) 2024-07-23

Family

ID=81264684

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111492470.0A Active CN114419051B (en) 2021-12-08 2021-12-08 Method and system for adapting to multi-task scene containing pixel level segmentation

Country Status (1)

Country Link
CN (1) CN114419051B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117351356A (en) * 2023-10-20 2024-01-05 三亚中国农业科学院国家南繁研究院 Field crop and near-edge seed disease detection method under unmanned aerial vehicle visual angle

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108009637A (en) * 2017-11-20 2018-05-08 天津大学 The station symbol dividing method of Pixel-level TV station symbol recognition network based on cross-layer feature extraction
CN109190707A (en) * 2018-09-12 2019-01-11 深圳市唯特视科技有限公司 A kind of domain adapting to image semantic segmentation method based on confrontation study
CN111382759A (en) * 2018-12-28 2020-07-07 广州市百果园信息技术有限公司 Pixel level classification method, device, equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108009637A (en) * 2017-11-20 2018-05-08 天津大学 The station symbol dividing method of Pixel-level TV station symbol recognition network based on cross-layer feature extraction
CN109190707A (en) * 2018-09-12 2019-01-11 深圳市唯特视科技有限公司 A kind of domain adapting to image semantic segmentation method based on confrontation study
CN111382759A (en) * 2018-12-28 2020-07-07 广州市百果园信息技术有限公司 Pixel level classification method, device, equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张乐等: "基于全卷积神经网络复杂场景的车辆分割研究", 青岛大学学报(工程技术版), no. 02, 15 May 2019 (2019-05-15), pages 5 - 16 *
郑宝玉;王雨;吴锦雯;周全;: "基于深度卷积神经网络的弱监督图像语义分割", 南京邮电大学学报(自然科学版), no. 05, 13 November 2018 (2018-11-13), pages 17 - 24 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117351356A (en) * 2023-10-20 2024-01-05 三亚中国农业科学院国家南繁研究院 Field crop and near-edge seed disease detection method under unmanned aerial vehicle visual angle
CN117351356B (en) * 2023-10-20 2024-05-24 三亚中国农业科学院国家南繁研究院 Field crop and near-edge seed disease detection method under unmanned aerial vehicle visual angle

Also Published As

Publication number Publication date
CN114419051B (en) 2024-07-23

Similar Documents

Publication Publication Date Title
CN111368685B (en) Method and device for identifying key points, readable medium and electronic equipment
CN112528977A (en) Target detection method, target detection device, electronic equipment and storage medium
CN110413812B (en) Neural network model training method and device, electronic equipment and storage medium
CN112990219B (en) Method and device for image semantic segmentation
CN111062854B (en) Method, device, terminal and storage medium for detecting watermark
CN110717919A (en) Image processing method, device, medium and computing equipment
WO2020062494A1 (en) Image processing method and apparatus
CN111325704B (en) Image restoration method and device, electronic equipment and computer-readable storage medium
CN114792355B (en) Virtual image generation method and device, electronic equipment and storage medium
CN110852980A (en) Interactive image filling method and system, server, device and medium
CN114898177B (en) Defect image generation method, model training method, device, medium and product
CN112598673A (en) Panorama segmentation method, device, electronic equipment and computer readable medium
CN113780326A (en) Image processing method and device, storage medium and electronic equipment
CN110310293B (en) Human body image segmentation method and device
CN112418249A (en) Mask image generation method and device, electronic equipment and computer readable medium
CN114419051A (en) Method and system for adapting to multi-task scene containing pixel-level segmentation
WO2021218414A1 (en) Video enhancement method and apparatus, and electronic device and storage medium
CN112085733A (en) Image processing method, image processing device, electronic equipment and computer readable medium
CN116977195A (en) Method, device, equipment and storage medium for adjusting restoration model
CN110555799A (en) Method and apparatus for processing video
CN110378282A (en) Image processing method and device
CN115578261A (en) Image processing method, deep learning model training method and device
CN113610856B (en) Method and device for training image segmentation model and image segmentation
CN111696041B (en) Image processing method and device and electronic equipment
CN110119721B (en) Method and apparatus for processing information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant