CN112348828A - Example segmentation method and device based on neural network and storage medium - Google Patents

Example segmentation method and device based on neural network and storage medium Download PDF

Info

Publication number
CN112348828A
CN112348828A CN202011166214.8A CN202011166214A CN112348828A CN 112348828 A CN112348828 A CN 112348828A CN 202011166214 A CN202011166214 A CN 202011166214A CN 112348828 A CN112348828 A CN 112348828A
Authority
CN
China
Prior art keywords
target
preset
picture
network
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011166214.8A
Other languages
Chinese (zh)
Inventor
苏浩
潘武
张小锋
黄鹏
胡彬
林封笑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202011166214.8A priority Critical patent/CN112348828A/en
Publication of CN112348828A publication Critical patent/CN112348828A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Abstract

The invention discloses an example segmentation method and device based on a neural network and a storage medium. Wherein, the method comprises the following steps: obtaining a target picture in a video stream; inputting a target picture into a target example segmentation neural network, and outputting a first example set, wherein the example segmentation neural network comprises: the method comprises the steps that a network, a feature map processing layer and a mask processing layer are detected, the detection network is used for obtaining parameters of a boundary frame of an example, the feature map processing layer processes the parameters of the boundary frame to obtain target parameters, and the mask processing layer performs example segmentation on a target picture according to the target parameters; determining similar examples of the target examples in the first example set according to the overlapping degree of the target examples in the first example set; and determining the instances which are larger than the first preset threshold value in the similar instances to obtain at least one instance picture of the target instance in the target picture, thereby solving the technical problem of low instance segmentation calculation speed in the prior art.

Description

Example segmentation method and device based on neural network and storage medium
Technical Field
The invention relates to the technical field of image processing, in particular to an example segmentation method and device based on a neural network and a storage medium.
Background
When processing an image, it is often necessary to locate and distinguish the various instances contained in the picture. For example, different examples are framed by using a target detection method, and the regions where the examples of different types are located are marked pixel by using a semantic segmentation method, so that the examples of different types are distinguished. If the instances in the same category need to be further distinguished, the image is subjected to instance segmentation, and the instance segmentation not only can distinguish the categories of the image, but also can distinguish different instances in the same category.
The existing example segmentation architecture based on the candidate region is adopted to perform example segmentation on the picture in a prediction network with N levels so as to directly obtain an example segmentation result. The accuracy of example segmentation is improved in a cascading mode, but the reasoning speed is greatly reduced, and the speed and the precision are not balanced.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides an example segmentation method and device based on a neural network and a storage medium, which are used for at least solving the technical problem of low example segmentation calculation speed in the prior art.
According to an aspect of the embodiments of the present invention, there is provided an example segmentation method based on a neural network, including: acquiring a target picture in a video stream; inputting the target picture into a target instance segmentation neural network, and outputting a first instance set, wherein the instance segmentation neural network comprises: the method comprises the steps that a detection network, a feature map processing layer and a mask processing layer are used, the detection network is used for obtaining parameters of a boundary frame of an example, the feature map processing layer processes the parameters of the boundary frame to obtain target parameters, and the mask processing layer performs example segmentation on a target picture according to the target parameters; determining similar instances of the target instances in the first instance set according to the overlapping degree of the target instances in the first instance set; and determining the instances which are larger than a first preset threshold value in the similar instances to obtain at least one instance picture of the target instance in the target picture.
Optionally, before the target picture is input into the target instance segmentation neural network and the first instance set is output, the method includes: acquiring a sample picture set in a video stream; labeling a target object in each picture in the sample picture set to obtain a target data set; inputting the labeled data set into a preset example segmentation neural network, wherein the preset neural network comprises a preset detection network, a preset feature map processing layer, a preset mask processing layer and a target loss function, the detection network is used for acquiring parameters of example bounding boxes in preset sample pictures, the feature map processing layer processes the parameters of the example bounding boxes in the preset sample pictures to obtain preset target parameters, the mask processing layer performs example segmentation on the sample target pictures according to the preset target parameters, and the target loss function comprises a binary cross entropy loss function and an intersection-to-parallel ratio loss function; determining to segment a neural network for the instance if the target loss function satisfies a predetermined condition.
Optionally, labeling the target object in each picture in the sample picture set to obtain a target data set includes: and performing data enhancement on each picture and the labeling result in the sample picture set by adopting an example segmentation standard data enhancement technology to obtain the target data set.
Optionally, after labeling the target object in each picture in the sample picture set to obtain a target data set, the method further includes: and dividing the target data set into a training set, a verification set and a test set according to a preset proportion, wherein the training set is used for training the preset example segmentation neural network, the verification set is used for verifying the preset example segmentation neural network, and the test set is used for testing the preset neural network segmentation model.
Optionally, before the annotation data set is input into the preset instance segmentation neural network, the method further includes: the method comprises the steps that an initialization detection network is built, wherein the detection network comprises a feature extraction backbone network, a feature enhancement network and a detection head, the feature extraction backbone network is used for carrying out feature extraction on an example of each picture in a sample picture set to obtain a feature map, the feature enhancement network carries out feature map enhancement on the feature map and marks the size of the feature map, and the feature maps marked in different sizes are input to the detection head to obtain parameters of a sample example boundary frame; and constructing the preset example segmentation neural network according to the initialization detection network, a preset feature map processing layer and a preset mask processing, wherein the preset feature map processing layer processes parameters of the sample example bounding box to obtain sample target parameters, and the preset mask processing layer performs example segmentation on the sample target picture according to the sample target parameters.
According to another aspect of the embodiments of the present invention, there is also provided an example segmentation apparatus based on a neural network, including:
according to an aspect of the embodiments of the present invention, there is provided an example segmenting apparatus based on a neural network, including: the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring a target picture in a video stream; an output unit, configured to input the target picture into a target instance segmentation neural network, and output a first instance set, where the instance segmentation neural network includes: the method comprises the steps that a detection network, a feature map processing layer and a mask processing layer are used, the detection network is used for obtaining parameters of a boundary frame of an example, the feature map processing layer processes the parameters of the boundary frame to obtain target parameters, and the mask processing layer performs example segmentation on a target picture according to the target parameters; a first determining unit, configured to determine similar instances of the target instances in the first instance set according to overlapping degrees between the target instances in the first instance set; a second determining unit, configured to determine an instance, which is greater than a first predetermined threshold, in the similar instance, to obtain at least one instance picture of the target instance in the target picture.
Optionally, the apparatus includes: the second acquisition unit is used for inputting the target picture into a target example segmentation neural network and acquiring a sample picture set in a video stream before outputting the first example set; the obtaining unit is used for marking the target object in each picture in the sample picture set to obtain a target data set; the input unit is used for inputting the labeling data set into a preset example segmentation neural network, wherein the preset neural network comprises a preset detection network, a preset feature map processing layer, a preset mask processing layer and a target loss function, the detection network is used for acquiring parameters of an example boundary box in a preset sample picture, the feature map processing layer processes the parameters of the example boundary box in the preset sample picture to obtain preset target parameters, the mask processing layer performs example segmentation on the sample target picture according to the preset target parameters, and the target loss function comprises a binary cross loss entropy function and an intersection-to-parallel ratio loss function; a third determining unit, configured to determine to segment the neural network for the instance if the target loss function satisfies a predetermined condition.
Optionally, the obtaining unit includes: and the obtaining module is used for performing data enhancement on each picture and the labeling result in the sample picture set by adopting an example segmentation standard data enhancement technology to obtain the target data set.
Optionally, the apparatus further comprises: and the dividing unit is used for marking the target object in each picture in the sample picture set to obtain a target data set, and then dividing the target data set into a training set, a verification set and a test set according to a preset proportion, wherein the training set is used for training the preset example segmentation neural network, the verification set is used for verifying the preset example segmentation neural network, and the test set is used for testing the preset neural network segmentation model.
Optionally, the apparatus further comprises: the first construction unit is used for constructing an initialized detection network before the labeled data set is input into a preset example segmentation neural network, wherein the detection network comprises a feature extraction backbone network, a feature enhancement network and a detection head, the feature extraction backbone network is used for performing feature extraction on an example of each picture in a sample picture set to obtain a feature map, the feature enhancement network performs feature map enhancement on the feature map and marks the size of the feature map, and the feature maps marked in different sizes are input into the detection head to obtain parameters of a sample example boundary box; and the second construction unit is used for constructing the preset example segmentation neural network according to the initialization detection network, a preset feature map processing layer and a preset mask processing layer, wherein the preset feature map processing layer processes the parameters of the sample example bounding box to obtain sample target parameters, and the preset mask processing layer performs example segmentation on the sample target picture according to the sample target parameters.
According to a further aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the above example neural network-based segmentation method when the computer program is executed.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the example neural network-based segmentation method described above through the computer program.
In the embodiment of the invention, a target picture in a video stream is obtained; inputting a target picture into a target example segmentation neural network, and outputting a first example set, wherein the example segmentation neural network comprises: the method comprises the steps that a network, a feature map processing layer and a mask processing layer are detected, the detection network is used for obtaining parameters of a boundary frame of an example, the feature map processing layer processes the parameters of the boundary frame to obtain target parameters, and the mask processing layer performs example segmentation on a target picture according to the target parameters; determining similar examples of the target examples in the first example set according to the overlapping degree of the target examples in the first example set; and determining the example which is larger than the first preset threshold value in the similar example to obtain at least one example picture of the target example in the target picture, so that the aims of carrying out example segmentation on the target picture through an example segmentation neural network with a detection network, a feature map processing layer and a mask processing layer and determining the target example through the threshold value in the example segmentation result are fulfilled, the technical effect of rapidness and accuracy is achieved, and the technical problem that the example segmentation calculation speed is lower in the prior art is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a schematic diagram of an application environment of an alternative example neural network-based segmentation method according to an embodiment of the present invention;
FIG. 2 is a flow diagram of an alternative neural network-based example segmentation method in accordance with embodiments of the present invention;
FIG. 3 is an alternative example segmentation method according to an embodiment of the present invention;
FIG. 4 is a block diagram of an alternative example split network in accordance with an embodiment of the present invention;
FIG. 5 is a block diagram of an alternative masking layer in accordance with an embodiment of the present invention;
FIG. 6 is a block diagram of an alternative example neural network-based segmentation apparatus, according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an alternative electronic device according to an embodiment of the invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
According to an aspect of the embodiments of the present invention, there is provided a neural network-based example segmentation method, optionally, as an optional implementation, the neural network-based example segmentation method may be applied, but is not limited, to the environment as shown in fig. 1. Including terminal equipment 102, network 104, and server 106.
Optionally, the example segmentation method based on the neural network may be executed by the terminal device 102, may also be executed by the server 106, and may also be completed by the terminal device 102 and the server 106 executing together.
The example segmentation method performed by the server 106 based on the neural network is described as follows.
The server 106 acquires a target picture in the video stream; inputting a target picture into a target example segmentation neural network, and outputting a first example set, wherein the example segmentation neural network comprises: the method comprises the steps that a network, a feature map processing layer and a mask processing layer are detected, the detection network is used for obtaining parameters of a boundary frame of an example, the feature map processing layer processes the parameters of the boundary frame to obtain target parameters, and the mask processing layer performs example segmentation on a target picture according to the target parameters; determining similar examples of the target examples in the first example set according to the overlapping degree of the target examples in the first example set; and determining the example which is larger than the first preset threshold value in the similar example to obtain at least one example picture of the target example in the target picture, so that the aims of carrying out example segmentation on the target picture through an example segmentation neural network with a detection network, a feature map processing layer and a mask processing layer and determining the target example through the threshold value in the example segmentation result are fulfilled, the technical effect of rapidness and accuracy is achieved, and the technical problem that the example segmentation calculation speed is lower in the prior art is solved.
Optionally, in this embodiment, the terminal device 102 may be a terminal device configured with a target client, and may include but is not limited to at least one of the following: mobile phones (such as Android phones, iOS phones, etc.), notebook computers, tablet computers, palm computers, MID (Mobile Internet Devices), PAD, desktop computers, smart televisions, etc. The target client may be a video client, an instant messaging client, a browser client, an educational client, etc. Such networks may include, but are not limited to: a wired network, a wireless network, wherein the wired network comprises: a local area network, a metropolitan area network, and a wide area network, the wireless network comprising: bluetooth, WIFI, and other networks that enable wireless communication. The server may be a single server, a server cluster composed of a plurality of servers, or a cloud server. The above is merely an example, and this is not limited in this embodiment.
Optionally, as an optional implementation manner, as shown in fig. 2, the example segmentation method based on a neural network includes:
in step S2020, a target picture in the video stream is acquired.
Step S204, inputting the target picture into a target example segmentation neural network, and outputting a first example set, wherein the example segmentation neural network comprises: the image processing method comprises a detection network, a feature map processing layer and a mask processing layer, wherein the detection network is used for obtaining parameters of a boundary frame of an example, the feature map processing layer processes the parameters of the boundary frame to obtain target parameters, and the mask processing layer performs example segmentation on a target image according to the target parameters.
Step S206, determining similar examples of the target examples in the first example set according to the overlapping degree of the target examples in the first example set.
Step S208, determining the instances which are larger than the first preset threshold value in the similar instances, and obtaining at least one instance picture of the target instance in the target picture.
Optionally, in this embodiment, the above scheme may include but is not limited to be applied to scenes such as portrait photographing, video special effects, AR scenes, automatic driving, video target tracking, and unmanned aerial vehicle video image processing, where performing target object tracking is to perform instance segmentation on pictures in a video stream, and perform video target tracking. Fast and accurate example segmentation is a good basis for proceeding to the next step.
According to the embodiment provided by the application, the target picture in the video stream is obtained; inputting a target picture into a target example segmentation neural network, and outputting a first example set, wherein the example segmentation neural network comprises: the method comprises the steps that a network, a feature map processing layer and a mask processing layer are detected, the detection network is used for obtaining parameters of a boundary frame of an example, the feature map processing layer processes the parameters of the boundary frame to obtain target parameters, and the mask processing layer performs example segmentation on a target picture according to the target parameters; determining similar examples of the target examples in the first example set according to the overlapping degree of the target examples in the first example set; and determining the example which is larger than the first preset threshold value in the similar example to obtain at least one example picture of the target example in the target picture, so that the aims of carrying out example segmentation on the target picture through an example segmentation neural network with a detection network, a feature map processing layer and a mask processing layer and determining the target example through the threshold value in the example segmentation result are fulfilled, the technical effect of rapidness and accuracy is achieved, and the technical problem that the example segmentation calculation speed is lower in the prior art is solved.
Optionally, before inputting the target picture into the target instance segmentation neural network and outputting the first instance set, the method may include: acquiring a sample picture set in a video stream; marking a target object in each picture in the sample picture set to obtain a target data set; inputting the labeled data set into a preset example segmentation neural network, wherein the preset neural network comprises a preset detection network, a preset feature map processing layer, a preset mask processing layer and a target loss function, the detection network is used for acquiring parameters of example boundary frames in preset sample pictures, the feature map processing layer processes the parameters of the example boundary frames in the preset sample pictures to obtain preset target parameters, the mask processing layer performs example segmentation on the sample target pictures according to the preset target parameters, and the target loss function comprises a binary cross entropy loss function and an intersection-to-parallel ratio loss function; in the case where the target loss function satisfies a predetermined condition, it is determined to segment the neural network for the instance.
Optionally, labeling the target object in each picture in the sample picture set to obtain the target data set includes: and performing data enhancement on each picture and the labeling result in the sample picture set by adopting an example segmentation standard data enhancement technology to obtain a target data set.
Optionally, after labeling the target object in each picture in the sample picture set to obtain the target data set, the method further includes: dividing a target data set into a training set, a verification set and a test set according to a preset proportion, wherein the training set is used for training a preset example segmentation neural network, the verification set is used for verifying the preset example segmentation neural network, and the test set is used for testing a preset neural network segmentation model.
Optionally, before the annotation data set is input into the preset instance segmentation neural network, the method further includes: the method comprises the steps that an initialized detection network is established, wherein the detection network comprises a feature extraction backbone network, a feature enhancement network and a detection head, the feature extraction backbone network is used for carrying out feature extraction on an example of each picture in a sample picture set to obtain a feature map, the feature enhancement network carries out feature map enhancement on the feature map and marks the size of the feature map, and the feature maps marked in different sizes are input to the detection head to obtain parameters of a sample example boundary frame; and constructing a preset example segmentation neural network according to the initialization detection network, a preset feature map processing layer and a preset mask processing layer, wherein the preset feature map processing layer processes parameters of a sample example boundary frame to obtain sample target parameters, and the preset mask processing layer performs example segmentation on the sample target picture according to the sample target parameters.
As an alternative embodiment, the present application further provides an instance partitioning method. As shown in fig. 3, a flow chart of an example segmentation method. The detailed description is as follows.
And step 31, initializing and preprocessing a video image to be detected.
Video image pre-processing, comprising: initializing a video image to be detected, denoted as XV,XVIs noted as the dimension of
Figure BDA0002745862430000091
XVIs denoted by KV(ii) a To-be-detected video image XVAccording to a standard example segmentation labeling method, carrying out manual labeling on an interested target to obtain a labeling result, and recording the labeling result as GV(ii) a Adopting an example segmentation standard data enhancement technology to treat a video image X to be detectedVAnd labeling the result GVPerforming data enhancement to obtain a final data set, and marking as omega; initializing the proportion of the number of images in the training set, the verification set and the test set in the data set omega, and recording the proportion as K1∶K2∶K3(ii) a The data set omega is in accordance with K1∶K2∶K3The training set, the verification set and the test set are divided in proportion at random, and the obtained training set, verification set and test set are recorded as omega respectivelytrain,Ωvalid,Ωtest
The standard example segmentation and annotation method refers to example segmentation aiming at predicting the position and semantic mask of each example in an image, and adopts open source software labelme for annotation.
The example segmentation standard data enhancement technology is used for expanding a data set by performing operations of turning, rotating, scaling, translating, adding Gaussian noise, performing contrast transformation, performing color transformation and the like on an image of the data set. The data enhancement is mainly to reduce the overfitting phenomenon of the network, and the network with stronger generalization ability can be obtained by transforming the training picture, so that the method is better suitable for application scenes.
Step 32, a convolutional neural detection network is constructed and initialized.
Constructing and initializing a standard convolutional neural detection network model, denoted as W, according to a standard YOLOv4 network construction methodDWherein W isDThe system consists of a feature extraction backbone network, a feature enhancement network and a detection head, wherein the feature extraction backbone network is marked as WBThe feature enhancement network is denoted as WNThe detection head is marked as WP(ii) a The result of a convolutional layer followed by a batch normalization layer and a modified linear unit with leakage (leakage ReLU) function is recorded as CBL
Wherein, WBExtracting features by adopting a standard CSPDarknet53 network;
feature enhanced network WNIn (1), WBThe results of (a) are processed in parallel by using a maximum pooling mode of 1 × 1, 5 × 5, 9 × 9 and 13 × 13, and then tensor stitching operation is performed on feature maps of different scales after pooling, and the obtained result is recorded as SPP(ii) a Will SPPA cross-level local connection module (CSP module) is accessed later, and residual components in the CSP module are used as CBLReplacing, and recording the obtained result as A5(ii) a A is to be5Is connected with a CBLSequentially performing bilinear interpolation upsampling twice, and sequentially recording the result as A4And A3(ii) a A is to be5,A4,A3Constructing according to a standard characteristic pyramid and a path enhancement structure, and replacing tensor addition in the path enhancement structure by tensor splicing(ii) a Respectively connecting the results of four times of tensor splicing operations in the characteristic pyramid and the path enhancement structure which construct the standard into a cross-level local connection module (CSP module), and using C as a residual component in the cross-level local connection moduleBLReplacing the standard feature pyramid and the 1 × 1 convolution layer in the path enhancement structure with the standard CoordConv layer to obtain the final feature enhancement network
Figure BDA0002745862430000111
Enhancing features in a network
Figure BDA0002745862430000112
Is sequentially marked as P from large to small according to the size of the characteristic diagram3,P4,P5
At WPIn (1), P is3,P4,P5Then, a standard YOLOv3 detection head (definition 6) is accessed in sequence through convolution connection; replacing the first layer convolutional layer in the YOLOv3 detection header with a standard CoordConv layer; the center point adjustment formula in the YOLOv3 detection head is increased by a factor of alpha, namely x ═ s (g)x+α·σ(px)-(α-1)/2),y=s·(gy+α·σ(py) - (α -1)/2 wherein x, y are the bounding box center coordinates, σ is the Sigmoid function, s is the scale factor, σ (p)x) And σ (p)y) Is the center coordinate offset, gxAnd gyRepresenting the coordinates of the center of the real bounding box; initializing the coefficient alpha to 1.05 to obtain the final detection head
Figure BDA0002745862430000113
Extracting features into a backbone network WBFeature enhanced network
Figure BDA0002745862430000114
And a detection head
Figure BDA0002745862430000115
Through convolution connection and initialization, a final detection model is obtained
Figure BDA0002745862430000116
The standard CSPDarknet53 is a Backbone structure generated by referring to the experience of CSPNet in 2019 on the basis of a Yolov3 Backbone network Darknet53, wherein the Backbone structure comprises 5 CSP modules (cross-level local connection modules); the accuracy of the YOLOv4 is improved by nearly 10 points compared with that of the YOLOv3, however, the speed is hardly reduced, and the YOLOv4 is a detection model with higher speed and better precision, and can complete training only by 1080Ti or 2080 Ti.
The standard feature pyramid and path enhancement structure is based on a feature pyramid framework, enhancing information propagation, which adds a bottom-up enhancement path, thereby improving propagation of low-level features. Each stage of the newly added third path takes as input the feature maps of the previous stage and processes them with a 3 x 3 convolutional layer. The output of the convolution is added to the same phase profiles of the top-down path using cross-connects and then the profiles are sent to the next phase.
The convolution operation in the deep learning of the standard CoordConv layer is degenerated by translation and the like, so that uniform convolution kernel parameters can be shared at different positions of an image, but the coordinates of the current feature in the image cannot be perceived in the convolution learning process. CoordConv represents the coordinates of the pixel points of the feature map by newly adding corresponding channels in the convolved input feature map, so that the detection precision can be improved by sensing the coordinates to a certain degree in the process of convolution learning, and therefore, the feature extraction is optimized under the condition that the calculated amount is hardly increased.
The standard YOLOv3 detection head, the YOLOv3 network, the YOLOv3 detection head and the method are characterized in that the YOLOv3 detection head is composed of a feature extraction network Darknet53 and a YOLOv3 detection head, and the YOLOv3 detection head is used for carrying out target detection through 3 feature graphs with different scales, so that the fine-grained features can be detected, and the method is favorable for detecting small targets.
Step 33, the instance segmentation network is constructed and initialized.
The convolutional neural detection network model obtained in step 32
Figure BDA0002745862430000121
In the method, a feature map preprocessing layer W is addedPreSum mask branch WMTo obtain the final example segmentation model, denoted as WISAs shown in fig. 4, the example partitions the structure of the network.
Preprocessing the layer W on the feature mapPreIn accordance with
Figure BDA0002745862430000122
Using the formula sareaCalculate the rectangular box area s for each target w hareaWherein w and h are the width and height of the detection target, respectively; then, the obtained detection result is taken as a mask proposal to be distributed to the corresponding feature enhanced network according to the following rules
Figure BDA0002745862430000123
The method comprises the following steps:
Figure BDA0002745862430000124
is allocated to P3Processing;
Figure BDA0002745862430000125
is allocated to P4Processing;
Figure BDA0002745862430000126
is allocated to P5Processing;
mask branching in feature enhanced networks based on the above-described assignment
Figure BDA0002745862430000127
Taking out the corresponding characteristic diagram for example segmentation; firstly, carrying out region of interest alignment (RoIAlign) operation on the extracted feature map, and then passing through ncConvolution layer with convolution kernel size of 3 x 3, ndDeconvolution layer sum with convolution kernel size of 2 x 2
Figure BDA0002745862430000128
The operation of the convolutional layer with convolution kernel size of 1 × 1, the result of segmentation is obtained and is denoted as R1The channel dimension of each convolutional layer is CDThen adding a short circuit path, i.e. passing through nc-after the results of 1 convolutional layer with convolution kernel size 3 × 3 have been processed through a convolutional layer with convolution kernel size 3 × 3, reducing the channel dimension to C with a convolutional layer with convolution kernel size 1 × 1DAnd/2, introducing a full connection layer, changing the operation into a vector, and changing the space size of the obtained vector to R through matrix dimension changing1Keep consistent and the final result is recorded as R2(ii) a R is to be1And R2Adding to obtain the final mask result, which is denoted as Rmask(ii) a Will detect the model
Figure BDA0002745862430000129
Feature map preprocessing layer WPreSum mask branch WMDirectly connecting and initializing according to the structure diagram to obtain a final example segmentation model WIS(ii) a As shown in fig. 5, the structure of the mask processing layer.
Step 34, training and adjusting the instance segmentation network.
Initializing image processing batch size and mini batch size, and recording as BS and mini-BS respectively; initializing learning rate, and recording as LR; initializing weight attenuation rate and momentum, and respectively recording as WDR and MO; initializing a training period, and recording as epoch; the value sampled from the Gaussian distribution with the mean value of 0 and the variance of 1 is taken as an example segmentation model WISTo obtain initialized instance split network parameters Wold(ii) a Training set omega in step S31 by adopting standard example segmentation network training technologytrainAfter the video pictures are disordered in sequence at random, the small batches of the video pictures are sequentially introduced into the example segmentation model W obtained in the step S33IS(ii) a Computing an instance segmentation model WISLoss function value of (D), denoted LossoldThe formula for calculating the loss function is as follows: loss ═ Losscls+Lossconf+Lossbox+LossmaskWherein Losscls,LossconfAnd LossmaskIs a binary cross entropy Loss function, LossboxIs a standard CIoU loss function; network parameters W are segmented for the examples using a standard stochastic gradient descent methodoldThe updating is carried out, and the updating is carried out,wherein standard exponential moving average techniques are introduced; i.e. WEMA=λ*WEMA+(1-λ)*WEMAInitializing parameter lambda to 0.998, and finally obtaining a new network parameter, which is marked as Wnew(ii) a Will train the network WISThe resulting final model and parameters are recorded as
Figure BDA0002745862430000131
Compared with a Dropout algorithm, the DropBlock algorithm does not Drop in a characteristic point form when the Drop is characterized, but collects a certain region of the Drop, so that the DropBlock algorithm is more suitable for being applied to an example segmentation task to improve the generalization capability of a network; the method adopts Mosaic data enhancement in the standard example segmentation network training technology, and splices 4 pictures into one picture by using the modes of random scaling, random cutting and random arrangement so as to improve the performance of small and medium targets; in addition, if the loss function of the small object is lower than a certain threshold value in one iteration, the next iteration utilizes a splicing map, otherwise, normal image training is adopted; the image is also subjected to self-adaptive scaling in the standard example segmentation network training technology; the standard example segmentation network training technology also adopts CmBN, SAT self-confrontation training and other technologies to train the network.
According to the standard CIoU loss function, DIoU is more in line with a target box regression mechanism than GIou, the distance, the overlapping rate and the scale between a target and an anchor are taken into consideration, so that the target box regression becomes more stable, and the length-width ratio of three boundary box regression factors is taken into consideration by the CIoU on the basis of the DIoU, so that the result is more accurate.
Standard index moving average technique the index moving average technique refers to taking the average value of a parameter in the past as a new parameter; compared with the method of directly updating the parameters, the method has the advantages that the parameter learning process can be more gentle by adopting an exponential moving average mode, the influence of abnormal values on parameter updating can be effectively avoided, and the convergence effect of model training is improved.
And step 35, performing real-time instance segmentation on the video stream to be detected.
Initializing a video stream acquired by a camera in real time into a video stream to be detected, and recording the video stream as V; decoding the video stream V according to FFmpeg standard by adopting multithreading technology, and recording the decoding result
Figure BDA0002745862430000141
Will be provided with
Figure BDA0002745862430000142
The video image example segmentation model obtained in the step 34 is introduced
Figure BDA0002745862430000143
In (3), obtaining an output result Rresult(ii) a R is to beresultInhibition of overlap by standard Matrix NMS gave results
Figure BDA0002745862430000144
I.e. the final video stream instance segmentation result.
For example, when calculating the suppression coefficient for a certain prediction frame B, the Matrix NMS calculates IoU of all prediction frames with scores higher than B and the prediction frame B in a Matrix parallel manner, and then performs approximate estimation according to the IOU and the suppression probability of the prediction frames with scores higher than B to estimate the suppression coefficient of B, thereby implementing parallelized calculation Soft NMS, and avoiding reduction of inference speed while improving detection accuracy.
According to the embodiment, a video stream acquired by a camera in real time is initialized to be a video stream to be detected and marked as V; decoding the video stream V according to FFmpeg standard by adopting multithreading technology, and recording the decoding result
Figure BDA0002745862430000145
Will be provided with
Figure BDA0002745862430000146
The video image example segmentation model obtained in the step 34 is introduced
Figure BDA0002745862430000147
In (3), obtaining an output result Rresult(ii) a R is to beresultInhibition of overlap by standard Matrix NMS gave results
Figure BDA0002745862430000148
I.e. the final video stream instance segmentation result. And thus, instance partitioning can be achieved quickly and accurately.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
According to another aspect of the embodiments of the present invention, there is also provided a neural network-based example segmentation apparatus for implementing the neural network-based example segmentation method described above. As shown, 6 the example segmentation apparatus based on neural network includes: a first acquisition unit 61, an output unit 63, a first determination unit 65, and a second determination unit 67.
A first obtaining unit 61, configured to obtain a target picture in a video stream.
An output unit 63, configured to input the target picture into a target instance segmentation neural network, and output the first instance set, where the instance segmentation neural network includes: the image processing method comprises a detection network, a feature map processing layer and a mask processing layer, wherein the detection network is used for obtaining parameters of a boundary frame of an example, the feature map processing layer processes the parameters of the boundary frame to obtain target parameters, and the mask processing layer performs example segmentation on a target image according to the target parameters.
A first determining unit 65, configured to determine similar instances of the target instances in the first instance set according to the overlapping degrees between the target instances in the first instance set.
A second determining unit 67, configured to determine an instance of the similar instances that is greater than the first predetermined threshold, to obtain at least one instance picture of the target instance in the target picture.
By the embodiment provided by the application, the first obtaining unit 61 obtains a target picture in a video stream; the output unit 63 inputs the target picture into a target instance segmentation neural network, and outputs a first instance set, wherein the instance segmentation neural network includes: the method comprises the steps that a network, a feature map processing layer and a mask processing layer are detected, the detection network is used for obtaining parameters of a boundary frame of an example, the feature map processing layer processes the parameters of the boundary frame to obtain target parameters, and the mask processing layer performs example segmentation on a target picture according to the target parameters; the first determining unit 65 determines similar instances of the target instances in the first instance set according to the overlapping degrees between the target instances in the first instance set; second determining unit 67 determines the instances of the similar instances that are larger than the first predetermined threshold, resulting in at least one instance picture of the target instance in the target picture. The method achieves the purposes of carrying out instance segmentation on the target picture through the instance segmentation neural network with the detection network, the feature map processing layer and the mask processing layer and determining the target instance through the threshold value of the instance segmentation result, thereby achieving the technical effects of rapidness and accuracy and further solving the technical problem of lower instance segmentation calculation speed in the prior art.
As an alternative embodiment, the apparatus may include:
the second acquisition unit is used for inputting the target picture into the target example segmentation neural network and acquiring a sample picture set in the video stream before outputting the first example set;
the obtaining unit is used for marking the target object in each picture in the sample picture set to obtain a target data set;
the device comprises an input unit, a detection unit, a feature map processing layer, a mask processing layer and a target loss function, wherein the preset neural network comprises a preset detection network, a preset feature map processing layer, a preset mask processing layer and the target loss function;
and a third determination unit for determining the example segmented neural network in the case that the target loss function satisfies a predetermined condition.
As an alternative embodiment, the obtaining unit may include:
and the obtaining module is used for performing data enhancement on each picture and the labeling result in the sample picture set by adopting an example segmentation standard data enhancement technology to obtain a target data set.
As an alternative embodiment, the apparatus may include:
and the dividing unit is used for marking the target object in each picture in the sample picture set to obtain a target data set, and then dividing the target data set into a divided training set, a verification set and a test set according to a preset proportion, wherein the training set is used for training the preset example segmented neural network, the verification set is used for verifying the preset example segmented neural network, and the test set is used for testing the preset neural network segmentation model.
As an alternative embodiment, the apparatus may include:
the first construction unit is used for constructing an initialized detection network before the labeled data set is input into a preset example segmentation neural network, wherein the detection network comprises a feature extraction backbone network, a feature enhancement network and a detection head, the feature extraction backbone network is used for performing feature extraction on an example of each picture in the sample picture set to obtain a feature map, the feature enhancement network performs feature map enhancement on the feature map and marks the size of the feature map, and the feature maps marked in different sizes are input into the detection head to obtain parameters of a sample example boundary box;
and the second construction unit is used for constructing a preset example segmentation neural network according to the initialization detection network, the preset feature map processing layer and the preset mask processing, wherein the preset feature map processing layer processes the parameters of the sample example bounding box to obtain sample target parameters, and the preset mask processing layer performs example segmentation on the sample target picture according to the sample target parameters.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device for implementing the example neural network-based segmentation method, where the electronic device may be a terminal device or a server shown in fig. 1. The present embodiment is described by taking the electronic device as an example of a server. As shown in fig. 7, the electronic device comprises a memory 702 and a processor 704, the memory 702 having stored therein a computer program, the processor 704 being arranged to perform the steps of any of the above-described method embodiments by means of the computer program.
Optionally, in this embodiment, the electronic device may be located in at least one network device of a plurality of network devices of a computer network.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, acquiring a target picture in the video stream;
s2, inputting the target picture into a target example segmentation neural network, and outputting a first example set, wherein the example segmentation neural network comprises: the method comprises the steps that a network, a feature map processing layer and a mask processing layer are detected, the detection network is used for obtaining parameters of a boundary frame of an example, the feature map processing layer processes the parameters of the boundary frame to obtain target parameters, and the mask processing layer performs example segmentation on a target picture according to the target parameters;
s3, determining similar examples of the target examples in the first example set according to the overlapping degree of the target examples in the first example set;
and S4, determining the instances which are larger than the first preset threshold value in the similar instances, and obtaining at least one instance picture of the target instance in the target picture.
Alternatively, it can be understood by those skilled in the art that the structure shown in fig. 7 is only an illustration, and the electronic device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 7 is a diagram illustrating a structure of the electronic device. For example, the electronics may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 7, or have a different configuration than shown in FIG. 7.
The memory 702 may be used to store software programs and modules, such as program instructions/modules corresponding to the neural network based example segmentation method and apparatus in the embodiment of the present invention, and the processor 704 executes various functional applications and data processing by running the software programs and modules stored in the memory 702, so as to implement the above-mentioned neural network based example segmentation method. The memory 702 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 702 can further include memory located remotely from the processor 704, which can be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 702 may be specifically, but not limited to, used for storing information such as a target picture, a result of example segmentation of the target picture, and the like. As an example, as shown in fig. 7, the memory 702 may include, but is not limited to, the first obtaining unit 61, the output unit 63, the first determining unit 65, and the second determining unit 67 in the example neural network based segmentation apparatus. In addition, other module units in the example segmentation device based on the neural network may also be included, but are not limited to these, and are not described in detail in this example.
Optionally, the transmitting device 706 is used for receiving or sending data via a network. Examples of the network may include a wired network and a wireless network. In one example, the transmission device 706 includes a Network adapter (NIC) that can be connected to a router via a Network cable and other Network devices to communicate with the internet or a local area Network. In one example, the transmission device 706 is a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
In addition, the electronic device further includes: a display 708, configured to display the picture to be example-segmented and the example segmentation result; and a connection bus 710 for connecting the respective module parts in the above-described electronic apparatus.
In other embodiments, the terminal device or the server may be a node in a distributed system, where the distributed system may be a blockchain system, and the blockchain system may be a distributed system formed by connecting a plurality of nodes through a network communication. Nodes can form a Peer-To-Peer (P2P, Peer To Peer) network, and any type of computing device, such as a server, a terminal, and other electronic devices, can become a node in the blockchain system by joining the Peer-To-Peer network.
According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the method for neural network-based instance segmentation described above. Wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.
Alternatively, in the present embodiment, the above-mentioned computer-readable storage medium may be configured to store a computer program for executing the steps of:
s1, acquiring a target picture in the video stream;
s2, inputting the target picture into a target example segmentation neural network, and outputting a first example set, wherein the example segmentation neural network comprises: the method comprises the steps that a network, a feature map processing layer and a mask processing layer are detected, the detection network is used for obtaining parameters of a boundary frame of an example, the feature map processing layer processes the parameters of the boundary frame to obtain target parameters, and the mask processing layer performs example segmentation on a target picture according to the target parameters;
s3, determining similar examples of the target examples in the first example set according to the overlapping degree of the target examples in the first example set;
and S4, determining the instances which are larger than the first preset threshold value in the similar instances, and obtaining at least one instance picture of the target instance in the target picture.
Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing one or more computer devices (which may be personal computers, servers, network devices, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (11)

1. An example segmentation method based on a neural network is characterized by comprising the following steps:
acquiring a target picture in a video stream;
inputting the target picture into a target instance segmentation neural network, and outputting a first instance set, wherein the instance segmentation neural network comprises: the method comprises the steps that a detection network, a feature map processing layer and a mask processing layer are used, the detection network is used for obtaining parameters of a boundary frame of an example, the feature map processing layer processes the parameters of the boundary frame to obtain target parameters, and the mask processing layer performs example segmentation on a target picture according to the target parameters;
determining similar instances of the target instances in the first instance set according to the overlapping degree of the target instances in the first instance set;
and determining the instances which are larger than a first preset threshold value in the similar instances to obtain at least one instance picture of the target instance in the target picture.
2. The method of claim 1, wherein before inputting the target picture into a target instance segmentation neural network and outputting the first set of instances, the method comprises:
acquiring a sample picture set in a video stream;
labeling a target object in each picture in the sample picture set to obtain a target data set;
inputting the labeled data set into a preset example segmentation neural network, wherein the preset neural network comprises a preset detection network, a preset feature map processing layer, a preset mask processing layer and a target loss function, the detection network is used for acquiring parameters of example bounding boxes in preset sample pictures, the feature map processing layer processes the parameters of the example bounding boxes in the preset sample pictures to obtain preset target parameters, the mask processing layer performs example segmentation on the sample target pictures according to the preset target parameters, and the target loss function comprises a binary cross entropy loss function and an intersection-to-parallel ratio loss function;
determining to segment a neural network for the instance if the target loss function satisfies a predetermined condition.
3. The method of claim 2, wherein labeling the target object in each picture in the sample picture set to obtain a target data set comprises:
and performing data enhancement on each picture and the labeling result in the sample picture set by adopting an example segmentation standard data enhancement technology to obtain the target data set.
4. The method of claim 2, wherein after labeling the target object in each picture in the sample picture set to obtain the target data set, the method further comprises:
and dividing the target data set into a training set, a verification set and a test set according to a preset proportion, wherein the training set is used for training the preset example segmentation neural network, the verification set is used for verifying the preset example segmentation neural network, and the test set is used for testing the preset neural network segmentation model.
5. The method of claim 2, wherein prior to inputting the annotated dataset into a preset instance segmentation neural network, the method further comprises:
the method comprises the steps that an initialization detection network is built, wherein the detection network comprises a feature extraction backbone network, a feature enhancement network and a detection head, the feature extraction backbone network is used for carrying out feature extraction on an example of each picture in a sample picture set to obtain a feature map, the feature enhancement network carries out feature map enhancement on the feature map and marks the size of the feature map, and the feature maps marked in different sizes are input to the detection head to obtain parameters of a sample example boundary frame;
and constructing the preset example segmentation neural network according to the initialization detection network, a preset feature map processing layer and a preset mask processing, wherein the preset feature map processing layer processes parameters of the sample example bounding box to obtain sample target parameters, and the preset mask processing layer performs example segmentation on the sample target picture according to the sample target parameters.
6. An example segmentation apparatus based on a neural network, comprising:
the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring a target picture in a video stream;
an output unit, configured to input the target picture into a target instance segmentation neural network, and output a first instance set, where the instance segmentation neural network includes: the method comprises the steps that a detection network, a feature map processing layer and a mask processing layer are used, the detection network is used for obtaining parameters of a boundary frame of an example, the feature map processing layer processes the parameters of the boundary frame to obtain target parameters, and the mask processing layer performs example segmentation on a target picture according to the target parameters;
a first determining unit, configured to determine similar instances of the target instances in the first instance set according to overlapping degrees between the target instances in the first instance set;
a second determining unit, configured to determine an instance, which is greater than a first predetermined threshold, in the similar instance, to obtain at least one instance picture of the target instance in the target picture.
7. The apparatus of claim 6, wherein the apparatus comprises:
the second acquisition unit is used for inputting the target picture into a target example segmentation neural network and acquiring a sample picture set in a video stream before outputting the first example set;
the obtaining unit is used for marking the target object in each picture in the sample picture set to obtain a target data set;
the input unit is used for inputting the labeling data set into a preset example segmentation neural network, wherein the preset neural network comprises a preset detection network, a preset feature map processing layer, a preset mask processing layer and a target loss function, the detection network is used for acquiring parameters of an example boundary box in a preset sample picture, the feature map processing layer processes the parameters of the example boundary box in the preset sample picture to obtain preset target parameters, the mask processing layer performs example segmentation on the sample target picture according to the preset target parameters, and the target loss function comprises a binary cross loss entropy function and an intersection-to-parallel ratio loss function;
a third determining unit, configured to determine to segment the neural network for the instance if the target loss function satisfies a predetermined condition.
8. The apparatus of claim 7, wherein the obtaining unit comprises:
and the obtaining module is used for performing data enhancement on each picture and the labeling result in the sample picture set by adopting an example segmentation standard data enhancement technology to obtain the target data set.
9. The apparatus of claim 7, further comprising:
and the dividing unit is used for marking the target object in each picture in the sample picture set to obtain a target data set, and then dividing the target data set into a training set, a verification set and a test set according to a preset proportion, wherein the training set is used for training the preset example segmentation neural network, the verification set is used for verifying the preset example segmentation neural network, and the test set is used for testing the preset neural network segmentation model.
10. The apparatus of claim 7, further comprising:
the first construction unit is used for constructing an initialized detection network before the labeled data set is input into a preset example segmentation neural network, wherein the detection network comprises a feature extraction backbone network, a feature enhancement network and a detection head, the feature extraction backbone network is used for performing feature extraction on an example of each picture in a sample picture set to obtain a feature map, the feature enhancement network performs feature map enhancement on the feature map and marks the size of the feature map, and the feature maps marked in different sizes are input into the detection head to obtain parameters of a sample example boundary box;
and the second construction unit is used for constructing the preset example segmentation neural network according to the initialization detection network, a preset feature map processing layer and a preset mask processing layer, wherein the preset feature map processing layer processes the parameters of the sample example bounding box to obtain sample target parameters, and the preset mask processing layer performs example segmentation on the sample target picture according to the sample target parameters.
11. A computer-readable storage medium, comprising a stored program, wherein the program when executed performs the method of any of claims 1 to 5.
CN202011166214.8A 2020-10-27 2020-10-27 Example segmentation method and device based on neural network and storage medium Pending CN112348828A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011166214.8A CN112348828A (en) 2020-10-27 2020-10-27 Example segmentation method and device based on neural network and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011166214.8A CN112348828A (en) 2020-10-27 2020-10-27 Example segmentation method and device based on neural network and storage medium

Publications (1)

Publication Number Publication Date
CN112348828A true CN112348828A (en) 2021-02-09

Family

ID=74358675

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011166214.8A Pending CN112348828A (en) 2020-10-27 2020-10-27 Example segmentation method and device based on neural network and storage medium

Country Status (1)

Country Link
CN (1) CN112348828A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112835718A (en) * 2021-02-10 2021-05-25 北京灵汐科技有限公司 Method and device for processing task, many-core system and computer readable medium
CN113222874A (en) * 2021-06-01 2021-08-06 平安科技(深圳)有限公司 Data enhancement method, device and equipment applied to target detection and storage medium
CN113312999A (en) * 2021-05-19 2021-08-27 华南农业大学 High-precision detection method and device for diaphorina citri in natural orchard scene
CN113706475A (en) * 2021-08-06 2021-11-26 福建自贸试验区厦门片区Manteia数据科技有限公司 Confidence coefficient analysis method and device based on image segmentation
CN114419322A (en) * 2022-03-30 2022-04-29 飞狐信息技术(天津)有限公司 Image instance segmentation method and device, electronic equipment and storage medium
WO2022171002A1 (en) * 2021-02-10 2022-08-18 北京灵汐科技有限公司 Task processing method and apparatus, many-core system, and computer-readable medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109584248A (en) * 2018-11-20 2019-04-05 西安电子科技大学 Infrared surface object instance dividing method based on Fusion Features and dense connection network
CN110200598A (en) * 2019-06-12 2019-09-06 天津大学 A kind of large-scale plant that raises sign exception birds detection system and detection method
CN110910334A (en) * 2018-09-15 2020-03-24 北京市商汤科技开发有限公司 Instance segmentation method, image processing device and computer readable storage medium
CN111598942A (en) * 2020-03-12 2020-08-28 中国电力科学研究院有限公司 Method and system for automatically positioning electric power facility instrument

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110910334A (en) * 2018-09-15 2020-03-24 北京市商汤科技开发有限公司 Instance segmentation method, image processing device and computer readable storage medium
CN109584248A (en) * 2018-11-20 2019-04-05 西安电子科技大学 Infrared surface object instance dividing method based on Fusion Features and dense connection network
CN110200598A (en) * 2019-06-12 2019-09-06 天津大学 A kind of large-scale plant that raises sign exception birds detection system and detection method
CN111598942A (en) * 2020-03-12 2020-08-28 中国电力科学研究院有限公司 Method and system for automatically positioning electric power facility instrument

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112835718A (en) * 2021-02-10 2021-05-25 北京灵汐科技有限公司 Method and device for processing task, many-core system and computer readable medium
WO2022171002A1 (en) * 2021-02-10 2022-08-18 北京灵汐科技有限公司 Task processing method and apparatus, many-core system, and computer-readable medium
CN113312999A (en) * 2021-05-19 2021-08-27 华南农业大学 High-precision detection method and device for diaphorina citri in natural orchard scene
CN113312999B (en) * 2021-05-19 2023-07-07 华南农业大学 High-precision detection method and device for diaphorina citri in natural orchard scene
CN113222874A (en) * 2021-06-01 2021-08-06 平安科技(深圳)有限公司 Data enhancement method, device and equipment applied to target detection and storage medium
CN113222874B (en) * 2021-06-01 2024-02-02 平安科技(深圳)有限公司 Data enhancement method, device, equipment and storage medium applied to target detection
CN113706475A (en) * 2021-08-06 2021-11-26 福建自贸试验区厦门片区Manteia数据科技有限公司 Confidence coefficient analysis method and device based on image segmentation
CN113706475B (en) * 2021-08-06 2023-07-21 福建自贸试验区厦门片区Manteia数据科技有限公司 Confidence analysis method and device based on image segmentation
CN114419322A (en) * 2022-03-30 2022-04-29 飞狐信息技术(天津)有限公司 Image instance segmentation method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112348828A (en) Example segmentation method and device based on neural network and storage medium
US10943145B2 (en) Image processing methods and apparatus, and electronic devices
US11200424B2 (en) Space-time memory network for locating target object in video content
CN108122234B (en) Convolutional neural network training and video processing method and device and electronic equipment
Fu et al. Uncertainty inspired underwater image enhancement
CN110910391B (en) Video object segmentation method for dual-module neural network structure
CN111402130B (en) Data processing method and data processing device
CN112132847A (en) Model training method, image segmentation method, device, electronic device and medium
CN114511041B (en) Model training method, image processing method, device, equipment and storage medium
Cheng et al. Learning to refine depth for robust stereo estimation
JP2023131117A (en) Joint perception model training, joint perception method, device, and medium
CN114170290A (en) Image processing method and related equipment
CN111476812A (en) Map segmentation method and device, pose estimation method and equipment terminal
CN112418256A (en) Classification, model training and information searching method, system and equipment
CN108764248B (en) Image feature point extraction method and device
CN112270748B (en) Three-dimensional reconstruction method and device based on image
CN111488887B (en) Image processing method and device based on artificial intelligence
CN114170558A (en) Method, system, device, medium and article for video processing
CN111652181B (en) Target tracking method and device and electronic equipment
CN111914809A (en) Target object positioning method, image processing method, device and computer equipment
Wang et al. A multi-scale attentive recurrent network for image dehazing
CN108460768B (en) Video attention object segmentation method and device for hierarchical time domain segmentation
CN112862840B (en) Image segmentation method, device, equipment and medium
CN108701206B (en) System and method for facial alignment
CN110610185B (en) Method, device and equipment for detecting salient object of image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination