CN112132156B - Image saliency target detection method and system based on multi-depth feature fusion - Google Patents

Image saliency target detection method and system based on multi-depth feature fusion Download PDF

Info

Publication number
CN112132156B
CN112132156B CN202010832414.6A CN202010832414A CN112132156B CN 112132156 B CN112132156 B CN 112132156B CN 202010832414 A CN202010832414 A CN 202010832414A CN 112132156 B CN112132156 B CN 112132156B
Authority
CN
China
Prior art keywords
image
feature
convolution
information
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010832414.6A
Other languages
Chinese (zh)
Other versions
CN112132156A (en
Inventor
陈振学
闫星合
刘成云
孙露娜
段树超
朱凯
陆梦旭
李明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202010832414.6A priority Critical patent/CN112132156B/en
Publication of CN112132156A publication Critical patent/CN112132156A/en
Application granted granted Critical
Publication of CN112132156B publication Critical patent/CN112132156B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The application discloses a method and a system for detecting an image saliency target by fusion of multiple depth features, wherein the method comprises the following steps: acquiring image information to be detected in a set scene; inputting the image information into a trained multi-depth feature fusion neural network model; the multi-depth feature fusion neural network model adopts convolution to perform feature extraction in the encoding stage, and combines the convolution and a bilinear interpolation up-sampling method to restore the information of an input image in the decoding stage, so as to output a feature map with significance information; adopting a multi-level network to learn the feature graphs of different levels, and fusing the feature graphs of different levels; and outputting a final saliency target detection result. The application utilizes the multi-depth feature fusion neural network to carry out the saliency target detection on the images in the scene, ensures the detection precision and accelerates the speed of the subsequent processing process; and adding a contour detection branch, and refining the boundary details of the object to be detected by using the contour characteristics.

Description

Image saliency target detection method and system based on multi-depth feature fusion
Technical Field
The application relates to the technical field of image saliency target detection, in particular to a method and a system for detecting an image saliency target by fusion of multiple depth features.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
Salient object detection refers to separating the person or thing in the image that most draws attention to human vision from the background by using a computer to simulate the human vision attention mechanism. The image is composed of a plurality of pixel points, the brightness, the color and other attributes of the pixel points are different, and the corresponding salient feature values are also different. Unlike conventional object detection and semantic segmentation tasks, salient object detection focuses only on the most visually noticeable part and does not classify it, and the general detection result is at the pixel level, so salient object detection is often used as a pre-step of each image processing method to improve the accuracy of the subsequent processing flow.
At present, saliency target detection is applied to the fields of medical image segmentation, intelligent photography, image retrieval, virtual background, intelligent unmanned systems and the like. The saliency target detection is a basic task in the intelligent unmanned system, and lays a foundation for subsequent target identification and decision making. In recent years, the artificial intelligence industry rapidly develops, people pursue unmanned operation in the fields of intelligent life and industry, and intelligent unmanned systems become hot spots for people to study.
Taking an unmanned system as an example, unmanned is a complex computer task, a visual attention mechanism of a driver needs to be simulated in a changed scene to perform quick and accurate perception, a back-end computer is required to well perceive the surrounding whole environment and different scenes, conventional target detection can only detect specific objects, and a detection result is in an inaccurate block diagram form, accurate and quick response cannot be performed on an unknown sudden picture, so that remarkable target detection is a key technology in unmanned. The vehicle-mounted camera or the laser radar inputs real-time road pictures, a binarized saliency feature map is output through a saliency target detection algorithm, and then a scene with emphasis is segmented, so that pictures with semantic information are obtained, the advance and obstacle avoidance of the automobile are controlled, and the rapid, accurate and calculation resource saving can be realized.
Early significance detection features such as: color, brightness, direction, center-to-surrounding contrast, etc., can only be detected by localized regions, and then methods such as Markov chain method and frequency domain tuning have been developed to bring global features into the detection range in a mathematical angle, but still have difficulty in achieving higher accuracy. While unmanned systems require ultra-high precision and extremely fast response speeds to ensure safety and real-time. Meanwhile, the problems of too small target to be detected, complex background, unclear target contour and the like can be encountered in the unmanned process, and the detection result and the precision of subsequent processing operation are influenced.
Disclosure of Invention
In order to solve the problems, the application provides a multi-depth feature fusion image saliency target detection method and system, which utilize a multi-depth feature fusion neural network to carry out saliency detection on images in an application scene, and improve the speed of processing steps such as subsequent segmentation while guaranteeing the detection precision.
In some embodiments, the following technical scheme is adopted:
a multi-depth feature fused image saliency target detection method comprises the following steps:
acquiring image information to be detected in a set scene;
inputting the image information into a trained multi-depth feature fusion neural network model;
the multi-depth feature fusion neural network model adopts convolution to perform feature extraction in the encoding stage, and combines the convolution and a bilinear interpolation up-sampling method to restore the information of an input image in the decoding stage, so as to output a feature map with significance information;
adopting a multi-level network to learn the feature graphs of different levels, and fusing the feature graphs of different levels;
and outputting a final saliency target detection result.
Further, adding a contour detection branch, extracting the contour feature information of the salient target through a multi-depth feature fusion neural network model, and refining the boundary details of the target to be detected by using the contour features; and then fusing the salient feature information of the image to be detected and the salient target contour feature information.
In other embodiments, the following technical solutions are adopted:
an image saliency target detection system for multi-depth feature fusion, comprising:
the device is used for acquiring the image information to be detected in the set scene;
means for inputting the image information into a trained multi-depth feature fusion neural network model;
the device is used for extracting the characteristics of the multi-depth characteristic fusion neural network model by adopting convolution in the encoding stage, restoring the information of the input image by combining the up-sampling method of convolution and bilinear interpolation in the decoding stage, and outputting a characteristic map with significance information;
means for learning feature maps of different levels using a multi-level network, the feature maps of different levels being fused;
means for outputting a final salient object detection result.
In other embodiments, the following technical solutions are adopted:
a terminal device comprising a processor and a computer-readable storage medium, the processor configured to implement instructions; the computer readable storage medium is for storing a plurality of instructions adapted to be loaded by a processor and to perform the image saliency object detection method of multi-depth feature fusion described above.
In other embodiments, the following technical solutions are adopted:
a computer readable storage medium having stored therein a plurality of instructions adapted to be loaded by a processor of a terminal device and to perform the above-described image saliency object detection method of multi-depth feature fusion.
Compared with the prior art, the application has the beneficial effects that:
the application utilizes the multi-depth feature fusion neural network to carry out the saliency target detection on the images in the scene, ensures the detection precision and accelerates the speed of the subsequent processing process;
the encoder-decoder structure in the multi-depth feature fusion neural network can meet the problem of significance detection precision, and the multi-level, multi-task and multi-channel feature map fusion fully utilizes shallow and deep information;
the application adds the contour feature detection, can optimize the detail information of the salient target edge, obtain the detection result with higher accuracy and more definite contour, and is definitely helpful for other processing tasks such as subsequent scene segmentation.
The saliency target detection algorithm provided by the application can effectively provide help for intelligent unmanned systems, such as unmanned systems and the like, simultaneously meet the requirements of accuracy and real-time performance, and can overcome the problems of too small target to be detected, complex background, unclear target outline, large occupied memory for calculation and long training time.
Drawings
FIG. 1 is a flow chart of a salient object detection method in an embodiment of the present application;
FIG. 2 is a schematic diagram of an image preprocessing method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a multi-depth feature fusion neural network framework in an embodiment of the application;
fig. 4 is a schematic diagram of a re-weighting module for network important components in an embodiment of the present application.
Detailed Description
It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
Example 1
The saliency target detection method can be applied to the fields of medical image segmentation, intelligent photography, image retrieval, virtual background, intelligent unmanned systems and the like.
In this embodiment, the multi-depth feature fusion neural network refers to a saliency detection neural network fused with multi-level, multi-task, multi-channel depth features.
In this embodiment, the method of the present application is described in detail by taking an unmanned scene as an example:
a method for detecting an image saliency target by multi-depth feature fusion, referring to fig. 1, includes:
acquiring image information to be detected in a set scene;
inputting the image information into a trained multi-depth feature fusion neural network model;
the multi-depth feature fusion neural network model adopts convolution to perform feature extraction in the encoding stage, and combines the convolution and a bilinear interpolation up-sampling method to restore the information of an input image in the decoding stage, so as to output a feature map with significance information; adopting a multi-level network to learn the feature graphs of different levels, and fusing the feature graphs of different levels;
and outputting a final saliency target detection result.
Adding a contour detection branch to assist a significance detection task, and refining boundary details of a target to be detected by using contour features;
introducing a self-adaptive channel re-weighting branch to re-calibrate the characteristic channel weight of the convolution layer;
the neural network is optimized by a cross entropy loss function.
Specifically, S1: and collecting the image of the field driving, carrying out binarization significance labeling on the image, and determining the label so as to form a training set and a testing set.
The specific process of step S1 is as follows:
s1.1: the images can be obtained by video shooting and separation, the video is extracted every 10 frames, the images are obtained, and the images are input into a neural network.
S1.2: and labeling each pixel point, wherein one class corresponds to one number, and the number of the class is 2 to distinguish the foreground from the background, so that a gray level image is obtained and is used as a true value of an output image.
S2: referring to fig. 2, on the basis of the training set, the input and the labeling images are randomly scaled, cut, boundary filled and turned, so that the training set is expanded, and the precision is improved more along with the expansion of the training set.
The method has the advantages that the number of the pixels in each image is large, each pixel is marked, time and labor are wasted, omission or mismarking exists, but a large number of images are helpful for improving the precision, so that the method can preprocess the images, and a better effect can be achieved by using fewer images.
The specific process of step S2 is as follows:
s2.1: the input and labeled truth images are randomly scaled down or up in each training.
S2.2: if the image is bigger than the original image, the image is cut from random points, if the image is smaller than the original image, the boundary is filled, and finally the image is turned over randomly horizontally or vertically.
S2.3: the images of each training are different, and the training set is expanded.
S3: and establishing a background model by calculating the mean value and the variance of each pixel point in the image, normalizing the pixel points and extracting the characteristics.
The specific process of step S3 is as follows:
s3.1: and calculating the average value and variance of all the image pixel points to obtain a background model.
S3.2: the average value is subtracted from the image and divided by the variance to obtain data meeting normal distribution, the average brightness value of the image is removed, and the calculation accuracy of the network can be improved through data normalization.
S4: the training set of the preprocessed scene images is input into a convolution network shown in fig. 3 for training, and in the training process, image features in different aspects are learned by using a multi-level, multi-task and multi-channel structure, and a plurality of feature images are fused, so that the speed is kept, and meanwhile, the precision is improved.
Wherein, multitasking refers to a detection mode taking a saliency target detection task as a main part and a saliency contour detection task as an auxiliary part; the feature graphs formed by different convolution layers in the network structure are fused and utilized in a multi-level manner, so that the aim of combining multi-scale features is fulfilled; the channels are re-weighted according to the contribution degree of the characteristic channels to the significance stimulus, so that the characteristic channels with large contribution are given larger weight in the characteristic calculation. Channels refer to the number of channels in an image, such as red, green, and blue channels in an RGB image, but each channel contributes differently to the significance stimulus. The application introduces a self-adaptive channel re-weighting branch to re-calibrate the characteristic channel weight of the convolution layer; the contribution of the characteristic channel is re-given with weight, the weight is optimized continuously, and the accuracy of target detection is improved.
The specific process of step S4 is:
s4.1: the encoding section samples the image to 1/2 of the original image 2048×1024 by a convolution layer with a step size of 2 and a convolution kernel of 3×3, thereby reducing the burden of computation. Two convolution filters with a step size of 1 and a kernel of 3 x 3 do not change the image size, but can capture shallow features. The size of the feature map obtained after these 3 convolution operations is 1024×512×32 pixels.
S4.2: four convolutions are performed on the input image by using the re-weighted convolution structure in fig. 4, w, h and c in fig. 4 represent the width, the height and the channel number of the feature map respectively, so as to obtain four groups of saliency feature maps with different scales.
S4.3: the high-order feature maps are fused according to the re-weighted fusion unit in fig. 4, and the up-sampling operation in 4.4 is performed in the fusion process.
S4.4: the image is up-sampled twice by bilinear interpolation. The feature size obtained was 512×256×the number of categories. The pixels of the four pixel points (i, j), (i, j+1), (i+1, j), (i+1, j+1) are known, and the pixels of the (i+u, j+v) point are obtained by a bilinear difference method:
f(i+u,j+v)=(1-u)*(1-v)*f(i,j)+(1-u)*v*f(i,j+1)+u*(1-v)*f(i+1,j)+u*v*f(i+1,j+1)
s4.5: and merging the upsampled feature map with the shallow features extracted from the encoder to form a multi-level feature map. The number of channels increases after fusion, so that the step length is 1 again, the convolution kernel is convolution of 1×1, and the number of channels is maintained as the category number. The feature size obtained is still 512×256×class number.
S4.6: finally, up-sampling a plurality of groups of images with different scales by corresponding times through bilinear interpolation, so that the obtained predicted output image has the same size as the original image; the feature size is 2048×1024×class number.
S4.7: each convolution network in the multi-depth feature fusion neural network optimizes the network through cross entropy loss, and the formula of the cross entropy function is as follows:
loss(x,class)=weight[class]*(-x[class]+log(∑ j exp(x[j])))
wherein x represents the predicted output of a pixel, class represents the true class of the pixel, namely foreground or background, weight represents the weighting coefficient for weighting each class, x class represents the predicted output of a pixel with the true label being class, and x j represents the predicted output of a pixel with the true label being j.
In addition, a second saliency target outline auxiliary detection branch is introduced, the re-weighting module is the same as that in fig. 4, and the branch only pays attention to low-level features to form two groups of multi-scale feature graphs;
and carrying out feature extraction and fusion by adopting the same method as the above to obtain the fused contour feature information.
And fusing the fused target contour feature information and the target significance feature information to obtain a final fusion prediction graph.
And finally, carrying out data enhancement processing which is the same as the training set but does not carry out random scaling, cutting, boundary filling and overturning on each test set, and calculating the detection precision by utilizing quantitative indexes such as a P-R curve, an ROC curve, an MAE and the like.
The embodiment solves the problem of detecting the salient targets in the intelligent unmanned system by utilizing the multi-depth feature fusion neural network. Extracting images from actual application scenes (such as road scenes), randomly scaling, cutting, filling boundaries, turning over the images, and expanding a training set; normalizing pixel points in the image to enable pixel values to be between 0 and 1, and eliminating the influence of other transformation functions on image transformation; and fusing the characteristic graphs of different levels through an encoding-decoding structure, performing characteristic extraction by convolution in an encoding stage, and restoring the information of the input image by combining convolution and bilinear interpolation in a decoding stage to obtain the image with the significant characteristic information. The encoder-decoder structure can meet the detection precision problem, multi-level, multi-task and multi-channel combined feature map fusion fully excavates multi-aspect depth information, the accuracy is further improved, and the 1 multiplied by 1,3 multiplied by 3 small convolution kernels used in feature extraction improve the network running speed.
The saliency target detection algorithm provided by the embodiment can effectively provide help for unmanned operation, meets the requirements of accuracy and real-time performance, and can solve the problems of too small target to be detected, complex background, unclear target outline, large occupied memory for calculation and long training time.
Example two
In one or more embodiments, an image saliency target detection system for multi-depth feature fusion is disclosed, comprising:
the device is used for acquiring the image information to be detected in the set scene;
means for inputting the image information into a trained multi-depth feature fusion neural network model;
the device is used for extracting the characteristics of the multi-depth characteristic fusion neural network model by adopting convolution in the encoding stage, restoring the information of the input image by combining the up-sampling method of convolution and bilinear interpolation in the decoding stage, and outputting a characteristic map with significance information;
means for learning feature maps of different levels using a multi-level network, the feature maps of different levels being fused;
means for outputting a final salient object detection result.
It should be noted that, the specific working mode of the above device is implemented by using the method disclosed in the first embodiment, which is not described herein.
Example III
In one or more embodiments, a terminal device is disclosed that includes a server including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the image saliency target detection method of multi-depth feature fusion of embodiment one when executing the program. For brevity, the description is omitted here.
It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate array FPGA or other programmable logic device, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include read only memory and random access memory and provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store information of the device type.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software.
The method for detecting the image saliency target by fusion of multiple depth features in the first embodiment can be directly embodied as execution completion of a hardware processor or execution completion of the execution by a combination of hardware and software modules in the processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method. To avoid repetition, a detailed description is not provided herein.
While the foregoing description of the embodiments of the present application has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the application, but rather, it is intended to cover all modifications or variations within the scope of the application as defined by the claims of the present application.

Claims (7)

1. The image saliency target detection method for multi-depth feature fusion is characterized by comprising the following steps of:
acquiring image information to be detected in a set scene;
inputting the image information into a trained multi-depth feature fusion neural network model;
the multi-depth feature fusion neural network model adopts convolution to perform feature extraction in the encoding stage, and combines the convolution and a bilinear interpolation up-sampling method to restore the information of an input image in the decoding stage, so as to output a feature map with significance information;
adopting a multi-level network to learn the feature graphs of different levels, and fusing the feature graphs of different levels;
outputting a final salient target detection result;
the multi-depth feature fusion neural network model adopts convolution to extract features in the encoding stage, and specifically comprises the following steps:
sampling the image to 1/2 of the original image by a convolution layer;
capturing shallow detail features of the image through contour branches of the two convolution layers;
the up-sampling method combining convolution and bilinear interpolation in the decoding stage restores the information of the input image and outputs a feature map with significance information, which is specifically as follows:
performing four convolutions on the input image by using a re-weighted convolution structure to obtain four groups of saliency feature images with different scales;
upsampling the image to twice by bilinear interpolation;
the feature map after up-sampling is fused with the corresponding scale feature map obtained in the encoding stage to form a multi-level feature map;
performing convolution operation on the multi-level feature map, and maintaining the number of channels as the number of categories;
up-sampling a plurality of groups of images with different scales by corresponding multiples through bilinear interpolation, so that the obtained predicted output image has the same size as the original image;
the re-weighted convolution structure specifically comprises: a branch of channel weight storage is introduced into the basic convolution unit, and the weight of each channel for significance contribution is adjusted through training.
2. The method for detecting the image saliency target by multi-depth feature fusion as claimed in claim 1, wherein a contour detection branch is added, the saliency target contour feature information is extracted through a multi-depth feature fusion neural network model, and the boundary details of the target to be detected are refined by the contour features; and then fusing the salient feature information of the image to be detected and the salient target contour feature information.
3. The method for detecting an image saliency target of multi-depth feature fusion as claimed in claim 1, wherein the training process for the multi-depth feature fusion neural network model comprises:
different image information under a set scene is obtained, binarization significance labeling is carried out on pixel points in each image, and a label is determined, so that a training set and a testing set are formed;
randomly scaling, cutting, filling boundaries and turning over the training set image to expand the data set;
establishing a background model by calculating the mean value and variance of each pixel point in the image, normalizing the pixel points, and extracting features;
inputting the extracted features into a multi-depth feature fusion neural network model for training;
and verifying the trained neural network model by using the test set image.
4. The method for detecting the image saliency target by multi-depth feature fusion as claimed in claim 3, wherein a background model is built by calculating the mean value and the variance of each pixel point in the image, the pixel points are normalized, and the feature extraction is carried out, and the specific process comprises the following steps:
calculating the average value and variance of all the image pixel points to obtain a background model;
the pixel value of each pixel point of the image is subtracted by the average value and divided by the variance to obtain data meeting normal distribution.
5. An image saliency target detection system for multi-depth feature fusion, comprising:
the device is used for acquiring the image information to be detected in the set scene;
means for inputting the image information into a trained multi-depth feature fusion neural network model;
the device is used for extracting the characteristics of the multi-depth characteristic fusion neural network model by adopting convolution in the encoding stage, restoring the information of the input image by combining the up-sampling method of convolution and bilinear interpolation in the decoding stage, and outputting a characteristic map with significance information;
means for learning feature maps of different levels using a multi-level network, the feature maps of different levels being fused;
means for outputting a final salient object detection result;
the multi-depth feature fusion neural network model adopts convolution to extract features in the encoding stage, and specifically comprises the following steps:
sampling the image to 1/2 of the original image by a convolution layer;
capturing shallow detail features of the image through contour branches of the two convolution layers;
the up-sampling method combining convolution and bilinear interpolation in the decoding stage restores the information of the input image and outputs a feature map with significance information, which is specifically as follows:
performing four convolutions on the input image by using a re-weighted convolution structure to obtain four groups of saliency feature images with different scales;
upsampling the image to twice by bilinear interpolation;
the feature map after up-sampling is fused with the corresponding scale feature map obtained in the encoding stage to form a multi-level feature map;
performing convolution operation on the multi-level feature map, and maintaining the number of channels as the number of categories;
up-sampling a plurality of groups of images with different scales by corresponding multiples through bilinear interpolation, so that the obtained predicted output image has the same size as the original image;
the re-weighted convolution structure specifically comprises: a branch of channel weight storage is introduced into the basic convolution unit, and the weight of each channel for significance contribution is adjusted through training.
6. A terminal device comprising a processor and a computer-readable storage medium, the processor configured to implement instructions; a computer readable storage medium for storing a plurality of instructions adapted to be loaded by a processor and to perform the multi-depth feature fused image salient object detection method of any one of claims 1-4.
7. A computer readable storage medium having stored therein a plurality of instructions adapted to be loaded by a processor of a terminal device and to perform the multi-depth feature fused image salient object detection method of any one of claims 1-4.
CN202010832414.6A 2020-08-18 2020-08-18 Image saliency target detection method and system based on multi-depth feature fusion Active CN112132156B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010832414.6A CN112132156B (en) 2020-08-18 2020-08-18 Image saliency target detection method and system based on multi-depth feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010832414.6A CN112132156B (en) 2020-08-18 2020-08-18 Image saliency target detection method and system based on multi-depth feature fusion

Publications (2)

Publication Number Publication Date
CN112132156A CN112132156A (en) 2020-12-25
CN112132156B true CN112132156B (en) 2023-08-22

Family

ID=73850349

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010832414.6A Active CN112132156B (en) 2020-08-18 2020-08-18 Image saliency target detection method and system based on multi-depth feature fusion

Country Status (1)

Country Link
CN (1) CN112132156B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112837360B (en) * 2021-01-07 2023-08-11 北京百度网讯科技有限公司 Depth information processing method, apparatus, device, storage medium, and program product
CN114414578A (en) * 2021-01-18 2022-04-29 无锡金元启信息技术科技有限公司 Industrial hole wall defect detection system and identification algorithm based on AI
CN112766285B (en) * 2021-01-26 2024-03-19 北京有竹居网络技术有限公司 Image sample generation method and device and electronic equipment
CN113052188A (en) * 2021-03-26 2021-06-29 大连理工大学人工智能大连研究院 Method, system, equipment and storage medium for detecting remote sensing image target
CN112967322B (en) * 2021-04-07 2023-04-18 深圳创维-Rgb电子有限公司 Moving object detection model establishing method and moving object detection method
CN113313129B (en) * 2021-06-22 2024-04-05 中国平安财产保险股份有限公司 Training method, device, equipment and storage medium for disaster damage recognition model
CN113538379B (en) * 2021-07-16 2022-11-22 河南科技学院 Double-stream coding fusion significance detection method based on RGB and gray level images
CN113641845B (en) * 2021-07-16 2022-09-23 广西师范大学 Depth feature contrast weighted image retrieval method based on vector contrast strategy
CN113515660B (en) * 2021-07-16 2022-03-18 广西师范大学 Depth feature contrast weighted image retrieval method based on three-dimensional tensor contrast strategy
CN113567436A (en) * 2021-07-22 2021-10-29 上海交通大学 Saliency target detection device and method based on deep convolutional neural network
CN114332633B (en) * 2022-03-01 2022-06-10 北京化工大学 Radar image target detection and identification method and equipment and storage medium
CN115527027A (en) * 2022-03-04 2022-12-27 西南民族大学 Remote sensing image ground object segmentation method based on multi-feature fusion mechanism
CN115331082B (en) * 2022-10-13 2023-02-03 天津大学 Path generation method of tracking sound source, training method of model and electronic equipment
CN115908982A (en) * 2022-12-01 2023-04-04 北京百度网讯科技有限公司 Image processing method, model training method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109816100A (en) * 2019-01-30 2019-05-28 中科人工智能创新技术研究院(青岛)有限公司 A kind of conspicuousness object detecting method and device based on two-way fusion network
CN110598610A (en) * 2019-09-02 2019-12-20 北京航空航天大学 Target significance detection method based on neural selection attention
CN110909594A (en) * 2019-10-12 2020-03-24 杭州电子科技大学 Video significance detection method based on depth fusion
CN111428805A (en) * 2020-04-01 2020-07-17 南开大学 Method and device for detecting salient object, storage medium and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109816100A (en) * 2019-01-30 2019-05-28 中科人工智能创新技术研究院(青岛)有限公司 A kind of conspicuousness object detecting method and device based on two-way fusion network
CN110598610A (en) * 2019-09-02 2019-12-20 北京航空航天大学 Target significance detection method based on neural selection attention
CN110909594A (en) * 2019-10-12 2020-03-24 杭州电子科技大学 Video significance detection method based on depth fusion
CN111428805A (en) * 2020-04-01 2020-07-17 南开大学 Method and device for detecting salient object, storage medium and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MDFN: Multi-scale deep feature learning network for object detection;Wenchi Ma 等;ELSEVIER;全文 *

Also Published As

Publication number Publication date
CN112132156A (en) 2020-12-25

Similar Documents

Publication Publication Date Title
CN112132156B (en) Image saliency target detection method and system based on multi-depth feature fusion
CN108647585B (en) Traffic identifier detection method based on multi-scale circulation attention network
CN107274445B (en) Image depth estimation method and system
CN111666921B (en) Vehicle control method, apparatus, computer device, and computer-readable storage medium
US10614574B2 (en) Generating image segmentation data using a multi-branch neural network
CN109753913B (en) Multi-mode video semantic segmentation method with high calculation efficiency
CN112528878A (en) Method and device for detecting lane line, terminal device and readable storage medium
CN111696110B (en) Scene segmentation method and system
CN110163188B (en) Video processing and method, device and equipment for embedding target object in video
CN111915627A (en) Semantic segmentation method, network, device and computer storage medium
CN113284054A (en) Image enhancement method and image enhancement device
CN111369581A (en) Image processing method, device, equipment and storage medium
CN110781744A (en) Small-scale pedestrian detection method based on multi-level feature fusion
CN114359851A (en) Unmanned target detection method, device, equipment and medium
CN113591872A (en) Data processing system, object detection method and device
CN112651979A (en) Lung X-ray image segmentation method, system, computer equipment and storage medium
CN109871792B (en) Pedestrian detection method and device
CN112581379A (en) Image enhancement method and device
CN110807384A (en) Small target detection method and system under low visibility
CN110852327A (en) Image processing method, image processing device, electronic equipment and storage medium
CN113065551A (en) Method for performing image segmentation using a deep neural network model
CN111325766A (en) Three-dimensional edge detection method and device, storage medium and computer equipment
CN112541394A (en) Black eye and rhinitis identification method, system and computer medium
CN113706562A (en) Image segmentation method, device and system and cell segmentation method
CN113673562A (en) Feature enhancement method, target segmentation method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant