CN112132156A - Multi-depth feature fusion image saliency target detection method and system - Google Patents

Multi-depth feature fusion image saliency target detection method and system Download PDF

Info

Publication number
CN112132156A
CN112132156A CN202010832414.6A CN202010832414A CN112132156A CN 112132156 A CN112132156 A CN 112132156A CN 202010832414 A CN202010832414 A CN 202010832414A CN 112132156 A CN112132156 A CN 112132156A
Authority
CN
China
Prior art keywords
image
information
feature
convolution
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010832414.6A
Other languages
Chinese (zh)
Other versions
CN112132156B (en
Inventor
陈振学
闫星合
刘成云
孙露娜
段树超
朱凯
陆梦旭
李明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202010832414.6A priority Critical patent/CN112132156B/en
Publication of CN112132156A publication Critical patent/CN112132156A/en
Application granted granted Critical
Publication of CN112132156B publication Critical patent/CN112132156B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method and a system for detecting an image saliency target by multi-depth feature fusion, wherein the method comprises the following steps: acquiring to-be-detected image information under a set scene; inputting the image information into a trained multi-depth feature fusion neural network model; the multi-depth feature fusion neural network model adopts convolution to extract features in an encoding stage, restores information of an input image in a decoding stage by combining an up-sampling method of convolution and bilinear interpolation, and outputs a feature map with significance information; learning feature maps of different levels by adopting a multi-level network, and fusing the feature maps of the different levels; and outputting a final significant target detection result. According to the method, the multi-depth feature fusion neural network is utilized to perform significance target detection on the image in the scene, so that the detection precision is guaranteed, and the speed of the subsequent processing process is increased; and adding a contour detection branch, and refining the boundary details of the target to be detected by using contour characteristics.

Description

Multi-depth feature fusion image saliency target detection method and system
Technical Field
The invention relates to the technical field of image saliency target detection, in particular to a method and a system for detecting an image saliency target by multi-depth feature fusion.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
The salient object detection means that a computer is used for simulating a human visual attention mechanism to separate people or things which can attract human visual attention most from the background. The image is composed of a plurality of pixel points, the attributes of the pixel points such as brightness, color and the like are different, and the corresponding significant characteristic values are also different. Different from the traditional target detection and semantic segmentation tasks, the salient target detection only focuses on the part which can draw the most visual attention without classifying the part, and the detection result is in a pixel level, so the salient target detection is often used as a pre-step of each image processing method to improve the accuracy of the subsequent processing flow.
At present, salient object detection is applied to the fields of medical image segmentation, intelligent photography, image retrieval, virtual backgrounds, intelligent unmanned systems and the like. The significance target detection is a basic task in an intelligent unmanned system, and lays a foundation for subsequent target identification and decision making. In recent years, the artificial intelligence industry is rapidly developed, people pursue unmanned operation in the fields of intelligent life and industry, and an intelligent unmanned system becomes a hot point of research of people.
Taking an unmanned system as an example, unmanned driving is a complex computer task, a visual attention mechanism of a driver needs to be simulated in a changing scene for quick and accurate perception, a back-end computer is required to well perceive the whole surrounding environment and different scenes, conventional target detection can only detect a specific object, and a detection result is in an inaccurate block diagram form and cannot accurately and quickly respond to unknown sudden pictures, so that significant target detection is a key technology in unmanned driving. The vehicle-mounted camera or the laser radar inputs a real-time road picture, outputs a binary significance characteristic map through a significance target detection algorithm, and performs scene segmentation with emphasis to obtain a picture with semantic information, so that the advancing and obstacle avoidance of the automobile are controlled, and the method is fast, accurate and computing resources are saved.
Early significance detection features such as: color, brightness, direction, center-to-periphery contrast, etc. can only be detected locally, and then methods such as Markov chain method and frequency domain tuning are developed to bring global features into the detection range in a mathematical angle, but still higher accuracy is difficult to achieve. The unmanned system needs ultra-high precision and extremely fast response speed to ensure safety and real-time performance. Meanwhile, problems of undersize target to be detected, complex background, unclear target outline and the like can be encountered in the unmanned driving process, and influences are brought to the detection result and the precision of subsequent processing operation.
Disclosure of Invention
In order to solve the problems, the invention provides a method and a system for detecting an image saliency target by multi-depth feature fusion.
In some embodiments, the following technical scheme is adopted:
a method for detecting a multi-depth feature fused image saliency target comprises the following steps:
acquiring to-be-detected image information under a set scene;
inputting the image information into a trained multi-depth feature fusion neural network model;
the multi-depth feature fusion neural network model adopts convolution to extract features in an encoding stage, restores information of an input image in a decoding stage by combining an up-sampling method of convolution and bilinear interpolation, and outputs a feature map with significance information;
learning feature maps of different levels by adopting a multi-level network, and fusing the feature maps of the different levels;
and outputting a final significant target detection result.
Further, adding a contour detection branch, extracting the contour characteristic information of the salient target through a multi-depth characteristic fusion neural network model, and refining the boundary details of the target to be detected by using the contour characteristics; and then, fusing the salient feature information of the image to be detected and the salient target contour feature information.
In other embodiments, the following technical solutions are adopted:
a multi-depth feature fused image saliency target detection system comprising:
the device is used for acquiring the information of the image to be detected in a set scene;
means for inputting the image information to a trained multi-depth feature fusion neural network model;
the device is used for extracting the features of the multi-depth feature fusion neural network model by convolution in the encoding stage, restoring the information of the input image by combining an up-sampling method of convolution and bilinear interpolation in the decoding stage and outputting a feature map with significance information;
the device is used for learning the feature maps of different levels by adopting a multi-level network and fusing the feature maps of different levels;
and a means for outputting a final saliency target detection result.
In other embodiments, the following technical solutions are adopted:
a terminal device comprising a processor and a computer-readable storage medium, the processor being configured to implement instructions; the computer readable storage medium is used for storing a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing the image saliency target detection method of the multi-depth feature fusion.
In other embodiments, the following technical solutions are adopted:
a computer-readable storage medium, wherein a plurality of instructions are stored, and the instructions are suitable for being loaded by a processor of a terminal device and executing the multi-depth feature fusion image saliency target detection method.
Compared with the prior art, the invention has the beneficial effects that:
according to the method, the multi-depth feature fusion neural network is utilized to perform significance target detection on the image in the scene, so that the detection precision is guaranteed, and the speed of the subsequent processing process is increased;
the encoder-decoder structure in the multi-depth feature fusion neural network can meet the problem of significance detection precision, and the multi-level, multi-task and multi-channel feature map fusion fully utilizes shallow information and deep information;
the method adds contour feature detection, can optimize detail information of the edge of the salient object, obtains a detection result with higher accuracy and more definite contour, and is undoubtedly helpful for other processing tasks such as subsequent scene segmentation and the like.
The saliency target detection algorithm provided by the invention can effectively provide help for an intelligent unmanned system, such as an unmanned system, and the like, meets the requirements on accuracy and real-time performance, and can solve the problems of too small target to be detected, complex background, unclear target outline, large occupied memory for calculation and long training time.
Drawings
FIG. 1 is a flow chart of a salient object detection method in an embodiment of the present invention;
FIG. 2 is a schematic diagram of an image preprocessing method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a multi-depth feature fusion neural network framework in an embodiment of the present invention;
fig. 4 is a schematic diagram of a network important component re-weighting module in an embodiment of the present invention.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
Example one
The method for detecting the salient object can be applied to the fields of medical image segmentation, intelligent photography, image retrieval, virtual backgrounds, intelligent unmanned systems and the like.
In this embodiment, the multi-depth feature fusion neural network refers to a significance detection neural network that fuses multi-level, multi-task, and multi-channel depth features.
In this embodiment, the method of the present invention is described in detail by taking an unmanned driving scene as an example:
a method for detecting a salient object of an image by multi-depth feature fusion refers to FIG. 1, and includes:
acquiring to-be-detected image information under a set scene;
inputting the image information into a trained multi-depth feature fusion neural network model;
the multi-depth feature fusion neural network model adopts convolution to extract features in an encoding stage, restores information of an input image in a decoding stage by combining an up-sampling method of convolution and bilinear interpolation, and outputs a feature map with significance information; learning feature maps of different levels by adopting a multi-level network, and fusing the feature maps of the different levels;
and outputting a final significant target detection result.
Adding a contour detection branch to assist a saliency detection task, and refining the boundary details of the target to be detected by using contour features;
introducing a self-adaptive channel re-weighting branch circuit, and re-calibrating the convolutional layer characteristic channel weight;
the neural network is optimized by a cross entropy loss function.
Specifically, S1: and collecting the image of the driving on the spot, carrying out binaryzation significance labeling on the image, determining a label, and further forming a training set and a test set.
The specific process of step S1 is:
s1.1: the images can be obtained by shooting and separating videos, extracting every 10 frames of the videos to obtain the images, and inputting the images into the neural network.
S1.2: and labeling each pixel point, wherein one category corresponds to one number, and the number of the categories is 2 so as to distinguish the foreground from the background to obtain a gray level image which is used as a true value of an output image.
S2: referring to fig. 2, on the basis of the training set, the input and labeled images are randomly scaled, cropped, boundary filled and flipped, so that the training set is expanded, and the precision is improved more with the expansion of the training set.
The method has the advantages that the number of the pixel points in each image is large, each pixel point is labeled, time and labor are wasted, omission or wrong labeling is caused, but a large number of images are helpful for improving the precision, so that the method can be used for preprocessing the images, and a good effect can be achieved by using fewer images.
The specific process of step S2 is:
s2.1: in each training, the input and truth labeled images are randomly reduced or enlarged.
S2.2: if the image is larger than the original image, cutting is started from random points, if the image is smaller than the original image, the boundary is filled, and finally the random horizontal or vertical turning is carried out.
S2.3: the images of each training are different, and the training set is expanded.
S3: and establishing a background model by calculating the mean value and the variance of each pixel point in the image, normalizing the pixel points and extracting the characteristics.
The specific process of step S3 is:
s3.1: and calculating the average value and variance of all image pixel points to obtain a background model.
S3.2: the average value is subtracted from the image and divided by the square difference to obtain data meeting normal distribution, the average brightness value of the image is removed, and the calculation accuracy of the network can be improved through data normalization.
S4: inputting the training set of the preprocessed scene images into a convolution network shown in fig. 3 for training, learning image features in different aspects by using a multi-level, multi-task and multi-channel structure in the training process, and fusing a plurality of feature maps to improve the precision while maintaining the speed.
The multi-task is a detection mode which takes a saliency target detection task as a main mode and is assisted by a saliency contour detection task; the multi-level combines the feature maps formed by different convolution layers in the network structure to achieve the purpose of combining the multi-scale features; the multi-channel re-weights the channels according to the contribution degree of the characteristic channel to the significant stimulation, so that the characteristic channel with large contribution is endowed with larger weight in the characteristic calculation. The channel refers to the number of channels in an image, for example, three color channels of red, green and blue exist in an RGB image, but each channel has different contribution to the significance stimulus. The invention introduces a self-adaptive channel re-weighting branch circuit to re-calibrate the convolutional layer characteristic channel weight; namely, the weight is endowed to the contribution of the characteristic channel again, the weight is optimized continuously, and the accuracy of target detection is improved.
The specific process of step S4 is:
s4.1: the encoding section samples the image to 1/2 of 2048 × 1024 of the original image by the convolutional layer having a step size of 2 and a convolution kernel of 3 × 3, thereby reducing the burden of calculation. The two step sizes are 1 and the convolution filter with a kernel of 3 x 3 does not change the image size but can capture shallow features. The size of the feature map obtained after these 3 convolution operations is 1024 × 512 × 32 pixels.
S4.2: and performing four times of convolution on the input image by using the re-weighted convolution structure in fig. 4, wherein w, h and c in fig. 4 respectively represent the width, height and channel number of the feature map, and four groups of significant feature maps with different scales are obtained.
S4.3: and fusing the high-order characteristic graphs according to the re-weighting fusion unit in fig. 4, wherein the up-sampling operation in 4.4 is performed in the fusion process.
S4.4: the image is up-sampled by a bilinear interpolation method to double. The resulting feature size was 512 × 256 × the number of classes. Knowing the pixels of the four pixel points (i, j), (i, j +1), (i +1, j), (i +1, j +1), the pixel of the point (i + u, j + v) is obtained by a bilinear difference method as follows:
f(i+u,j+v)=(1-u)*(1-v)*f(i,j)+(1-u)*v*f(i,j+1)+u*(1-v)*f(i+1,j)+u*v*f(i+1,j+1)
s4.5: and fusing the up-sampled feature map with the shallow feature extracted from the encoder to form a multi-level feature map. The number of channels after fusion is increased, so that the convolution with the step length of 1 and the convolution kernel of 1 multiplied by 1 is used again, and the number of channels is maintained as the number of categories. The resulting feature size is still 512 x 256 x the number of classes.
S4.6: finally, sampling corresponding multiples on a plurality of groups of images with different scales through bilinear interpolation to enable the size of the obtained prediction output image to be the same as that of the original image; the feature size is 2048 × 1024 × the number of categories.
S4.7: each convolution network in the multi-depth feature fusion neural network optimizes the network through cross entropy loss, and the formula of a cross entropy function is as follows:
loss(x,class)=weight[class]*(-x[class]+log(∑jexp(x[j])))
wherein, x represents the prediction output of a certain pixel, class represents the real category of the pixel, namely foreground or background, weight [ class ] represents the weighting coefficient for weighting each category, x [ class ] represents the prediction output of the pixel with the real label of class, and x [ j ] represents the prediction output of the pixel with the real label of j.
In addition, a second significant target contour auxiliary detection branch is introduced, the re-weighting module of the branch is the same as that in fig. 4, and the branch only focuses on low-level features to form two groups of multi-scale feature maps;
and extracting and fusing the features by the same method as the above method to obtain the fused contour feature information.
And fusing the fused target contour characteristic information and the target significance characteristic information to obtain a final fusion prediction image.
And finally, performing data enhancement processing on each test set, which is the same as the training set but is not subjected to random scaling, clipping, boundary filling and overturning, and calculating the detection precision by using quantitative indexes such as a P-R curve, an ROC curve, an MAE and the like.
The embodiment solves the problem of significant target detection in the intelligent unmanned system by using the multi-depth feature fusion neural network. Extracting images from practical application scenes (such as road scenes), randomly zooming, cutting, filling boundaries and turning the images, and expanding a training set; normalizing pixel points in the image to enable pixel values to be between 0 and 1, and eliminating the influence of other transformation functions on image transformation; and in the decoding stage, restoring the information of the input image by combining convolution and bilinear interpolation to obtain an image with output significant characteristic information. The encoder-decoder structure can meet the detection precision problem, multi-level, multi-task and multi-channel combined feature map fusion fully excavates multi-aspect depth information, further improves accuracy, and the 1 × 1 and 3 × 3 small convolution kernels used in feature extraction improve network operation speed.
The saliency target detection algorithm provided by the embodiment can effectively provide help for unmanned driving, meets the requirements of accuracy and real-time performance, and can solve the problems of too small target to be detected, complex background, unclear target outline, large occupied memory for calculation and long training time.
Example two
In one or more embodiments, disclosed is a multi-depth feature fused image saliency target detection system, comprising:
the device is used for acquiring the information of the image to be detected in a set scene;
means for inputting the image information to a trained multi-depth feature fusion neural network model;
the device is used for extracting the features of the multi-depth feature fusion neural network model by convolution in the encoding stage, restoring the information of the input image by combining an up-sampling method of convolution and bilinear interpolation in the decoding stage and outputting a feature map with significance information;
the device is used for learning the feature maps of different levels by adopting a multi-level network and fusing the feature maps of different levels;
and a means for outputting a final saliency target detection result.
It should be noted that the specific working manner of the apparatus is implemented by using the method disclosed in the first embodiment, and is not described again.
EXAMPLE III
In one or more embodiments, a terminal device is disclosed, which includes a server including a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor executes the computer program to implement the image saliency target detection method of multi-depth feature fusion in the first embodiment. For brevity, no further description is provided herein.
It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software.
The method for detecting the image saliency target based on multi-depth feature fusion in the first embodiment may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims (10)

1. A method for detecting a multi-depth feature fused image saliency target is characterized by comprising the following steps:
acquiring to-be-detected image information under a set scene;
inputting the image information into a trained multi-depth feature fusion neural network model;
the multi-depth feature fusion neural network model adopts convolution to extract features in an encoding stage, restores information of an input image in a decoding stage by combining an up-sampling method of convolution and bilinear interpolation, and outputs a feature map with significance information;
learning feature maps of different levels by adopting a multi-level network, and fusing the feature maps of the different levels;
and outputting a final significant target detection result.
2. The method for detecting the image saliency target of claim 1 characterized by adding a contour detection branch, extracting the contour feature information of a saliency target through a multi-depth feature fusion neural network model, and refining the boundary details of the target to be detected by using the contour features; and then, fusing the salient feature information of the image to be detected and the salient target contour feature information.
3. The method for detecting the image saliency target of claim 1 characterized in that the training process for the neural network model of multi-depth feature fusion comprises:
acquiring different image information under a set scene, performing binarization significance labeling on pixel points in each image, determining labels, and further forming a training set and a test set;
randomly zooming, cutting, filling a boundary and turning the training set image to expand the data set;
establishing a background model by calculating the mean value and variance of each pixel point in the image, normalizing the pixel points and extracting the characteristics;
inputting the extracted features into a multi-depth feature fusion neural network model for training;
and verifying the trained neural network model by using the test set image.
4. The method for detecting the image saliency target by fusion of the multiple depth features as claimed in claim 3, wherein a background model is established by calculating the mean and variance of each pixel point in the image, the pixel points are normalized, and the feature extraction is performed, the specific process includes:
calculating the average value and variance of all image pixel points to obtain a background model;
and subtracting the average value from the pixel value of each pixel point of the image and dividing the average value by the square difference to obtain data meeting normal distribution.
5. The method for detecting the image saliency target by multi-depth feature fusion according to claim 1, wherein the multi-depth feature fusion neural network model performs feature extraction by convolution in a coding stage, specifically:
1/2 sampling the image through a convolutional layer to the original image;
shallow detail features of the image are captured by the contour legs of the two convolutional layers.
6. The method for detecting the saliency target of the image fused with the multi-depth features as claimed in claim 1, wherein an up-sampling method combining convolution and bilinear interpolation is used in a decoding stage to restore the information of the input image and output a feature map with saliency information, and specifically:
carrying out four times of convolution on the input image by utilizing a re-weighted convolution structure to obtain four groups of significant feature maps with different scales;
the image is up-sampled by two times by a bilinear interpolation method;
fusing the up-sampled characteristic diagram with the corresponding scale characteristic diagram obtained in the encoding stage to form a multi-level characteristic diagram;
performing convolution operation on the multi-level feature graph, and maintaining the number of channels as the number of categories;
and (4) sampling a plurality of groups of images with different scales by corresponding multiples through bilinear interpolation, so that the obtained prediction output image has the same size as the original image.
7. The method for detecting the image saliency target by fusion of multiple depth features as claimed in claim 6, wherein said re-weighted convolution structure is specifically: a branch of channel weight storage is introduced into a basic convolution unit, and the weight of each channel on significance contribution is adjusted through training.
8. A multi-depth feature fused image saliency target detection system, comprising:
the device is used for acquiring the information of the image to be detected in a set scene;
means for inputting the image information to a trained multi-depth feature fusion neural network model;
the device is used for extracting the features of the multi-depth feature fusion neural network model by convolution in the encoding stage, restoring the information of the input image by combining an up-sampling method of convolution and bilinear interpolation in the decoding stage and outputting a feature map with significance information;
the device is used for learning the feature maps of different levels by adopting a multi-level network and fusing the feature maps of different levels;
and a means for outputting a final saliency target detection result.
9. A terminal device comprising a processor and a computer-readable storage medium, the processor being configured to implement instructions; a computer readable storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method for multi-depth feature fused image saliency target detection of any one of claims 1-7.
10. A computer-readable storage medium having stored therein a plurality of instructions, wherein the instructions are adapted to be loaded by a processor of a terminal device and to perform the multi-depth feature fused image saliency target detection method of any one of claims 1-7.
CN202010832414.6A 2020-08-18 2020-08-18 Image saliency target detection method and system based on multi-depth feature fusion Active CN112132156B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010832414.6A CN112132156B (en) 2020-08-18 2020-08-18 Image saliency target detection method and system based on multi-depth feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010832414.6A CN112132156B (en) 2020-08-18 2020-08-18 Image saliency target detection method and system based on multi-depth feature fusion

Publications (2)

Publication Number Publication Date
CN112132156A true CN112132156A (en) 2020-12-25
CN112132156B CN112132156B (en) 2023-08-22

Family

ID=73850349

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010832414.6A Active CN112132156B (en) 2020-08-18 2020-08-18 Image saliency target detection method and system based on multi-depth feature fusion

Country Status (1)

Country Link
CN (1) CN112132156B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112766285A (en) * 2021-01-26 2021-05-07 北京有竹居网络技术有限公司 Image sample generation method and device and electronic equipment
CN112837360A (en) * 2021-01-07 2021-05-25 北京百度网讯科技有限公司 Depth information processing method, apparatus, device, storage medium, and program product
CN112903692A (en) * 2021-01-18 2021-06-04 无锡金元启信息技术科技有限公司 Industrial hole wall defect detection system and identification algorithm based on AI
CN112967322A (en) * 2021-04-07 2021-06-15 深圳创维-Rgb电子有限公司 Moving object detection model establishing method and moving object detection method
CN113052188A (en) * 2021-03-26 2021-06-29 大连理工大学人工智能大连研究院 Method, system, equipment and storage medium for detecting remote sensing image target
CN113313129A (en) * 2021-06-22 2021-08-27 中国平安财产保险股份有限公司 Method, device and equipment for training disaster recognition model and storage medium
CN113515660A (en) * 2021-07-16 2021-10-19 广西师范大学 Depth feature contrast weighted image retrieval method based on three-dimensional tensor contrast strategy
CN113538379A (en) * 2021-07-16 2021-10-22 河南科技学院 Double-stream coding fusion significance detection method based on RGB and gray level image
CN113567436A (en) * 2021-07-22 2021-10-29 上海交通大学 Saliency target detection device and method based on deep convolutional neural network
CN113641845A (en) * 2021-07-16 2021-11-12 广西师范大学 Depth feature contrast weighted image retrieval method based on vector contrast strategy
CN113724286A (en) * 2021-08-09 2021-11-30 浙江大华技术股份有限公司 Method and device for detecting saliency target and computer-readable storage medium
CN114332633A (en) * 2022-03-01 2022-04-12 北京化工大学 Radar image target detection and identification method, equipment and storage medium
CN115035377A (en) * 2022-06-15 2022-09-09 天津大学 Significance detection network system based on double-stream coding and interactive decoding
CN115331082A (en) * 2022-10-13 2022-11-11 天津大学 Path generation method of tracking sound source, training method of model and electronic equipment
CN115527027A (en) * 2022-03-04 2022-12-27 西南民族大学 Remote sensing image ground object segmentation method based on multi-feature fusion mechanism
CN115908982A (en) * 2022-12-01 2023-04-04 北京百度网讯科技有限公司 Image processing method, model training method, device, equipment and storage medium
CN116703817A (en) * 2023-03-23 2023-09-05 国网山东省电力公司莱芜供电公司 Transmission line detection method and system based on saliency target detection

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109816100A (en) * 2019-01-30 2019-05-28 中科人工智能创新技术研究院(青岛)有限公司 A kind of conspicuousness object detecting method and device based on two-way fusion network
CN110598610A (en) * 2019-09-02 2019-12-20 北京航空航天大学 Target significance detection method based on neural selection attention
CN110909594A (en) * 2019-10-12 2020-03-24 杭州电子科技大学 Video significance detection method based on depth fusion
CN111428805A (en) * 2020-04-01 2020-07-17 南开大学 Method and device for detecting salient object, storage medium and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109816100A (en) * 2019-01-30 2019-05-28 中科人工智能创新技术研究院(青岛)有限公司 A kind of conspicuousness object detecting method and device based on two-way fusion network
CN110598610A (en) * 2019-09-02 2019-12-20 北京航空航天大学 Target significance detection method based on neural selection attention
CN110909594A (en) * 2019-10-12 2020-03-24 杭州电子科技大学 Video significance detection method based on depth fusion
CN111428805A (en) * 2020-04-01 2020-07-17 南开大学 Method and device for detecting salient object, storage medium and electronic equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WENCHI MA 等: "MDFN: Multi-scale deep feature learning network for object detection", ELSEVIER *
裴晓康 等: "基于模糊估计融合显著性检测的自动抠图算法", 计算机应用研究, vol. 29, no. 10 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112837360A (en) * 2021-01-07 2021-05-25 北京百度网讯科技有限公司 Depth information processing method, apparatus, device, storage medium, and program product
CN112837360B (en) * 2021-01-07 2023-08-11 北京百度网讯科技有限公司 Depth information processing method, apparatus, device, storage medium, and program product
CN112903692A (en) * 2021-01-18 2021-06-04 无锡金元启信息技术科技有限公司 Industrial hole wall defect detection system and identification algorithm based on AI
CN112766285A (en) * 2021-01-26 2021-05-07 北京有竹居网络技术有限公司 Image sample generation method and device and electronic equipment
CN112766285B (en) * 2021-01-26 2024-03-19 北京有竹居网络技术有限公司 Image sample generation method and device and electronic equipment
CN113052188A (en) * 2021-03-26 2021-06-29 大连理工大学人工智能大连研究院 Method, system, equipment and storage medium for detecting remote sensing image target
CN112967322A (en) * 2021-04-07 2021-06-15 深圳创维-Rgb电子有限公司 Moving object detection model establishing method and moving object detection method
CN113313129A (en) * 2021-06-22 2021-08-27 中国平安财产保险股份有限公司 Method, device and equipment for training disaster recognition model and storage medium
CN113313129B (en) * 2021-06-22 2024-04-05 中国平安财产保险股份有限公司 Training method, device, equipment and storage medium for disaster damage recognition model
CN113515660A (en) * 2021-07-16 2021-10-19 广西师范大学 Depth feature contrast weighted image retrieval method based on three-dimensional tensor contrast strategy
CN113641845A (en) * 2021-07-16 2021-11-12 广西师范大学 Depth feature contrast weighted image retrieval method based on vector contrast strategy
CN113538379A (en) * 2021-07-16 2021-10-22 河南科技学院 Double-stream coding fusion significance detection method based on RGB and gray level image
CN113567436A (en) * 2021-07-22 2021-10-29 上海交通大学 Saliency target detection device and method based on deep convolutional neural network
CN113724286A (en) * 2021-08-09 2021-11-30 浙江大华技术股份有限公司 Method and device for detecting saliency target and computer-readable storage medium
CN114332633A (en) * 2022-03-01 2022-04-12 北京化工大学 Radar image target detection and identification method, equipment and storage medium
CN115527027A (en) * 2022-03-04 2022-12-27 西南民族大学 Remote sensing image ground object segmentation method based on multi-feature fusion mechanism
CN115035377A (en) * 2022-06-15 2022-09-09 天津大学 Significance detection network system based on double-stream coding and interactive decoding
CN115035377B (en) * 2022-06-15 2024-08-23 天津大学 Significance detection network system based on double-flow coding and interactive decoding
CN115331082B (en) * 2022-10-13 2023-02-03 天津大学 Path generation method of tracking sound source, training method of model and electronic equipment
CN115331082A (en) * 2022-10-13 2022-11-11 天津大学 Path generation method of tracking sound source, training method of model and electronic equipment
CN115908982A (en) * 2022-12-01 2023-04-04 北京百度网讯科技有限公司 Image processing method, model training method, device, equipment and storage medium
CN115908982B (en) * 2022-12-01 2024-07-02 北京百度网讯科技有限公司 Image processing method, model training method, device, equipment and storage medium
CN116703817A (en) * 2023-03-23 2023-09-05 国网山东省电力公司莱芜供电公司 Transmission line detection method and system based on saliency target detection

Also Published As

Publication number Publication date
CN112132156B (en) 2023-08-22

Similar Documents

Publication Publication Date Title
CN112132156B (en) Image saliency target detection method and system based on multi-depth feature fusion
CN111666921B (en) Vehicle control method, apparatus, computer device, and computer-readable storage medium
CN107274445B (en) Image depth estimation method and system
CN109753913B (en) Multi-mode video semantic segmentation method with high calculation efficiency
CN111696110B (en) Scene segmentation method and system
CN109960742B (en) Local information searching method and device
WO2021218786A1 (en) Data processing system, object detection method and apparatus thereof
CN110298262A (en) Object identification method and device
CN111260666B (en) Image processing method and device, electronic equipment and computer readable storage medium
CN111814902A (en) Target detection model training method, target identification method, device and medium
CN110781744A (en) Small-scale pedestrian detection method based on multi-level feature fusion
CN109871792B (en) Pedestrian detection method and device
CN112446292B (en) 2D image salient object detection method and system
CN113673562B (en) Feature enhancement method, object segmentation method, device and storage medium
CN107563290A (en) A kind of pedestrian detection method and device based on image
CN111553414A (en) In-vehicle lost object detection method based on improved Faster R-CNN
CN112597918A (en) Text detection method and device, electronic equipment and storage medium
CN112149526B (en) Lane line detection method and system based on long-distance information fusion
CN114220126A (en) Target detection system and acquisition method
CN116129291A (en) Unmanned aerial vehicle animal husbandry-oriented image target recognition method and device
CN112597996A (en) Task-driven natural scene-based traffic sign significance detection method
CN114743045B (en) Small sample target detection method based on double-branch area suggestion network
CN116309050A (en) Image super-resolution method, program product, storage medium and electronic device
CN111539420B (en) Panoramic image saliency prediction method and system based on attention perception features
CN115731530A (en) Model training method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant