CN108734210A - A kind of method for checking object based on cross-module state multi-scale feature fusion - Google Patents

A kind of method for checking object based on cross-module state multi-scale feature fusion Download PDF

Info

Publication number
CN108734210A
CN108734210A CN201810474925.8A CN201810474925A CN108734210A CN 108734210 A CN108734210 A CN 108734210A CN 201810474925 A CN201810474925 A CN 201810474925A CN 108734210 A CN108734210 A CN 108734210A
Authority
CN
China
Prior art keywords
network model
rgb
trained
depth map
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810474925.8A
Other languages
Chinese (zh)
Other versions
CN108734210B (en
Inventor
刘盛
尹科杰
刘儒瑜
陈彬
陈一彬
沈康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201810474925.8A priority Critical patent/CN108734210B/en
Publication of CN108734210A publication Critical patent/CN108734210A/en
Application granted granted Critical
Publication of CN108734210B publication Critical patent/CN108734210B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour

Abstract

The invention discloses a kind of method for checking object based on cross-module state multi-scale feature fusion, the network parameter initialization depth map that network model is detected by RGB detects network model;The RGB detection network models based on acquisition detect network model with depth map again, initialize the feature extraction weight of converged network model respectively, and final training obtains a multiple dimensioned converged network model across modal characteristics of fusion.The present invention independent of the depth image data collection largely marked, can across modality fusion depth image and RGB image feature, in real time, efficiently, and accurately complete object identification, positioning and detection.The converged network model that the present invention designs only need one piece of consumer level video card and CPU as hardware, so that it may to reach real-time detection speed.

Description

A kind of method for checking object based on cross-module state multi-scale feature fusion
Technical field
The present invention relates to image identification technical field more particularly to a kind of objects based on cross-module state multi-scale feature fusion Detection method is completed at the same time inspection to the object in color depth image (RGB-D images, including colouring information and depth information) Survey, position and accurately identify task.
Background technology
In the industry, faster, more acurrate and more extensive method for checking object is always urgent demand.RGB image exists It can be influenced by violent in some special environment, such as movement or dazzle can bring image data degeneration, be schemed using RGB As feature cannot often reach expected precision to complete detection.It is therefore necessary to be believed using such as depth of the information from different sensors Breath, to improve the performance of object detection.
Since convolutional neural networks are used for identification and the Detection task of object, most of high-precision object detections Method is all based on convolutional neural networks realization.These networks can learn object from the data set of extensive label RGB image Generic features expression.If necessary to improve the precision of object detection using depth map data, it is necessary to extract the logical of object It is expressed with depth characteristic.However, in the industry not comprising enough categorical measures, and the extensive depth map all marked As data set, the generic features expression of depth information can not be thus directly obtained.
On the other hand, there are speed limitations for existing fusion feature detection method, generally require high performance GPU through long Time calculating can just obtain as a result, cannot meet the rigid requirements in industrial system for high real-time.
Invention content
The object of the present invention is to provide a kind of RGB-D image detecting methods based on cross-module state multi-scale feature fusion, if Meter one kind taking into account real-time and high-precision Fusion Model, the accurate detection of object is carried out using the multi-modal feature of object, simultaneously It completes the detection of objects in images, position and accurately identify task.
To achieve the goals above, technical solution of the present invention is as follows:
A kind of method for checking object based on cross-module state multi-scale feature fusion, including:
Pre-training model is trained using the RGB image for having marked objects in images classification in the first data set, based on pre- instruction The RGB for practicing model initialization single mode detects network model;
Network model is detected to RGB using the RGB image for having marked objects in images classification and position in the second data set It is trained;
Network model is detected based on trained RGB, initializes the depth map detection network model of single mode;
Using the depth image pair for having marked objects in images classification and position corresponding with RGB image in the second data set Depth map detection network model is trained;
Network model is detected based on trained RGB detection network models and depth map, initializes converged network model And carry out multi-scale feature fusion;
Using the RGB image and depth image for having marked objects in images classification and position of pairing, to converged network mould Type is trained;
The object in color depth image is detected using trained converged network model.
Further, described that network model is detected based on trained RGB, initialize the depth map detection net of single mode Network model, including:
The network parameter for replicating RGB detection network models, the network parameter of network model is detected as depth map.
Further, described that network model, initialization are detected based on trained RGB detection network models and depth map Converged network model simultaneously carries out multi-scale feature fusion, including:
The network parameter for replicating RGB detection network models and depth map detection network model, as in converged network model The weight of two characteristic extraction parts;
The Analysis On Multi-scale Features for combining two characteristic extraction parts to extract with multiple fused layers.
Further, the converged network model uses Multibox Loss as loss function when training.
Further, it is described to RGB detection network model be trained, to depth map detection network model be trained, When being trained to converged network model, further include:
Data enhancing processing is carried out to input data.
Further, described when being trained to converged network model, further include:
Freeze feature extracts the weight of part.
A kind of method for checking object based on cross-module state multi-scale feature fusion proposed by the present invention, in conjunction with RGB and depth Feature improves detection performance, and the network parameter initialization depth map that network model is detected by RGB detects network model;It is based on again RGB detection network models and the depth map of acquisition detect network model, initialize the feature extraction power of converged network model respectively Weight, final training obtain a multiple dimensioned converged network model across modal characteristics of fusion.The present invention is independent of a large amount of marks Depth image data collection, can across modality fusion depth image and RGB image feature, in real time, efficiently, and accurately complete object Body identification, positioning and detection.The converged network model that the present invention designs only need one piece of consumer level video card and CPU as hard Part, so that it may to reach real-time detection speed, such as video card GTX1080 and CPU Intel 7700K.
Description of the drawings
Fig. 1 is that the present invention is based on the method for checking object flow charts of cross-module state multi-scale feature fusion;
Fig. 2 is the structural schematic diagram of converged network model.
Specific implementation mode
Technical solution of the present invention is described in further details with reference to the accompanying drawings and examples, following embodiment is not constituted Limitation of the invention.
The present invention general thought be, being capable of across modality fusion depth independent of the depth image data collection largely marked Image and RGB image feature are spent, in real time, efficiently, and accurately completes object identification, positioning and detection.Training obtains an energy Receive RGB and the depth image input of cross-module state, and obtains the position of multiple objects and the Fusion Model of classification information in real time.It should Solution needs to complete the feature transfer of cross-module state:Depth map information network is initialized by RGB model parameters and training obtains Depth graph model;RGB models based on acquisition and depth graph model again, initialize the spy of converged network proposed by the present invention respectively Sign extraction part, final training obtain a multiple dimensioned network model across modal characteristics of fusion.The real-time that the present invention designs High multiple dimensioned cross-module state converged network is the key element of solution with accuracy of detection.
As shown in Figure 1, a kind of method for checking object based on cross-module state multi-scale feature fusion, including:
Pre-training model is trained using the RGB image for having marked objects in images classification in the first data set, based on pre- instruction The RGB for practicing model initialization single mode detects network model;
Network model is detected to RGB using the RGB image for having marked objects in images classification and position in the second data set It is trained;
Network model is detected based on trained RGB, initializes the depth map detection network model of single mode;
Using the depth image pair for having marked objects in images classification and position corresponding with RGB image in the second data set Depth map detection network model is trained;
Network model is detected based on trained RGB detection network models and depth map, initializes converged network model And carry out multi-scale feature fusion;
Using the RGB image and depth image for having marked objects in images classification and position of pairing, to converged network mould Type is trained;
The object in color depth image is detected using trained converged network model.
The step of above method, is described in detail below, wherein the present embodiment model training includes three phases, First stage trains RGB to detect network model, the training method training depth map inspection that second stage is shifted using the supervision of cross-module state Network model is surveyed, the phase III is based on trained RGB detection network models and depth map and detects network model, training fusion Network model.
Since convolutional neural networks are used for identification and the Detection task of object, most of high-precision object detections Method is all based on convolutional neural networks realization.These networks can learn object from the data set of extensive label RGB image Generic features expression.The technical program improves the precision of object detection by using depth map data, it is necessary to extract The general depth characteristic of body is expressed.However, in the industry not comprising enough categorical measures, and all marked extensive Depth image data collection can not thus directly obtain the generic features expression of depth information.The present embodiment is single by first training The RGB of mode detects network model, detects network model using the training method training depth map of cross-module state supervision transfer, only needs Depth map detection network model is obtained with using small-scale data set.
First stage:First, pre-training is trained using the RGB image for having marked objects in images classification in the first data set Model, the RGB based on pre-training model initialization single mode detect network model;Then it uses in the second data set and has marked figure The RGB image of object category and position is trained RGB detection network models as in.
In current pre-training model, the pre- instruction trained using the extensive rgb image data collection marked Practice model comparative maturity, such as advance trained VGG16 models etc. on ImageNet data sets, it can be directly by this Technical solution uses.Pre-training model is instructed using the extensive rgb image data collection (also referred to as the first data set) marked Practice, has typically been labelled with the classification of object in RGB image.
After choosing pre-training model, network model is detected to initialize RGB based on pre-training model, that is, replicates pre- instruction The parameter for practicing Model Neural detects network model to RGB.Then small-scale data set (also referred to as the second data set) is used In RGB image training is finely adjusted to RGB detection network models, examined object is (i.e. in RGB image in small-scale data set Object) classification and position need mark in advance, and include depth image corresponding with RGB image, depth in small-scale data set Also examined object classification and position are labelled in image.
Second stage:The present embodiment is based on trained RGB and detects network model, initializes the depth map inspection of single mode Survey network model;Using the depth map for having marked objects in images classification and position corresponding with RGB image in the second data set As being trained to depth map detection network model.
The present embodiment RGB detection network models and depth map detection network model are all single modes, are all made of neural network Model is expressed to be layered, and wherein RGB image mode is expressed as:
It is the i-th layer of feature representation trained from the large-scale dataset by label, #l is the number of plies of neural network, god Parameter through network is usedTo indicate.
Depth image mode is expressed as:
ψiI-th layer of feature representation, #u is the number of plies of neural network, equally withNerve is expressed as the layering The parameter of network.
The present embodiment detects network model based on trained RGB, initializes the depth map detection network of single mode Model replicates the network parameter of RGB detection network modelsThe network ginseng of network model is detected as depth map Number, is then finely adjusted it training with the depth image portion in small-scale data set, and trained depth map detects network The network parameter of model isThe network model of depth map detection at this time can identify object category and position in depth map It sets.
The method that cross-module state supervision transfer (Supervision Transfer) is employed herein, with the nerve of RGB mode The neural network of network expression initialization depth information mode, the migration pattern of this cross-module state, in the layer of convolution pond Verification is arrived.Assuming that existing be completed pairing from both modalities which, but the large data collection not marked, it is labeled as Pl,u.Passing through will Image feature representationAnd ψ#uScheme I with the pairs of image RGB of both modalities whichuWith depth map IlMatched (Refer to RGB networks Part expression, IlAnd IuRGB and depth image respectively in data set), it can therefrom learn the rich expression of depth map, use Transforming function transformation function t is identical come the dimension for making the two express, and proposes that (f can be arbitrary letter to the loss function based on above-mentioned network Number form formula), then it can obtain the parameter of depth map network with trained mode:
Here in addition to conventional convolution pondization and full articulamentum in single mode of the invention detection network, further include Element_wise-sum layers, permute layers, flatten layers and priorbox layers.First, element_wise-sum layers it is right Characteristic pattern is summed, and the sum operation of multi-dimensional matrix is can be regarded as.Secondly, the sequence of permute layers of change data dimension, This is identical as the unit matrix that one exchanges through space is multiplied by.Then flatten layers multi-dimensional matrix is merged into it is one-dimensional.Finally, Priorbox layers are used for handling bounding box, and are not influenced on image foot sign.All these layers can be unified into one and turn Exchange the letters number s, then the loss function of network can be described as follows:
Based on the loss function, you can realize the cross-module states model transfer of single mode network.
It should be noted that small-scale data set includes having marked the RGB image of object category and position, and it is right therewith That answers has marked the depth map of object category and position, micro- detecting network model progress to RGB using small-scale data set When adjusting training, using the RGB image in small-scale data set;Network is being detected to depth map using small-scale data set When model is finely adjusted trained, using the depth map in small-scale data set, and depth map has been represented as HHA formats (there are three dimensions for HHA codings, are horizontal parallax, height from the ground and the angle with gravity respectively), dimension and RGB image one It causes.
For the present embodiment when pre-training model uses VGG16 models, RGB detects network model, depth map detects network mould Type network structure is identical, the form of SSD-VGG16 may be used, wherein SSD-VGG16 is i.e. using VGG16 as feature extraction unit The SSD networks divided.But these networks are not fixed, other latticed forms may be used, as pre-training model also can be used The form of SSD-ResNet may be used in ResNet, corresponding RGB detections network model, depth map detection network model.
Phase III detects network model, initialization fusion based on trained RGB detection network models and depth map Network model simultaneously carries out multi-scale feature fusion;Using pairing the RGB image for having marked objects in images classification and position and Depth image is trained converged network model.
The present embodiment detects the network parameter of network model using trained RGB detection network models and depth map, The weight of characteristic extraction part in converged network model is initialized, that is, replicates RGB detection network models and depth map detects network The network parameter of model, the weight as characteristic extraction part in converged network model.Then input data is the RGB figures of pairing Piece and depth picture, are finely adjusted training, which may be used the second data set.As shown in Fig. 2, here with SSD- For VGG16 is as characteristic extraction part, converged network, which needs to copy to detect in network with depth map from RGB detection networks, to be obtained (RGB feature extraction part and depth map are special as two characteristic extraction parts of converged network for all parameters of SSD-VGG16 modules Sign extraction part) weight.Converged network input RGB figures and depth map in this way, will after two characteristic extraction parts Respectively obtain multi-level RGB generic features and the generic features (Analysis On Multi-scale Features) of depth map.
It is extracted respectively in above-mentioned two characteristic extraction part (RGB feature extraction part and depth map features extract part) After the Analysis On Multi-scale Features of two mode, converged network model (is used more using multilayered structure to merge the feature from different scale A fused layer come combine two characteristic extraction parts extract Analysis On Multi-scale Features), it is more aobvious that these features come from semantic feature The convolution pond layer of work corresponds to the higher level in network structure.There is the feasible merging point of two classes in converged network framework, these Point is divided into two major classes, and one kind is the network bottom layer before characteristic extraction part;It is another kind of relatively high after feature extraction layer The position of layer.The network of low layer possesses more space characteristics, and upper layer network possesses more semantic features.To be detected two A object if it is the same object, then high-level generic features expression it is closer, but low level expression might have very It is different.Therefore, the present invention selects high-rise fusion rather than the fusion of low layer, and experiment also demonstrates high-rise fusion and can obtain more Good effect.The present invention not merely carries out Fusion Features in the architectural framework of converged network using the feature of a certain layer, but It chooses multiple specific network layer features to be merged, including the more of the significantly more convolution pond layer of multiple semantic features Scale feature.As shown in Fig. 2, for continuing here using SSD-VGG16 as characteristic extraction part, can be merged using fused layer Conv4-3, fc7, conv6_2, conv7_2, conv8_2, conv9_2 layers of feature obtains the fusion feature of network high level.It needs It is noted that in fig. 2, illustrating only conv4-3, fc7, schematic diagram being omitted for the fusion of other layers, here no longer It repeats.
The experimental results showed that replacing element_wise-sum layers with concatenate layers carries out Fusion Features, effect is not Good, it is special that the present invention merges RGB and depth map cross-module state using the merging network layer of the element_wise-sum of corresponding number Sign.Classification and the position that object can be predicted using these features, by including two 3*3 convolution after obtaining these features The convolutional layer of core carries out regression forecasting and obtains multiple as a result, wherein first convolution kernel completes the prediction (1*4 dimensions) of position, another A convolution kernel completes the prediction of object category (1* needs the dimension for the categorical measure predicted).Finally these are obtained multiple pre- It surveys and result to the end is obtained by the method for NMS (Non-maximum suppression, non-maximum restraining).Converged network exists Using Multibox Loss as loss function when training.
In order to preferably utilize input data, the present embodiment carries out data enhancing to input data, such as passes through rotation, mirror As, the means such as cut, the Spatial diversity of picture is showed, better robust will also be had by training the model come accordingly Property.
When being trained to converged network model, freeze feature extracts the weight of part, and only training fusion part, that is, set The learning rate for setting characteristic extraction part is less than the threshold value (threshold value as 0 or very low numerical value, such as 10e-8) set so that network Trained process can be absorbed in fusion part, will not excessively change the weight of RGB and depth characteristic extraction part.By freezing From the copied next module weight of RGB and depth model, only training fusion part is trained with the fine tuning for completing converged network.Training Generally at 40,000 times or more, basic learning rate is arranged 0.001 or so iterations.Here fusion part is converged network mould Type removes the other parts of feature extraction unit exceptionally.
Technical solution of the present invention combination RGB and depth characteristic improve detection performance, before the feature merging, RGB and depth Degree image will be converted to generic features expression, the two feature extraction units by two characteristic extraction parts in converged network respectively The characteristic extraction part being made of multiple convolution ponds layer, respectively RGB and depth map features is divided to extract part, their power It is to detect to initialize and train in network model and depth map detection model from the RGB of above-mentioned two single mode to obtain again.This two A single mode network has individually carried out fine tuning training before converged network training, and uses identical architecture. The present embodiment can obtain the generic features expression of depth image in the case of no deep annotation large data collection.In addition, melting In the training process for closing network, the input data of two kinds of different modalities must keep dimension identical.Meanwhile two kinds of input pictures The data enhancement operations mode of (RGB and depth image) also must be identical.
The above embodiments are merely illustrative of the technical solutions of the present invention rather than is limited, without departing substantially from essence of the invention In the case of refreshing and its essence, those skilled in the art make various corresponding changes and change in accordance with the present invention Shape, but these corresponding change and deformations should all belong to the protection domain of appended claims of the invention.

Claims (6)

1. a kind of method for checking object based on cross-module state multi-scale feature fusion, which is characterized in that described more based on cross-module state The method for checking object of scale feature fusion, including:
Pre-training model is trained using the RGB image for having marked objects in images classification in the first data set, is based on pre-training mould The RGB that type initializes single mode detects network model;
RGB detection network models are carried out using the RGB image for having marked objects in images classification and position in the second data set Training;
Network model is detected based on trained RGB, initializes the depth map detection network model of single mode;
Using the depth image for having marked objects in images classification and position corresponding with RGB image in the second data set to depth Figure detection network model is trained;
Network model is detected based on trained RGB detection network models and depth map, initialization converged network model is gone forward side by side Row multi-scale feature fusion;
Using the RGB image and depth image for having marked objects in images classification and position of pairing, to converged network model into Row training;
The object in color depth image is detected using trained converged network model.
2. the method for checking object according to claim 1 based on cross-module state multi-scale feature fusion, which is characterized in that institute The depth map detection network model that single mode is initialized based on trained RGB detections network model is stated, including:
The network parameter for replicating RGB detection network models, the network parameter of network model is detected as depth map.
3. according to the method for checking object described in claim 1 based on cross-module state multi-scale feature fusion, which is characterized in that described Network model is detected based on trained RGB detection network models and depth map, converged network model is initialized and carries out more Scale feature merges, including:
The network parameter for replicating RGB detection network models and depth map detection network model, as two in converged network model The weight of characteristic extraction part;
The Analysis On Multi-scale Features for combining two characteristic extraction parts to extract with multiple fused layers.
4. according to the method for checking object based on cross-module state multi-scale feature fusion described in claim 3, which is characterized in that described Converged network model is when training using Multibox Loss as loss function.
5. according to the method for checking object described in claim 1 based on cross-module state multi-scale feature fusion, which is characterized in that described RGB detection network models are trained, depth map detection network model is trained, converged network model is trained When, further include:
Data enhancing processing is carried out to input data.
6. according to the method for checking object based on cross-module state multi-scale feature fusion described in claim 3, which is characterized in that described When being trained to converged network model, further include:
Freeze feature extracts the weight of part.
CN201810474925.8A 2018-05-17 2018-05-17 Object detection method based on cross-modal multi-scale feature fusion Active CN108734210B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810474925.8A CN108734210B (en) 2018-05-17 2018-05-17 Object detection method based on cross-modal multi-scale feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810474925.8A CN108734210B (en) 2018-05-17 2018-05-17 Object detection method based on cross-modal multi-scale feature fusion

Publications (2)

Publication Number Publication Date
CN108734210A true CN108734210A (en) 2018-11-02
CN108734210B CN108734210B (en) 2021-10-15

Family

ID=63938564

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810474925.8A Active CN108734210B (en) 2018-05-17 2018-05-17 Object detection method based on cross-modal multi-scale feature fusion

Country Status (1)

Country Link
CN (1) CN108734210B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110070072A (en) * 2019-05-05 2019-07-30 厦门美图之家科技有限公司 A method of generating object detection model
CN110334769A (en) * 2019-07-09 2019-10-15 北京华捷艾米科技有限公司 Target identification method and device
CN110334708A (en) * 2019-07-03 2019-10-15 中国科学院自动化研究所 Difference automatic calibrating method, system, device in cross-module state target detection
CN110852350A (en) * 2019-10-21 2020-02-28 北京航空航天大学 Pulmonary nodule benign and malignant classification method and system based on multi-scale migration learning
CN110956094A (en) * 2019-11-09 2020-04-03 北京工业大学 RGB-D multi-mode fusion personnel detection method based on asymmetric double-current network
CN111242238A (en) * 2020-01-21 2020-06-05 北京交通大学 Method for acquiring RGB-D image saliency target
CN111540343A (en) * 2020-03-17 2020-08-14 北京捷通华声科技股份有限公司 Corpus identification method and apparatus
CN111723649A (en) * 2020-05-08 2020-09-29 天津大学 Short video event detection method based on semantic decomposition
CN112183619A (en) * 2020-09-27 2021-01-05 南京三眼精灵信息技术有限公司 Digital model fusion method and device
CN113033258A (en) * 2019-12-24 2021-06-25 百度国际科技(深圳)有限公司 Image feature extraction method, device, equipment and storage medium
CN113077491A (en) * 2021-04-02 2021-07-06 安徽大学 RGBT target tracking method based on cross-modal sharing and specific representation form
CN114581838A (en) * 2022-04-26 2022-06-03 阿里巴巴达摩院(杭州)科技有限公司 Image processing method and device and cloud equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102800079A (en) * 2012-08-03 2012-11-28 西安电子科技大学 Multimode image fusion method based on SCDPT transformation and amplitude-phase combination thereof
US20170014203A1 (en) * 2014-02-24 2017-01-19 Universite De Strasbourg (Etablissement Public National A Caractere Scientifiqu, Culturel Et Prof Automatic multimodal real-time tracking of a moving marker for image plane alignment inside a mri scanner
CN106981059A (en) * 2017-03-30 2017-07-25 中国矿业大学 With reference to PCNN and the two-dimensional empirical mode decomposition image interfusion method of compressed sensing
CN107066583A (en) * 2017-04-14 2017-08-18 华侨大学 A kind of picture and text cross-module state sensibility classification method merged based on compact bilinearity
CN107403201A (en) * 2017-08-11 2017-11-28 强深智能医疗科技(昆山)有限公司 Tumour radiotherapy target area and jeopardize that organ is intelligent, automation delineation method
CN107463952A (en) * 2017-07-21 2017-12-12 清华大学 A kind of object material sorting technique based on multi-modal fusion deep learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102800079A (en) * 2012-08-03 2012-11-28 西安电子科技大学 Multimode image fusion method based on SCDPT transformation and amplitude-phase combination thereof
US20170014203A1 (en) * 2014-02-24 2017-01-19 Universite De Strasbourg (Etablissement Public National A Caractere Scientifiqu, Culturel Et Prof Automatic multimodal real-time tracking of a moving marker for image plane alignment inside a mri scanner
CN106981059A (en) * 2017-03-30 2017-07-25 中国矿业大学 With reference to PCNN and the two-dimensional empirical mode decomposition image interfusion method of compressed sensing
CN107066583A (en) * 2017-04-14 2017-08-18 华侨大学 A kind of picture and text cross-module state sensibility classification method merged based on compact bilinearity
CN107463952A (en) * 2017-07-21 2017-12-12 清华大学 A kind of object material sorting technique based on multi-modal fusion deep learning
CN107403201A (en) * 2017-08-11 2017-11-28 强深智能医疗科技(昆山)有限公司 Tumour radiotherapy target area and jeopardize that organ is intelligent, automation delineation method

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110070072A (en) * 2019-05-05 2019-07-30 厦门美图之家科技有限公司 A method of generating object detection model
CN110334708A (en) * 2019-07-03 2019-10-15 中国科学院自动化研究所 Difference automatic calibrating method, system, device in cross-module state target detection
CN110334769A (en) * 2019-07-09 2019-10-15 北京华捷艾米科技有限公司 Target identification method and device
CN110852350A (en) * 2019-10-21 2020-02-28 北京航空航天大学 Pulmonary nodule benign and malignant classification method and system based on multi-scale migration learning
WO2021088300A1 (en) * 2019-11-09 2021-05-14 北京工业大学 Rgb-d multi-mode fusion personnel detection method based on asymmetric double-stream network
CN110956094A (en) * 2019-11-09 2020-04-03 北京工业大学 RGB-D multi-mode fusion personnel detection method based on asymmetric double-current network
CN110956094B (en) * 2019-11-09 2023-12-01 北京工业大学 RGB-D multi-mode fusion personnel detection method based on asymmetric double-flow network
CN113033258A (en) * 2019-12-24 2021-06-25 百度国际科技(深圳)有限公司 Image feature extraction method, device, equipment and storage medium
CN111242238A (en) * 2020-01-21 2020-06-05 北京交通大学 Method for acquiring RGB-D image saliency target
CN111242238B (en) * 2020-01-21 2023-12-26 北京交通大学 RGB-D image saliency target acquisition method
CN111540343A (en) * 2020-03-17 2020-08-14 北京捷通华声科技股份有限公司 Corpus identification method and apparatus
CN111540343B (en) * 2020-03-17 2021-02-05 北京捷通华声科技股份有限公司 Corpus identification method and apparatus
CN111723649A (en) * 2020-05-08 2020-09-29 天津大学 Short video event detection method based on semantic decomposition
CN112183619A (en) * 2020-09-27 2021-01-05 南京三眼精灵信息技术有限公司 Digital model fusion method and device
CN113077491A (en) * 2021-04-02 2021-07-06 安徽大学 RGBT target tracking method based on cross-modal sharing and specific representation form
CN114581838A (en) * 2022-04-26 2022-06-03 阿里巴巴达摩院(杭州)科技有限公司 Image processing method and device and cloud equipment

Also Published As

Publication number Publication date
CN108734210B (en) 2021-10-15

Similar Documents

Publication Publication Date Title
CN108734210A (en) A kind of method for checking object based on cross-module state multi-scale feature fusion
CN110276269B (en) Remote sensing image target detection method based on attention mechanism
Zalpour et al. A new approach for oil tank detection using deep learning features with control false alarm rate in high-resolution satellite imagery
CN106991382A (en) A kind of remote sensing scene classification method
CN107330357A (en) Vision SLAM closed loop detection methods based on deep neural network
Zhang et al. Fast and accurate land-cover classification on medium-resolution remote-sensing images using segmentation models
Gu et al. Hard pixel mining for depth privileged semantic segmentation
Wang et al. FE-YOLOv5: Feature enhancement network based on YOLOv5 for small object detection
CN108257154A (en) Polarimetric SAR Image change detecting method based on area information and CNN
Wang et al. Relation-attention networks for remote sensing scene classification
Liu et al. Subtler mixed attention network on fine-grained image classification
CN104298974A (en) Human body behavior recognition method based on depth video sequence
CN107767416A (en) The recognition methods of pedestrian's direction in a kind of low-resolution image
CN109522961A (en) A kind of semi-supervision image classification method based on dictionary deep learning
Brekke et al. Multimodal 3d object detection from simulated pretraining
CN105046269A (en) Multi-instance multi-label scene classification method based on multinuclear fusion
CN105574545B (en) The semantic cutting method of street environment image various visual angles and device
CN107967480A (en) A kind of notable object extraction method based on label semanteme
Fan Research and realization of video target detection system based on deep learning
CN109034213A (en) Hyperspectral image classification method and system based on joint entropy principle
Xu et al. Concrete crack segmentation based on convolution–deconvolution feature fusion with holistically nested networks
Xiang et al. Semi-supervised learning framework for crack segmentation based on contrastive learning and cross pseudo supervision
Du et al. Improved detection method for traffic signs in real scenes applied in intelligent and connected vehicles
CN110111365B (en) Training method and device based on deep learning and target tracking method and device
Sen et al. A hierarchical approach to remote sensing scene classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant