CN112487927B - Method and system for realizing indoor scene recognition based on object associated attention - Google Patents
Method and system for realizing indoor scene recognition based on object associated attention Download PDFInfo
- Publication number
- CN112487927B CN112487927B CN202011344887.8A CN202011344887A CN112487927B CN 112487927 B CN112487927 B CN 112487927B CN 202011344887 A CN202011344887 A CN 202011344887A CN 112487927 B CN112487927 B CN 112487927B
- Authority
- CN
- China
- Prior art keywords
- feature
- feature vector
- expression
- objects
- input image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 239000013598 vector Substances 0.000 claims abstract description 91
- 230000014509 gene expression Effects 0.000 claims abstract description 62
- 230000002776 aggregation Effects 0.000 claims abstract description 32
- 238000004220 aggregation Methods 0.000 claims abstract description 32
- 230000011218 segmentation Effects 0.000 claims abstract description 12
- 239000013604 expression vector Substances 0.000 claims description 13
- 238000013528 artificial neural network Methods 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 11
- 230000004931 aggregating effect Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000003909 pattern recognition Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/35—Categorising the entire scene, e.g. birthday party or wedding scene
- G06V20/36—Indoor scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
Abstract
The invention discloses an indoor scene recognition realization method and system based on object associated attention, wherein the method comprises the following steps: extracting semantic feature vectors of each spatial position in an input image through a backbone network; the semantic feature vectors of all the spatial positions form a feature map according to the spatial positions and are transmitted to a segmentation module to calculate the probability that each spatial position in the input image belongs to different objects; calculating the feature vector of each object through the object feature aggregation module, multiplying the feature vector of all the spatial positions of each object by the probability that the spatial positions belong to the object, and carrying out weighted average so as to obtain the feature vector expression of each object. According to the method and the system for realizing the indoor scene recognition based on the object-associated attention, the object feature aggregation module is used for detecting all object features on the input image aiming at the fact that objects are different in different scenes, so that information contained in the image is better expressed.
Description
Technical Field
The invention relates to an intelligent recognition method and a software system, in particular to an object associated attention characteristic recognition method and system improvement aiming at indoor scene recognition.
Background
In the prior art, the perception capability of environmental information is an indispensable capability of a robot, and accurate perception of surrounding scenes is helpful for the robot to make correct judgment and behaviors.
With advances in technology and computing, a number of scene recognition algorithms based on deep learning have been proposed. Herranze et al found that feature extraction required adaptation to different scales of the image and identified the scene by multi-scale fusion of features obtained from models trained on different datasets, see pages 571-579 of CVPR 2016, scene Recognition with CNNs: objects, scales and Dataset Bias (CVPR is an abbreviation for IEEE Conference on Computer Vision and Pattern Recognition, IEEE international conference on computer vision and pattern recognition).
However, the enhancement of scene recognition effects based solely on picture global information is limited, as these methods are not only semantically difficult to interpret, but are also easily interfered with by common objects existing across scenes.
Thus, some students attempt to combine context information and local object associations to achieve scene recognition. Lfupez-cifues et al obtain context information through Semantic segmentation to help eliminate divergence from common objects in different scenarios, see Pattern Recognition, vol.102, pages 107-256, semantic-Aware Scene Recognition.
Wang et al trains Patchet Net based on weak supervision training, guides local feature extraction based thereon, and finally aggregates local features based on semantic probability to achieve scene recognition, see Pattern Recognition, vol.26, pages 2028-2041, weakly Supervised Patchnets: describing and Aggregating Local Patches for Scene Recognition.
At the same time, there are many studies to enhance the scene understanding capabilities of models by combining multi-modal features. However, in the prior art, the indoor scene recognition method is mostly realized by combining the manual setting feature with the global feature, so that the calculation amount is large, and the relationship between objects cannot be effectively learned, so that the scene can be accurately recognized.
Accordingly, the prior art is still in need of improvement and development.
Disclosure of Invention
The invention aims to provide an indoor scene recognition implementation method and system based on object association attention, and provides a rapid and accurate object association recognition implementation method and system aiming at the problems of inaccurate overall recognition and excessive redundancy of a network structure in the prior art.
The technical scheme of the invention is as follows:
an indoor scene recognition implementation method based on object associated attention comprises the following steps:
A. extracting semantic feature vectors of each spatial position in an input image through a backbone network;
B. the semantic feature vectors of all the spatial positions form a feature map according to the spatial positions and are transmitted to a segmentation module to calculate the probability that each spatial position in the input image belongs to different objects;
C. calculating the feature vector of each object through an object feature aggregation module, multiplying the feature vector of all the spatial positions of each object by the probability that the spatial positions belong to the object, and carrying out weighted average so as to obtain the feature vector expression of each object;
D. and splicing the feature vectors of all the objects to form the object feature expression of the input image.
According to the method for realizing the indoor scene recognition based on the object-associated attention, the backbone network and the object feature aggregation module can calculate feature expressions of different objects based on feature hidden vectors of different spatial positions.
The method for realizing the indoor scene recognition based on the object associated attention, wherein the method further comprises the following steps after the step D:
E. the object feature expressions are input to a lightweight object-associated attention module, which is implemented using neural networks to calculate relationships between objects.
The method for realizing the indoor scene recognition based on the object associated attention, wherein the step E specifically further comprises the following steps:
and E1, calculating a relation feature vector expression of each object and all other objects based on a neural network and cosine similarity by the light-weight object association attention module, and splicing the relation feature vector expression into a feature vector expression of the object.
The method for realizing the indoor scene recognition based on the object associated attention, wherein the step E further comprises the following steps:
F. the feature vector expression of the object and the relation feature vector expression are input to a global association aggregation module to aggregate the relation among all objects and form a common feature expression vector of all objects.
The method for realizing the indoor scene recognition based on the object associated attention, wherein the step F further comprises the following steps:
G. and inputting the common characteristic expression vector to a classification and identification module of the neural network full-connection layer, so as to identify the scene to which the input image belongs.
An indoor scene recognition implementation system based on object-associated attention, comprising:
a backbone network for extracting semantic feature vectors of each spatial location for the input image;
the segmentation module is used for forming a feature map of semantic feature vectors of all the spatial positions according to the spatial positions of the semantic feature vectors, and calculating the probability that each spatial position in the input image belongs to different objects;
the object feature aggregation module is used for calculating the feature vector of each object, multiplying the feature vector of all the spatial positions of each object by the probability that the spatial positions belong to the object and carrying out weighted average so as to obtain the feature vector expression of each object;
the object feature aggregation module is also used for splicing feature vectors of all objects to form object feature expression of the input image.
The system for realizing the recognition of the indoor scene based on the object associated attention comprises the following components: and the light-weight object associated attention module is used for calculating the relation feature vector expression of each object and all other objects based on the neural network and the cosine similarity, and splicing the relation feature vector expression into the feature vector expression of the object.
The system for realizing the recognition of the indoor scene based on the object associated attention comprises the following components: and the global association aggregation module is used for taking the characteristic vector expression of the object and the relation characteristic vector expression as input, and aggregating the relation among all the objects to form a common characteristic expression vector of all the objects.
The system for realizing the recognition of the indoor scene based on the object associated attention comprises the following components: and the classification and identification module is used for inputting the common characteristic expression vector and identifying the scene to which the input image belongs.
According to the method and the system for realizing indoor scene recognition based on the object associated attention, provided by the invention, aiming at the fact that objects exist in different scenes, all object features on an input image are detected by using the object feature aggregation module, so that information contained in the image is better expressed. Meanwhile, aiming at different coexisting object distributions in different scenes, a light-weight object associated attention module and a global associated aggregation module are used for learning and aggregating the relation of objects, and finally a common feature expression vector is generated so as to facilitate the subsequent classification module to identify different scenes. The method is high in efficiency, accurate in identification and suitable for identifying and judging different indoor scenes.
Drawings
Fig. 1 is a block diagram and a flowchart illustrating a preferred embodiment of a method and a system for implementing indoor scene recognition based on object-related attention according to the present invention.
Fig. 2 is a schematic diagram illustrating an object feature aggregation module processing example of a preferred embodiment of the method and system for implementing indoor scene recognition based on object-related attention according to the present invention.
Fig. 3 is a schematic diagram of an example of a light-weight object-associated attention module according to a preferred embodiment of the method and system for implementing object-associated attention-based indoor scene recognition.
Fig. 4 is a schematic diagram of a global association aggregation module according to a preferred embodiment of the method and system for implementing indoor scene recognition based on object associated attention.
Detailed Description
The preferred embodiments of the present invention are described in detail below.
According to the method and the system for realizing the indoor scene recognition based on the object associated attention, in the recognition processing of the neural network, the distribution of the coexisting objects in different scenes is found to be different through analysis, so that the indoor scene recognition performance can be improved through learning the object relationship. Therefore, the invention provides the object feature aggregation module for detecting and extracting the features of all objects on the picture, learning the relation among the objects through the proposed light-weight object associated attention module, and finally, aggregating the object features and the object relation through the global associated aggregation module, and realizing the identification of indoor scenes through full connection. The invention realizes scene recognition in a brand new angle, and is more effective than the method in the prior art.
In the preferred embodiment of the method and system for recognizing indoor scene based on object-related attention according to the present invention, as shown in fig. 1, the input image may be a still image acquired by a camera or one of the frames of images captured from a video. Extracting high-level semantic feature vectors of each position of the input image through a backbone network capable of extracting feature hidden vectors of different spatial positions, and then forming feature graphs according to the semantic feature vectors of all the spatial positions and transmitting the feature graphs to a segmentation module to calculate the probability that each spatial position of the upper input image belongs to different objects.
Based on the feature map calculated by the backbone network and the object attribution probability map calculated by the segmentation module, a newly proposed object feature aggregation module is then used to calculate feature vectors of each object, and the implementation process of the module is to multiply feature vectors of all spatial positions of each object by the probability that the spatial positions belong to the object and then perform weighted average, so as to obtain feature vector expression of each object. And finally, the feature vectors of all the objects are spliced to form the object feature expression of the picture.
And then inputting the object feature expression into a light-weight object associated attention module newly proposed by the invention for calculating the relation between the objects, wherein the light-weight object associated attention module calculates the relation feature vector expression of each object and all other objects based on the neural network and cosine similarity, and splices the relation feature vector expression into the feature vector expression of the object, thereby enriching the object features.
The invention further inputs the object self feature vector expression and the object relation feature vector expression into a newly proposed global association aggregation module for aggregating the relations among all objects, thereby forming a common feature expression vector of all objects on the input image. And finally, inputting the feature expression vector into a classification recognition module formed by the neural network full-connection layer to recognize which scene the picture belongs to.
Specifically, after the camera acquires an input image of an indoor scene, object feature analysis is performed on all spatial positions of the input image, so that scene judgment is performed through all object features contained in the image. The specific judging process does not depend on local object features, and judges all object relationships in the input image at the same time, so that indoor scenes, such as a kitchen, a bedroom or a living room and a restaurant, can be judged more accurately and effectively, and the interference of the object features commonly appearing among different scenes is prevented by identifying the relationship features among the objects, so that the scene identification is more accurate.
Fig. 2 shows a preferred implementation example of the object feature aggregation module in the method and system for implementing indoor scene recognition based on object-related attention according to the present invention. In order to effectively extract the object features in the input image, the invention provides an implementation scheme of the object feature aggregation module in fig. 2. Firstly, calculating a spatial position feature map F and an object attribution probability map S based on a backbone network of scene segmentation aiming at an input image, and then carrying out weighted summation on all spatial position feature vectors of each object and object attribution probabilities of corresponding positions to obtain a feature vector expression O of the object.
Finally, the feature vector expressions of all the objects are spliced to obtain the object feature expression of the input image, wherein the non-existing objects are all zero vectors, different feature vectors exist for the existing objects, and the final feature dimension is 1024x150x1 as shown in an example in fig. 2.
The calculation method of each object feature vector in the object feature aggregation module is shown in the following chart, wherein Oj represents the feature vector of an object j, bij represents whether the ith pixel position belongs to the object j with the highest probability, sij represents the probability of the ith pixel position belonging to the object j, and Fi represents the feature vector of the ith pixel position. Finally, the following calculation formula is adopted to determine the feature vector expression of each object:
thus, the object feature vector expression is calculated for each divided region of the input image, and the judgment of different objects can be realized, but it is difficult to express the coexistence relation of the objects only by the object features.
Accordingly, the present invention further provides a lightweight object-related attention module for calculating coexistence relationships between objects, as shown in fig. 3.
In order to effectively migrate the characteristics of objects in scene segmentation for scene recognition and learn potential relationships between objects, the invention further provides a lightweight object-associated attention module. The lightweight object-associated attention module is comprised of one or more lightweight object-associated attention block cascades as shown in fig. 3, implemented as a neural network. Wherein Q can refine object features with higher semantic information while reducing data dimension, compared with existing methods in the prior art, the calculation of K and V based on Q not only reduces the calculation amount by 50%, but also enables the dimensions of Q, K, V and output features to be controlled simultaneously by adjusting the value of alpha only. K and V can obtain the relation expression of each object through matrix multiplication, and finally, the object relation and the original object characteristics are spliced and then output to the next module, so that the object relation and the object characteristics are subjected to characteristic vector expression.
In order to aggregate object features and relationships into hidden vector expressions with as few parameters and calculations as possible (too complex parameters and calculations would result in inefficient processing and difficult extraction of critical information), the present invention proposes a global associative aggregation module, as shown in fig. 4, in which a strip-like depth convolution is used, which models object features that have no positional relationship compared to the block convolutions in conventional depth convolutions. The module first aggregates the information of all objects in each channel with a strip-like depth convolution of 150x1 for each channel of the feature.
However, the information between the channels is not circulated, so that the information between all channels is aggregated by using the point convolution of 1x1, and thus, an advanced semantic feature expression vector is generated to express the scene information. And finally, transmitting the final scene expression vector, namely the object feature vector expression, to a general full-connection layer, namely a classification recognition module, so as to obtain a final scene recognition result of the input image or picture.
In the method and the system for realizing the indoor scene recognition based on the object-related attention, a brand new object feature aggregation method is adopted in the object feature aggregation module, namely, after a space position feature map F and an object attribution probability map S are calculated based on a scene segmentation algorithm, the feature vector expression O of the object is obtained by weighting and summing the feature vectors of all the space positions of each object and the object attribution probabilities of the corresponding space positions. And finally, splicing the feature vectors of all the objects to obtain the object feature expression of the image.
And secondly, the invention further provides an object associated attention module, and compared with the traditional attention module, the light object associated attention module is lighter when learning the object relationship, and the number of the output characteristic channels can be controlled at will.
In the method and the system for realizing the indoor scene recognition based on the object associated attention, the invention further provides a global associated aggregation module, and compared with the block convolution in the traditional depth convolution, the bar depth convolution adopts a bar depth convolution mode, and the bar depth convolution aggregates the characteristics and the relations of all objects, so that even if the input of the bar depth convolution does not have space position information, the bar depth convolution can aggregate to form the final characteristic vector expression of all the objects.
According to the method and the system for realizing the recognition of the indoor scene based on the object associated attention, the object feature aggregation and the light associated attention processing method and the light associated attention processing module are adopted, so that the calculation efficiency and accuracy meeting the actual calculation amount requirement are realized, and the recognition and judgment process of the indoor scene of the input image is facilitated.
It will be understood that modifications and variations will be apparent to those skilled in the art from the foregoing description, and it is intended that all such modifications and variations be included within the scope of the following claims.
Claims (3)
1. An indoor scene recognition implementation method based on object associated attention comprises the following steps:
A. extracting semantic feature vectors of each spatial position in an input image through a backbone network;
B. the semantic feature vectors of all the spatial positions form a feature map according to the spatial positions and are transmitted to a segmentation module to calculate the probability that each spatial position in the input image belongs to different objects;
C. calculating the feature vector expression of each object through an object feature aggregation module, multiplying the semantic feature vector of all the spatial positions of each object by the probability that the spatial positions belong to the object, and carrying out weighted average so as to obtain the feature vector expression of each object;
c1, the calculation method of each object feature vector expression in the object feature aggregation module specifically comprises the following steps:
wherein, in O j Feature vector representation representing object j, B ij Indicating whether the ith pixel position belongs to the object j, S with the highest probability ij Representing the probability of the ith pixel position belonging to object j, F i Representing the semantic feature vector of the ith pixel position, so as to calculate the object feature vector expression of each divided area of the input image, and realize the judgment of different objects;
D. splicing the feature vector expressions of all the objects to form the object feature expression of the input image;
E. inputting the object feature expression into a light-weight object associated attention module, wherein the light-weight object associated attention module is realized by a neural network and is used for calculating the relation between objects;
e1, the light-weight object associated attention module calculates the relation feature vector expression of each object and all other objects based on a neural network and cosine similarity, and splices the relation feature vector expression into the feature vector expression of the object;
F. inputting the feature vector expression of the object and the relation feature vector expression into a global association aggregation module to aggregate the relation among all the objects and form a common feature expression vector of all the objects;
G. and inputting the common characteristic expression vector to a classification and identification module of the neural network full-connection layer, so as to identify the scene to which the input image belongs.
2. The method according to claim 1, wherein the backbone network and the object feature aggregation module can calculate feature vector expressions of different objects based on different spatial location feature hidden vectors.
3. An indoor scene recognition implementation system based on object-associated attention, comprising:
a backbone network for extracting semantic feature vectors of each spatial location for the input image;
the segmentation module is used for forming a feature map of semantic feature vectors of all the spatial positions according to the spatial positions of the semantic feature vectors, and calculating the probability of each spatial position in the input image corresponding to different objects;
the object feature aggregation module is used for calculating the feature vector expression of each object, multiplying the semantic feature vectors of all the spatial positions of each object by the probability that the spatial positions are the object and carrying out weighted average so as to obtain the feature vector expression of each object;
the calculation method of each object feature vector expression in the object feature aggregation module specifically comprises the following steps:
wherein Oj represents the feature vector expression of the object j, bij represents whether the ith pixel position belongs to the object j with the highest probability, sij represents the probability of the ith pixel position belonging to the object j, fi represents the semantic feature vector of the ith pixel position, so that the object feature vector expression of each segmentation area of an input image is calculated, and the judgment of different objects is realized;
the object feature aggregation module is also used for splicing the feature vector expressions of all objects to form the object feature expression of the input image;
the light-weight object associated attention module is used for calculating the relation feature vector expression of each object and all other objects based on the neural network and cosine similarity, and splicing the relation feature vector expression into the feature vector expression of the object;
the global association aggregation module takes the feature vector expression of the object and the relation feature vector expression as input, and aggregates the relation among all the objects to form a common feature expression vector of all the objects;
and the classification and identification module is used for inputting the common characteristic expression vector and identifying the scene to which the input image belongs.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011344887.8A CN112487927B (en) | 2020-11-26 | 2020-11-26 | Method and system for realizing indoor scene recognition based on object associated attention |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011344887.8A CN112487927B (en) | 2020-11-26 | 2020-11-26 | Method and system for realizing indoor scene recognition based on object associated attention |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112487927A CN112487927A (en) | 2021-03-12 |
CN112487927B true CN112487927B (en) | 2024-02-13 |
Family
ID=74934952
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011344887.8A Active CN112487927B (en) | 2020-11-26 | 2020-11-26 | Method and system for realizing indoor scene recognition based on object associated attention |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112487927B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113470048B (en) * | 2021-07-06 | 2023-04-25 | 北京深睿博联科技有限责任公司 | Scene segmentation method, device, equipment and computer readable storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110084128A (en) * | 2019-03-29 | 2019-08-02 | 安徽艾睿思智能科技有限公司 | Scene chart generation method based on semantic space constraint and attention mechanism |
CN110245665A (en) * | 2019-05-13 | 2019-09-17 | 天津大学 | Image, semantic dividing method based on attention mechanism |
CN111932553A (en) * | 2020-07-27 | 2020-11-13 | 北京航空航天大学 | Remote sensing image semantic segmentation method based on area description self-attention mechanism |
-
2020
- 2020-11-26 CN CN202011344887.8A patent/CN112487927B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110084128A (en) * | 2019-03-29 | 2019-08-02 | 安徽艾睿思智能科技有限公司 | Scene chart generation method based on semantic space constraint and attention mechanism |
CN110245665A (en) * | 2019-05-13 | 2019-09-17 | 天津大学 | Image, semantic dividing method based on attention mechanism |
CN111932553A (en) * | 2020-07-27 | 2020-11-13 | 北京航空航天大学 | Remote sensing image semantic segmentation method based on area description self-attention mechanism |
Also Published As
Publication number | Publication date |
---|---|
CN112487927A (en) | 2021-03-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110097568B (en) | Video object detection and segmentation method based on space-time dual-branch network | |
CN108133188B (en) | Behavior identification method based on motion history image and convolutional neural network | |
WO2020228446A1 (en) | Model training method and apparatus, and terminal and storage medium | |
CN107529650B (en) | Closed loop detection method and device and computer equipment | |
CN108960141B (en) | Pedestrian re-identification method based on enhanced deep convolutional neural network | |
CN111709311B (en) | Pedestrian re-identification method based on multi-scale convolution feature fusion | |
CN110717411A (en) | Pedestrian re-identification method based on deep layer feature fusion | |
CN112614187B (en) | Loop detection method, loop detection device, terminal equipment and readable storage medium | |
CN111639564B (en) | Video pedestrian re-identification method based on multi-attention heterogeneous network | |
Xia et al. | Loop closure detection for visual SLAM using PCANet features | |
CN110633632A (en) | Weak supervision combined target detection and semantic segmentation method based on loop guidance | |
Geng et al. | Combining CNN and MRF for road detection | |
CN111709313B (en) | Pedestrian re-identification method based on local and channel combination characteristics | |
WO2021243947A1 (en) | Object re-identification method and apparatus, and terminal and storage medium | |
CN113920472A (en) | Unsupervised target re-identification method and system based on attention mechanism | |
Alsanad et al. | Real-time fuel truck detection algorithm based on deep convolutional neural network | |
US20230072445A1 (en) | Self-supervised video representation learning by exploring spatiotemporal continuity | |
CN112487927B (en) | Method and system for realizing indoor scene recognition based on object associated attention | |
CN106650814B (en) | Outdoor road self-adaptive classifier generation method based on vehicle-mounted monocular vision | |
CN110942463B (en) | Video target segmentation method based on generation countermeasure network | |
CN111144220B (en) | Personnel detection method, device, equipment and medium suitable for big data | |
CN116824641A (en) | Gesture classification method, device, equipment and computer storage medium | |
CN114581769A (en) | Method for identifying houses under construction based on unsupervised clustering | |
Zhou et al. | FENet: Fast Real-time Semantic Edge Detection Network | |
Xie et al. | Semantic-based traffic video retrieval using activity pattern analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |