CN112487927B - Method and system for realizing indoor scene recognition based on object associated attention - Google Patents

Method and system for realizing indoor scene recognition based on object associated attention Download PDF

Info

Publication number
CN112487927B
CN112487927B CN202011344887.8A CN202011344887A CN112487927B CN 112487927 B CN112487927 B CN 112487927B CN 202011344887 A CN202011344887 A CN 202011344887A CN 112487927 B CN112487927 B CN 112487927B
Authority
CN
China
Prior art keywords
feature
feature vector
expression
objects
input image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011344887.8A
Other languages
Chinese (zh)
Other versions
CN112487927A (en
Inventor
苗博
周立广
林天麟
徐扬生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chinese University of Hong Kong Shenzhen
Shenzhen Institute of Artificial Intelligence and Robotics
Original Assignee
Chinese University of Hong Kong Shenzhen
Shenzhen Institute of Artificial Intelligence and Robotics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chinese University of Hong Kong Shenzhen, Shenzhen Institute of Artificial Intelligence and Robotics filed Critical Chinese University of Hong Kong Shenzhen
Priority to CN202011344887.8A priority Critical patent/CN112487927B/en
Publication of CN112487927A publication Critical patent/CN112487927A/en
Application granted granted Critical
Publication of CN112487927B publication Critical patent/CN112487927B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/35Categorising the entire scene, e.g. birthday party or wedding scene
    • G06V20/36Indoor scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Abstract

The invention discloses an indoor scene recognition realization method and system based on object associated attention, wherein the method comprises the following steps: extracting semantic feature vectors of each spatial position in an input image through a backbone network; the semantic feature vectors of all the spatial positions form a feature map according to the spatial positions and are transmitted to a segmentation module to calculate the probability that each spatial position in the input image belongs to different objects; calculating the feature vector of each object through the object feature aggregation module, multiplying the feature vector of all the spatial positions of each object by the probability that the spatial positions belong to the object, and carrying out weighted average so as to obtain the feature vector expression of each object. According to the method and the system for realizing the indoor scene recognition based on the object-associated attention, the object feature aggregation module is used for detecting all object features on the input image aiming at the fact that objects are different in different scenes, so that information contained in the image is better expressed.

Description

Method and system for realizing indoor scene recognition based on object associated attention
Technical Field
The invention relates to an intelligent recognition method and a software system, in particular to an object associated attention characteristic recognition method and system improvement aiming at indoor scene recognition.
Background
In the prior art, the perception capability of environmental information is an indispensable capability of a robot, and accurate perception of surrounding scenes is helpful for the robot to make correct judgment and behaviors.
With advances in technology and computing, a number of scene recognition algorithms based on deep learning have been proposed. Herranze et al found that feature extraction required adaptation to different scales of the image and identified the scene by multi-scale fusion of features obtained from models trained on different datasets, see pages 571-579 of CVPR 2016, scene Recognition with CNNs: objects, scales and Dataset Bias (CVPR is an abbreviation for IEEE Conference on Computer Vision and Pattern Recognition, IEEE international conference on computer vision and pattern recognition).
However, the enhancement of scene recognition effects based solely on picture global information is limited, as these methods are not only semantically difficult to interpret, but are also easily interfered with by common objects existing across scenes.
Thus, some students attempt to combine context information and local object associations to achieve scene recognition. Lfupez-cifues et al obtain context information through Semantic segmentation to help eliminate divergence from common objects in different scenarios, see Pattern Recognition, vol.102, pages 107-256, semantic-Aware Scene Recognition.
Wang et al trains Patchet Net based on weak supervision training, guides local feature extraction based thereon, and finally aggregates local features based on semantic probability to achieve scene recognition, see Pattern Recognition, vol.26, pages 2028-2041, weakly Supervised Patchnets: describing and Aggregating Local Patches for Scene Recognition.
At the same time, there are many studies to enhance the scene understanding capabilities of models by combining multi-modal features. However, in the prior art, the indoor scene recognition method is mostly realized by combining the manual setting feature with the global feature, so that the calculation amount is large, and the relationship between objects cannot be effectively learned, so that the scene can be accurately recognized.
Accordingly, the prior art is still in need of improvement and development.
Disclosure of Invention
The invention aims to provide an indoor scene recognition implementation method and system based on object association attention, and provides a rapid and accurate object association recognition implementation method and system aiming at the problems of inaccurate overall recognition and excessive redundancy of a network structure in the prior art.
The technical scheme of the invention is as follows:
an indoor scene recognition implementation method based on object associated attention comprises the following steps:
A. extracting semantic feature vectors of each spatial position in an input image through a backbone network;
B. the semantic feature vectors of all the spatial positions form a feature map according to the spatial positions and are transmitted to a segmentation module to calculate the probability that each spatial position in the input image belongs to different objects;
C. calculating the feature vector of each object through an object feature aggregation module, multiplying the feature vector of all the spatial positions of each object by the probability that the spatial positions belong to the object, and carrying out weighted average so as to obtain the feature vector expression of each object;
D. and splicing the feature vectors of all the objects to form the object feature expression of the input image.
According to the method for realizing the indoor scene recognition based on the object-associated attention, the backbone network and the object feature aggregation module can calculate feature expressions of different objects based on feature hidden vectors of different spatial positions.
The method for realizing the indoor scene recognition based on the object associated attention, wherein the method further comprises the following steps after the step D:
E. the object feature expressions are input to a lightweight object-associated attention module, which is implemented using neural networks to calculate relationships between objects.
The method for realizing the indoor scene recognition based on the object associated attention, wherein the step E specifically further comprises the following steps:
and E1, calculating a relation feature vector expression of each object and all other objects based on a neural network and cosine similarity by the light-weight object association attention module, and splicing the relation feature vector expression into a feature vector expression of the object.
The method for realizing the indoor scene recognition based on the object associated attention, wherein the step E further comprises the following steps:
F. the feature vector expression of the object and the relation feature vector expression are input to a global association aggregation module to aggregate the relation among all objects and form a common feature expression vector of all objects.
The method for realizing the indoor scene recognition based on the object associated attention, wherein the step F further comprises the following steps:
G. and inputting the common characteristic expression vector to a classification and identification module of the neural network full-connection layer, so as to identify the scene to which the input image belongs.
An indoor scene recognition implementation system based on object-associated attention, comprising:
a backbone network for extracting semantic feature vectors of each spatial location for the input image;
the segmentation module is used for forming a feature map of semantic feature vectors of all the spatial positions according to the spatial positions of the semantic feature vectors, and calculating the probability that each spatial position in the input image belongs to different objects;
the object feature aggregation module is used for calculating the feature vector of each object, multiplying the feature vector of all the spatial positions of each object by the probability that the spatial positions belong to the object and carrying out weighted average so as to obtain the feature vector expression of each object;
the object feature aggregation module is also used for splicing feature vectors of all objects to form object feature expression of the input image.
The system for realizing the recognition of the indoor scene based on the object associated attention comprises the following components: and the light-weight object associated attention module is used for calculating the relation feature vector expression of each object and all other objects based on the neural network and the cosine similarity, and splicing the relation feature vector expression into the feature vector expression of the object.
The system for realizing the recognition of the indoor scene based on the object associated attention comprises the following components: and the global association aggregation module is used for taking the characteristic vector expression of the object and the relation characteristic vector expression as input, and aggregating the relation among all the objects to form a common characteristic expression vector of all the objects.
The system for realizing the recognition of the indoor scene based on the object associated attention comprises the following components: and the classification and identification module is used for inputting the common characteristic expression vector and identifying the scene to which the input image belongs.
According to the method and the system for realizing indoor scene recognition based on the object associated attention, provided by the invention, aiming at the fact that objects exist in different scenes, all object features on an input image are detected by using the object feature aggregation module, so that information contained in the image is better expressed. Meanwhile, aiming at different coexisting object distributions in different scenes, a light-weight object associated attention module and a global associated aggregation module are used for learning and aggregating the relation of objects, and finally a common feature expression vector is generated so as to facilitate the subsequent classification module to identify different scenes. The method is high in efficiency, accurate in identification and suitable for identifying and judging different indoor scenes.
Drawings
Fig. 1 is a block diagram and a flowchart illustrating a preferred embodiment of a method and a system for implementing indoor scene recognition based on object-related attention according to the present invention.
Fig. 2 is a schematic diagram illustrating an object feature aggregation module processing example of a preferred embodiment of the method and system for implementing indoor scene recognition based on object-related attention according to the present invention.
Fig. 3 is a schematic diagram of an example of a light-weight object-associated attention module according to a preferred embodiment of the method and system for implementing object-associated attention-based indoor scene recognition.
Fig. 4 is a schematic diagram of a global association aggregation module according to a preferred embodiment of the method and system for implementing indoor scene recognition based on object associated attention.
Detailed Description
The preferred embodiments of the present invention are described in detail below.
According to the method and the system for realizing the indoor scene recognition based on the object associated attention, in the recognition processing of the neural network, the distribution of the coexisting objects in different scenes is found to be different through analysis, so that the indoor scene recognition performance can be improved through learning the object relationship. Therefore, the invention provides the object feature aggregation module for detecting and extracting the features of all objects on the picture, learning the relation among the objects through the proposed light-weight object associated attention module, and finally, aggregating the object features and the object relation through the global associated aggregation module, and realizing the identification of indoor scenes through full connection. The invention realizes scene recognition in a brand new angle, and is more effective than the method in the prior art.
In the preferred embodiment of the method and system for recognizing indoor scene based on object-related attention according to the present invention, as shown in fig. 1, the input image may be a still image acquired by a camera or one of the frames of images captured from a video. Extracting high-level semantic feature vectors of each position of the input image through a backbone network capable of extracting feature hidden vectors of different spatial positions, and then forming feature graphs according to the semantic feature vectors of all the spatial positions and transmitting the feature graphs to a segmentation module to calculate the probability that each spatial position of the upper input image belongs to different objects.
Based on the feature map calculated by the backbone network and the object attribution probability map calculated by the segmentation module, a newly proposed object feature aggregation module is then used to calculate feature vectors of each object, and the implementation process of the module is to multiply feature vectors of all spatial positions of each object by the probability that the spatial positions belong to the object and then perform weighted average, so as to obtain feature vector expression of each object. And finally, the feature vectors of all the objects are spliced to form the object feature expression of the picture.
And then inputting the object feature expression into a light-weight object associated attention module newly proposed by the invention for calculating the relation between the objects, wherein the light-weight object associated attention module calculates the relation feature vector expression of each object and all other objects based on the neural network and cosine similarity, and splices the relation feature vector expression into the feature vector expression of the object, thereby enriching the object features.
The invention further inputs the object self feature vector expression and the object relation feature vector expression into a newly proposed global association aggregation module for aggregating the relations among all objects, thereby forming a common feature expression vector of all objects on the input image. And finally, inputting the feature expression vector into a classification recognition module formed by the neural network full-connection layer to recognize which scene the picture belongs to.
Specifically, after the camera acquires an input image of an indoor scene, object feature analysis is performed on all spatial positions of the input image, so that scene judgment is performed through all object features contained in the image. The specific judging process does not depend on local object features, and judges all object relationships in the input image at the same time, so that indoor scenes, such as a kitchen, a bedroom or a living room and a restaurant, can be judged more accurately and effectively, and the interference of the object features commonly appearing among different scenes is prevented by identifying the relationship features among the objects, so that the scene identification is more accurate.
Fig. 2 shows a preferred implementation example of the object feature aggregation module in the method and system for implementing indoor scene recognition based on object-related attention according to the present invention. In order to effectively extract the object features in the input image, the invention provides an implementation scheme of the object feature aggregation module in fig. 2. Firstly, calculating a spatial position feature map F and an object attribution probability map S based on a backbone network of scene segmentation aiming at an input image, and then carrying out weighted summation on all spatial position feature vectors of each object and object attribution probabilities of corresponding positions to obtain a feature vector expression O of the object.
Finally, the feature vector expressions of all the objects are spliced to obtain the object feature expression of the input image, wherein the non-existing objects are all zero vectors, different feature vectors exist for the existing objects, and the final feature dimension is 1024x150x1 as shown in an example in fig. 2.
The calculation method of each object feature vector in the object feature aggregation module is shown in the following chart, wherein Oj represents the feature vector of an object j, bij represents whether the ith pixel position belongs to the object j with the highest probability, sij represents the probability of the ith pixel position belonging to the object j, and Fi represents the feature vector of the ith pixel position. Finally, the following calculation formula is adopted to determine the feature vector expression of each object:
thus, the object feature vector expression is calculated for each divided region of the input image, and the judgment of different objects can be realized, but it is difficult to express the coexistence relation of the objects only by the object features.
Accordingly, the present invention further provides a lightweight object-related attention module for calculating coexistence relationships between objects, as shown in fig. 3.
In order to effectively migrate the characteristics of objects in scene segmentation for scene recognition and learn potential relationships between objects, the invention further provides a lightweight object-associated attention module. The lightweight object-associated attention module is comprised of one or more lightweight object-associated attention block cascades as shown in fig. 3, implemented as a neural network. Wherein Q can refine object features with higher semantic information while reducing data dimension, compared with existing methods in the prior art, the calculation of K and V based on Q not only reduces the calculation amount by 50%, but also enables the dimensions of Q, K, V and output features to be controlled simultaneously by adjusting the value of alpha only. K and V can obtain the relation expression of each object through matrix multiplication, and finally, the object relation and the original object characteristics are spliced and then output to the next module, so that the object relation and the object characteristics are subjected to characteristic vector expression.
In order to aggregate object features and relationships into hidden vector expressions with as few parameters and calculations as possible (too complex parameters and calculations would result in inefficient processing and difficult extraction of critical information), the present invention proposes a global associative aggregation module, as shown in fig. 4, in which a strip-like depth convolution is used, which models object features that have no positional relationship compared to the block convolutions in conventional depth convolutions. The module first aggregates the information of all objects in each channel with a strip-like depth convolution of 150x1 for each channel of the feature.
However, the information between the channels is not circulated, so that the information between all channels is aggregated by using the point convolution of 1x1, and thus, an advanced semantic feature expression vector is generated to express the scene information. And finally, transmitting the final scene expression vector, namely the object feature vector expression, to a general full-connection layer, namely a classification recognition module, so as to obtain a final scene recognition result of the input image or picture.
In the method and the system for realizing the indoor scene recognition based on the object-related attention, a brand new object feature aggregation method is adopted in the object feature aggregation module, namely, after a space position feature map F and an object attribution probability map S are calculated based on a scene segmentation algorithm, the feature vector expression O of the object is obtained by weighting and summing the feature vectors of all the space positions of each object and the object attribution probabilities of the corresponding space positions. And finally, splicing the feature vectors of all the objects to obtain the object feature expression of the image.
And secondly, the invention further provides an object associated attention module, and compared with the traditional attention module, the light object associated attention module is lighter when learning the object relationship, and the number of the output characteristic channels can be controlled at will.
In the method and the system for realizing the indoor scene recognition based on the object associated attention, the invention further provides a global associated aggregation module, and compared with the block convolution in the traditional depth convolution, the bar depth convolution adopts a bar depth convolution mode, and the bar depth convolution aggregates the characteristics and the relations of all objects, so that even if the input of the bar depth convolution does not have space position information, the bar depth convolution can aggregate to form the final characteristic vector expression of all the objects.
According to the method and the system for realizing the recognition of the indoor scene based on the object associated attention, the object feature aggregation and the light associated attention processing method and the light associated attention processing module are adopted, so that the calculation efficiency and accuracy meeting the actual calculation amount requirement are realized, and the recognition and judgment process of the indoor scene of the input image is facilitated.
It will be understood that modifications and variations will be apparent to those skilled in the art from the foregoing description, and it is intended that all such modifications and variations be included within the scope of the following claims.

Claims (3)

1. An indoor scene recognition implementation method based on object associated attention comprises the following steps:
A. extracting semantic feature vectors of each spatial position in an input image through a backbone network;
B. the semantic feature vectors of all the spatial positions form a feature map according to the spatial positions and are transmitted to a segmentation module to calculate the probability that each spatial position in the input image belongs to different objects;
C. calculating the feature vector expression of each object through an object feature aggregation module, multiplying the semantic feature vector of all the spatial positions of each object by the probability that the spatial positions belong to the object, and carrying out weighted average so as to obtain the feature vector expression of each object;
c1, the calculation method of each object feature vector expression in the object feature aggregation module specifically comprises the following steps:
wherein, in O j Feature vector representation representing object j, B ij Indicating whether the ith pixel position belongs to the object j, S with the highest probability ij Representing the probability of the ith pixel position belonging to object j, F i Representing the semantic feature vector of the ith pixel position, so as to calculate the object feature vector expression of each divided area of the input image, and realize the judgment of different objects;
D. splicing the feature vector expressions of all the objects to form the object feature expression of the input image;
E. inputting the object feature expression into a light-weight object associated attention module, wherein the light-weight object associated attention module is realized by a neural network and is used for calculating the relation between objects;
e1, the light-weight object associated attention module calculates the relation feature vector expression of each object and all other objects based on a neural network and cosine similarity, and splices the relation feature vector expression into the feature vector expression of the object;
F. inputting the feature vector expression of the object and the relation feature vector expression into a global association aggregation module to aggregate the relation among all the objects and form a common feature expression vector of all the objects;
G. and inputting the common characteristic expression vector to a classification and identification module of the neural network full-connection layer, so as to identify the scene to which the input image belongs.
2. The method according to claim 1, wherein the backbone network and the object feature aggregation module can calculate feature vector expressions of different objects based on different spatial location feature hidden vectors.
3. An indoor scene recognition implementation system based on object-associated attention, comprising:
a backbone network for extracting semantic feature vectors of each spatial location for the input image;
the segmentation module is used for forming a feature map of semantic feature vectors of all the spatial positions according to the spatial positions of the semantic feature vectors, and calculating the probability of each spatial position in the input image corresponding to different objects;
the object feature aggregation module is used for calculating the feature vector expression of each object, multiplying the semantic feature vectors of all the spatial positions of each object by the probability that the spatial positions are the object and carrying out weighted average so as to obtain the feature vector expression of each object;
the calculation method of each object feature vector expression in the object feature aggregation module specifically comprises the following steps:
wherein Oj represents the feature vector expression of the object j, bij represents whether the ith pixel position belongs to the object j with the highest probability, sij represents the probability of the ith pixel position belonging to the object j, fi represents the semantic feature vector of the ith pixel position, so that the object feature vector expression of each segmentation area of an input image is calculated, and the judgment of different objects is realized;
the object feature aggregation module is also used for splicing the feature vector expressions of all objects to form the object feature expression of the input image;
the light-weight object associated attention module is used for calculating the relation feature vector expression of each object and all other objects based on the neural network and cosine similarity, and splicing the relation feature vector expression into the feature vector expression of the object;
the global association aggregation module takes the feature vector expression of the object and the relation feature vector expression as input, and aggregates the relation among all the objects to form a common feature expression vector of all the objects;
and the classification and identification module is used for inputting the common characteristic expression vector and identifying the scene to which the input image belongs.
CN202011344887.8A 2020-11-26 2020-11-26 Method and system for realizing indoor scene recognition based on object associated attention Active CN112487927B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011344887.8A CN112487927B (en) 2020-11-26 2020-11-26 Method and system for realizing indoor scene recognition based on object associated attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011344887.8A CN112487927B (en) 2020-11-26 2020-11-26 Method and system for realizing indoor scene recognition based on object associated attention

Publications (2)

Publication Number Publication Date
CN112487927A CN112487927A (en) 2021-03-12
CN112487927B true CN112487927B (en) 2024-02-13

Family

ID=74934952

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011344887.8A Active CN112487927B (en) 2020-11-26 2020-11-26 Method and system for realizing indoor scene recognition based on object associated attention

Country Status (1)

Country Link
CN (1) CN112487927B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113470048B (en) * 2021-07-06 2023-04-25 北京深睿博联科技有限责任公司 Scene segmentation method, device, equipment and computer readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084128A (en) * 2019-03-29 2019-08-02 安徽艾睿思智能科技有限公司 Scene chart generation method based on semantic space constraint and attention mechanism
CN110245665A (en) * 2019-05-13 2019-09-17 天津大学 Image, semantic dividing method based on attention mechanism
CN111932553A (en) * 2020-07-27 2020-11-13 北京航空航天大学 Remote sensing image semantic segmentation method based on area description self-attention mechanism

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084128A (en) * 2019-03-29 2019-08-02 安徽艾睿思智能科技有限公司 Scene chart generation method based on semantic space constraint and attention mechanism
CN110245665A (en) * 2019-05-13 2019-09-17 天津大学 Image, semantic dividing method based on attention mechanism
CN111932553A (en) * 2020-07-27 2020-11-13 北京航空航天大学 Remote sensing image semantic segmentation method based on area description self-attention mechanism

Also Published As

Publication number Publication date
CN112487927A (en) 2021-03-12

Similar Documents

Publication Publication Date Title
CN110097568B (en) Video object detection and segmentation method based on space-time dual-branch network
CN108133188B (en) Behavior identification method based on motion history image and convolutional neural network
WO2020228446A1 (en) Model training method and apparatus, and terminal and storage medium
CN107529650B (en) Closed loop detection method and device and computer equipment
CN108960141B (en) Pedestrian re-identification method based on enhanced deep convolutional neural network
CN111709311B (en) Pedestrian re-identification method based on multi-scale convolution feature fusion
CN110717411A (en) Pedestrian re-identification method based on deep layer feature fusion
CN112614187B (en) Loop detection method, loop detection device, terminal equipment and readable storage medium
CN111639564B (en) Video pedestrian re-identification method based on multi-attention heterogeneous network
Xia et al. Loop closure detection for visual SLAM using PCANet features
CN110633632A (en) Weak supervision combined target detection and semantic segmentation method based on loop guidance
Geng et al. Combining CNN and MRF for road detection
CN111709313B (en) Pedestrian re-identification method based on local and channel combination characteristics
WO2021243947A1 (en) Object re-identification method and apparatus, and terminal and storage medium
CN113920472A (en) Unsupervised target re-identification method and system based on attention mechanism
Alsanad et al. Real-time fuel truck detection algorithm based on deep convolutional neural network
US20230072445A1 (en) Self-supervised video representation learning by exploring spatiotemporal continuity
CN112487927B (en) Method and system for realizing indoor scene recognition based on object associated attention
CN106650814B (en) Outdoor road self-adaptive classifier generation method based on vehicle-mounted monocular vision
CN110942463B (en) Video target segmentation method based on generation countermeasure network
CN111144220B (en) Personnel detection method, device, equipment and medium suitable for big data
CN116824641A (en) Gesture classification method, device, equipment and computer storage medium
CN114581769A (en) Method for identifying houses under construction based on unsupervised clustering
Zhou et al. FENet: Fast Real-time Semantic Edge Detection Network
Xie et al. Semantic-based traffic video retrieval using activity pattern analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant