CN110852206A - Scene recognition method and device combining global features and local features - Google Patents

Scene recognition method and device combining global features and local features Download PDF

Info

Publication number
CN110852206A
CN110852206A CN201911033329.7A CN201911033329A CN110852206A CN 110852206 A CN110852206 A CN 110852206A CN 201911033329 A CN201911033329 A CN 201911033329A CN 110852206 A CN110852206 A CN 110852206A
Authority
CN
China
Prior art keywords
features
daisy
image
histogram
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911033329.7A
Other languages
Chinese (zh)
Inventor
樊硕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yingpu Technology Co Ltd
Original Assignee
Beijing Yingpu Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yingpu Technology Co Ltd filed Critical Beijing Yingpu Technology Co Ltd
Priority to CN201911033329.7A priority Critical patent/CN110852206A/en
Publication of CN110852206A publication Critical patent/CN110852206A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • G06V10/507Summing image-intensity values; Histogram projection analysis

Abstract

The application discloses a scene recognition method and device combining global features and local features, and relates to the field of computer vision. The method comprises the following steps: feature extraction: for each image of the training data set, extracting DAISY characteristics as local characteristics, and extracting HOG characteristics as global characteristics of the image; image coding: encoding each image as a histogram of visual words using local DAISY features corresponding to each keypoint; constructing a pool device: the DAISY histogram feature is used as a first layer, the global feature after L2 normalization is used as a second layer, and the two layers are connected in series to form a mixed feature; scene recognition: the classifier is trained with the hybrid features to form a final scene recognizer, which is used for scene recognition. The device comprises: the device comprises a feature extraction module, an image coding module, a pooling device construction module and a scene identification module. The method and the device can improve the accuracy of scene recognition.

Description

Scene recognition method and device combining global features and local features
Technical Field
The present application relates to the field of computer vision, and in particular, to a scene recognition method and apparatus combining global features and local features.
Background
Scene recognition is a hot problem in the field of computer vision, and the research goal is to process video or image information and automatically recognize scene information in the video or image, and the method has rich application fields, such as automatic monitoring, human-computer interaction, video indexing, image indexing and the like.
The feature extraction types in scene recognition are classified into three categories, namely a bottom-layer feature method, a middle-layer semantic method and a high-layer feature method. The bottom layer features are basic features for describing image color, shape, texture and the like, the feature form is simple and easy to obtain, a scene is regarded as an object with a structure and a shape, the whole information of the scene is represented by analyzing spectral information, and the features are suitable for outdoor scene recognition with low complexity. The middle-layer semantic method is a method for combining features to form a new feature, aims to solve the semantic gap existing between the features and the semantics, is generally realized by depending on a visual bag-of-words model, has the main defect that spatial information is ignored, and has the recognition effect greatly depending on the performance of the selected features. The high-level features are more complex and closer to image semantics, are generally combined and constructed on the basis of the bottom-level features, are richer in expressive force relative to the bottom-level features, can also be used for processing the scene classification problem with a large number of categories, are closer to the real semantics of the image and also contain more scene information, but are higher in general dimensionality, more complex in calculation and extraction, and can be used in the scene identification problem.
Therefore, the three types of feature extraction methods have the advantages and the disadvantages, different feature extraction methods can be adopted according to different application requirements, and in the traditional scene identification method, the bottom-layer features or the high-layer features are generally used more, and the methods are easy to understand and simple and feasible. However, the three feature extraction methods do not fully consider the feature information of the image, and cannot represent richer image scene information, so that the accuracy of scene identification is reduced.
Disclosure of Invention
It is an object of the present application to overcome the above problems or to at least partially solve or mitigate the above problems.
According to an aspect of the present application, there is provided a scene recognition method combining global features and local features, including:
feature extraction: for each image of a training data set, extracting DAISY features corresponding to key points detected from the image, taking the DAISY features as local features of the image, extracting standard HOG features corresponding to the whole image at different granularities, and taking the HOG features as global features of the image;
image coding: encoding each image as a histogram of visual words using local DAISY features corresponding to each keypoint;
constructing a double-layer pooling device: adopting DAISY histogram features as a first layer of the pooling device, adopting an L2 normalized global feature as a second layer of the pooling device, and connecting the L2 normalized global feature and the corresponding DAISY histogram features in series to form a mixed feature; the method for constructing the DAISY histogram features comprises the following steps: constructing a histogram by representing the frequency of each visual word in each image according to the DAISY characteristics, selecting each key point in the image, determining the DAISY characteristics corresponding to the key point, searching cluster IDs corresponding to the DAISY characteristics, and performing L2 normalization on the obtained histogram to form the DAISY histogram characteristics;
scene recognition: and training the classifier by using the mixed features to form a final scene recognizer, and performing scene recognition by using the scene recognizer.
Optionally, the DAISY feature utilizes a sketch library in python to perform feature extraction.
Optionally, the encoding each image into a histogram of visual words using the local DAISY feature corresponding to each keypoint comprises:
quantizing the DAISY characteristics into 'K' clusters by adopting a Mini-BatchKMeans algorithm so as to form 'visual words' in a vocabulary table, wherein K represents the size of a vocabulary amount;
a histogram with 'K' as the dimension is formed using the vocabulary.
Optionally, the classifier is an SVM classifier.
Optionally, K has a value of 700.
According to another aspect of the present application, there is provided a scene recognition apparatus combining global features and local features, including:
a feature extraction module configured to extract, for each image of the training data set, DAISY features corresponding to key points detected from the image and use the DAISY features as local features of the image, extract standard HOG features corresponding to the entire image at different granularities, and use the HOG features as global features of the image;
an image encoding module configured to encode each image as a histogram of visual words using local DAISY features corresponding to each keypoint;
a pooling device building module configured to use a DAISY histogram feature as a first layer of the pooling device, use an L2 normalized global feature as a second layer of the pooling device, and connect the L2 normalized global feature in series with a corresponding DAISY histogram feature to form a hybrid feature; the method for constructing the DAISY histogram features comprises the following steps: constructing a histogram by representing the frequency of each visual word in each image according to the DAISY characteristics, selecting each key point in the image, determining the DAISY characteristics corresponding to the key point, searching cluster IDs corresponding to the DAISY characteristics, and performing L2 normalization on the obtained histogram to form the DAISY histogram characteristics;
and the scene recognition module is configured to train the classifier by utilizing the mixed features to form a final scene recognizer, and the scene recognizer is utilized for scene recognition.
Optionally, the DAISY feature utilizes a sketch library in python to perform feature extraction.
Optionally, the encoding each image into a histogram of visual words using the local DAISY feature corresponding to each keypoint comprises:
quantizing the DAISY characteristics into 'K' clusters by adopting a Mini-BatchKMeans algorithm so as to form 'visual words' in a vocabulary table, wherein K represents the size of a vocabulary amount;
a histogram with 'K' as the dimension is formed using the vocabulary.
Optionally, the classifier is an SVM classifier.
Optionally, K has a value of 700.
According to the scene recognition method and device combining the global features and the local features, the DAISY features are used as the local features of the images, the local features are clustered through the Mini-Batch KMeans algorithm to form the visual word bag, the HOG features are used as the global features of the images and are connected with the corresponding DAISY histogram features in series to form the mixed features to comprehensively represent the image features, and therefore the accuracy of scene recognition can be improved.
The above and other objects, advantages and features of the present application will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.
Drawings
Some specific embodiments of the present application will be described in detail hereinafter by way of illustration and not limitation with reference to the accompanying drawings. The same reference numbers in the drawings identify the same or similar elements or components. Those skilled in the art will appreciate that the drawings are not necessarily drawn to scale. In the drawings:
FIG. 1 is a schematic flow chart diagram of a method for scene recognition combining global and local features according to one embodiment of the present application;
FIG. 2 is a schematic flow chart of the method shown in FIG. 1 for encoding each image as a histogram of visual words using local DAISY features corresponding to each keypoint;
FIG. 3 is a block diagram of a scene recognition apparatus that combines global features and local features according to an embodiment of the present application;
FIG. 4 is a block schematic diagram of a computing device of one embodiment of the present application;
fig. 5 is a schematic block diagram of a computer-readable storage medium according to an embodiment of the present application.
Detailed Description
Fig. 1 is a schematic flow chart of a scene recognition method combining global features and local features according to an embodiment of the present application. The method may generally include:
s1, feature extraction: for each image of a training data set, extracting DAISY features corresponding to key points detected from the image, taking the DAISY features as local features of the image, extracting standard HOG features corresponding to the whole image at different granularities, and taking the HOG features as global features of the image;
s2, image coding: each image is encoded as a histogram of visual words using the local DAISY features corresponding to each keypoint, as shown in fig. 2:
s21, image coding uses local DAISY characteristics corresponding to each key point to code each image into a histogram of visual words, specifically, a Mini-Batch KMeans algorithm is applied to quantize the DAISY characteristics extracted from all training images into 'K' clusters so as to form 'visual words' in a vocabulary table, wherein K represents the size of vocabulary, the Mini-Batch KMeans algorithm has the advantage that the calculation time can be greatly reduced on the premise of keeping the clustering accuracy as much as possible, the algorithm adopts a small-Batch data subset to reduce the calculation time, and simultaneously tries to optimize an objective function, the value of the number K of the optimal visual words is determined to be 700 through cross validation and experience;
s22, forming a histogram having 'K' as a dimension by using the vocabulary in correspondence to each image.
S3, constructing a double-layer pooling device: adopting DAISY histogram features as a first layer of the pooling device, adopting an L2 normalized global feature as a second layer of the pooling device, and connecting the L2 normalized global feature and the corresponding DAISY histogram features in series to form a mixed feature; the method for constructing the DAISY histogram features comprises the following steps: selecting each key point in the image, determining a DAISY feature corresponding to the key point, searching a cluster ID corresponding to the DAISY feature, and performing L2 normalization on the histogram obtained in the step S22 by using the cluster ID to form a DAISY histogram feature;
s4, scene recognition: training the classifier by using the mixed features to form a final scene recognizer, and performing scene recognition by using the scene recognizer:
through the three steps, the mixed feature description of the training data set can be obtained, the classifier is trained by using the mixed feature to form a final scene recognizer, the SVM classifier is used as the scene recognizer, and cross verification is performed by randomly splitting the data set into a training and verification set. Experimental results prove that the scene recognition method combining the global features and the local features can comprehensively extract the global and local information of the image, and the accuracy of scene recognition is improved.
Fig. 3 is a schematic structural block diagram of a scene recognition apparatus combining global features and local features according to an embodiment of the present application. The apparatus may generally include: the device comprises a feature extraction module 1, an image coding module 2, a pooling device construction module 3 and a scene identification module 4.
The feature extraction module 1 is configured to extract, for each image of a training data set, DAISY features corresponding to key points detected from the image and using the DAISY features as local features of the image, extract standard HOG features corresponding to the entire image at different granularities, and use the HOG features as global features of the image;
we use DAISY features extracted from all training images to form visual word bags by clustering them using the Mini-Batch KMeans algorithm, which is a clustering model that can maintain clustering accuracy as much as possible but can reduce computation time by a large margin, using a small Batch of data subsets to reduce computation time while still trying to optimize the objective function, and we define the number of best visual words (K) empirically by cross-validation in this application, where we define K as 700.
The image encoding module 2 is configured to encode each image into a histogram of visual words using the local DAISY features corresponding to each keypoint, as follows:
the image coding utilizes local DAISY characteristics corresponding to each key point to code each image into a histogram of visual words, specifically, a Mini-Batch KMeans algorithm is applied to quantize the DAISY characteristics extracted from all training images into 'K' clusters so as to form 'visual words' in a vocabulary table, wherein K represents the size of word aggregates, and the Mini-Batch KMeans algorithm has the advantages that on the premise of keeping clustering accuracy as much as possible, the calculation time can be greatly reduced, the algorithm adopts a small-Batch data subset to reduce the calculation time, and simultaneously still tries to optimize an objective function, the value of the number K of the optimal visual words is determined to be 700 through cross validation and experience;
a histogram having 'K' as a dimension is formed by using the vocabulary table corresponding to each image.
The pooling device constructing module 3 is configured to adopt a DAISY histogram feature as a first layer of the pooling device, adopt an L2 normalized global feature as a second layer of the pooling device, and connect the L2 normalized global feature in series with a corresponding DAISY histogram feature to form a mixed feature; the method for constructing the DAISY histogram features comprises the following steps: selecting each key point in the image, determining a DAISY feature corresponding to the key point, searching a cluster ID corresponding to the DAISY feature, and performing L2 normalization on the histogram obtained by the image coding module 2 by using the cluster ID to form a DAISY histogram feature;
the scene recognition module 4 is configured to train the classifier using the mixed features to form a final scene recognizer, and perform scene recognition using the scene recognizer:
through the feature extraction module 1, the image coding module 2 and the pooling device construction module 3, mixed feature description of a training data set can be obtained, a classifier is trained by using the mixed features to form a final scene recognizer, the SVM classifier is used as the scene recognizer in the embodiment, and cross validation is performed by randomly splitting the data set into a training and validation set. Experimental results prove that the scene recognition method combining the global features and the local features can comprehensively extract the global and local information of the image, and the accuracy of scene recognition is improved.
Embodiments also provide a computing device, referring to fig. 4, comprising a memory 1120, a processor 1110 and a computer program stored in said memory 1120 and executable by said processor 1110, the computer program being stored in a space 1130 for program code in the memory 1120, the computer program, when executed by the processor 1110, implementing the method steps 1131 for performing any of the methods according to the invention.
The embodiment of the application also provides a computer readable storage medium. Referring to fig. 5, the computer readable storage medium comprises a storage unit for program code provided with a program 1131' for performing the steps of the method according to the invention, which program is executed by a processor.
The embodiment of the application also provides a computer program product containing instructions. Which, when run on a computer, causes the computer to carry out the steps of the method according to the invention.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed by a computer, cause the computer to perform, in whole or in part, the procedures or functions described in accordance with the embodiments of the application. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by a program, and the program may be stored in a computer-readable storage medium, where the storage medium is a non-transitory medium, such as a random access memory, a read only memory, a flash memory, a hard disk, a solid state disk, a magnetic tape (magnetic tape), a floppy disk (floppy disk), an optical disk (optical disk), and any combination thereof.
The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A scene recognition method combining global features and local features comprises the following steps:
feature extraction: for each image of a training data set, extracting DAISY features corresponding to key points detected from the image, taking the DAISY features as local features of the image, extracting standard HOG features corresponding to the whole image at different granularities, and taking the HOG features as global features of the image;
image coding: encoding each image as a histogram of visual words using local DAISY features corresponding to each keypoint;
constructing a double-layer pooling device: adopting DAISY histogram features as a first layer of the pooling device, adopting an L2 normalized global feature as a second layer of the pooling device, and connecting the L2 normalized global feature and the corresponding DAISY histogram features in series to form a mixed feature; the method for constructing the DAISY histogram features comprises the following steps: constructing a histogram by representing the frequency of each visual word in each image according to the DAISY characteristics, selecting each key point in the image, determining the DAISY characteristics corresponding to the key point, searching cluster IDs corresponding to the DAISY characteristics, and performing L2 normalization on the obtained histogram to form the DAISY histogram characteristics;
scene recognition: and training the classifier by using the mixed features to form a final scene recognizer, and performing scene recognition by using the scene recognizer.
2. The method as claimed in claim 1, wherein the DAISY feature is extracted using a sketch library in python.
3. The method of claim 1 or 2, wherein encoding each image as a histogram of visual words using local DAISY features corresponding to each keypoint comprises:
quantizing the DAISY characteristics into 'K' clusters by adopting a Mini-BatchKMeans algorithm so as to form 'visual words' in a vocabulary table, wherein K represents the size of a vocabulary amount;
a histogram with 'K' as the dimension is formed using the vocabulary.
4. A method according to any one of claims 1-3, wherein the classifier is a SVM classifier.
5. The method of any one of claims 1-4, wherein K has a value of 700.
6. A scene recognition apparatus that combines global features and local features, comprising:
a feature extraction module configured to extract, for each image of the training data set, DAISY features corresponding to key points detected from the image and use the DAISY features as local features of the image, extract standard HOG features corresponding to the entire image at different granularities, and use the HOG features as global features of the image;
an image encoding module configured to encode each image as a histogram of visual words using local DAISY features corresponding to each keypoint;
a pooling device building module configured to use a DAISY histogram feature as a first layer of the pooling device, use an L2 normalized global feature as a second layer of the pooling device, and connect the L2 normalized global feature in series with a corresponding DAISY histogram feature to form a hybrid feature; the method for constructing the DAISY histogram features comprises the following steps: constructing a histogram by representing the frequency of each visual word in each image according to the DAISY characteristics, selecting each key point in the image, determining the DAISY characteristics corresponding to the key point, searching cluster IDs corresponding to the DAISY characteristics, and performing L2 normalization on the obtained histogram to form the DAISY histogram characteristics;
and the scene recognition module is configured to train the classifier by utilizing the mixed features to form a final scene recognizer, and the scene recognizer is utilized for scene recognition.
7. The apparatus of claim 6 wherein the DAISY features are extracted using a sketch library in python.
8. The apparatus of claim 6 or 7, wherein encoding each image as a histogram of visual words using local DAISY features corresponding to each keypoint comprises:
quantizing the DAISY characteristics into 'K' clusters by adopting a Mini-BatchKMeans algorithm so as to form 'visual words' in a vocabulary table, wherein K represents the size of a vocabulary amount;
a histogram with 'K' as the dimension is formed using the vocabulary.
9. The apparatus of any one of claims 6-8, wherein said classifier is an SVM classifier.
10. The apparatus of any one of claims 6-9, wherein K has a value of 700.
CN201911033329.7A 2019-10-28 2019-10-28 Scene recognition method and device combining global features and local features Pending CN110852206A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911033329.7A CN110852206A (en) 2019-10-28 2019-10-28 Scene recognition method and device combining global features and local features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911033329.7A CN110852206A (en) 2019-10-28 2019-10-28 Scene recognition method and device combining global features and local features

Publications (1)

Publication Number Publication Date
CN110852206A true CN110852206A (en) 2020-02-28

Family

ID=69598952

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911033329.7A Pending CN110852206A (en) 2019-10-28 2019-10-28 Scene recognition method and device combining global features and local features

Country Status (1)

Country Link
CN (1) CN110852206A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699855A (en) * 2021-03-23 2021-04-23 腾讯科技(深圳)有限公司 Image scene recognition method and device based on artificial intelligence and electronic equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596195A (en) * 2018-05-09 2018-09-28 福建亿榕信息技术有限公司 A kind of scene recognition method based on sparse coding feature extraction

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596195A (en) * 2018-05-09 2018-09-28 福建亿榕信息技术有限公司 A kind of scene recognition method based on sparse coding feature extraction

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JOBIN WILSON等: ""Scene Recognition by Combining Local and Global Image Descriptors"", 《ARXIV》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699855A (en) * 2021-03-23 2021-04-23 腾讯科技(深圳)有限公司 Image scene recognition method and device based on artificial intelligence and electronic equipment
CN112699855B (en) * 2021-03-23 2021-10-22 腾讯科技(深圳)有限公司 Image scene recognition method and device based on artificial intelligence and electronic equipment

Similar Documents

Publication Publication Date Title
Husain et al. REMAP: Multi-layer entropy-guided pooling of dense CNN features for image retrieval
US10489681B2 (en) Method of clustering digital images, corresponding system, apparatus and computer program product
CN110175249A (en) A kind of search method and system of similar pictures
CN116049412B (en) Text classification method, model training method, device and electronic equipment
CN112804558B (en) Video splitting method, device and equipment
CN114282059A (en) Video retrieval method, device, equipment and storage medium
CN113593661A (en) Clinical term standardization method, device, electronic equipment and storage medium
CN114332500A (en) Image processing model training method and device, computer equipment and storage medium
CN108090117B (en) A kind of image search method and device, electronic equipment
CN115062709A (en) Model optimization method, device, equipment, storage medium and program product
CN110688515A (en) Text image semantic conversion method and device, computing equipment and storage medium
CN116343190B (en) Natural scene character recognition method, system, equipment and storage medium
CN110852206A (en) Scene recognition method and device combining global features and local features
CN113704534A (en) Image processing method and device and computer equipment
CN116957036A (en) Training method, training device and computing equipment for fake multimedia detection model
CN115187910A (en) Video classification model training method and device, electronic equipment and storage medium
CN115222047A (en) Model training method, device, equipment and storage medium
CN115240647A (en) Sound event detection method and device, electronic equipment and storage medium
Hua et al. Cross-modal correlation learning with deep convolutional architecture
CN113408282A (en) Method, device, equipment and storage medium for topic model training and topic prediction
CN115331062B (en) Image recognition method, image recognition device, electronic device and computer-readable storage medium
Zhang et al. Short video fingerprint extraction: from audio–visual fingerprint fusion to multi-index hashing
CN111625672B (en) Image processing method, image processing device, computer equipment and storage medium
CN116977888A (en) Video processing method, apparatus, device, storage medium, and computer program product
Tong et al. A compact discriminant hierarchical clustering approach for action recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200228