WO2022100133A1 - 场景识别方法、装置、智能设备、存储介质和计算机程序 - Google Patents
场景识别方法、装置、智能设备、存储介质和计算机程序 Download PDFInfo
- Publication number
- WO2022100133A1 WO2022100133A1 PCT/CN2021/106936 CN2021106936W WO2022100133A1 WO 2022100133 A1 WO2022100133 A1 WO 2022100133A1 CN 2021106936 W CN2021106936 W CN 2021106936W WO 2022100133 A1 WO2022100133 A1 WO 2022100133A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- feature
- processed
- semantic mask
- semantic
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 91
- 238000004590 computer program Methods 0.000 title claims description 9
- 239000013598 vector Substances 0.000 claims abstract description 87
- 230000002776 aggregation Effects 0.000 claims abstract description 38
- 238000004220 aggregation Methods 0.000 claims abstract description 38
- 238000012545 processing Methods 0.000 claims abstract description 28
- 230000011218 segmentation Effects 0.000 claims description 25
- 238000000605 extraction Methods 0.000 claims description 6
- 238000004422 calculation algorithm Methods 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 230000002452 interceptive effect Effects 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 241001464837 Viridiplantae Species 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000004807 localization Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000001932 seasonal effect Effects 0.000 description 2
- 241000282472 Canis lupus familiaris Species 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000006386 memory function Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24553—Query execution of query operations
- G06F16/24554—Unary operations; Data partitioning operations
- G06F16/24556—Aggregation; Duplicate elimination
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/53—Querying
- G06F16/532—Query formulation, e.g. graphical querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/55—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Definitions
- the present application relates to the technical field of image retrieval, and in particular, designs a scene recognition method, apparatus, smart device, storage medium and computer program.
- Scene recognition has important applications in the field of computer vision, such as Simultaneously Localization And Mapping (SLAM), Structure From Motion (SFM), and Visual Localization (VL).
- SLAM Simultaneously Localization And Mapping
- SFM Structure From Motion
- VL Visual Localization
- the main content of the scene recognition problem is to identify the corresponding scene from a given image, give the name of the scene or the geographic location of the scene, or select images similar to the scene from the database, or See it as an image retrieval problem.
- Embodiments of the present application provide a scene recognition method, apparatus, smart device, storage medium, and computer program.
- An embodiment of the present application provides a scene recognition method, including: acquiring an image to be processed and a semantic mask map corresponding to the image to be processed; wherein, the image to be processed includes a query image and an image to be recognized, and the image to be processed
- the corresponding semantic mask map includes the semantic mask map of the query image and the semantic mask map of the to-be-recognized image; perform feature aggregation processing on the to-be-processed image according to the semantic mask map to obtain the to-be-processed image.
- the feature vector of the image is processed; the image matching the scene of the query image is determined from the to-be-recognized image by using the feature vector of the to-be-processed image.
- the features corresponding to the images to be processed are obtained by combining the semantic mask map with the feature aggregation method, which can reduce the interference of interfering factors and improve the robustness of scene recognition.
- the step of acquiring the to-be-processed image and the semantic mask map corresponding to the to-be-processed image includes: performing semantic segmentation on the to-be-recognized image and the query image to obtain the category of each pixel and the probability corresponding to the category; set the weight for the category of each pixel according to the set conditions; obtain the semantic mask corresponding to each pixel according to the probability corresponding to the category and the weight corresponding to the category, wherein all The semantic masks corresponding to the pixels constitute a semantic mask map.
- the obtained semantic mask map can reduce the interference of interference factors and improve the robustness of scene recognition after combining the feature aggregation method to obtain the corresponding features of the image to be processed.
- the weights to the categories of each pixel according to the set conditions before setting the weights to the categories of each pixel according to the set conditions, it further includes: classifying all the pixels by attributes to obtain one or more subcategories; setting the weights for each of the subcategories according to the set conditions weight; obtaining a semantic mask corresponding to each pixel according to the probability corresponding to the subcategory and the weight corresponding to the subcategory, wherein the semantic mask corresponding to all the pixels constitutes a semantic mask map. Setting weights for each sub-category can reduce the interference of interference factors and improve the robustness of scene recognition.
- the sub-categories include at least two of fixed sub-categories, non-fixed sub-categories, dynamic sub-categories and unknown sub-categories; the weight of the dynamic sub-categories is smaller than the fixed sub-categories, the non-fixed Subcategories and weights for the unknown subcategories. For example, set higher weights for non-fixed sub-categories and set smaller weights for fixed sub-categories, so as to eliminate the interference of non-fixed features on feature recognition and improve the robustness of scene recognition.
- obtaining the semantic mask corresponding to each pixel according to the probability corresponding to the subcategory and the weight corresponding to the subcategory includes : calculating the The semantic mask corresponding to the pixel;
- m i represents the semantic mask corresponding to the ith pixel
- the generated image is a semantic mask map
- pi represents the probability of the subcategory to which the ith pixel belongs
- wi represents the category to which the ith pixel belongs or Weights corresponding to subcategories.
- performing feature aggregation processing on the to-be-processed image according to the semantic mask map to obtain a feature vector of the to-be-processed image includes: performing feature extraction on the to-be-processed image to obtain a feature set; The feature set forms a plurality of cluster centers; obtain a cluster center corresponding to each feature in each of the to-be-processed images according to the plurality of the cluster centers; determine that each feature in the to-be-processed image is in the The value corresponding to the first dimension, and the value corresponding to the first dimension of the cluster center corresponding to each feature in the image to be processed is determined; the cluster center, the value corresponding to the cluster center in the first dimension for each feature in the image to be processed, and the value corresponding to the first dimension for each feature in the image to be processed, In combination with the semantic mask map of the query image, feature aggregation processing is performed on the query image to obtain a feature vector of the query image; and through the cluster center corresponding to each feature in the
- the forming a plurality of cluster centers according to the feature set includes: using a clustering algorithm to process the feature set to form a plurality of cluster centers;
- Obtaining the cluster center corresponding to each feature in each of the to-be-processed images includes: taking the cluster center closest to each of the features as the cluster center corresponding to each of the features in the to-be-processed image.
- the determining an image from the image to be recognized that matches the scene of the query image by using the feature vector of the image to be processed includes: according to the feature vector of the image to be recognized and the query The distance of the feature vector of the image, and the image matching the scene of the query image is determined from the to-be-recognized image. Since the calculation of the feature vector combines the semantic mask map, the interference of non-fixed features is reduced, and an image to be recognized that is more similar to the query image is obtained.
- the step of determining the image matching the scene of the query image from the image to be recognized includes: The to-be-identified image corresponding to the feature vector with the nearest feature vector of the query image is determined to be an image matching the query image. In this way, an image to be recognized that is more similar to the query image is obtained.
- the step of querying the images matched with the image further includes: using a spatial consistency method to arrange the images matching the query image, so as to obtain the most similar image to the query image. In this way, the obtained scenes are more similar and more accurate.
- An embodiment of the present application provides a scene recognition device, including: an acquisition module configured to acquire an image to be processed and a semantic mask map corresponding to the image to be processed; wherein the image to be processed includes a query image and an image to be recognized; A feature aggregation module, configured to perform feature aggregation processing on the to-be-processed image according to the semantic mask map, to obtain a feature vector of the to-be-processed image; an image matching module, configured to use the feature vector of the to-be-processed image from An image matching the scene of the query image is determined in the to-be-identified image.
- the features corresponding to the images to be processed are obtained by combining the semantic mask map with the feature aggregation method, which can reduce the interference of interfering factors and improve the robustness of scene recognition.
- An embodiment of the present application provides a smart device, including: a processor and a memory coupled to each other, wherein the memory is used to store program instructions for implementing the scene recognition method described in any one of the above.
- An embodiment of the present application provides a computer-readable storage medium storing a program file, where the program file can be executed to implement any one of the scene recognition methods described above.
- An embodiment of the present application provides a computer program, including computer-readable code.
- a processor in the smart device executes a program for implementing any of the scenarios described above. recognition methods.
- the embodiments of the present application provide a scene recognition method, device, smart device, storage medium, and computer program.
- feature aggregation processing is performed on the to-be-processed image according to the semantic mask map.
- obtain the feature vector of the image to be processed and then use the feature vector to determine the image matching the scene of the query image from the image to be recognized.
- the high-level semantic information of the image can be obtained by obtaining the semantic mask map.
- the interference caused by interfering factors in the image is eliminated, thereby improving the robustness of scene recognition.
- FIG. 1 is a schematic flowchart of an embodiment of a scene recognition method according to an embodiment of the present application
- FIG. 2 is a schematic flowchart of an embodiment of step S11 in FIG. 1 according to the embodiment of the present application;
- FIG. 3 is a schematic flowchart of another embodiment of step S11 in FIG. 1 according to the embodiment of the present application;
- FIG. 4 is a schematic structural diagram of an embodiment of a scene recognition apparatus according to an embodiment of the present application.
- FIG. 5 is a schematic structural diagram of an embodiment of a smart device according to an embodiment of the present application.
- FIG. 6 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present application.
- Scene recognition has an important application in the field of computer vision.
- the main content of the scene recognition problem is to identify the corresponding scene from a given image, give the name of the scene or the geographic location of the scene, or get the scene from the database. Picking out images of similar scenes can also be seen as an image retrieval problem.
- the core of this type of problem is to accurately describe the image or the scene in the image. There are two commonly used methods, one is to directly calculate the global description of the image, and the other is to use local feature aggregation.
- the method of directly calculating the global description of the image the input is a complete image and the output is the global descriptor of the image.
- the simplest method is to stitch together all the pixel values of the image as the descriptor of the image, or to use the histogram to count the grayscale information or gradient information of the pixels, etc.
- This method has extremely poor robustness.
- the input is the local features extracted from the image, and the output is an encoded feature vector. This method only uses local features, lacks high-level semantic information, and is not robust to illumination changes and dynamic scenes.
- Semantic information as a high-level visual information, has a good guiding role for scene recognition.
- the use of semantic information is also more in line with human cognition.
- an embodiment of the present application proposes a scene recognition method for semantic masks. The method uses the semantic segmentation results to apply different weights to different regions in the image, and effectively deals with the negative impact of dynamic unstable objects on scene recognition.
- the soft weighting method since the soft weighting method is used, the influence of the instability of semantic segmentation is effectively avoided. Not only that, but the method is also robust to seasonal changes.
- FIG. 1 is a schematic flowchart of a first embodiment of a scene recognition method according to an embodiment of the present application.
- the scene recognition method is executed by a smart device, and the method includes:
- Step S11 acquiring the image to be processed and the semantic mask map corresponding to the image to be processed; wherein the image to be processed includes a query image and an image to be recognized.
- the image to be processed includes a query image and an image to be recognized
- the semantic mask map corresponding to the image to be processed includes a semantic mask map of the query image and a semantic mask map of the image to be recognized.
- obtaining the semantic mask map corresponding to the image to be recognized includes:
- Step S21 perform semantic segmentation processing on the image to be recognized and the query image to obtain the category of each pixel and the probability corresponding to the category.
- the query image is a user-defined image, which may be an image currently captured by the user, or an image stored in advance by the user.
- the image to be recognized is an image that matches the query image and is searched from the database according to the query image.
- the database is a server, and a query image is input, and the server matches a plurality of images to be recognized with similar scenes for the query image.
- Semantic segmentation is performed on the image to be recognized and the query image to obtain the category of each pixel in the image and the corresponding probability of the category.
- Step S22 Set a weight for each pixel category according to the set condition.
- the weight of the dynamic sub-category is set to be the lowest, which is smaller than the weight of the fixed sub-category, the non-fixed sub-category, and the unknown weight.
- the weight of non-fixed sub-category features is set to the lowest, which is smaller than that of fixed sub-categories, dynamic sub-categories, and unknown sub-categories. class weight.
- Step S23 obtaining a semantic mask corresponding to each pixel according to the probability corresponding to the sub-category and the weight corresponding to the sub-category, wherein the semantic mask corresponding to all pixels constitutes a semantic mask map.
- the following formula (1) is used to calculate the semantic mask corresponding to each pixel:
- m i represents the semantic mask corresponding to the ith pixel
- the generated image is a semantic mask map
- pi represents the probability of the subcategory to which the ith pixel belongs
- wi represents the category to which the ith pixel belongs or Weights corresponding to subcategories.
- the category result after semantic segmentation does not include four categories including fixed subcategories, non-fixed subcategories, dynamic and unknown, please refer to FIG. 3 , where step S31 is the same as that in FIG. 2 .
- the category result after semantic segmentation does not include the four categories of stable, volatile, dynamic and unknown, it also includes:
- Step S32 Perform attribute classification on all pixels to obtain one or more sub-categories.
- Attribute classification is performed on all pixels to obtain one or more sub-categories.
- the sub-categories include at least two or at least one of fixed sub-categories, non-fixed sub-categories, dynamic sub-categories and unknown sub-categories.
- Step S33 Set weights for each sub-category according to the set conditions.
- weights are set for the pixels of each sub-category.
- the sub-categories obtained by classifying the result attributes of semantic segmentation include four categories: fixed sub-categories, non-fixed sub-categories, dynamic sub-categories and unknown sub-categories, in order to reduce the interference of dynamic features on scene recognition, in one embodiment, the weight of the dynamic feature is set to be the lowest, which is smaller than the fixed sub-category, the non-fixed sub-category and the unknown weight.
- the weight of the non-fixed sub-category feature is set to the lowest, which is smaller than the fixed sub-category, dynamic and unknown weights .
- Step S34 Obtain a semantic mask corresponding to each pixel according to the probability corresponding to the sub-category and the weight corresponding to the sub-category, wherein the semantic mask corresponding to all pixels constitutes a semantic mask map.
- the following formula (2) is used to calculate the semantic mask corresponding to each pixel:
- m i represents the semantic mask corresponding to the ith pixel
- the generated image is a semantic mask map
- pi represents the probability of the subcategory to which the ith pixel belongs
- wi represents the category to which the ith pixel belongs or Weights corresponding to subcategories.
- different weights are set for the pixel categories after semantic segmentation, so as to reduce the interference caused by the category in the feature recognition, thereby improving the robustness of the scene recognition score.
- Step S12 Perform feature aggregation processing on the image to be processed according to the semantic mask map to obtain a feature vector of the image to be processed.
- the existing method of performing feature aggregation processing on the feature to be processed to obtain the feature vector includes obtaining the feature vector by VLAD encoding.
- obtaining the feature vector by means of VLAD coding includes: performing feature extraction on the to-be-processed image to obtain a feature set.
- feature extraction can also be performed on the preset to-be-processed image to obtain a feature set, and the preset data image can be a set of all images in the database and the server, or can be a set of some images in the server, which does not For limitation, it may also be a collection of pictures collected by the user, which is not limited.
- the feature set X can also be aggregated into a feature vector with a fixed length through the codebook C.
- a cluster center corresponding to each feature xi in each image to be processed is obtained through the plurality of cluster centers.
- the position of the feature xi is determined, and the cluster center closest to the feature xi is determined as the cluster center ck corresponding to the feature xi .
- the value corresponding to the cluster center ck in the first dimension is determined.
- the value corresponding to the cluster center ck is determined.
- the dimension is the same as the dimension of the feature x i corresponding to the cluster center ck .
- the dimension of the cluster center ck is the same as the dimension of the feature xi corresponding to the cluster center ck .
- the cluster center ck The dimension of , plus the distance between the cluster center ck and the corresponding feature xi .
- the first dimension may be dimension 1, dimension 2, dimension 3, etc. In order to clarify that cluster centers and features are aggregated in the same dimension, the first dimension is used for description.
- the query image and the feature vector of the to-be-recognized image are obtained through the cluster center ck and the corresponding value of the cluster center ck corresponding to each feature in the first dimension.
- the prior art generally obtains the feature vector of the query image or the to-be-recognized image through the following formula (3):
- v(k, j) represents the feature vector of the query image or the image to be recognized
- ⁇ k (x i ) represents the selection function
- xi is the feature.
- ck is the cluster center of xi
- ⁇ k (x i ) i ) is equal to 1
- otherwise ⁇ k ( xi ) is equal to
- x i (j) is the value corresponding to the j-th dimension of the i-th feature
- c k (j) is the j-th dimension of the k-th cluster center value corresponding to each dimension.
- v(k, j) represents the feature vector of the query image
- ⁇ k ( xi ) represents the selection function
- xi is the feature of the query image
- c k When the cluster center) is the cluster center corresponding to xi , ⁇ k ( xi ) is equal to 1, otherwise ⁇ k ( xi ) is equal to 0.
- x i (j) represents the value corresponding to the j-th dimension of the i-th feature on the query image
- c k (j) represents the value corresponding to the j-th dimension of the k-th cluster center of the query image.
- v(k, j) represents the feature vector of the image to be recognized
- ⁇ k ( xi ) represents the selection function
- x i is the feature of the image to be recognized
- c k (cluster center) is the cluster center corresponding to x i
- ⁇ k ( xi ) is equal to 1
- ⁇ k ( xi ) is equal to 0.
- x i (j) represents the value corresponding to the j-th dimension of the i-th feature on the image to be recognized
- c k (j) represents the value corresponding to the j-th dimension of the k-th cluster center of the image to be recognized.
- the embodiment of the present application uses each feature x in the image to be processed.
- the cluster center ck corresponding to i the value corresponding to the cluster center ck corresponding to each feature in the image to be processed in the first dimension, and the value of each feature xi in the image to be processed in the first dimension
- the corresponding values combined with the semantic mask map of the query image, perform feature aggregation processing on the query image to obtain the feature vector of the query image.
- the value corresponding to the cluster center ck corresponding to each feature in the image to be processed in the first dimension is combined with the semantic mask map of the image to be recognized, and feature aggregation processing is performed on the image to be recognized to obtain a feature vector of the image to be recognized.
- the embodiment of the present application obtains the feature vector of the query image and the image to be recognized by the following formula (4):
- v(k, j)' represents the feature vector of the query image and the image to be recognized
- ⁇ k ( xi ) represents the selection function
- xi is the feature.
- ck is the cluster center of xi
- ⁇ k ( x i ) is equal to 1
- otherwise ⁇ k ( xi ) is equal to 1
- x i (j) is the value corresponding to the j-th dimension of the i-th feature
- c k (j) is the value of the k-th cluster center
- the values corresponding to the j dimensions, m i represent the query image and the semantic mask map of the image to be recognized.
- weighting can be performed by using a semantic mask, thereby reducing the weight of dynamic objects and improving the robustness of feature recognition.
- the semantic mask of the corresponding position can be directly obtained according to the position of the feature in the image. If the feature is a sub-pixel-level feature, Then it can be obtained by interpolation corresponding to the same position on the semantic mask map.
- the feature vectors of the query image and the image to be recognized are obtained in the above manner, the feature vectors may be normalized respectively in the K cluster centers, and then the entire vectors may be normalized.
- Step S13 Determine an image matching the scene of the query image from the to-be-recognized image by using the feature vector of the to-be-processed image.
- the image matching the scene of the query image is determined from the image to be recognized by the position of the feature vector of the image to be recognized and the feature vector of the query image.
- the to-be-identified image corresponding to the feature vector closest to the feature vector of the query image is determined as the image matching the query image.
- the images matching the query image are arranged by using a spatial consistency method, so as to obtain the image matching the query image.
- the image with the most similar images are arranged by using a spatial consistency method, so as to obtain the image matching the query image.
- the method combines the semantic mask map with the traditional feature aggregation method, so as to reduce the interference of dynamic features in the image to the feature recognition by means of semantic mask weighting, and effectively avoid unnecessary Negative images of stable objects for scene recognition.
- the weighted method is used to effectively avoid the image caused by the instability of semantic segmentation, thereby improving its robustness.
- the method of the embodiments of the present application also has good robustness when changing seasons.
- the embodiments of the present application further provide a scene recognition method.
- the scene recognition method uses the semantic segmentation result to weight different areas of the image when generating the global feature vector of the image. Robustness of the method used for scene recognition when a large number of dynamic objects or scenes change seasonally.
- the scene recognition method can be implemented in the following ways:
- the input of the semantic segmentation is an image
- the output is the result of the semantic segmentation
- a semantic segmentation network may be used to perform semantic segmentation on the input image.
- the result of semantic segmentation contains the class of each pixel and the probability of belonging to that class.
- the semantic segmentation network can be any network, and the segmented categories can be customized and trained, or can be directly trained using categories defined on public datasets.
- the results of the segmentation can be further divided into four categories: stable categories, volatile categories, dynamic categories, and unknown categories. If the above-mentioned segmentation results are the same as the four categories, the step of continuing segmentation is not performed; otherwise, the categories can be further divided according to the actual usage scenarios. For example, for an indoor environment, the floor, walls, ceiling can be considered as stable categories, beds, tables, chairs, etc. can be regarded as volatile categories, people, cats and dogs, etc. can be regarded as dynamic categories, etc. For outdoor scenes, buildings, roads, street lights, etc. can be regarded as stable categories, green plants, sky, etc. can be regarded as volatile categories, and pedestrians and vehicles can be regarded as dynamic categories. Of course, this classification can be adjusted differently according to the actual usage scenario, for example, in some indoor scenarios, the table can be regarded as a stable class.
- the input of the semantic mask is the result of semantic segmentation
- the output is the semantic mask map
- the weights corresponding to the stable category, the volatile category, the dynamic category, and the unknown category are w 1 , w 2 , w 3 , and w 4 , respectively.
- This weight can be set manually, eg 1.0, 0.5, 0.1 and 0.3 for the four categories).
- pi the probability of the category
- m i is called the semantic mask corresponding to pixel i
- the generated image is the semantic mask image.
- using the generated semantic masks can be embedded in current local feature aggregation methods, as well as in end-to-end deep learning methods.
- the input of the feature aggregation is the image and the corresponding semantic mask map
- the output is the image feature vector
- the image feature vector can be expressed by the following formula (5):
- ⁇ k ( xi ) represents the calculation of the nearest cluster center of the feature xi , that is, the selection function, which is 1 at the position of the nearest cluster center, otherwise it is 0;
- xi (j) represents the jth of the feature xi .
- the value corresponding to the dimension, ck (j) represents the value corresponding to the jth dimension of the kth cluster center.
- m i is the semantic mask corresponding to the ith feature. If the feature is a pixel set feature, the semantic mask of the corresponding position can be directly obtained at the position of the image. If the feature is a sub-pixel level feature, it can be used in the semantic The same position on the mask map is obtained by interpolation.
- the input of the scene recognition is the feature vector obtained from the image and the semantic mask, and the output is the most similar scene.
- step (3) feature vectors are extracted from all database images to construct an image feature database. Then, the feature vector is also extracted for the image to be recognized, and then the distance comparison between the features of the query image and the image features in the database is used to find the first few images with the smallest distance as the retrieval result. Several images are reordered to obtain the most similar scene image.
- the scene recognition method using the semantic mask provided by the embodiment of the present application can effectively deal with these dynamic objects, and by assigning a lower weight, the interference to the image description can be effectively alleviated.
- the scene recognition method using semantic masks in the embodiment of the present application can assign higher weights to categories with strong discriminativeness, thereby increasing their proportion in the image description, thereby effectively suppressing non-discriminatory regions such as Roads, floors, etc.
- the usage scenarios of the scene recognition method provided by the embodiments of the present application may include: in the visual positioning algorithm, an image-level description is usually used to retrieve a similar scene first, thereby narrowing the matching range of local features. If the target scene contains a large number of dynamic objects during mapping or positioning, such as pedestrians coming and going in shopping malls, vehicles on the road, etc., if it is used directly without processing, it will greatly affect the retrieval performance. , reducing the success rate of retrieval. For the outdoor environment, if the mapping and positioning are in different seasons, the outdoor green plants show different shapes due to seasonal changes, which will also greatly affect the effect of scene recognition. If the methods proposed in the embodiments of the present application are adopted, these problems can be effectively dealt with. Of course, the scene recognition method provided in the embodiment of the present application also includes other usage scenarios, which can be used by those skilled in the art according to actual needs.
- FIG. 4 is a schematic structural diagram of an embodiment of a scene recognition apparatus according to an embodiment of the present application. It includes: acquisition module 41, feature aggregation module 42 and image matching module 43.
- the acquisition module 41 is configured to acquire the image to be processed and the semantic mask map corresponding to the image to be processed; the image to be processed includes the query image and the image to be recognized, and the semantic mask map corresponding to the image to be processed includes the semantic mask map of the query image The code map and the semantic mask map of the image to be recognized.
- the acquisition module 41 is configured to acquire a query image, acquire a plurality of images to be identified that match the query image from a database according to the query image; perform semantic segmentation processing on the image to be recognized and the query image to obtain the category and category of each pixel Corresponding probability; set the weight for the category of each pixel according to the set conditions; obtain the semantic mask corresponding to each pixel according to the probability corresponding to the category and the weight corresponding to the category, wherein the semantic mask corresponding to all pixels constitutes the semantic mask picture.
- the acquisition module 41 is further configured to perform attribute classification on all pixels to obtain one or more sub-categories; set weights for each sub-category according to set conditions; The weight obtains a semantic mask corresponding to each of the pixels, wherein the semantic masks corresponding to all pixels constitute a semantic mask map.
- the feature aggregation module 42 is configured to perform feature aggregation processing on the image to be processed according to the semantic mask map to obtain a feature vector of the image to be processed.
- the feature aggregation module 42 is configured to perform feature extraction on the to-be-processed image to obtain a feature set; form a plurality of cluster centers according to the feature set; obtain each of the to-be-processed centers according to the plurality of cluster centers the cluster center corresponding to each feature in the image; determine the value corresponding to each feature in the image to be processed in the first dimension, and determine the cluster corresponding to each feature in the image to be processed the value corresponding to the center in the first dimension;
- the value corresponding to the first dimension of the cluster center corresponding to each feature in the image to be processed, and the The value corresponding to each feature in the first dimension is combined with the semantic mask map of the query image to perform feature aggregation processing on the query image to obtain a feature vector of the query image.
- the value corresponding to the first dimension of the cluster center corresponding to each feature in the image to be processed, and the value of the cluster center in the image to be processed is combined with the semantic mask map of the to-be-recognized image to perform feature aggregation processing on the to-be-recognized image to obtain a feature vector of the to-be-recognized image.
- the image matching module 43 is configured to use the feature vector of the image to be processed to determine the image matching the scene of the query image from the image to be recognized.
- the image matching module 43 is configured to determine, from the to-be-recognized image, an image that matches the query image scene according to the distance between the feature vector of the to-be-recognized image and the feature vector of the query image.
- the image matching module 43 is configured to determine the to-be-identified image corresponding to the feature vector closest to the feature vector of the query image as an image matched by the query image.
- the image matching module 43 is further configured to, when there are multiple images matching the query image in the to-be-recognized image, use a spatial consistency method to arrange the images matching the query image, to obtain the most similar image to the query image.
- the scene recognition device provided by the embodiment of the present application combines the semantic mask map with the traditional feature aggregation method, so as to reduce the interference of dynamic features in the image on feature recognition by means of semantic mask weighting, thereby improving the robustness of the device .
- FIG. 5 is a schematic structural diagram of a smart device according to an embodiment of the present application.
- the smart device includes a memory 52 and a processor 51 that are interconnected.
- the memory 52 is used to store program instructions for implementing any one of the above-mentioned scene recognition methods.
- the processor 51 is used to execute program instructions stored in the memory 52 .
- the processor 51 may also be referred to as a central processing unit (Central Processing Unit, CPU).
- the processor 51 may be an integrated circuit chip with signal processing capability.
- the processor 51 may also be a general-purpose processor, a digital signal processor (Digital Signal Process, DSP), an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field programmable gate array (Field Programmable Gate Array, FPGA) or other Programming logic devices, discrete gate or transistor logic devices, discrete hardware components.
- a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
- the memory 52 can be a memory stick, a flash memory card (Trans-flash, TF card for short), etc., and can store all the information in the smart device, including the input original data, computer programs, intermediate running results and final running results. middle. It stores and retrieves information according to the location specified by the controller. With memory, smart devices have memory function to ensure normal work.
- the memory of smart devices can be divided into main memory (memory) and auxiliary memory (external memory) according to the purpose of memory, and there are also classification methods into external memory and internal memory. External storage is usually a magnetic medium or an optical disc, etc., which can store information for a long time.
- Memory refers to the storage components on the motherboard, which are used to store the data and programs currently being executed, but are only used to temporarily store programs and data. When the power is turned off or powered off, the data will be lost.
- the disclosed method and apparatus may be implemented in other manners.
- the apparatus implementations described above are only illustrative, for example, the division of modules or units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
- the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, which may be electrical, mechanical or other forms.
- Units described as separate components may or may not be physically separated, and components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this implementation manner.
- each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
- the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
- the integrated unit if implemented as a software functional unit and sold or used as a stand-alone product, may be stored in a computer-readable storage medium.
- a computer device which may be a personal computer, a system server, or a network device, etc.
- a processor Processor
- FIG. 6 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present application.
- the computer-readable storage medium of the embodiment of the present application stores a program file 61 capable of implementing all the above-mentioned scene recognition methods, wherein the program file 61 may be stored in the above-mentioned storage medium in the form of a software product, and includes several instructions to make a A computer device (which may be a personal computer, a server, or a network device, etc.) or a processor executes all or part of the steps of the methods of the various embodiments of the present application.
- a computer device which may be a personal computer, a server, or a network device, etc.
- a processor executes all or part of the steps of the methods of the various embodiments of the present application.
- the aforementioned storage devices include: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes , or terminal devices such as computers, servers, mobile phones, and tablets.
- the embodiments of the present application provide a computer program, including computer-readable code, when the computer-readable code is executed in a smart device, a processor in the smart device executes the above method.
- Embodiments of the present application provide a scene recognition method, apparatus, smart device, storage medium and computer program
- the scene recognition method includes: acquiring an image to be processed and a semantic mask map corresponding to the image to be processed;
- the image to be processed includes a query image and an image to be recognized, and the semantic mask map corresponding to the image to be processed includes a semantic mask map of the query image and a semantic mask map of the image to be recognized; according to the semantic mask map
- the code map performs feature aggregation processing on the to-be-processed image to obtain a feature vector of the to-be-processed image; uses the feature vector of the to-be-processed image to determine an image matching the scene of the query image from the to-be-recognized image .
- the feature corresponding to the image to be processed can be obtained by combining the semantic mask map and the feature aggregation method, so as to reduce the interference of interference factors and improve the robustness of the scene recognition.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Library & Information Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (14)
- 一种场景识别方法,其中,所述方法由智能设备执行,所述方法包括:获取待处理图像以及所述待处理图像对应的语义掩码图;其中,所述待处理图像包括查询图像及待识别图像,所述待处理图像对应的语义掩码图包括所述查询图像的语义掩码图和所述待识别图像的语义掩码图;根据所述语义掩码图对所述待处理图像进行特征聚合处理,得到所述待处理图像的特征向量;利用所述待处理图像的特征向量从所述待识别图像中确定与所述查询图像的场景匹配的图像。
- 根据权利要求1所述的场景识别方法,其中,所述获取待处理图像以及所述待处理图像对应的语义掩码图包括:对所述待识别图像及所述查询图像进行语义分割处理,得到每一像素的类别及所述类别对应的概率;按照设定条件对每一像素的类别设置权重;根据所述类别对应的概率及所述类别对应的权重得到每一所述像素对应的语义掩码,其中,所有所述像素对应的语义掩码构成语义掩码图。
- 根据权利要求2所述的方法,其中,所述按照设定条件对每一像素的类别设置权重之前还包括:对所有像素进行属性分类,得到一个或多个子类别;按照设定条件对每一所述子类别设置权重;根据所述子类别对应的概率及所述子类别对应的权重得到每一所述像素对应的语义掩码,其中,所有所述像素对应的语义掩码构成语义掩码图。
- 根据权利要求3所述的方法,其中,所述子类别包括固定子类别、不固定子类别、动态子类别和未知子类别中的至少两种;所述动态子类别的权重小于所述固定子类别、所述不固定子类别及所述未知子类别的权重。
- 根据权利要求4所述的方法,其中,所述根据所述子类别对应的概 率及所述子类别对应的权重得到每一所述像素对应的语义掩码包括:利用公式m i=p i×w i计算所述像素对应的语义掩码;其中,m i表示第i个像素对应的语义掩码,其生成的图为语义掩码图,p i表示第i个像素所属的子类别的概率,w i表示第i个像素所属的类别或子类别对应的权重。
- 根据权利要求1所述的方法,其中,所述根据所述语义掩码图对所述待处理图像进行特征聚合处理,得到所述待处理图像的特征向量包括:对所述待处理图像进行特征抽取,得到特征集合;依据所述特征集合形成多个聚类中心;根据多个所述聚类中心得到每一所述待处理图像中的每一特征对应的聚类中心;确定所述待处理图像中的每一特征在第一维度对应的值,以及确定所述待处理图像中的所述每一特征对应的聚类中心在所述第一维度对应的值;通过所述待处理图像中的每一特征对应的聚类中心,所述待处理图像中的每一特征对应的聚类中心在第一维度对应的值,以及,所述待处理图像中的所述每一特征在所述第一维度对应的值,结合所述查询图像的语义掩码图,对所述查询图像进行特征聚合处理,得到所述查询图像的特征向量;通过所述待处理图像中的每一特征对应的聚类中心,所述待处理图像中的每一特征对应的聚类中心在第一维度对应的值,以及,所述待处理图像中的每一特征在所述第一维度对应的值,结合所述待识别图像的语义掩码图,对所述待识别图像进行特征聚合处理,得到所述待识别图像的特征向量。
- 根据权利要求6所述的方法,其中,所述依据所述特征集合形成多个聚类中心包括:利用聚类算法对所述特征集合进行处理,以形成多个聚类中心;所述根据多个所述聚类中心得到每一所述待处理图像中的每一特征对应的聚类中心包括:将距离每一所述特征最近的聚类中心作为所述待处理图像中的每一特征对应的聚类中心。
- 根据权利要求1至7任一项所述的方法,其中,所述利用所述待处理图像的特征向量从所述待识别图像中确定与所述查询图像的场景匹配的图像包括:根据所述待识别图像的特征向量与所述查询图像的特征向量的距离,从所述待识别图像中确定与所述查询图像场景匹配的图像。
- 根据权利要求8所述的方法,其中,所述根据所述待识别图像的特征向量与所述查询图像的特征向量的距离,从所述待识别图像中确定与所述查询图像场景匹配的图像的步骤包括:将距离所述查询图像的特征向量最近的特征向量对应的所述待识别图像确定为所述查询图像匹配的图像。
- 根据权利要求9所述的方法,其中,所述待识别图像中与所述查询图像匹配的图像为多个;所述将距离所述查询图像的特征向量最近的特征向量对应的所述待识别图像确定为所述查询图像匹配的图像之后还包括:采用空间一致性方法将与所述查询图像匹配的图像进行排列,以获取到与所述查询图像最相似的图像。
- 一种场景识别装置,其中,包括:获取模块,获取待处理图像以及所述待处理图像对应的语义掩码图;其中,所述待处理图像包括查询图像及待识别图像,所述待处理图像对应的语义掩码图包括所述查询图像的语义掩码图和所述待识别图像的语义掩码图;特征聚合模块,配置为根据所述语义掩码图对所述待处理图像进行特征聚合处理,得到所述待处理图像的特征向量;图像匹配模块,配置为利用所述待处理图像的特征向量从所述待识别图像中确定与所述查询图像的场景匹配的图像。
- 一种智能设备,其中,包括:相互藕接的处理器及存储器,其中,所述存储器用于存储实现如权利要求1-10任意一项所述的场景识别方 法的程序指令。
- 一种计算机可读存储介质,其中,存储有程序文件,所述程序文件能够被执行以实现如权利要求1-10任意一项所述的场景识别方法。
- 一种计算机程序,其中,包括计算机可读代码,当所述计算机可读代码在智能设备中运行时,所述智能设备中的处理器执行用于实现如权利要求1-10任意一项所述的场景识别方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2022543759A JP2023510945A (ja) | 2020-11-10 | 2021-07-16 | シーン識別方法及びその装置、インテリジェントデバイス、記憶媒体並びにコンピュータプログラム |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011249944.4 | 2020-11-10 | ||
CN202011249944.4A CN112329660B (zh) | 2020-11-10 | 2020-11-10 | 一种场景识别方法、装置、智能设备及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022100133A1 true WO2022100133A1 (zh) | 2022-05-19 |
Family
ID=74317739
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/106936 WO2022100133A1 (zh) | 2020-11-10 | 2021-07-16 | 场景识别方法、装置、智能设备、存储介质和计算机程序 |
Country Status (3)
Country | Link |
---|---|
JP (1) | JP2023510945A (zh) |
CN (1) | CN112329660B (zh) |
WO (1) | WO2022100133A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117009532A (zh) * | 2023-09-21 | 2023-11-07 | 腾讯科技(深圳)有限公司 | 语义类型识别方法、装置、计算机可读介质及电子设备 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112329660B (zh) * | 2020-11-10 | 2024-05-24 | 浙江商汤科技开发有限公司 | 一种场景识别方法、装置、智能设备及存储介质 |
CN113393515B (zh) * | 2021-05-21 | 2023-09-19 | 杭州易现先进科技有限公司 | 一种结合场景标注信息的视觉定位方法和系统 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140133759A1 (en) * | 2012-11-14 | 2014-05-15 | Nec Laboratories America, Inc. | Semantic-Aware Co-Indexing for Near-Duplicate Image Retrieval |
CN105335757A (zh) * | 2015-11-03 | 2016-02-17 | 电子科技大学 | 一种基于局部特征聚合描述符的车型识别方法 |
CN107239535A (zh) * | 2017-05-31 | 2017-10-10 | 北京小米移动软件有限公司 | 相似图片检索方法及装置 |
US20190295260A1 (en) * | 2016-10-31 | 2019-09-26 | Konica Minolta Laboratory U.S.A., Inc. | Method and system for image segmentation using controlled feedback |
CN111027493A (zh) * | 2019-12-13 | 2020-04-17 | 电子科技大学 | 一种基于深度学习多网络软融合的行人检测方法 |
CN112329660A (zh) * | 2020-11-10 | 2021-02-05 | 浙江商汤科技开发有限公司 | 一种场景识别方法、装置、智能设备及存储介质 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107871143B (zh) * | 2017-11-15 | 2019-06-28 | 深圳云天励飞技术有限公司 | 图像识别方法及装置、计算机装置和计算机可读存储介质 |
CN108710847B (zh) * | 2018-05-15 | 2020-11-27 | 北京旷视科技有限公司 | 场景识别方法、装置及电子设备 |
JP7026826B2 (ja) * | 2018-09-15 | 2022-02-28 | 北京市商▲湯▼科技▲開▼▲發▼有限公司 | 画像処理方法、電子機器および記憶媒体 |
CN109829383B (zh) * | 2018-12-29 | 2024-03-15 | 平安科技(深圳)有限公司 | 掌纹识别方法、装置和计算机设备 |
CN111709398A (zh) * | 2020-07-13 | 2020-09-25 | 腾讯科技(深圳)有限公司 | 一种图像识别的方法、图像识别模型的训练方法及装置 |
-
2020
- 2020-11-10 CN CN202011249944.4A patent/CN112329660B/zh active Active
-
2021
- 2021-07-16 WO PCT/CN2021/106936 patent/WO2022100133A1/zh active Application Filing
- 2021-07-16 JP JP2022543759A patent/JP2023510945A/ja not_active Withdrawn
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140133759A1 (en) * | 2012-11-14 | 2014-05-15 | Nec Laboratories America, Inc. | Semantic-Aware Co-Indexing for Near-Duplicate Image Retrieval |
CN105335757A (zh) * | 2015-11-03 | 2016-02-17 | 电子科技大学 | 一种基于局部特征聚合描述符的车型识别方法 |
US20190295260A1 (en) * | 2016-10-31 | 2019-09-26 | Konica Minolta Laboratory U.S.A., Inc. | Method and system for image segmentation using controlled feedback |
CN107239535A (zh) * | 2017-05-31 | 2017-10-10 | 北京小米移动软件有限公司 | 相似图片检索方法及装置 |
CN111027493A (zh) * | 2019-12-13 | 2020-04-17 | 电子科技大学 | 一种基于深度学习多网络软融合的行人检测方法 |
CN112329660A (zh) * | 2020-11-10 | 2021-02-05 | 浙江商汤科技开发有限公司 | 一种场景识别方法、装置、智能设备及存储介质 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117009532A (zh) * | 2023-09-21 | 2023-11-07 | 腾讯科技(深圳)有限公司 | 语义类型识别方法、装置、计算机可读介质及电子设备 |
CN117009532B (zh) * | 2023-09-21 | 2023-12-19 | 腾讯科技(深圳)有限公司 | 语义类型识别方法、装置、计算机可读介质及电子设备 |
Also Published As
Publication number | Publication date |
---|---|
CN112329660B (zh) | 2024-05-24 |
JP2023510945A (ja) | 2023-03-15 |
CN112329660A (zh) | 2021-02-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022100133A1 (zh) | 场景识别方法、装置、智能设备、存储介质和计算机程序 | |
CN107679250B (zh) | 一种基于深度自编码卷积神经网络的多任务分层图像检索方法 | |
US11244205B2 (en) | Generating multi modal image representation for an image | |
US9075824B2 (en) | Retrieval system and method leveraging category-level labels | |
CN108038122B (zh) | 一种商标图像检索的方法 | |
US9330341B2 (en) | Image index generation based on similarities of image features | |
CN103207898B (zh) | 一种基于局部敏感哈希的相似人脸快速检索方法 | |
CN104036012B (zh) | 字典学习、视觉词袋特征提取方法及检索系统 | |
US9600738B2 (en) | Discriminative embedding of local color names for object retrieval and classification | |
WO2013053320A1 (zh) | 一种图像检索方法及装置 | |
US10839006B2 (en) | Mobile visual search using deep variant coding | |
WO2023142602A1 (zh) | 图像处理方法、装置和计算机可读存储介质 | |
WO2023142551A1 (zh) | 模型训练及图像识别方法和装置、设备、存储介质和计算机程序产品 | |
CN112561976A (zh) | 一种图像主颜色特征提取方法、图像检索方法、存储介质及设备 | |
CN103761503A (zh) | 用于相关反馈图像检索的自适应训练样本选取方法 | |
CN110188864B (zh) | 基于分布表示和分布度量的小样本学习方法 | |
CN111709317A (zh) | 一种基于显著性模型下多尺度特征的行人重识别方法 | |
CN114743139A (zh) | 视频场景检索方法、装置、电子设备及可读存储介质 | |
CN108694411A (zh) | 一种识别相似图像的方法 | |
Yan | Accurate Image Retrieval Algorithm Based on Color and Texture Feature. | |
CN105205497B (zh) | 一种基于局部pca白化的图像表示方法和处理装置 | |
WO2017143979A1 (zh) | 图像的检索方法及装置 | |
WO2017168601A1 (ja) | 類似画像検索方法およびシステム | |
Jena et al. | Content based image retrieval using adaptive semantic signature | |
Islam et al. | Texture feature based image retrieval algorithms |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21890664 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2022543759 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21890664 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21890664 Country of ref document: EP Kind code of ref document: A1 |