WO2022148372A1 - Visual phrase construction method and apparatus based on image feature space and spatial-domain space - Google Patents

Visual phrase construction method and apparatus based on image feature space and spatial-domain space Download PDF

Info

Publication number
WO2022148372A1
WO2022148372A1 PCT/CN2022/070305 CN2022070305W WO2022148372A1 WO 2022148372 A1 WO2022148372 A1 WO 2022148372A1 CN 2022070305 W CN2022070305 W CN 2022070305W WO 2022148372 A1 WO2022148372 A1 WO 2022148372A1
Authority
WO
WIPO (PCT)
Prior art keywords
visual
word
phrases
space
target image
Prior art date
Application number
PCT/CN2022/070305
Other languages
French (fr)
Chinese (zh)
Inventor
王亚楠
Original Assignee
瞬联软件科技(南京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 瞬联软件科技(南京)有限公司 filed Critical 瞬联软件科技(南京)有限公司
Publication of WO2022148372A1 publication Critical patent/WO2022148372A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Definitions

  • the invention relates to a visual phrase construction method based on image feature space and image airspace space, and also relates to a corresponding visual phrase construction device, belonging to the technical field of image recognition.
  • image visual features is the most basic and core part of image retrieval, segmentation and recognition algorithms. Discriminative features are of great significance to image retrieval, image segmentation and recognition.
  • image features can usually be divided into: low-level features, middle-level features and high-level features.
  • the low-level features are often composed of low-level features such as edges, colors, and textures as basic units.
  • a proposed feature is generated based on the underlying feature analysis. From the perspective of human cognition, the understanding of an image is first of all high-level semantic features with a high degree of abstraction, and also includes simple low-level features. Therefore, human vision's understanding of images is the process of acquiring semantic information at different levels and granularities.
  • the primary technical problem to be solved by the present invention is to provide a visual phrase construction method based on image feature space and image space space.
  • Another technical problem to be solved by the present invention is to provide a visual phrase construction device based on image feature space and image space space.
  • the present invention adopts the following technical scheme:
  • a method for constructing a visual phrase based on an image feature space and an image airspace space including the following steps:
  • For each key feature word extract a neighborhood feature word that has a geometric relationship with the key feature word from the visual word set, and form a corresponding visual phrase with the key feature word;
  • the extraction of visual words satisfying preset conditions in the target image to form a visual word set specifically includes the following steps:
  • the frequency of occurrence of various visual words is counted, and visual words with a frequency higher than a preset frequency are selected to form a visual word set.
  • the extraction of neighborhood feature words that have a geometric relationship with the key feature word in the visual word set specifically includes the following steps:
  • forming a corresponding visual phrase with the key feature word specifically includes the following steps:
  • establishing a visual phrase set describing the feature of the target image specifically includes the following steps:
  • a set of visual phrases describing the characteristics of the target image is established.
  • the classification of the formed visual phrases specifically includes the following steps:
  • the two visual word groups are of different types.
  • the encoding of the visual phrases of the same category specifically includes the following steps:
  • the corresponding key feature words or neighborhood feature words are encoded
  • the encoding of the visual phrase of the current category is composed.
  • establishing a visual phrase set describing the characteristics of the target image specifically includes the following steps:
  • the set of codes be the set of visual phrases that describe the features of the target image.
  • the judging whether the positions of the key feature words and the neighborhood feature words in any two visual phrases in the target image can be aligned one by one in a one-to-one correspondence specifically includes the following steps:
  • the calculated minimum position distance is equal to zero, it is determined that the positions of the two corresponding key feature words or neighborhood feature words in the target image are aligned.
  • an apparatus for constructing visual phrases based on image feature space and image space space including a processor and a memory, wherein the processor reads a computer program in the memory for executing Do the following:
  • For each key feature word extract a neighborhood feature word that has a geometric relationship with the key feature word from the visual word set, and form a corresponding visual phrase with the key feature word;
  • the visual phrase construction method and device provided by the present invention can make full use of the context constraints of the local feature space of the image and the effective information of the image space space, and greatly improve the accuracy of image visual feature extraction. Applying the method and device for constructing visual phrases to image recognition can greatly improve the accuracy of image retrieval, classification and recognition.
  • FIG. 1 is a schematic flowchart of a method for constructing a visual phrase provided by an embodiment of the present invention
  • FIG. 2 is a schematic diagram of matching of two visual phrases in an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of an apparatus for constructing a visual phrase provided by an embodiment of the present invention.
  • the method for constructing a visual phrase based on an image feature space and an image airspace space includes the following steps:
  • each small circle in the figure represents a visual word.
  • the target image is a certain image in the database.
  • the purpose of selecting visual words with high frequency in the target image is to avoid insufficient number of visual phrases in some images.
  • a visual phrase is formed by taking the stable triangular structure of local co-occurrence in the image space as an example.
  • Each key feature word is represented in the target image as a micro-region o 1 (some small circle in Figure 2).
  • a circle is drawn with the current micro area o 1 as the center and the predetermined distance as the radius.
  • a and B represent two different visual phrases; vw is the code of the visual word to which the vertex of the visual phrase belongs.
  • D vp 0, it means that the three vertices (small circles) corresponding to the two visual phrases are aligned one-to-one.
  • the two visual word groups are of different types.
  • the three vertices in the two visual phrases can be aligned in a one-to-one correspondence, it indicates that the two visual phrases belong to the same type of visual phrases. Because, in two visual phrase matching, as long as the three vertices are aligned, the corresponding corners and edges will also be aligned accordingly.
  • a certain vertex belongs to a certain key feature word or neighborhood feature word, so the position information of the key feature word and neighborhood feature word refers to the position of the vertex in the target image.
  • Location information including information about the visual word to which the vertex belongs, the angle at which the vertex is located, and the vertex-to-edge.
  • the corresponding key feature words or neighborhood feature words are encoded
  • vertex a ⁇ vw a ,ang a ,eg a ⁇ ;
  • vw a is the code of the visual word to which vertex a belongs
  • ang a is the angle normalization code of the angle where the vertex a is located
  • eg a is the side length normalization code of the vertex a to the edge.
  • the encoding of the current category visual phrase is composed.
  • each category of visual phrases establish a set of visual phrases describing the characteristics of the target image
  • the set of codes be a set of visual phrases that characterize the target image.
  • the frequency of occurrence of all categories of visual phrases is counted in the target image, and the visual phrases with higher occurrence frequency are selected as the features of the image, and a set of visual phrases VP (vp 1 , vp 2 , ..., vp n ).
  • the set VP (vp 1 , vp 2 , . precision.
  • the present invention also provides a visual phrase construction device based on image feature space and image space space, including a processor 21 and a memory 22, and can also be based on actual needs. It further includes communication components, sensor components, power supply components, multimedia components and input/output interfaces.
  • the memory, communication components, sensor components, power supply components, multimedia components and input/output interfaces are all connected to the processor 21 .
  • the memory 22 in the node device may be static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable Read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, etc.
  • the processor can be a central processing unit (CPU), a graphics processing unit (GPU), a field programmable logic gate array (FPGA), a dedicated Integrated circuit (ASIC), digital signal processing (DSP) chip, etc.
  • Other communication components, sensor components, power supply components, multimedia components, etc. can all be implemented by using common components in existing smart terminals, and will not be described in detail here.
  • the processor 21 reads the computer program in the memory 22 for performing the following operations:
  • For each key feature word extract a neighborhood feature word that has a geometric relationship with the key feature word from the visual word set, and form a corresponding visual phrase with the key feature word;
  • the visual phrase construction method and device provided by the present invention combine the local feature space of the image and the image airspace space to jointly construct the visual phrase, which can greatly reduce the ambiguity of the visual phrase in the image matching process, and obtain higher discrimination. visual phrases.
  • the present invention classifies and encodes the visual phrase based on the feature space attribute of the vertex of the visual phrase and the relationship between the vertices. This code can more accurately represent image features, which can greatly improve the accuracy of image retrieval, segmentation and recognition.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Image Analysis (AREA)

Abstract

Provided are a visual phrase construction method and apparatus based on an image feature space and a spatial-domain space. The method comprises the following steps: extracting, from a target image, visual words meeting a preset condition, so as to form a visual word set (S101); selecting, from the visual word set, each key feature word in a target area in the target image (S102); for each key feature word, extracting, from the visual word set, a neighborhood feature word having a geometrical relationship with the key feature word, so that same and the key feature word form a corresponding visual phrase (S103); and on the basis of formed visual phrases, establishing a visual phrase set for describing the features of the target image (S104). A local feature space and a spatial-domain space of an image are combined to jointly construct a visual phrase, so that the ambiguity of the current visual phrase during an image matching process can be greatly reduced, and a visual phrase with higher discrimination is obtained.

Description

基于图像特征空间和空域空间的视觉词组构建方法和装置Method and device for constructing visual phrases based on image feature space and spatial space
技术邻域Technology neighborhood
本发明涉及一种基于图像特征空间和图像空域空间的视觉词组构建方法,同时涉及相应的视觉词组构建装置,属于图像识别技术领域。The invention relates to a visual phrase construction method based on image feature space and image airspace space, and also relates to a corresponding visual phrase construction device, belonging to the technical field of image recognition.
背景技术Background technique
图像视觉特征的提取与表达是图像检索、分割和识别类算法中最基本、最核心的部分。高区分性的特征(Discriminative Feature)无论对图像的检索还是图像的分割、识别都有重要的意义。The extraction and expression of image visual features is the most basic and core part of image retrieval, segmentation and recognition algorithms. Discriminative features are of great significance to image retrieval, image segmentation and recognition.
根据特征的表达方式,图像特征通常可以分为:底层特征、中层特征和高层特征。低层特征往往由边缘、颜色、纹理等底层特征为基本单元构成,高层特征是按人的认知方式来理解图像的高层语义信息,而中层特征是为了减小底层与高层之间的语义鸿沟而提出的一种在底层特征分析的基础上产生的特征。从人的认知角度来看,对图像的理解,首先是抽象程度较高的高层语义特征,同时又包括简单的底层特征。因此,人的视觉对图像的理解是不同层次、不同粒度语义信息的获取过程。According to the expression of features, image features can usually be divided into: low-level features, middle-level features and high-level features. The low-level features are often composed of low-level features such as edges, colors, and textures as basic units. A proposed feature is generated based on the underlying feature analysis. From the perspective of human cognition, the understanding of an image is first of all high-level semantic features with a high degree of abstraction, and also includes simple low-level features. Therefore, human vision's understanding of images is the process of acquiring semantic information at different levels and granularities.
传统的图像特征模型往往由特征点、边缘、颜色、纹理等底层特征为基本单元,向上构建复杂语义和概念抽象。但是由于底层特征及其构建的局部特征,往往存在同义性和多义性,即相似的局部特征可能量化到不同的局部特征上,而不相似的局部特征也可能量化到相同的局部特征上,而基于深度学习提取的语义特征还存在着区分性不高的现状。另一方面,图像来源及种类多样复杂,并受到尺度、光照、视角和复杂背景等不同因素的制约,从而产生底层特征和高层语义之间的语义鸿沟。因此,如何定义高区分性的图像特征,如何克服图像特征的同义性和多义性,仍然是目前亟待解决的问题。Traditional image feature models often use low-level features such as feature points, edges, colors, and textures as basic units, and build complex semantics and conceptual abstractions upward. However, due to the underlying features and their constructed local features, there are often synonymy and ambiguity, that is, similar local features may be quantified to different local features, and dissimilar local features may also be quantified to the same local feature. , and the semantic features extracted based on deep learning still have the status quo that the discrimination is not high. On the other hand, image sources and types are diverse and complex, and are constrained by different factors such as scale, illumination, perspective, and complex background, resulting in a semantic gap between low-level features and high-level semantics. Therefore, how to define highly discriminative image features and how to overcome the synonymy and ambiguity of image features is still an urgent problem to be solved.
发明内容SUMMARY OF THE INVENTION
本发明所要解决的首要技术问题在于提供一种基于图像特征空间和图像空域空间的视觉词组构建方法。The primary technical problem to be solved by the present invention is to provide a visual phrase construction method based on image feature space and image space space.
本发明所要解决的另一技术问题在于提供一种基于图像特征空间和 图像空域空间的视觉词组构建装置。Another technical problem to be solved by the present invention is to provide a visual phrase construction device based on image feature space and image space space.
为了实现上述目的,本发明采用下述的技术方案:In order to achieve the above object, the present invention adopts the following technical scheme:
根据本发明实施例的第一方面,提供一种基于图像特征空间和图像空域空间的视觉词组构建方法,包括如下步骤:According to a first aspect of the embodiments of the present invention, a method for constructing a visual phrase based on an image feature space and an image airspace space is provided, including the following steps:
提取目标图像中满足预设条件的视觉单词,组成视觉单词集合;Extract the visual words that meet the preset conditions in the target image to form a visual word set;
在视觉单词集合中选取目标图像中目标区域内的各关键特征单词;Select each key feature word in the target area in the target image from the visual word set;
针对每一个关键特征单词,在视觉单词集合中提取与该关键特征单词有几何关系的邻域特征单词,与该关键特征单词构成对应视觉词组;For each key feature word, extract a neighborhood feature word that has a geometric relationship with the key feature word from the visual word set, and form a corresponding visual phrase with the key feature word;
基于构成的各视觉词组,建立描述目标图像特征的视觉词组集合。Based on the constituted visual phrases, a set of visual phrases describing the features of the target image is established.
其中较优地,所述提取目标图像中满足预设条件的视觉单词,组成视觉单词集合,具体包括如下步骤:Preferably, the extraction of visual words satisfying preset conditions in the target image to form a visual word set specifically includes the following steps:
将目标图像的局部特征量化为各视觉单词;Quantify the local features of the target image into visual words;
针对视觉单词的类别,统计各类视觉单词的出现频率,选择频率高于预设频率的视觉单词组成视觉单词集合。According to the categories of visual words, the frequency of occurrence of various visual words is counted, and visual words with a frequency higher than a preset frequency are selected to form a visual word set.
其中较优地,所述在视觉单词集合中提取与该关键特征单词有几何关系的邻域特征单词,具体包括如下步骤:Preferably, the extraction of neighborhood feature words that have a geometric relationship with the key feature word in the visual word set specifically includes the following steps:
以当前关键特征单词在目标图像中的位置为圆心,以预定距离为半径画圆;Draw a circle with the position of the current key feature word in the target image as the center and a predetermined distance as the radius;
在视觉单词集合中找寻当前关键特征单词对应的邻域特征单词;所述邻域特征单词的在目标图像中的位置必须在所画的圆内。Find the neighborhood feature word corresponding to the current key feature word in the visual word set; the position of the neighborhood feature word in the target image must be within the drawn circle.
其中较优地,所述与该关键特征单词构成对应视觉词组,具体包括如下步骤:Preferably, forming a corresponding visual phrase with the key feature word specifically includes the following steps:
以当前关键特征单词的位置,以及任意两个对应邻域特征单词的位置为顶点,组成三角形;Take the position of the current key feature word and the positions of any two corresponding neighborhood feature words as vertices to form a triangle;
判定该三角形的最短边长大于预设边长,且判定该三角形的最小角大于预设角后,则选择该三角形对应的当前关键特征单词和任意两个对应邻域特征单词为目标图像的一个视觉词组。After determining that the shortest side length of the triangle is greater than the preset side length, and determining that the minimum angle of the triangle is greater than the preset angle, then select the current key feature word corresponding to the triangle and any two corresponding neighborhood feature words as one of the target images visual phrases.
其中较优地,所述基于构成的各视觉词组,建立描述目标图像特征的视觉词组集合,具体包括如下步骤:Preferably, based on each visual phrase formed, establishing a visual phrase set describing the feature of the target image specifically includes the following steps:
将构成的各视觉词组进行分类;Classify each formed visual phrase;
对同一类别的视觉词组进行编码;encode visual phrases of the same category;
根据各类别视觉词组的编码,建立描述目标图像特征的视觉词组集合。According to the coding of each category of visual phrases, a set of visual phrases describing the characteristics of the target image is established.
其中较优地,所述将构成的各视觉词组进行分类,具体包括如下步骤:Preferably, the classification of the formed visual phrases specifically includes the following steps:
判断任意两个视觉词组中关键特征单词和邻域特征单词在目标图像中的位置是否能一一对应对齐;Determine whether the positions of key feature words and neighborhood feature words in any two visual phrases in the target image can be aligned one-to-one;
若能对齐,则所述两个视觉词组属于同一类型;If they can be aligned, the two visual phrases belong to the same type;
若不能对齐,则所述两个视觉词组分属不同类型。If they cannot be aligned, the two visual word groups are of different types.
其中较优地,所述对同一类别的视觉词组进行编码,具体包括如下步骤:Preferably, the encoding of the visual phrases of the same category specifically includes the following steps:
根据当前类别视觉词组的关键特征单词和邻域特征单词在目标图像中的位置,获取关键特征单词和邻域特征单词的位置信息;According to the position of the key feature word and the neighborhood feature word of the current category visual phrase in the target image, obtain the position information of the key feature word and the neighborhood feature word;
根据所述位置信息,对对应关键特征单词或邻域特征单词进行编码;According to the position information, the corresponding key feature words or neighborhood feature words are encoded;
根据当前类别视觉词组的关键特征单词和邻域特征单词的编码,组成当前类别视觉词组的编码。According to the encoding of the key feature words of the visual phrase of the current category and the encoding of the neighboring feature words, the encoding of the visual phrase of the current category is composed.
其中较优地,所述根据各类别视觉词组的编码,建立描述目标图像特征的视觉词组集合,具体包括如下步骤:Preferably, according to the coding of each category of visual phrases, establishing a visual phrase set describing the characteristics of the target image specifically includes the following steps:
统计各类视觉词组的出现频率;Count the frequency of occurrence of various visual phrases;
将频率高于预定频率的视觉词组的编码组成编码集合;Composing codes of visual phrases whose frequencies are higher than a predetermined frequency into a code set;
令所述编码集合为描述目标图像特征的视觉词组集合。Let the set of codes be the set of visual phrases that describe the features of the target image.
其中较优地,所述判断任意两个视觉词组中关键特征单词和邻域特征单词在目标图像中的位置是否能一一对应对齐,具体包括如下步骤:Preferably, the judging whether the positions of the key feature words and the neighborhood feature words in any two visual phrases in the target image can be aligned one by one in a one-to-one correspondence, specifically includes the following steps:
获取各关键特征单词和邻域特征单词在目标图像中的位置所属视觉单词的编码;Obtain the encoding of the visual word to which the position of each key feature word and neighborhood feature word in the target image belongs;
根据所述位置所属视觉单词的编码,计算分属不同视觉词组的两个对应关键特征单词或邻域特征单词在目标图像中的最小位置距离;According to the coding of the visual word to which the position belongs, calculate the minimum position distance of two corresponding key feature words or neighborhood feature words belonging to different visual phrases in the target image;
若计算出的最小位置距离等于零,则判定两个对应关键特征单词或邻域特征单词在目标图像中的位置对齐。If the calculated minimum position distance is equal to zero, it is determined that the positions of the two corresponding key feature words or neighborhood feature words in the target image are aligned.
根据本发明实施例的第二方面,提供一种基于图像特征空间和图像空域空间的视觉词组构建装置,包括处理器和存储器,所述处理器读取所述存储器中的计算机程序,用于执行以下操作:According to a second aspect of the embodiments of the present invention, there is provided an apparatus for constructing visual phrases based on image feature space and image space space, including a processor and a memory, wherein the processor reads a computer program in the memory for executing Do the following:
提取目标图像中满足预设条件的视觉单词,组成视觉单词集合;Extract the visual words that meet the preset conditions in the target image to form a visual word set;
在视觉单词集合中选取目标图像中目标区域内的各关键特征单词;Select each key feature word in the target area in the target image from the visual word set;
针对每一个关键特征单词,在视觉单词集合中提取与该关键特征单词有几何关系的邻域特征单词,与该关键特征单词构成对应视觉词组;For each key feature word, extract a neighborhood feature word that has a geometric relationship with the key feature word from the visual word set, and form a corresponding visual phrase with the key feature word;
基于构成的各视觉词组,建立描述目标图像特征的视觉词组集合。Based on the constituted visual phrases, a set of visual phrases describing the features of the target image is established.
本发明所提供的视觉词组构建方法和装置,可以充分利用图像的局部特征空间的上下文约束和图像空域空间的有效信息,大大提高了图像视觉特征提取的精准度。将该视觉词组构建方法和装置应用到图像识别中,可以大大提高图像检索、分类和识别的准确度。The visual phrase construction method and device provided by the present invention can make full use of the context constraints of the local feature space of the image and the effective information of the image space space, and greatly improve the accuracy of image visual feature extraction. Applying the method and device for constructing visual phrases to image recognition can greatly improve the accuracy of image retrieval, classification and recognition.
附图说明Description of drawings
图1为本发明实施例提供的视觉词组构建方法的流程示意图;1 is a schematic flowchart of a method for constructing a visual phrase provided by an embodiment of the present invention;
图2为本发明的一个实施例中,两个视觉词组的匹配示意图;2 is a schematic diagram of matching of two visual phrases in an embodiment of the present invention;
图3为本发明实施例提供的视觉词组构建装置的结构示意图。FIG. 3 is a schematic structural diagram of an apparatus for constructing a visual phrase provided by an embodiment of the present invention.
具体实施方式Detailed ways
下面结合附图和具体实施例对本发明的技术内容进行详细具体的说明。The technical content of the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
如图1所示,本发明实施例提供的基于图像特征空间和图像空域空间的视觉词组构建方法,包括如下步骤:As shown in FIG. 1 , the method for constructing a visual phrase based on an image feature space and an image airspace space provided by an embodiment of the present invention includes the following steps:
101.提取目标图像中满足预设条件的视觉单词,组成视觉单词集合;101. Extract visual words that meet preset conditions in the target image to form a visual word set;
如图2所示,图中的每个小圆圈就代表某个视觉单词。As shown in Figure 2, each small circle in the figure represents a visual word.
具体地,包括如下步骤:Specifically, it includes the following steps:
1011.将目标图像的局部特征量化为各视觉单词;1011. Quantify the local features of the target image into visual words;
1012.针对视觉单词的类别,统计各类视觉单词的出现频率,选择频率高于预设频率的视觉单词组成视觉单词集合W(w 1,w 2,…,w n)。 1012. According to the categories of visual words, count the frequency of occurrence of various types of visual words, and select visual words whose frequencies are higher than a preset frequency to form a visual word set W (w 1 , w 2 , . . . , w n ).
在本发明的一个实施例中,目标图像为数据库中的某一图像。选择目标图像出现频率高的视觉单词,是为了避免在某些图像中出现视觉词组数量不足的情况。In one embodiment of the present invention, the target image is a certain image in the database. The purpose of selecting visual words with high frequency in the target image is to avoid insufficient number of visual phrases in some images.
102.在视觉单词集合中选取目标图像中目标区域内的各关键特征单词;102. Select each key feature word in the target area in the target image from the visual word set;
103.针对每一个关键特征单词,在视觉单词集合中提取与该关键特征单词有几何关系的邻域特征单词,与该关键特征单词构成对应视觉词 组;103. For each key feature word, extract the neighborhood feature word that has a geometric relationship with this key feature word in the visual word set, and form a corresponding visual phrase with this key feature word;
具体地,包括如下步骤:Specifically, it includes the following steps:
1031.以当前关键特征单词在目标图像中的位置为圆心,以预定距离为半径画圆;1031. Take the position of the current key feature word in the target image as the center of the circle, and draw a circle with a predetermined distance as a radius;
在本发明的一个实施例中,以图像空间局部共生的稳定三角结构为例,组建视觉词组。每个关键特征单词在目标图像中表示为一个微区域o 1(图2中的某个小圆圈)。以当前微区域o 1为圆心,以预定距离为半径画圆。 In an embodiment of the present invention, a visual phrase is formed by taking the stable triangular structure of local co-occurrence in the image space as an example. Each key feature word is represented in the target image as a micro-region o 1 (some small circle in Figure 2). A circle is drawn with the current micro area o 1 as the center and the predetermined distance as the radius.
1032.在视觉单词集合中找寻当前关键特征单词对应的邻域特征单词;所述邻域特征单词的在目标图像中的位置必须在所画的圆内。1032. Find the neighborhood feature word corresponding to the current key feature word in the visual word set; the position of the neighborhood feature word in the target image must be within the drawn circle.
在所画圆的区域内,找寻其余两个小圆圈作为当前微区域o 1的邻域特征单词。 In the area of the drawn circle, find the remaining two small circles as the neighborhood feature words of the current micro area o1.
1033.以当前关键特征单词的位置,以及任意两个对应邻域特征单词的位置为顶点,组成三角形;1033. Use the position of the current key feature word and the positions of any two corresponding neighborhood feature words as vertices to form a triangle;
以上述步骤中找到的三个小圆圈(当前微区域o 1和对应的两个邻域特征单词)为顶点,组成一个三角形,该三角形就代表一个视觉词组。 Taking the three small circles found in the above steps (the current micro area o 1 and the corresponding two neighborhood feature words) as vertices, a triangle is formed, and the triangle represents a visual phrase.
1032.判定该三角形的最短边长大于预设边长,且判定该三角形的最小角大于预设角后,则选择该三角形对应的当前关键特征单词和任意两个对应邻域特征单词为目标图像的一个视觉词组。1032. After determining that the shortest side length of the triangle is greater than the preset side length, and determining that the minimum angle of the triangle is greater than the preset angle, then select the current key feature word corresponding to the triangle and any two corresponding neighborhood feature words as the target image. a visual phrase.
预先计算预设边长和预设角,在组成的各三角形中,去掉最小角过小和最短边过短的组合,目的的是使视觉词组在图像空间尽量规整。Pre-calculate the preset side length and preset angle, and remove the combination of the smallest angle being too small and the shortest side being too short in each formed triangle, in order to make the visual phrases as regular as possible in the image space.
104.基于构成的各视觉词组,建立描述目标图像特征的视觉词组集合;104. Based on each formed visual phrase, establish a visual phrase set describing the feature of the target image;
具体地,包括如下步骤:Specifically, it includes the following steps:
1041.将构成的各视觉词组进行分类;具体地,包括如下步骤:1041. Classify each visual phrase formed; specifically, including the following steps:
10411.判断任意两个视觉词组中关键特征单词和邻域特征单词在目标图像中的位置是否能一一对应对齐;10411. Determine whether the positions of key feature words and neighborhood feature words in any two visual phrases in the target image can be aligned one-to-one;
具体地,包括如下步骤:Specifically, it includes the following steps:
104111.获取各关键特征单词和邻域特征单词在目标图像中的位置所属视觉单词的编码;104111. Obtain the coding of the visual word to which the position of each key feature word and neighborhood feature word in the target image belongs;
在本发明的一个实施例中,假设当前视觉词组(三角形)的三个小 圆圈分别表示为a、b、c;那么a、b、c的所属视觉单词的编码分别为:vw a、vw b、vw cIn an embodiment of the present invention, it is assumed that the three small circles of the current visual phrase (triangle) are represented as a, b, and c respectively; then the codes of the visual words to which a, b, and c belong are respectively: vw a , vw b , vw c .
104112.根据所述位置所属视觉单词的编码,计算分属不同视觉词组的两个对应关键特征单词或邻域特征单词在目标图像中的最小位置距离;104112. According to the coding of the visual word to which the position belongs, calculate the minimum position distance of two corresponding key feature words or neighborhood feature words belonging to different visual phrases in the target image;
如图2所示,将分属两个视觉词组的两个小圆圈(图中横线两端所处的位置)进行匹配。计算公式为:As shown in Figure 2, two small circles (the positions at both ends of the horizontal line in the figure) belonging to two visual phrases are matched. The calculation formula is:
D vp=min∑ i∈A,j∈B|vw i-vw j| i,j=1,2,3     (1) D vp =min∑i∈A ,j∈B |vw i -vw j | i,j=1,2,3 (1)
公式(1)中,A、B代表两个不同的视觉词组;vw为视觉词组的顶点所属视觉单词的编码。In formula (1), A and B represent two different visual phrases; vw is the code of the visual word to which the vertex of the visual phrase belongs.
104113.若计算出的最小位置距离等于零,则判定两个对应关键特征单词或邻域特征单词在目标图像中的位置对齐。104113. If the calculated minimum position distance is equal to zero, then determine that the positions of the two corresponding key feature words or neighborhood feature words are aligned in the target image.
在本发明的一个实施例中,若D vp=0,则代表两个视觉词组对应的三个顶点(小圆圈)一一对齐。 In an embodiment of the present invention, if D vp =0, it means that the three vertices (small circles) corresponding to the two visual phrases are aligned one-to-one.
10412.若能对齐,则所述两个视觉词组属于同一类型;10412. If they can be aligned, the two visual phrases belong to the same type;
10413.若不能对齐,则所述两个视觉词组分属不同类型。10413. If they cannot be aligned, the two visual word groups are of different types.
在本发明实施例中,两个视觉词组中三个顶点能一一对应对齐,则表明两个视觉词组属于同一类视觉词组。因为,在两个视觉词组匹配中,只要将三个顶点对齐,则对应的角和边也会相应地对齐。In the embodiment of the present invention, if the three vertices in the two visual phrases can be aligned in a one-to-one correspondence, it indicates that the two visual phrases belong to the same type of visual phrases. Because, in two visual phrase matching, as long as the three vertices are aligned, the corresponding corners and edges will also be aligned accordingly.
1042.对同一类别的视觉词组进行编码;1042. Encode visual phrases of the same class;
具体地,包括如下步骤:Specifically, it includes the following steps:
10421.根据当前类别视觉词组的关键特征单词和邻域特征单词在目标图像中的位置,获取关键特征单词和邻域特征单词的位置信息;10421. According to the position of the key feature word and the neighborhood feature word of the current category visual phrase in the target image, obtain the position information of the key feature word and the neighborhood feature word;
在本发明的一个实施例中,某个顶点(小圆圈)隶属于某一个关键特征单词或邻域特征单词,所以关键特征单词和邻域特征单词的位置信息指的是顶点在目标图像中的位置信息,包括与顶点所属视觉单词、顶点所在的角度和顶点对边相关的信息。In one embodiment of the present invention, a certain vertex (small circle) belongs to a certain key feature word or neighborhood feature word, so the position information of the key feature word and neighborhood feature word refers to the position of the vertex in the target image. Location information, including information about the visual word to which the vertex belongs, the angle at which the vertex is located, and the vertex-to-edge.
10422.根据所述位置信息,对对应关键特征单词或邻域特征单词进行编码;10422. According to the position information, the corresponding key feature words or neighborhood feature words are encoded;
以顶点a为例,获取顶点a的所属视觉单词、所在角的角度和对边边长,得到顶点a的编码:v a={vw a,ang a,eg a}; Taking vertex a as an example, obtain the visual word of vertex a, the angle of the corner and the length of the opposite side, and obtain the code of vertex a: v a ={vw a ,ang a ,eg a };
其中,vw a是顶点a的所属视觉单词的编码,ang a是顶点a所在角的角 度归一化编码,eg a是顶点a对边的边长归一化编码。 Among them, vw a is the code of the visual word to which vertex a belongs, ang a is the angle normalization code of the angle where the vertex a is located, and eg a is the side length normalization code of the vertex a to the edge.
10423.根据当前类别视觉词组的关键特征单词和邻域特征单词的编码,组成当前类别视觉词组的编码。10423. According to the encoding of the key feature words and the neighborhood feature words of the current category visual phrase, the encoding of the current category visual phrase is composed.
以顶点a、b、c的编码为基础确定所属视觉词组的编码vp={v a,v b,v c}。 Based on the codes of the vertices a, b , and c , the code vp={va, vb, vc } of the visual phrase to which they belong is determined.
1043.根据各类别视觉词组的编码,建立描述目标图像特征的视觉词组集合;1043. According to the coding of each category of visual phrases, establish a set of visual phrases describing the characteristics of the target image;
具体地,包括如下步骤:Specifically, it includes the following steps:
10431.统计各类视觉词组的出现频率;10431. Count the frequency of occurrence of various visual phrases;
10432.将频率高于预定频率的视觉词组的编码组成编码集合;10432. Form codes of visual phrases with frequencies higher than a predetermined frequency into code sets;
10433.令所述编码集合为描述目标图像特征的视觉词组集合。10433. Let the set of codes be a set of visual phrases that characterize the target image.
在本发明的一个实施例中,在目标图像中统计所有类别的视觉词组出现的频率,从中选取出现频率较高的视觉词组表示作为图像的特征,建立描述图像特征的视觉词组集合VP(vp 1,vp 2,…,vp n)。 In an embodiment of the present invention, the frequency of occurrence of all categories of visual phrases is counted in the target image, and the visual phrases with higher occurrence frequency are selected as the features of the image, and a set of visual phrases VP (vp 1 , vp 2 , ..., vp n ).
集合VP(vp 1,vp 2,…,vp n)可以很精准的描述目标图像特征,将此集合描述的图像特征应用到图像检索,分割和识别中,可以大大提高图像检索,分割和识别的精准度。 The set VP (vp 1 , vp 2 , . precision.
如图3所示,为实现本发明提供的视觉词组构建方法,本发明还提供一种基于图像特征空间和图像空域空间的视觉词组构建装置,包括处理器21和存储器22,还可以根据实际需要进一步包括通信组件、传感器组件、电源组件、多媒体组件及输入/输出接口。其中,存储器、通信组件、传感器组件、电源组件、多媒体组件及输入/输出接口均与该处理器21连接。前已述及,节点设备中的存储器22可以是静态随机存取存储器(SRAM)、电可擦除可编程只读存储器(EEPROM)、可擦除可编程只读存储器(EPROM)、可编程只读存储器(PROM)、只读存储器(ROM)、磁存储器、快闪存储器等,处理器可以是中央处理器(CPU)、图形处理器(GPU)、现场可编程逻辑门阵列(FPGA)、专用集成电路(ASIC)、数字信号处理(DSP)芯片等。其它通信组件、传感器组件、电源组件、多媒体组件等均可以采用现有智能终端中的通用部件实现,在此就不具体说明了。As shown in FIG. 3, in order to realize the visual phrase construction method provided by the present invention, the present invention also provides a visual phrase construction device based on image feature space and image space space, including a processor 21 and a memory 22, and can also be based on actual needs. It further includes communication components, sensor components, power supply components, multimedia components and input/output interfaces. The memory, communication components, sensor components, power supply components, multimedia components and input/output interfaces are all connected to the processor 21 . As mentioned above, the memory 22 in the node device may be static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable Read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, etc. The processor can be a central processing unit (CPU), a graphics processing unit (GPU), a field programmable logic gate array (FPGA), a dedicated Integrated circuit (ASIC), digital signal processing (DSP) chip, etc. Other communication components, sensor components, power supply components, multimedia components, etc. can all be implemented by using common components in existing smart terminals, and will not be described in detail here.
另一方面,在上述基于图像特征空间和图像空域空间的视觉词组构建装置中,所述处理器21读取所述存储器22中的计算机程序,用于执 行以下操作:On the other hand, in the above-mentioned visual phrase construction device based on image feature space and image space space, the processor 21 reads the computer program in the memory 22 for performing the following operations:
提取目标图像中满足预设条件的视觉单词,组成视觉单词集合;Extract the visual words that meet the preset conditions in the target image to form a visual word set;
在视觉单词集合中选取目标图像中目标区域内的各关键特征单词;Select each key feature word in the target area in the target image from the visual word set;
针对每一个关键特征单词,在视觉单词集合中提取与该关键特征单词有几何关系的邻域特征单词,与该关键特征单词构成对应视觉词组;For each key feature word, extract a neighborhood feature word that has a geometric relationship with the key feature word from the visual word set, and form a corresponding visual phrase with the key feature word;
基于构成的各视觉词组,建立描述目标图像特征的视觉词组集合。Based on the constituted visual phrases, a set of visual phrases describing the features of the target image is established.
本发明所提供的视觉词组构建方法和装置,将图像的局部特征空间和图像空域空间进行结合,共同构建视觉词组,可以大大降低视觉词组在图像匹配过程中的歧义性,获得有更高区分性的视觉词组。同时,本发明基于视觉词组顶点的特征空间属性和顶点间的关系,对视觉词组进行分类和编码。此编码可以更加精准地代表图像特征,从而可以大大提高图像检索、分割和识别的精准度。The visual phrase construction method and device provided by the present invention combine the local feature space of the image and the image airspace space to jointly construct the visual phrase, which can greatly reduce the ambiguity of the visual phrase in the image matching process, and obtain higher discrimination. visual phrases. At the same time, the present invention classifies and encodes the visual phrase based on the feature space attribute of the vertex of the visual phrase and the relationship between the vertices. This code can more accurately represent image features, which can greatly improve the accuracy of image retrieval, segmentation and recognition.
上面对本发明所提供的基于图像特征空间和空域空间的视觉词组构建方法和装置进行了详细的说明。对本领域的一般技术人员而言,在不背离本发明实质内容的前提下对它所做的任何显而易见的改动,都将构成对本发明专利权的侵犯,将承担相应的法律责任。The method and apparatus for constructing visual phrases based on image feature space and spatial space provided by the present invention are described in detail above. For those of ordinary skill in the art, any obvious changes made to the present invention without departing from the essential content of the present invention will constitute an infringement of the patent right of the present invention, and will bear corresponding legal responsibilities.

Claims (10)

  1. 一种基于图像特征空间和空域空间的视觉词组构建方法,其特征在于包括如下步骤:A method for constructing visual phrases based on image feature space and airspace space, characterized in that it comprises the following steps:
    提取目标图像中满足预设条件的视觉单词,组成视觉单词集合;Extract the visual words that meet the preset conditions in the target image to form a visual word set;
    在视觉单词集合中选取目标图像中目标区域内的各关键特征单词;Select each key feature word in the target area in the target image from the visual word set;
    针对每一个关键特征单词,在视觉单词集合中提取与该关键特征单词有几何关系的邻域特征单词,与该关键特征单词构成对应视觉词组;For each key feature word, extract a neighborhood feature word that has a geometric relationship with the key feature word from the visual word set, and form a corresponding visual phrase with the key feature word;
    基于构成的各视觉词组,建立描述目标图像特征的视觉词组集合。Based on the constituted visual phrases, a set of visual phrases describing the features of the target image is established.
  2. 如权利要求1所述的基于图像特征空间和空域空间的视觉词组构建方法,其特征在于,所述提取目标图像中满足预设条件的视觉单词,组成视觉单词集合,具体包括如下步骤:The method for constructing visual phrases based on image feature space and airspace space as claimed in claim 1, wherein the extraction of visual words meeting preset conditions in the target image to form a visual word set specifically includes the following steps:
    将目标图像的局部特征量化为各视觉单词;Quantify the local features of the target image into visual words;
    针对视觉单词的类别,统计各类视觉单词的出现频率,选择频率高于预设频率的视觉单词组成视觉单词集合。According to the categories of visual words, the frequency of occurrence of various visual words is counted, and visual words with a frequency higher than a preset frequency are selected to form a visual word set.
  3. 如权利要求1所述的基于图像特征空间和空域空间的视觉词组构建方法,其特征在于,所述在视觉单词集合中提取与该关键特征单词有几何关系的邻域特征单词,具体包括如下步骤:The method for constructing visual phrases based on an image feature space and an airspace space according to claim 1, wherein the extraction of the neighborhood feature words having a geometric relationship with the key feature words in the visual word set specifically includes the following steps :
    以当前关键特征单词在目标图像中的位置为圆心,以预定距离为半径画圆;Draw a circle with the position of the current key feature word in the target image as the center and a predetermined distance as the radius;
    在视觉单词集合中找寻当前关键特征单词对应的邻域特征单词;所述邻域特征单词的在目标图像中的位置必须在所画的圆内。Find the neighborhood feature word corresponding to the current key feature word in the visual word set; the position of the neighborhood feature word in the target image must be within the drawn circle.
  4. 如权利要求3所述的基于图像特征空间和空域空间的视觉词组构建方法,其特征在于,所述与该关键特征单词构成对应视觉词组,具体包括如下步骤:The method for constructing a visual phrase based on an image feature space and an airspace space as claimed in claim 3, characterized in that, forming a corresponding visual phrase with the key feature word specifically comprises the following steps:
    以当前关键特征单词的位置,以及任意两个对应邻域特征单词的位置为顶点,组成三角形;Take the position of the current key feature word and the positions of any two corresponding neighborhood feature words as vertices to form a triangle;
    判定该三角形的最短边长大于预设边长,且判定该三角形的最小角大于预设角后,则选择该三角形对应的当前关键特征单词和任意两个对应邻域特征单词为目标图像的一个视觉词组。After determining that the shortest side length of the triangle is greater than the preset side length, and determining that the minimum angle of the triangle is greater than the preset angle, then select the current key feature word corresponding to the triangle and any two corresponding neighborhood feature words as one of the target images visual phrases.
  5. 如权利要求1所述的基于图像特征空间和空域空间的视觉词组构 建方法,其特征在于,所述基于构成的各视觉词组,建立描述目标图像特征的视觉词组集合,具体包括如下步骤:The visual phrase construction method based on image feature space and airspace space as claimed in claim 1, is characterized in that, described each visual phrase based on formation, sets up the visual phrase set describing target image feature, specifically comprises the steps:
    将构成的各视觉词组进行分类;Classify each formed visual phrase;
    对同一类别的视觉词组进行编码;encode visual phrases of the same category;
    根据各类别视觉词组的编码,建立描述目标图像特征的视觉词组集合。According to the coding of each category of visual phrases, a set of visual phrases describing the characteristics of the target image is established.
  6. 如权利要求5所述的基于图像特征空间和空域空间的视觉词组构建方法,其特征在于,所述将构成的各视觉词组进行分类,具体包括如下步骤:The method for constructing visual phrases based on an image feature space and an airspace space as claimed in claim 5, wherein the classifying each of the formed visual phrases specifically includes the following steps:
    判断任意两个视觉词组中关键特征单词和邻域特征单词在目标图像中的位置是否能一一对应对齐;Determine whether the positions of key feature words and neighborhood feature words in any two visual phrases in the target image can be aligned one-to-one;
    若能对齐,则所述两个视觉词组属于同一类型;If they can be aligned, the two visual phrases belong to the same type;
    若不能对齐,则所述两个视觉词组分属不同类型。If they cannot be aligned, the two visual word groups are of different types.
  7. 如权利要求5所述的基于图像特征空间和空域空间的视觉词组构建方法,其特征在于,所述对同一类别的视觉词组进行编码,具体包括如下步骤:The method for constructing visual phrases based on image feature space and airspace space as claimed in claim 5, wherein the coding of visual phrases of the same category specifically includes the following steps:
    根据当前类别视觉词组的关键特征单词和邻域特征单词在目标图像中的位置,获取关键特征单词和邻域特征单词的位置信息;According to the position of the key feature word and the neighborhood feature word of the current category visual phrase in the target image, obtain the position information of the key feature word and the neighborhood feature word;
    根据所述位置信息,对对应关键特征单词或邻域特征单词进行编码;According to the position information, the corresponding key feature words or neighborhood feature words are encoded;
    根据当前类别视觉词组的关键特征单词和邻域特征单词的编码,组成当前类别视觉词组的编码。According to the encoding of the key feature words of the visual phrase of the current category and the encoding of the neighboring feature words, the encoding of the visual phrase of the current category is composed.
  8. 如权利要求5所述的基于图像特征空间和空域空间的视觉词组构建方法,其特征在于,所述根据各类别视觉词组的编码,建立描述目标图像特征的视觉词组集合,具体包括如下步骤:The method for constructing visual phrases based on image feature space and airspace space as claimed in claim 5, characterized in that, according to the coding of each category of visual phrases, a set of visual phrases describing the feature of the target image is established, specifically comprising the following steps:
    统计各类视觉词组的出现频率;Count the frequency of occurrence of various visual phrases;
    将频率高于预定频率的视觉词组的编码组成编码集合;Composing the codes of visual phrases whose frequency is higher than a predetermined frequency into a code set;
    令所述编码集合为描述目标图像特征的视觉词组集合。Let the set of codes be the set of visual phrases that describe the features of the target image.
  9. 如权利要求6所述的基于图像特征空间和空域空间的视觉词组构建方法,其特征在于,所述判断任意两个视觉词组中关键特征单词和邻域特征单词在目标图像中的位置是否能一一对应对齐,具体包括如下步骤:The method for constructing visual phrases based on image feature space and airspace space as claimed in claim 6, wherein the judgment is to determine whether the positions of key feature words and neighborhood feature words in any two visual phrases in the target image can be consistent with each other. One-to-one alignment includes the following steps:
    获取各关键特征单词和邻域特征单词在目标图像中的位置所属视觉单词的编码;Obtain the encoding of the visual word to which the position of each key feature word and neighborhood feature word in the target image belongs;
    根据所述位置所属视觉单词的编码,计算分属不同视觉词组的两个对应关键特征单词或邻域特征单词在目标图像中的最小位置距离;According to the coding of the visual word to which the position belongs, calculate the minimum position distance of two corresponding key feature words or neighborhood feature words belonging to different visual phrases in the target image;
    若计算出的最小位置距离等于零,则判定两个对应关键特征单词或邻域特征单词在目标图像中的位置对齐。If the calculated minimum position distance is equal to zero, it is determined that the positions of the two corresponding key feature words or neighborhood feature words in the target image are aligned.
  10. 一种基于图像特征空间和空域空间的视觉词组构建装置,其特征在于包括处理器和存储器,所述处理器读取所述存储器中的计算机程序,用于执行以下操作:A device for constructing visual phrases based on image feature space and space space, characterized by comprising a processor and a memory, wherein the processor reads a computer program in the memory for performing the following operations:
    提取目标图像中满足预设条件的视觉单词,组成视觉单词集合;Extract the visual words that meet the preset conditions in the target image to form a visual word set;
    在视觉单词集合中选取目标图像中目标区域内的各关键特征单词;Select each key feature word in the target area in the target image from the visual word set;
    针对每一个关键特征单词,在视觉单词集合中提取与该关键特征单词有几何关系的邻域特征单词,与该关键特征单词构成对应视觉词组;For each key feature word, extract a neighborhood feature word that has a geometric relationship with the key feature word from the visual word set, and form a corresponding visual phrase with the key feature word;
    基于构成的各视觉词组,建立描述目标图像特征的视觉词组集合。Based on the formed visual phrases, a set of visual phrases describing the features of the target image is established.
PCT/CN2022/070305 2021-01-05 2022-01-05 Visual phrase construction method and apparatus based on image feature space and spatial-domain space WO2022148372A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110008899.1A CN112668590A (en) 2021-01-05 2021-01-05 Visual phrase construction method and device based on image feature space and airspace space
CN202110008899.1 2021-01-05

Publications (1)

Publication Number Publication Date
WO2022148372A1 true WO2022148372A1 (en) 2022-07-14

Family

ID=75412990

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/070305 WO2022148372A1 (en) 2021-01-05 2022-01-05 Visual phrase construction method and apparatus based on image feature space and spatial-domain space

Country Status (2)

Country Link
CN (1) CN112668590A (en)
WO (1) WO2022148372A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112668590A (en) * 2021-01-05 2021-04-16 瞬联软件科技(南京)有限公司 Visual phrase construction method and device based on image feature space and airspace space

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080069456A1 (en) * 2006-09-19 2008-03-20 Xerox Corporation Bags of visual context-dependent words for generic visual categorization
CN101894276A (en) * 2010-06-01 2010-11-24 中国科学院计算技术研究所 Training method of human action recognition and recognition method
WO2012156774A1 (en) * 2011-05-18 2012-11-22 Ltu Technologies Method and apparatus for detecting visual words which are representative of a specific image category
CN103310208A (en) * 2013-07-10 2013-09-18 西安电子科技大学 Identifiability face pose recognition method based on local geometrical visual phrase description
CN103970838A (en) * 2014-04-12 2014-08-06 北京工业大学 Society image tag ordering method based on compressed domains
CN104299010A (en) * 2014-09-23 2015-01-21 深圳大学 Image description method and system based on bag-of-words model
CN107944454A (en) * 2017-11-08 2018-04-20 国网电力科学研究院武汉南瑞有限责任公司 A kind of semanteme marking method based on machine learning for substation
CN112668590A (en) * 2021-01-05 2021-04-16 瞬联软件科技(南京)有限公司 Visual phrase construction method and device based on image feature space and airspace space

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440508B (en) * 2013-08-26 2016-06-08 河海大学 The Remote Sensing Target recognition methods of view-based access control model word bag model
CN105404886B (en) * 2014-09-16 2019-01-18 株式会社理光 Characteristic model generation method and characteristic model generating means
CN107480718A (en) * 2017-08-17 2017-12-15 南京信息工程大学 A kind of high-resolution remote sensing image sorting technique of view-based access control model bag of words

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080069456A1 (en) * 2006-09-19 2008-03-20 Xerox Corporation Bags of visual context-dependent words for generic visual categorization
CN101894276A (en) * 2010-06-01 2010-11-24 中国科学院计算技术研究所 Training method of human action recognition and recognition method
WO2012156774A1 (en) * 2011-05-18 2012-11-22 Ltu Technologies Method and apparatus for detecting visual words which are representative of a specific image category
CN103310208A (en) * 2013-07-10 2013-09-18 西安电子科技大学 Identifiability face pose recognition method based on local geometrical visual phrase description
CN103970838A (en) * 2014-04-12 2014-08-06 北京工业大学 Society image tag ordering method based on compressed domains
CN104299010A (en) * 2014-09-23 2015-01-21 深圳大学 Image description method and system based on bag-of-words model
CN107944454A (en) * 2017-11-08 2018-04-20 国网电力科学研究院武汉南瑞有限责任公司 A kind of semanteme marking method based on machine learning for substation
CN112668590A (en) * 2021-01-05 2021-04-16 瞬联软件科技(南京)有限公司 Visual phrase construction method and device based on image feature space and airspace space

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JIANG YUE, WANG RUNSHENG, WANG CHENG: "Scene Classification with Context Pyramid Features", JOURNAL OF COMPUTER-AIDED DESIGN & COMPUTER GRAPHICS, vol. 22, no. 8, 15 August 2010 (2010-08-15), pages 1366 - 1373, XP055949272 *

Also Published As

Publication number Publication date
CN112668590A (en) 2021-04-16

Similar Documents

Publication Publication Date Title
CN109002834B (en) Fine-grained image classification method based on multi-modal representation
CN112949415B (en) Image processing method, apparatus, device and medium
WO2022105115A1 (en) Question and answer pair matching method and apparatus, electronic device and storage medium
CN110060255A (en) Semantic segmentation is carried out to 2D plan view using classifier pixel-by-pixel
CN110826335B (en) Named entity identification method and device
RU2723293C1 (en) Identification of fields and tables in documents using neural networks using global document context
WO2020119053A1 (en) Picture clustering method and apparatus, storage medium and terminal device
CN112016605B (en) Target detection method based on corner alignment and boundary matching of bounding box
CN114926699B (en) Indoor three-dimensional point cloud semantic classification method, device, medium and terminal
CN104239553A (en) Entity recognition method based on Map-Reduce framework
Obaidullah et al. A system for handwritten script identification from Indian document
Andrášik et al. Efficient road geometry identification from digital vector data
WO2022148372A1 (en) Visual phrase construction method and apparatus based on image feature space and spatial-domain space
Oskouie et al. Automated recognition of building façades for creation of As-Is Mock-Up 3D models
CN114582470A (en) Model training method and device and medical image report labeling method
CN108763496A (en) A kind of sound state data fusion client segmentation algorithm based on grid and density
CN113806392A (en) Building project list data processing method, project pricing method and device
CN112528315A (en) Method and device for identifying sensitive data
Jin et al. Exploring BIM Data by Graph-based Unsupervised Learning.
Sharma et al. High‐level feature aggregation for fine‐grained architectural floor plan retrieval
CN113221956A (en) Target identification method and device based on improved multi-scale depth model
Dehbi et al. Robust and fast reconstruction of complex roofs with active sampling from 3D point clouds
Yu et al. PLSD: A perceptually accurate line segment detection approach
CN114971294A (en) Data acquisition method, device, equipment and storage medium
CN115146062A (en) Intelligent event analysis method and system fusing expert recommendation and text clustering

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22736534

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22736534

Country of ref document: EP

Kind code of ref document: A1