CN107239571B - Index construction method based on multidimensional data space technology - Google Patents

Index construction method based on multidimensional data space technology Download PDF

Info

Publication number
CN107239571B
CN107239571B CN201710506059.1A CN201710506059A CN107239571B CN 107239571 B CN107239571 B CN 107239571B CN 201710506059 A CN201710506059 A CN 201710506059A CN 107239571 B CN107239571 B CN 107239571B
Authority
CN
China
Prior art keywords
index
file
dimension
multidimensional
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710506059.1A
Other languages
Chinese (zh)
Other versions
CN107239571A (en
Inventor
孙成通
董毅
付宪瑞
王玉奎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Financial Information Technology Co Ltd
Original Assignee
Inspur Financial Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Financial Information Technology Co Ltd filed Critical Inspur Financial Information Technology Co Ltd
Priority to CN201710506059.1A priority Critical patent/CN107239571B/en
Publication of CN107239571A publication Critical patent/CN107239571A/en
Application granted granted Critical
Publication of CN107239571B publication Critical patent/CN107239571B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention relates to an index construction method based on a multidimensional data space technology, which comprises the steps of initializing a multidimensional data retrieval system, loading data files needing to be indexed, and setting multidimensional file index items, wherein the multidimensional file index items at least comprise a file type index dimension, a file attribute index dimension, a file characteristic index dimension and a file content index dimension. Based on the differentiation characteristics, the method can rapidly narrow the retrieval range in a multi-dimensional indexing mode step by step, thereby achieving the purpose of improving the retrieval speed. The method can create index data for various types of files such as pictures, audio, video, compression, encryption and the like, thereby greatly improving the retrieval speed of the contents of the files, having important application value in the fields such as fingerprint comparison, face recognition and the like, and having higher efficiency and better accuracy than the traditional method.

Description

Index construction method based on multidimensional data space technology
Technical Field
The invention relates to an index construction method based on a multidimensional data space technology, and belongs to the technical field of software.
Background
With the improvement and popularization of informatization construction of various traditional industries such as finance, security, government affairs and the like, more and more user data need to be stored and retrieved. Particularly, in some large-scale systems such as banks involving group users, the data of the relevant data carried by the large-scale systems is hundreds of millions, and how to quickly and accurately locate the valid data has become an important problem in these industries. When an index is created in a traditional way, the index is often created based on readable characters, and the maximum performance can be ensured. The index created by the program cannot lock the main features of the data, and the index is only regarded as a meaningless simple identifier, so that the index cannot effectively locate the attribute features of the data, and the improvement of the retrieval performance is not obvious.
Taking the search of fingerprint data as an example, the conventional fingerprint search technology is very competitive today when the user data is growing in a large scale, and the improvement of hardware performance alone is not enough to deal with the huge information search amount. On the one hand, the retrieval process of image type data such as fingerprints is complicated, and on the other hand, the traditional data retrieval mode is not suitable for the retrieval requirement of multimedia type data. How to improve the retrieval efficiency of a large amount of non-readable data becomes a technical problem which is urgently needed to be solved in adaptation to the rapid development of the market.
Disclosure of Invention
The technical problem to be solved by the invention is to overcome the defects of the prior art and provide an index construction method based on the index features of a multidimensional data space technology, high index efficiency, uniqueness, group isolation and the like.
In order to solve the above technical problems, the present invention provides an index construction method based on multidimensional data space technology, which includes initializing a multidimensional data retrieval system and loading data files requiring index establishment, and is characterized by further including the following steps:
firstly, setting a multidimensional file index item, wherein the multidimensional file index item at least comprises a file type index dimension, a file attribute index dimension, a file characteristic index dimension and a file content index dimension;
step two, combining all the dimensional feature indexes constructed in the step one to create a multi-dimensional index structure in a data space;
loading the multidimensional index structure into the current retrieval data space and fusing the multidimensional index structure with the current index content to form a multidimensional data space search tree;
and step four, completing the construction of the multi-dimensional index.
The invention is further defined by the following technical characteristics: the multidimensional file index item of the step one also comprises a spatial grammar index.
Furthermore, the construction of the file type index dimension is to establish a corresponding data space dimension according to the file type.
Further, the file types include a file type determined based on a file name suffix, a file type determined based on a file header, and a file type having a self-encoding characteristic.
Further, the file feature index dimension comprises a uniqueness index, an index for acquiring a file feature code by an auxiliary means, and a spatial geometry feature index.
Further, the spatial geometrical features comprise audio features and graphic texture features.
Further, the file content index dimension includes an index of visible characters and an index of non-visible characters.
Further, the non-visible characters are associated through file commonality characteristic content.
Further, the data space includes spatial coordinates, spatial geometry, and interactions between geometries.
Further, when the data file needing to establish the index is loaded into the data space, the binary data is converted into three-dimensional data, and then the index dimension is established and points to the specific coordinate in the space geometry.
The invention has the beneficial effects that: the main difference between the multidimensional fast retrieval technology and the traditional search technology is represented on the multidimensional index creation. In the process of hybrid retrieval, due to various factors such as different file types, different data contents, different data characteristics and the like, the indexing modes of various types of data have great difference. Based on the differentiation characteristics, the method can rapidly narrow the retrieval range in a multi-dimensional indexing mode step by step, thereby achieving the purpose of improving the retrieval speed. The method can create index data for various types of files such as pictures, audio, video, compression, encryption and the like, thereby greatly improving the retrieval speed of the contents of the files, having important application value in the fields such as fingerprint comparison, face recognition and the like, and having higher efficiency and better accuracy than the traditional method.
Drawings
FIG. 1 is a flow chart of index construction according to the present invention.
Detailed Description
Example 1
The index construction method based on the multidimensional data space technology provided by the embodiment is as shown in fig. 1: the method comprises initializing a multi-dimensional data retrieval system and loading data files needing to be indexed, and is characterized by further comprising the following steps:
step one, setting the index dimension of the file according to the difference of the loaded file type and content, comprising the following steps:
1) file type index dimension
Creating an index according to the file type characteristics, which mainly comprises the following steps: file type determined based on a filename suffix, file type determined based on a file header, and content characteristics specific to certain file types, among others. Taking fingerprint image files as an example, the file types can be rapidly classified based on file name suffix or file header definition, but files in png or jpg formats have own coding characteristics, and the special formats need to be identified in the process so as to select a proper content identification method in the subsequent process
2) File attribute index dimension
Including basic file attributes such as creation time, creator, file size, etc. of file
3) File feature index dimension
The characteristic index comprises a traditional unique index and also comprises a plurality of characteristic identification and index modes such as an index method for acquiring the characteristic code of the file by an auxiliary means. The multi-dimensional data features more space geometric features, and different data types have different abstract geometric features in the data space, for example, an image file has a planar feature in a time dimension, and a video file and an audio file have a stereo feature in the time dimension, and the like
4) File content index dimension
The indexes established according to the content of the files comprise indexes of visible characters and indexes of invisible characters, wherein the indexes of visible characters can create association relations among different files through artificial classification, and the invisible characters need to be associated through common content, for example, files of the same manufacturer have the same file signature and the like.
5) Spatial texture index dimension
When the file content index is not enough to meet the requirement, the retrieval range is further reduced by using the spatial texture index. Taking fingerprint data as an example, the size, feature structure and the like of a fingerprint can be used as the creation basis of a spatial texture index, and in an audio/video file, the facial features, the voice print features and the like of a character can also be used for creating the index by the method.
Step two, creating a multidimensional file index
And combining the dimensional feature indexes created in the step one to create a multidimensional index structure in a data space.
Step three, loading the multidimensional data space
And loading the multidimensional index structure into a data space used by the current retrieval.
Step four, index of configuration file
The index of the file is fused with the existing index content to form a multi-dimensional data space search tree
And step five, completing the creation of the multidimensional index.
The method utilizes the concept of multidimensional data space to carry out multidimensional induction on the index attribute of the data, and indexes are carried out among each data type one by one according to the similarity. For example, when searching fingerprint data, firstly indexing is carried out according to the size characteristics of fingerprints, the fingerprint data with different sizes are classified into a plurality of levels, and characteristic indexes are respectively created among the fingerprint data under the same level according to special attributes such as geometric characteristics, so that a hierarchical structure in a data space is formed. In addition, there are some commonalities between the picture file mainly based on fingerprint data and the picture file of other contents, and these commonalities are important features for processing the picture type index, and if the picture file is used as the dimension for indexing, the effective contents such as picture size, color depth, creation time, etc. become indexes.
Therefore, the method is based on the multi-dimensional data space technology to create the index for the binary data file, the principle of index creation lies in the common characteristic of the files of the same type, and the characteristic must accord with the uniqueness principle of the data of the type, so that other types of data can be quickly eliminated, and the retrieval range of the data is reduced at the fastest speed.
In addition to the above embodiments, the present invention may have other embodiments. All technical solutions formed by adopting equivalent substitutions or equivalent transformations fall within the protection scope of the claims of the present invention.

Claims (5)

1. The index construction method based on the multidimensional data space technology comprises the steps of initializing a multidimensional data retrieval system and loading data files needing to establish indexes, and is characterized by further comprising the following steps:
firstly, setting a multi-dimensional file index item, wherein the multi-dimensional file index item at least comprises a file type index dimension, a file attribute index dimension, a file characteristic index dimension, a file content index dimension and a spatial texture index dimension, the file type index dimension is constructed by establishing a corresponding data space dimension according to a file type, and the file type comprises a file type determined based on a file name suffix, a file type determined based on a file header and a file type with own coding characteristics; the file feature index dimension comprises a uniqueness index, an index for acquiring a file feature code by an auxiliary means and a space geometric feature index;
step two, combining all the dimensional feature indexes constructed in the step one to create a multi-dimensional index structure in a data space, wherein the data space comprises space coordinates, a space geometric structure and interaction among the geometric structures;
loading the multidimensional index structure into the current retrieval data space and fusing the multidimensional index structure with the current index content to form a multidimensional data space search tree;
and step four, completing the construction of the multi-dimensional index.
2. The index building method based on multidimensional data space technology as claimed in claim 1, wherein: the spatial geometry features include audio features and graphical texture features.
3. The index building method based on multidimensional data space technology as claimed in claim 1, wherein: the file content index dimension includes an index of visible characters and an index of non-visible characters.
4. The index building method based on multidimensional data space technology as claimed in claim 3, wherein: the non-visible characters are associated through file commonality characteristic content.
5. The index building method based on multidimensional data space technology according to any one of claims 1 to 4, wherein: when a data file needing to establish an index is loaded into a data space, the binary data is converted into three-dimensional data, then an index dimension is established, and the index dimension points to a specific coordinate in a space geometric body.
CN201710506059.1A 2017-06-28 2017-06-28 Index construction method based on multidimensional data space technology Active CN107239571B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710506059.1A CN107239571B (en) 2017-06-28 2017-06-28 Index construction method based on multidimensional data space technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710506059.1A CN107239571B (en) 2017-06-28 2017-06-28 Index construction method based on multidimensional data space technology

Publications (2)

Publication Number Publication Date
CN107239571A CN107239571A (en) 2017-10-10
CN107239571B true CN107239571B (en) 2021-04-09

Family

ID=59989989

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710506059.1A Active CN107239571B (en) 2017-06-28 2017-06-28 Index construction method based on multidimensional data space technology

Country Status (1)

Country Link
CN (1) CN107239571B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108154024B (en) * 2017-12-20 2020-07-28 北京奇艺世纪科技有限公司 Data retrieval method and device and electronic equipment
CN109144962A (en) * 2018-08-31 2019-01-04 北京诚志重科海图科技有限公司 A kind of date storage method, querying method, storage device and inquiry unit
CN110162645A (en) * 2019-05-28 2019-08-23 广东三维家信息科技有限公司 Image search method, device and electronic equipment based on index
CN115756552B (en) * 2023-01-06 2023-04-28 山东矩阵软件工程股份有限公司 Application system function self-configuration method, system and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510222A (en) * 2009-02-20 2009-08-19 北京大学 Multilayer index voice document searching method and system thereof
CN102708148A (en) * 2012-03-31 2012-10-03 深圳祥云信息科技有限公司 Duplication eliminating method based on multidimensional lattice data spatial model
CN103377237A (en) * 2012-04-27 2013-10-30 常州市图佳网络科技有限公司 High dimensional data neighbor search method and fast approximate image search method
CN103593363A (en) * 2012-08-15 2014-02-19 中国科学院声学研究所 Video content indexing structure building method and video searching method and device
CN105574212A (en) * 2016-02-24 2016-05-11 北京大学 Image retrieval method for multi-index disk Hash structure
CN106095951A (en) * 2016-06-13 2016-11-09 哈尔滨工程大学 Data space multi-dimensional indexing method based on load balancing and inquiry log
CN106503092A (en) * 2016-10-13 2017-03-15 浪潮(苏州)金融技术服务有限公司 A kind of method using multidimensional technique construction Spatial Multi-Dimensional degree search tree

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070130112A1 (en) * 2005-06-30 2007-06-07 Intelligentek Corp. Multimedia conceptual search system and associated search method
CN102270201B (en) * 2010-06-01 2013-07-17 富士通株式会社 Multi-dimensional indexing method and device for network files
KR101764615B1 (en) * 2015-04-13 2017-08-03 숭실대학교산학협력단 Spatial knowledge extractor and extraction method
CN105808747A (en) * 2016-03-14 2016-07-27 浪潮(苏州)金融技术服务有限公司 Method for quickly searching and comparing fingerprint data by using multidimensional technology

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510222A (en) * 2009-02-20 2009-08-19 北京大学 Multilayer index voice document searching method and system thereof
CN102708148A (en) * 2012-03-31 2012-10-03 深圳祥云信息科技有限公司 Duplication eliminating method based on multidimensional lattice data spatial model
CN103377237A (en) * 2012-04-27 2013-10-30 常州市图佳网络科技有限公司 High dimensional data neighbor search method and fast approximate image search method
CN103593363A (en) * 2012-08-15 2014-02-19 中国科学院声学研究所 Video content indexing structure building method and video searching method and device
CN105574212A (en) * 2016-02-24 2016-05-11 北京大学 Image retrieval method for multi-index disk Hash structure
CN106095951A (en) * 2016-06-13 2016-11-09 哈尔滨工程大学 Data space multi-dimensional indexing method based on load balancing and inquiry log
CN106503092A (en) * 2016-10-13 2017-03-15 浪潮(苏州)金融技术服务有限公司 A kind of method using multidimensional technique construction Spatial Multi-Dimensional degree search tree

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《Multimedia medical case retrieval using decision trees》;Gwenole Quellec等;《2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society》;20071022;全文 *

Also Published As

Publication number Publication date
CN107239571A (en) 2017-10-10

Similar Documents

Publication Publication Date Title
CN107239571B (en) Index construction method based on multidimensional data space technology
US11048966B2 (en) Method and device for comparing similarities of high dimensional features of images
Liu et al. Contextual hashing for large-scale image search
US7245762B2 (en) Color image processing method
US7469257B2 (en) Generating and monitoring a multimedia database
Dong et al. A hierarchical distributed processing framework for big image data
Valle et al. High-dimensional descriptor indexing for large multimedia databases
Fonseca et al. Towards content-based retrieval of technical drawings through high-dimensional indexing
CN113515656A (en) Multi-view target identification and retrieval method and device based on incremental learning
CN105488471B (en) A kind of font recognition methods and device
CN110968723B (en) Image characteristic value searching method and device and electronic equipment
Mahmoudi et al. A probabilistic approach for 3D shape retrieval by characteristic views
Munarko et al. HII: Histogram Inverted Index for Fast Images Retrieval.
Pant Performance comparison of spatial indexing structures for different query types
Valle et al. Indexing personal image collections: a flexible, scalable solution
CN114529933A (en) Contract data difference comparison method, device, equipment and medium
WO2010089403A4 (en) Two-valued logic database management system with support for missing information
Histograms Bi-level classification of color indexed image histograms for content based image retrieval
Arpitha et al. A Navigation Supporting System Using R-Tree
CN111949839B (en) Data association method, electronic device and medium
Aiger et al. Yes, we CANN: Constrained Approximate Nearest Neighbors for local feature-based visual localization
Wu et al. A novel image retrieval approach with Bag-of-Word model and Gabor feature
Deniziak et al. Content Based Image Retrieval Using Modified Scalable Distributed Two-Layer Data Structure.
Zhou et al. 3d model retrieval based on distance classification histogram
Varish et al. Integration of statistical parameters-based colour-texture descriptors for radar remote sensing image retrieval applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant