WO2023078130A1 - Index creation method and apparatus, and computer-readable storage medium - Google Patents

Index creation method and apparatus, and computer-readable storage medium Download PDF

Info

Publication number
WO2023078130A1
WO2023078130A1 PCT/CN2022/127445 CN2022127445W WO2023078130A1 WO 2023078130 A1 WO2023078130 A1 WO 2023078130A1 CN 2022127445 W CN2022127445 W CN 2022127445W WO 2023078130 A1 WO2023078130 A1 WO 2023078130A1
Authority
WO
WIPO (PCT)
Prior art keywords
index
fields
frequency
data table
field
Prior art date
Application number
PCT/CN2022/127445
Other languages
French (fr)
Chinese (zh)
Inventor
魏铮
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2023078130A1 publication Critical patent/WO2023078130A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Provided are an index creation method and apparatus, and a computer-readable storage medium. The method comprises: acquiring an application algorithm set (100); according to the application algorithm set, obtaining the frequency of use of a data table and the frequency of a field use mode (200); and creating an index according to the frequency of use of the data table and the frequency of the field use mode (300).

Description

索引创建方法、装置和计算机可读存储介质Index creation method, device and computer-readable storage medium
相关申请的交叉引用Cross References to Related Applications
本申请基于申请号为202111294583.X、申请日为2021年11月03日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application is based on a Chinese patent application with application number 202111294583.X and a filing date of November 03, 2021, and claims the priority of this Chinese patent application. The entire content of this Chinese patent application is hereby incorporated by reference into this application.
技术领域technical field
本申请涉及信息化技术领域,尤其涉及一种索引创建方法、装置和计算机可读存储介质。The present application relates to the field of information technology, and in particular to an index creation method, device and computer-readable storage medium.
背景技术Background technique
在关系数据库中,索引是一种单独的、物理的对数据库的表中一列或多列的值进行排序的一种存储结构,它是某个表中一列或若干列值的集合和相应的指向表中物理标识这些值的数据页的逻辑指针清单。索引的作用相当于图书的目录,可以根据目录中的页码快速找到所需的内容。In a relational database, an index is a separate, physical storage structure that sorts the values of one or more columns in a database table. It is a collection of one or several column values in a table and the corresponding pointers A list of logical pointers to the data pages in the table that physically identify these values. The function of the index is equivalent to the table of contents of the book, and the required content can be quickly found according to the page number in the table of contents.
相关技术中的创建索引的方法,需要由业务开发人员根据过往的经验来判断哪些字段适合用来创建索引。然而,这种创建索引的方式,可能会造成索引字段选取不合适,进而导致数据访问性能的降低。In the method of creating an index in the related art, business developers need to judge which fields are suitable for creating an index based on past experience. However, this way of creating an index may result in inappropriate selection of index fields, which in turn leads to a decrease in data access performance.
发明内容Contents of the invention
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。The following is an overview of the topics described in detail in this article. This summary is not intended to limit the scope of the claims.
本申请实施例的主要目的在于提出一种索引创建方法、装置和计算机可读存储介质。The main purpose of the embodiments of the present application is to provide an index creation method, device and computer-readable storage medium.
第一方面,本申请实施例提供了一种索引创建方法,所述方法包括:获取应用算法集合;根据所述应用算法集合得到数据表使用频度和字段使用方式频度;以及根据所述数据表使用频度和所述字段使用方式频度创建索引。In the first aspect, the embodiment of the present application provides an index creation method, the method includes: obtaining a set of application algorithms; obtaining the frequency of data table usage and the frequency of field usage methods according to the set of application algorithms; and according to the data Indexes are created on how often the table is used and how often the fields are used.
第二方面,本申请实施例提供了一种索引创建装置,所述索引创建装置包括存储器、处理器、存储在所述存储器上并可在所述处理器上运行的程序以及用于实现所述处理器和所述存储器之间的连接通信的数据总线,所述程序被所述处理器执行时实现如上述第一方面所述的索引创建方法。In the second aspect, an embodiment of the present application provides an index creation device, the index creation device includes a memory, a processor, a program stored in the memory and executable on the processor, and used to implement the A data bus connecting and communicating between the processor and the memory, when the program is executed by the processor, implements the index creation method as described in the first aspect above.
第三方面,本申请提供了一种计算机可读存储介质,所述计算机可读存储介质存储有一个或者多个程序,所述一个或者多个程序可被一个或者多个处理器执行,以实现上述第一方面的索引创建方法。In a third aspect, the present application provides a computer-readable storage medium, the computer-readable storage medium stores one or more programs, and the one or more programs can be executed by one or more processors to realize The index creation method of the first aspect above.
本申请的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本申请而了解。本申请的目的和其他优点可通过在说明书、权利要求书以及附图中所特别指出的结构来实现和获得。Additional features and advantages of the application will be set forth in the description which follows, and, in part, will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
附图说明Description of drawings
图1是本申请一实施例提供的索引创建方法的流程图;FIG. 1 is a flowchart of an index creation method provided by an embodiment of the present application;
图2是本申请一实施例提供的频度统计的流程图;Fig. 2 is the flowchart of frequency statistics provided by an embodiment of the present application;
图3是本申请一实施例提供的数据获取的流程图;Fig. 3 is a flow chart of data acquisition provided by an embodiment of the present application;
图4是本申请一实施例提供的数据筛选的流程图;FIG. 4 is a flow chart of data screening provided by an embodiment of the present application;
图5是本申请一实施例提供的字段获取的流程图;FIG. 5 is a flow chart of field acquisition provided by an embodiment of the present application;
图6是本申请一实施例提供的获取索引字段的流程图;以及FIG. 6 is a flow chart of obtaining an index field provided by an embodiment of the present application; and
图7是本申请一实施例提供的索引创建方法的另一流程图。FIG. 7 is another flowchart of an index creation method provided by an embodiment of the present application.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, not to limit the present application.
需要说明的是,虽然在系统架构示意图中进行了功能模块划分,在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于装置中的模块划分,或流程图中的顺序执行所示出或描述的步骤。It should be noted that although the functional modules are divided in the schematic diagram of the system architecture and the logical order is shown in the flow chart, in some cases, the division of modules in the device or the order in the flow chart may be different. Perform the steps shown or described.
本申请提供了一种索引创建方法、装置和计算机可读存储介质,首先获取应用算法集合,应用算法集合能够对多个数据集合进行访问和获取,在每个数据集合中又包括有多个不同的字段,再通过应用算法能够获取到每一个字段对应的使用方式,因此能够进一步地得到数据表使用频度和字段使用方式频度,并创建索引。通过这两个类型的频度,能够得知当前检索过程出现的高频词,最终建立的索引能够适用于当前的数据,提高检索的效率,提升用户对于数据使用的体验,提高数据访问的性能。The present application provides an index creation method, device and computer-readable storage medium. Firstly, an application algorithm set is obtained. The application algorithm set can access and acquire multiple data sets, and each data set includes multiple different Fields, and then the usage method corresponding to each field can be obtained by applying the algorithm, so the usage frequency of the data table and the usage frequency of the field can be further obtained, and an index can be created. Through these two types of frequencies, we can know the high-frequency words that appear in the current retrieval process, and the final index can be applied to the current data, improve the efficiency of retrieval, improve the user's experience in using data, and improve the performance of data access .
下面结合附图,对本申请实施例作进一步阐述。The embodiments of the present application will be further described below in conjunction with the accompanying drawings.
如图1所示,图1是本申请一实施例提供的索引创建方法的流程图。可以理解的是,本申请提出了一种索引创建方法,该方法包括但不限于有步骤S100,步骤S200以及步骤S300。As shown in FIG. 1 , FIG. 1 is a flowchart of an index creation method provided by an embodiment of the present application. It can be understood that the present application proposes a method for creating an index, which includes but is not limited to step S100, step S200 and step S300.
步骤S100,获取应用算法集合。Step S100, acquiring a set of application algorithms.
可以理解的是,应用算法集合指的是当前在检索过程中用到或者可能会用到的所有算 法的集合。当用户需要使用某一类型的数据库,或者某类型的大数据组件来进行海量数据的清洗和处理时,可以将该数据库或大数据组件中所有算法的集合作为应用算法集合,并将该应用算法集合通过本申请的索引创建方法进行索引的创建。It can be understood that the set of applied algorithms refers to the set of all algorithms that are currently used or may be used in the retrieval process. When a user needs to use a certain type of database or a certain type of big data component to clean and process massive data, the set of all algorithms in the database or big data component can be used as a set of application algorithms, and the application algorithm An index is created for a collection through the index creation method of this application.
可以理解的是,在本申请提供的实施例中,使用应用程序中多个SQL(Structured Query Language,结构化查询语言)算法的集合作为应用算法集合,以此来举例说明本索引创建方法的具体流程。在对本申请提出的索引创建方法的运用过程中,可以使用SQL算法的集合作为应用算法集合,也可以使用其他类型的算法集合作为应用算法集合,又或者使用其他类型的数据库及大数据组件中的算法集合来作为应用算法集合,本申请不对索引创建方法针对的对象作具体限定。It can be understood that, in the embodiments provided by this application, a set of multiple SQL (Structured Query Language, Structured Query Language) algorithms in the application program is used as the set of application algorithms to illustrate the specifics of the index creation method. process. In the process of using the index creation method proposed in this application, a set of SQL algorithms can be used as a set of application algorithms, or a set of other types of algorithms can be used as a set of application algorithms, or other types of databases and big data components can be used. The algorithm set is used as the application algorithm set, and this application does not specifically limit the objects targeted by the index creation method.
步骤S200,根据应用算法集合得到数据表使用频度和字段使用方式频度。In step S200, the usage frequency of the data table and the frequency of the field usage mode are obtained according to the application algorithm set.
可以理解的是,在应用算法集合中会使用多个初始数据表,每一个初始数据表中均包括有多个字段,通过应用算法能够获取到初始数据表中被使用的字段以及字段的使用方式。对应用算法集合中的内容进行统计,能够得到数据表使用频度和字段使用方式频度。频度指的是频度相应的对象在应用算法集合中出现过的次数。通过对应用算法集合解析后的内容进行统计,得到数据表使用频度和字段使用方式频度之后,能够根据当前各个参数的频度来得出当前在实际工程应用的过程中,哪些内容的出现频度较高。It is understandable that multiple initial data tables will be used in the application algorithm set, and each initial data table includes multiple fields, and the fields used in the initial data table and how the fields are used can be obtained through the application algorithm . By making statistics on the content in the application algorithm set, the usage frequency of the data table and the frequency of the field usage can be obtained. Frequency refers to the number of times the object corresponding to the frequency appears in the application algorithm set. By counting the content analyzed by the application algorithm set, after obtaining the frequency of data table usage and the frequency of field usage, it can be obtained according to the current frequency of each parameter, which content appears frequently in the process of actual engineering application higher degree.
步骤S300,根据数据表使用频度和字段使用方式频度创建索引。Step S300, creating an index according to the usage frequency of the data table and the usage frequency of the fields.
可以理解的是,在得到数据表使用频度和字段使用方式频度之后,会对这两类频度进行统计分析,并根据分析结果来创建索引。在得到这两类频度之后,确定当前哪些数据表、字段和使用方式对应的频度是满足预设条件的,就能够根据这些频度较高的内容进行索引的创建。通过该索引创建方法建立的索引,充分结合了当前在对大数据的处理过程中的检索需求,能够提升用户对于数据使用的体验,提高数据访问的性能。It is understandable that after obtaining the data table usage frequency and field usage frequency, statistical analysis will be performed on these two types of frequencies, and an index will be created according to the analysis results. After obtaining these two types of frequencies, determine which current data tables, fields, and usage modes correspond to the frequencies that meet the preset conditions, and then create indexes based on these high-frequency contents. The index established by the index creation method fully combines the current retrieval requirements in the process of processing big data, which can improve the user's experience in using data and improve the performance of data access.
需要说明的是,本申请提出的数据表使用频度,指的是在应用算法集合里使用的多个数据表中,每一个数据表的使用频度。在每一个数据表中还存在有不同类型的字段,多个字段在应用算法中使用的方式又不尽相同,因此,在数据表中能够对字段以及字段对应的使用方式进行统计,从而得到字段使用频度和字段使用方式频度。再根据各自的使用频度,来确定哪些信息适合用于创建索引,进而实现数据访问性能的提高。It should be noted that the frequency of use of data tables proposed in this application refers to the frequency of use of each data table among multiple data tables used in the application algorithm set. There are also different types of fields in each data table, and multiple fields are used in different ways in the application algorithm. Therefore, in the data table, the fields and the corresponding usage methods of the fields can be counted to obtain the field Frequency of use and How often the field is used. Then determine which information is suitable for index creation according to their frequency of use, thereby improving data access performance.
如图2所示,图2是本申请一实施例提供的频度统计的流程图。可以理解的是,在图1所示的实施例中的步骤S200,包括但不限于有步骤S210,步骤S220,步骤S230以及步骤S240。As shown in FIG. 2 , FIG. 2 is a flow chart of frequency statistics provided by an embodiment of the present application. It can be understood that step S200 in the embodiment shown in FIG. 1 includes but not limited to step S210, step S220, step S230 and step S240.
步骤S210,根据应用算法集合得到多个初始数据表以及与初始数据表对应的数据表使用频度。In step S210, a plurality of initial data tables and usage frequencies of the data tables corresponding to the initial data tables are obtained according to the application algorithm set.
可以理解的是,在得到应用算法集合之后,要先根据应用算法集合得到多个初始数据表。在对这些初始数据表进行统计之后,能够得出每一个初始数据表对应的数据表使用频度。It can be understood that after obtaining the application algorithm set, multiple initial data tables must be obtained according to the application algorithm set. After counting these initial data tables, the usage frequency of the data table corresponding to each initial data table can be obtained.
需要说明的是,在本申请中使用SQL算法集合作为应用算法集合时,根据SQL算法集合得到的初始数据表为数据表。由于在对SQL算法进行实际运用的过程中,会将算法中的字段根据预先设定的类别,存储至不同的数据表之中。同时,通过SQL算法对各个数据表的调用关系,来体现出属于不同级别的数据表,表级的概念相当于书籍目录中的一级目录与多级目录之间的区别。具体的,上述提到的数据表使用频度,指的是对在SQL算法运行的过程中,对每一个数据表的访问频次。每个数据表中都存储有多个字段,同时记录有通过SQL算法获取到的每个字段对应的使用方式。在本申请中通过SQL算法集合以及数据表的存储形式,来简单地对索引创建方法进行描述,但SQL算法集合以及数据表的记录方式并不构成对本申请实施方式的限定。It should be noted that when the SQL algorithm set is used as the application algorithm set in this application, the initial data table obtained according to the SQL algorithm set is the data table. During the actual application of the SQL algorithm, the fields in the algorithm will be stored in different data tables according to the preset categories. At the same time, the calling relationship of each data table by the SQL algorithm is used to reflect the data tables belonging to different levels. The concept of table level is equivalent to the difference between the first-level catalog and the multi-level catalog in the book catalog. Specifically, the frequency of use of the data table mentioned above refers to the frequency of access to each data table during the running of the SQL algorithm. Multiple fields are stored in each data table, and the usage method corresponding to each field obtained through the SQL algorithm is recorded at the same time. In this application, the index creation method is briefly described through the SQL algorithm set and the storage form of the data table, but the SQL algorithm set and the recording method of the data table do not constitute a limitation to the implementation of the application.
需要说明的是,在数据表中存储有多个字段及对应的使用方式,其中,字段来源于需要进行索引创建的原始数据,在原始数据中包括有多个字段,再通过SQL算法来对原始数据进行处理,才能够得知每一个字段对应的使用方式,最终将多个字段及其对应的使用方式存储至数据表中,以便后续对字段使用方式频度等进行统计,以决定应该使用哪些信息来作为检索字段,从而实现数据访问性能的提高。It should be noted that there are multiple fields and corresponding usage methods stored in the data table. Among them, the fields come from the original data that needs to be indexed. There are multiple fields in the original data, and then the original data is processed by the SQL algorithm. Only by processing the data can we know the corresponding usage method of each field, and finally store multiple fields and their corresponding usage methods in the data table, so that the subsequent statistics on the frequency of field usage methods can be used to determine which ones should be used Information is used as a retrieval field, thereby improving the performance of data access.
需要说明的是,在初始数据表中存储有多个字段及对应的使用方式,其中,字段来源于需要进行索引创建的原始数据,在原始数据中包括有多个字段,再通过SQL算法来对原始数据进行处理,才能够得知每一个字段对应的使用方式,最终将多个字段及其对应的使用方式存储至初始数据表中,以便后续对初始数据表进行筛选而得到候选数据表。It should be noted that there are multiple fields and corresponding usage methods stored in the initial data table. Among them, the fields come from the original data that needs to be indexed. The original data includes multiple fields, and then the SQL algorithm is used to Only by processing the original data can we know the corresponding use method of each field, and finally store multiple fields and their corresponding use methods in the initial data table, so that the initial data table can be screened later to obtain candidate data tables.
步骤S220,对多个初始数据表进行筛选,得到候选数据表。Step S220, screening multiple initial data tables to obtain candidate data tables.
可以理解的是,在得到多个初始数据表之后,需要对初始数据表进行筛选,并得到候选数据表。由于在实际根据索引进行检索的过程中,对不同数据的检索需求大小也不尽相同,且检索需求是根据应用算法集合中的算法内容来决定的。因此,需要先根据实际的检索需求对初始数据表进行筛选,得到满足检索要求的初始数据表,即得到候选数据表,从而对候选数据表中的字段以及字段对应的使用方式进行统计,最终进行索引的创建,实现数据访问性能的提高。It can be understood that after obtaining multiple initial data tables, the initial data tables need to be screened to obtain candidate data tables. In the process of actually searching based on the index, the retrieval requirements for different data are also different, and the retrieval requirements are determined according to the algorithm content in the application algorithm set. Therefore, it is necessary to screen the initial data table according to the actual retrieval requirements to obtain the initial data table that meets the retrieval requirements, that is, to obtain the candidate data table, so as to make statistics on the fields in the candidate data table and the usage methods corresponding to the fields, and finally perform Create indexes to improve data access performance.
需要说明的是,在对初始数据表进行筛选并得到候选数据表的过程中,筛选指的是判断每一个初始数据表是否满足第一预设条件。具体的,第一预设条件为初始数据表对应的数据表使用频度不小于第一预设值。当初始数据表对应的数据表使用频度大于或等于第一预设值时,才能够认为这一初始数据表满足第一预设条件,并将该初始数据表标记为候选数据表;若初始数据表对应的数据表使用频度小于第一预设值,则不会将该初始数据表标记为候选数据表,并接着对其他的初始数据表进行判断,直至将所有的初始数据表判断完成。It should be noted that, in the process of screening the initial data tables and obtaining candidate data tables, screening refers to judging whether each initial data table satisfies the first preset condition. Specifically, the first preset condition is that the usage frequency of the data table corresponding to the initial data table is not less than the first preset value. When the usage frequency of the data table corresponding to the initial data table is greater than or equal to the first preset value, it can be considered that the initial data table satisfies the first preset condition, and the initial data table is marked as a candidate data table; if the initial If the usage frequency of the data table corresponding to the data table is less than the first preset value, the initial data table will not be marked as a candidate data table, and then other initial data tables will be judged until all the initial data tables are judged .
需要说明的是,第一预设值可以是根据实际需求预先设置的一个阈值,也可以是预先设定的算法,通过算法计算得到的一个值。在判断初始数据表是否满足第一预设条件的过程中,只需实现根据实际的检索需求来对初始数据表进行分类即可,本申请不对第一预设值的获得方式作具体限定。It should be noted that the first preset value may be a preset threshold according to actual needs, or may be a preset algorithm and a value obtained through algorithm calculation. In the process of judging whether the initial data table satisfies the first preset condition, it is only necessary to classify the initial data table according to the actual retrieval requirement, and the present application does not specifically limit the method of obtaining the first preset value.
步骤S230,对候选数据表进行扫描,得到多个字段、字段对应的使用方式以及字段使用方式频度。In step S230, the candidate data table is scanned to obtain a plurality of fields, usage modes corresponding to the fields, and frequency of usage modes of the fields.
可以理解的是,候选数据表中存储有多个字段以及字段对应的使用方式,在得到候选数据表之后,需要对候选数据表进行扫描,得到这些字段以及对应的使用方式。需要说明的是,上述提到的字段的使用方式,指的是在应用算法集合中针对字段进行的操作,包括但不限于有输出字段、将字段作为过滤条件、将字段用于关联以及将字段作为聚合条件。在实际算法运算的过程中,针对字段可能会出现多种使用方式,但不是所有的使用方式都能够用于判断字段是否适用于作为检索字段,例如,只是简单的输出功能是不足以作为判断条件的。用于判断用途的,包括但不限于有过滤字段、关联字段以及聚合字段,统计数据表中每个字段作为算法中过滤、聚合或关联的总体使用频度,从而进行索引字段的选取,以提高数据访问性能。It can be understood that the candidate data table stores a plurality of fields and corresponding usage methods of the fields. After the candidate data table is obtained, the candidate data table needs to be scanned to obtain these fields and corresponding usage methods. It should be noted that the usage of fields mentioned above refers to the operations performed on fields in the application algorithm set, including but not limited to having output fields, using fields as filter conditions, using fields for association, and using fields as the aggregation condition. In the process of actual algorithm operation, there may be a variety of usage methods for fields, but not all usage methods can be used to judge whether a field is suitable as a search field, for example, a simple output function is not enough as a judgment condition of. For judging purposes, including but not limited to filter fields, correlation fields, and aggregation fields, each field in the statistical data table is used as the overall frequency of filtering, aggregation, or association in the algorithm, so as to select index fields to improve Data access performance.
需要说明的是,在候选数据表中存储有多个字段及对应的使用方式,其中,字段来源于需要进行索引创建的原始数据,在原始数据中包括有多个字段,再通过SQL算法来对原始数据进行处理,才能够得知每一个字段对应的使用方式,最终将多个字段及其对应的使用方式存储至初始数据表中,并对初始数据表进行筛选而得到候选数据表,以便后续对字段使用方式频度等进行统计,从而确定应使用哪些信息作为检索字段。It should be noted that there are multiple fields and corresponding usage methods stored in the candidate data table. Among them, the fields come from the original data that needs to be indexed. The original data includes multiple fields, and then the SQL algorithm is used to Only by processing the original data can we know the corresponding use method of each field, and finally store multiple fields and their corresponding use methods in the initial data table, and filter the initial data table to obtain candidate data tables for subsequent Make statistics on the usage frequency of fields, etc., so as to determine which information should be used as search fields.
可以理解的是,在得到字段以及字段对应的使用方式之后,需要对这些信息进行统计,才能够得到字段使用方式频度。在得到每个字段对应的使用方式的频度之后,再根据频度的大小等方式来决定选用哪些信息来创建索引。It is understandable that, after obtaining the fields and their corresponding usage methods, it is necessary to make statistics on these information in order to obtain the frequency of the usage methods of the fields. After obtaining the frequency of the usage method corresponding to each field, determine which information to use to create the index according to the frequency and other methods.
如图3所示,图3是本申请一实施例提供的数据获取的流程图。可以理解的是,在图2所示的实施例中的步骤S230,包括但不限于有步骤S231和步骤S232。As shown in FIG. 3 , FIG. 3 is a flow chart of data acquisition provided by an embodiment of the present application. It can be understood that step S230 in the embodiment shown in FIG. 2 includes but not limited to step S231 and step S232.
步骤S231,将候选数据表添加至待创建索引数据表集合。Step S231, adding candidate data tables to the set of index data tables to be created.
步骤S232,对待创建索引数据表集合中的候选数据表进行扫描,得到字段、字段对应的使用方式以及字段使用方式频度。Step S232, scan the candidate data tables in the set of index data tables to be created, and obtain the fields, the usage modes corresponding to the fields, and the usage frequency of the fields.
可以理解的是,在判断每一个初始数据表是否满足第一预设条件,并将满足第一预设条件的初始数据表标记为候选数据表之后,执行步骤S231,将候选数据表添加至待创建索引数据表集合。在将所有候选数据表添加进待创建索引数据表集合之后,执行步骤S232,对待创建索引数据表集合中的每一个初始数据表(即候选数据表)进行扫描,得到字段以及与每一个字段对应的使用方式。在得到字段、字段对应的使用方式以及字段使用方式频度之后,才能够对这些信息进行统计,从而进行索引的创建。It can be understood that after judging whether each initial data table satisfies the first preset condition, and marking the initial data table satisfying the first preset condition as a candidate data table, step S231 is executed to add the candidate data table to the pending Create a collection of indexed data tables. After all the candidate data tables are added to the set of index data tables to be created, step S232 is performed to scan each initial data table (i.e. candidate data table) in the set of index data tables to be created to obtain the fields and the fields corresponding to each field way of using. Only after obtaining the field, the usage method corresponding to the field, and the frequency of the field usage method, can the statistics be made on this information, so as to create the index.
需要说明的是,在本申请提出的索引创建方法中,对候选数据表的扫描并不局限于“先将候选数据表加入待创建索引数据表集合后,再统一对待创建索引数据表集合中的初始数据表进行扫描”,还可以在每确定一个初始数据表满足第一预设条件之后,在将该初始数据表标记为候选数据表,并加入待创建索引数据表集合后,立即对该候选数据表进行扫描,在扫描完该候选数据表并得到其中的字段以及与字段对应的使用方式之后,再对剩余的初始数据表进行逐一确认,陆续判断剩余的初始数据表是否满足第一预设条件,并对满足第一预设条件的初始数据表进行相应处理。在上述方法中,只需实现对初始数据表的判断,然后将满足条件的初始数据表添加至待创建索引数据表集合并进行扫描,从而得到相应的字段以及对应的使用方式即可。两种技术方案均能够实现对候选数据表的扫描,本申请不对这一步骤的执行顺序作具体限定。It should be noted that in the index creation method proposed in this application, the scanning of the candidate data tables is not limited to "first add the candidate data tables to the set of index data tables to be created, and then uniformly treat the scan the initial data table", and after each initial data table is determined to meet the first preset condition, after marking the initial data table as a candidate data table and adding it to the set of index data tables to be created, the candidate The data table is scanned, and after the candidate data table is scanned and the fields and usage methods corresponding to the fields are obtained, the remaining initial data tables are confirmed one by one, and successively judge whether the remaining initial data tables meet the first preset condition, and correspondingly process the initial data table satisfying the first preset condition. In the above method, it is only necessary to realize the judgment of the initial data table, and then add the initial data table that satisfies the conditions to the set of index data tables to be created and scan to obtain the corresponding fields and corresponding usage methods. Both technical solutions can realize the scanning of the candidate data table, and this application does not specifically limit the execution sequence of this step.
如图4所示,图4是本申请一实施例提供的数据筛选的流程图。可以理解的是,图1所示的实施例中的步骤S300,包括但不限于有步骤S310和步骤S320。As shown in FIG. 4 , FIG. 4 is a flow chart of data screening provided by an embodiment of the present application. It can be understood that step S300 in the embodiment shown in FIG. 1 includes but not limited to step S310 and step S320.
步骤S310,对数据表使用频度和字段使用方式频度进行筛选,得到索引字段。Step S310, filter the data table use frequency and field use mode frequency to obtain index fields.
步骤S320,根据索引字段来创建索引。Step S320, creating an index according to the index field.
可以理解的是,在现有技术中的索引创建方法,通常是由业务开发人员根据过往的经验来判断应该使用哪些字段作为索引字段来创建索引。而业务开发人员的经验并不是确保准确的,因此,当数据访问的频度较低时,对这份应用数据进行索引的创建后,该应用数据的读写检索性能可能与未创建索引时的性能差别不大。若选择的索引字段不是访问这一应用数据所需的关键词,即没有选取到合适的索引字段来创建索引,这样不仅对数据进行 检索访问时没有明显的效益,甚至还增加了创建索引这一资源开销。It can be understood that, in the index creation method in the prior art, business developers usually judge which fields should be used as index fields to create indexes based on past experience. The experience of business developers does not guarantee accuracy. Therefore, when the frequency of data access is low, after creating an index for this application data, the read, write and retrieval performance of the application data may be different from that when no index is created. There is not much difference in performance. If the selected index field is not the keyword required to access the application data, that is, no suitable index field is selected to create the index, which not only has no obvious benefit when searching and accessing the data, but also increases the cost of creating the index. resource overhead.
因此,在本申请提出的索引创建方法中,在得到数据表使用频度和字段使用方式频度之后,需要执行步骤S310,对数据表使用频度和字段使用方式频度进行筛选,得到索引字段,然后再执行步骤S320,根据索引字段来进行索引的创建。由于选取的索引字段是在应用算法集合中出现频度较高的数据表、字段以及与字段对应的使用方式,即,这部分频度较高的信息是在对数据进行访问检索的过程中出现的次数较多,因此,在得知哪些字段及其对应的使用方式出现的频度较高之后,使用这些信息来作为索引字段,能够根据实际的应用需求来进行索引的创建,从而提高数据访问的性能,同时也降低了用户开发的成本和门槛。Therefore, in the index creation method proposed in this application, after obtaining the data table usage frequency and field usage frequency, step S310 needs to be performed to filter the data table usage frequency and field usage frequency to obtain the index field , and then execute step S320 to create an index according to the index field. Since the selected index fields are data tables, fields and corresponding usage methods that appear frequently in the application algorithm set, that is, this part of information with high frequency appears during the process of accessing and retrieving data Therefore, after knowing which fields and their corresponding usage methods appear frequently, using this information as index fields can create indexes according to actual application requirements, thereby improving data access performance, but also reduces the cost and threshold of user development.
如图5所示,图5是本申请一实施例提供的字段获取的流程图。可以理解的是,图4所示的实施例中的步骤S310,包括但不限于有步骤S311和步骤S312。As shown in FIG. 5 , FIG. 5 is a flow chart of field acquisition provided by an embodiment of the present application. It can be understood that step S310 in the embodiment shown in FIG. 4 includes but not limited to step S311 and step S312.
步骤S311,将满足第二预设条件的字段添加至待创建索引字段集合,第二预设条件为字段对应的字段使用方式频度不小于第二预设值。Step S311 , adding fields satisfying a second preset condition to the set of index fields to be created, where the second preset condition is that the field usage frequency corresponding to the field is not less than a second preset value.
步骤S312,根据待创建索引数据表集合和待创建索引字段集合得到索引字段。Step S312, obtaining index fields according to the set of index data tables to be created and the set of index fields to be created.
可以理解的是,在得到数据表使用频度、字段使用频度以及字段使用方式频度之后,需要先执行步骤S311,判断每一个候选数据表中的每一个字段是否满足第二预设条件,并将满足第二预设条件的字段添加至待创建索引字段集合之后,再执行步骤S312,根据待创建索引数据表集合和待创建索引字段集合,得到索引字段,从而进行索引的创建。具体的,第二预设条件为字段对应的字段使用方式频度不小于第二预设值。当字段使用频度大于或等于第二预设值时,才能够认为这一字段满足第二预设条件,说明这个字段在进行大数据检索访问的过程中是经常会用到的一个字段,因此需要将该字段添加至待创建索引字段集合中;若该字段对应的字段使用频度小于第二预设值,则说明这一字段在进行大数据检索访问的过程中出现的次数较少,因此没有必要将该字段添加至待创建索引字段集合,需要接着对其他字段进行判断,直至将所有满足第二预设条件的字段添加至待创建索引字段集合中。It can be understood that after obtaining the data table usage frequency, field usage frequency, and field usage mode frequency, step S311 needs to be executed first to determine whether each field in each candidate data table satisfies the second preset condition, After adding the fields satisfying the second preset condition to the set of index fields to be created, step S312 is performed to obtain index fields according to the set of index data tables to be created and the set of index fields to be created, so as to create the index. Specifically, the second preset condition is that the field usage frequency corresponding to the field is not less than the second preset value. When the field use frequency is greater than or equal to the second preset value, it can be considered that this field meets the second preset condition, indicating that this field is a field that is often used in the process of large data retrieval access, so This field needs to be added to the set of index fields to be created; if the frequency of use of the field corresponding to this field is less than the second preset value, it means that this field appears less often in the process of large data retrieval access, so It is not necessary to add this field to the set of index fields to be created, and it is necessary to judge other fields until all fields satisfying the second preset condition are added to the set of index fields to be created.
需要说明的是,第二预设值可以是根据实际需求预先设置的一个阈值,也可以是预先设定的算法,通过算法计算得到的一个值。在判断字段是否满足第二预设条件的过程中,只需实现根据实际的检索需求来对字段进行分类即可,本申请不对第二预设值的获得方式作具体限定。It should be noted that the second preset value may be a preset threshold according to actual needs, or may be a preset algorithm and a value obtained through algorithm calculation. In the process of judging whether a field satisfies the second preset condition, it is only necessary to classify the fields according to the actual retrieval requirement, and the present application does not specifically limit the manner of obtaining the second preset value.
如图6所示,图6是本申请一实施例提供的获取索引字段的流程图。可以理解的是, 图5所示的实施例中的步骤S312,包括但不限于有步骤S3121,步骤S3122以及步骤S3123。As shown in FIG. 6 , FIG. 6 is a flow chart of obtaining an index field provided by an embodiment of the present application. It can be understood that step S312 in the embodiment shown in FIG. 5 includes but not limited to step S3121 , step S3122 and step S3123 .
步骤S3121,根据待创建索引数据表集合和待创建索引字段集合,得到第一映射关系,第一映射关系为每个候选数据表与字段的映射关系。Step S3121, according to the set of index data tables to be created and the set of index fields to be created, a first mapping relationship is obtained, and the first mapping relationship is the mapping relationship between each candidate data table and a field.
步骤S3122,根据待创建索引字段集合与字段使用方式频度,得到第二映射关系,第二映射关系为每个字段与字段对应的使用方式的映射关系。In step S3122, a second mapping relationship is obtained according to the set of index fields to be created and the usage frequency of the fields. The second mapping relationship is the mapping relationship between each field and the corresponding usage mode of the field.
步骤S3123,根据第一映射关系与第二映射关系,得到索引字段。Step S3123, according to the first mapping relationship and the second mapping relationship, the index field is obtained.
可以理解的是,在待创建索引数据表集合中存在多个数据表,每一个数据表又对应着一个字段集合,将每一个满足第一预设条件的数据表中满足第二预设条件的字段加入与数据表对应的字段集合中,最终所有的字段共同构成待创建索引字段集合。It can be understood that there are multiple data tables in the set of index data tables to be created, and each data table corresponds to a field set, and each data table that satisfies the first preset condition satisfies the second preset condition The fields are added to the field set corresponding to the data table, and finally all the fields together constitute the set of index fields to be created.
可以理解的是,在根据待创建索引数据表集合、待创建索引字段集合以及字段使用方式频度得到索引字段这一过程中,需要先执行步骤S3121,获取待创建索引数据表集合与待创建索引字段集合之间的第一映射关系,即,获取每一个数据表与对应的字段的映射关系;然后再执行步骤S3122,获取待创建索引字段集合与字段使用方式频度之间的第二映射关系,即,获取字段与字段对应的使用方式之间的映射关系。最后再执行步骤S3123,根据第一映射关系与第二映射关系来得到索引字段,并进行索引的创建。It can be understood that, in the process of obtaining the index fields according to the set of index data tables to be created, the set of index fields to be created, and the frequency of field usage, step S3121 needs to be executed first to obtain the set of index data tables to be created and the index fields to be created. The first mapping relationship between the field sets, that is, to obtain the mapping relationship between each data table and the corresponding field; and then perform step S3122 to obtain the second mapping relationship between the index field set to be created and the field usage frequency , that is, to obtain the mapping relationship between the field and the corresponding usage mode of the field. Finally, step S3123 is executed to obtain index fields according to the first mapping relationship and the second mapping relationship, and create an index.
需要说明的是,在将每一个候选数据表中满足第二预设条件的字段都添加至待创建索引字段集合,也就是将应用算法集合中所有合适的字段都找到后,根据待创建索引数据表集合、待创建索引字段集合以及字段使用方式频度得到索引字段。对于数据访问来说,不管是表级的数据表使用频度,字段使用频度还是字段使用方式频度,都会根据不同的实际情况对数据访问的效率造成不同程度上的影响。因此,在根据索引字段创建索引的过程中,可以根据实际需求对数据表和与字段对应的使用方式等数据分配不同的权重,并选择最合适的一种方案来创建索引,从而使得索引的数据访问性能得到最大的提升。It should be noted that, after adding the fields satisfying the second preset condition in each candidate data table to the set of index fields to be created, that is, after finding all suitable fields in the set of applied algorithms, according to the index data to be created Index fields are obtained from the table set, the set of index fields to be created, and the frequency of field usage. For data access, whether it is table-level data table usage frequency, field usage frequency or field usage frequency, it will affect the efficiency of data access to varying degrees according to different actual situations. Therefore, in the process of creating an index based on the index field, you can assign different weights to the data such as the data table and the usage method corresponding to the field according to actual needs, and choose the most suitable scheme to create the index, so that the indexed data Access performance has been greatly improved.
需要说明的是,除了单只使用数据表以及字段使用方式这两种类型的数据来作为选择索引字段的标准以外,还可以根据实际需求对数据表、字段以及与字段对应的使用方式等数据进行不同的组合方式或分配不同的权重,从而决定检索字段。It should be noted that, in addition to only using the two types of data, the data table and the field usage method, as the criteria for selecting index fields, you can also perform data analysis on the data table, fields, and the usage methods corresponding to the fields according to actual needs. Different combinations or different weights are assigned to determine the search fields.
需要说明的是,当业务应用算法发生改变,例如算法出现新增、删除或者更新时,这样的改变会对数据表、字段以及与字段对应的使用方式都发生改变,进而各自对应的频度也会发生变化。因此,在业务应用算法发生改变之后,本申请提出的索引创建方法还主动获取更新后的应用算法集合,并通过上述的索引创建方法进行数据的更新,从而进行索引的创建,始终在最大程度上保证索引带来的数据检索访问的性能的提升。It should be noted that when the business application algorithm changes, such as when the algorithm is added, deleted, or updated, such changes will change the data tables, fields, and usage methods corresponding to the fields, and the corresponding frequencies will also change. will change. Therefore, after the business application algorithm is changed, the index creation method proposed in this application also actively obtains the updated application algorithm set, and updates the data through the above index creation method, so as to create the index to the greatest extent. Guarantees the performance improvement of data retrieval access brought by indexing.
需要说明的是,在业务应用算法发生改变之后,可以由业务人员主动将更新之后的应用算法集合输入至本申请提出的索引创建方法之中,也可以由处理器对业务应用算法进行周期性的检测,在检测到业务应用算法发生变化之后,主动获取新数据并对应用算法集合进行更新,从而进行索引的创建,以使得索引能够适配业务算法的变更,以提高索引的信价比以及数据访问性能。本申请不对应用算法集合的更新方式作具体限定。It should be noted that after the business application algorithm is changed, the business personnel can actively input the updated application algorithm set into the index creation method proposed in this application, or the processor can periodically update the business application algorithm. Detection, after detecting a change in the business application algorithm, actively acquire new data and update the application algorithm set to create an index so that the index can adapt to the change of the business algorithm to improve the credit-to-price ratio of the index and the data access performance. This application does not specifically limit the update method of the application algorithm set.
如图7所示,图7是本申请一实施例提供的索引创建方法的另一流程图。可以理解的是,本申请另一些实施例提出的索引创建方法还包括但不限于有步骤S400,步骤S410,步骤S420,步骤S430,步骤S440,步骤S450,步骤S460,步骤S470,步骤S48以及步骤S490。As shown in FIG. 7 , FIG. 7 is another flowchart of an index creation method provided by an embodiment of the present application. It can be understood that the index creation method proposed in other embodiments of the present application also includes but is not limited to step S400, step S410, step S420, step S430, step S440, step S450, step S460, step S470, step S48 and step S490.
步骤S400,获取应用算法集合。Step S400, acquiring a set of application algorithms.
步骤S410,扫描表级使用频度。Step S410, scanning table-level usage frequency.
步骤S420,判断所有数据表内的数据是否扫描完毕。Step S420, judging whether the data in all data tables has been scanned.
步骤S430,判断该数据表的使用频度是否不小于第一预设门限值。Step S430, judging whether the usage frequency of the data table is not less than a first preset threshold value.
步骤S440,将该数据表加入待创建索引表集合。Step S440, adding the data table into the set of index tables to be created.
步骤S450,对该数据表内的字段逐一进行扫描,得到字段使用频度以及字段对应的使用方式频度。In step S450, the fields in the data table are scanned one by one to obtain the frequency of use of the fields and the frequency of use modes corresponding to the fields.
步骤S460,判断该数据表内的字段是否扫描完毕。Step S460, judging whether the fields in the data table have been scanned.
步骤S470,逐一判断该数据表内的字段使用方式频度是否不小于第二预设门限值。Step S470, judging one by one whether the usage frequency of fields in the data table is not less than a second preset threshold value.
步骤S480,将该字段加入数据表对应的待创建索引字段集合。Step S480, adding the field to the set of index fields to be created corresponding to the data table.
步骤S490,根据数据表对应的索引字段以及使用方式来创建索引。In step S490, an index is created according to the index field corresponding to the data table and the usage method.
可以理解的是,在索引创建方法中,首先执行步骤S400,获取应用算法集合,对应用算法集合内的算法语句进行解析,识别算法中使用的数据表和字段的信息;接着执行步骤S410,对在步骤S400中识别出来的数据表以及字段进行统计分析,得到在不同层次的数据表以及与数据表对应的数据表使用频度。然后执行步骤S420,判断当前是否已经将所有数据表内的所有的字段信息都已经扫描完毕,如果是的话,则进入步骤S490,进行索引的创建;若仍未扫描完所有数据表内的数据,则进入步骤S430,对每一个数据表进行判断,具体为判断数据表的使用频度使用不小于第一预设门限值,如果确定了不小于第一预设门限值,则执行步骤S440,将该数据表添加至待创建索引表集合;若数据表的使用频度小于第一预设门限值,则跳过这一数据表,继续对其他未判断的数据表进行判断。在将适合的数据表加入待创建索引表集合之后,执行步骤S450,对待创建索引表集合中每一个数据表 内的字段进行逐一扫描,得到字段对应的使用方式频度。然后执行步骤S460,判断当前数据表内的字段是否已经扫描完毕,如果未扫描完毕,则执行步骤S470,对该数据表内的字段进行判断,具体为判断每一个字段的使用方式频度是否不小于第二预设门限值,直至筛选出所有使用方式频度不小于第二预设门限值的字段,并执行步骤S480,将这些字段加入数据表对应的待创建索引字段集合中;若当前数据表中的字段已经扫描完毕,且筛选出所有满足条件的字段之后,回到步骤S420,判断是否每一个数据表中的字段信息都已经扫描完毕,如果均已扫描完毕,则根据数据表对应的索引字段以及使用方式等信息来创建索引。通过获取数据表、字段以及与字段对应的使用方式,根据数据表使用频度以及字段使用方式频度等信息,能够得知当前的高频词,最终建立的索引能够根据当前输入的应用算法集合自适应地创建高效的索引,提高检索的效率,提升用户对于数据使用的体验,提高数据访问的性能,避免因为索引配置不当而引起的数据访问性能下降。It can be understood that, in the index creation method, step S400 is firstly executed to acquire the application algorithm set, the algorithm statement in the application algorithm set is parsed, and the information of the data tables and fields used in the algorithm is identified; then step S410 is executed to Statistical analysis is performed on the data tables and fields identified in step S400 to obtain data tables at different levels and usage frequencies of data tables corresponding to the data tables. Then execute step S420 to judge whether all field information in all data tables has been scanned at present, if so, then enter step S490 to create an index; if the data in all data tables has not been scanned yet, Then enter step S430 to judge each data table, specifically to judge that the usage frequency of the data table is not less than the first preset threshold value, if it is determined that it is not less than the first preset threshold value, then execute step S440 , adding the data table to the set of index tables to be created; if the usage frequency of the data table is less than the first preset threshold value, skip this data table and continue to judge other unjudged data tables. After the suitable data table is added to the set of index tables to be created, step S450 is executed to scan the fields in each data table in the set of index tables to be created one by one to obtain the usage frequency corresponding to the field. Then execute step S460 to judge whether the fields in the current data table have been scanned, if not, then execute step S470 to judge the fields in the data table, specifically to judge whether the usage frequency of each field is different is less than the second preset threshold value, until all fields whose usage frequency is not less than the second preset threshold value are screened out, and step S480 is executed to add these fields to the set of index fields to be created corresponding to the data table; if After the fields in the current data table have been scanned, and all fields satisfying the conditions have been screened out, return to step S420 to determine whether the field information in each data table has been scanned. Corresponding index fields and usage information to create an index. By obtaining the data tables, fields and usage methods corresponding to the fields, according to the data table usage frequency and field usage frequency and other information, the current high-frequency words can be known, and the final index can be set according to the current input application algorithm Adaptively create high-efficiency indexes, improve retrieval efficiency, improve user experience in data use, improve data access performance, and avoid data access performance degradation caused by improper index configuration.
另外,本申请的另一个实施例还提供了一种索引创建装置,该索引创建装置包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序。In addition, another embodiment of the present application also provides an index creation device, which includes: a memory, a processor, and a computer program stored in the memory and operable on the processor.
处理器和存储器可以通过数据总线或者其他方式连接。The processor and memory can be connected by a data bus or otherwise.
存储器作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序以及非暂态性计算机可执行程序。此外,存储器可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施方式中,存储器可选包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至该处理器。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。As a non-transitory computer-readable storage medium, memory can be used to store non-transitory software programs and non-transitory computer-executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage devices. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processor via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
实现上述实施例的索引创建方法所需的非暂态软件程序以及指令存储在存储器中,当被处理器执行时,执行上述实施例中的索引创建方法,例如,执行以上描述的图1中的方法步骤S100至S300、图2中的方法步骤S210至S230、图3中的方法步骤S231至S232、图4中的方法步骤S310至S320、图5中的方法步骤S311至S312、图6中的方法步骤S3121至S3123以及图7中的方法步骤S400至S490。The non-transitory software programs and instructions required to realize the index creation method of the above-mentioned embodiment are stored in the memory, and when executed by the processor, the index creation method in the above-mentioned embodiment is executed, for example, the execution of the above-described FIG. Method steps S100 to S300, method steps S210 to S230 among Fig. 2, method steps S231 to S232 among Fig. 3, method steps S310 to S320 among Fig. 4, method steps S311 to S312 among Fig. 5, Fig. 6 Method steps S3121 to S3123 and method steps S400 to S490 in FIG. 7 .
以上所描述的装置实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
此外,本申请的一个实施例还提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机可执行指令,该计算机可执行指令被一个处理器或控制器执行,例如,被 上述索引创建装置实施例中的一个处理器执行,可使得上述处理器执行上述实施例中的索引创建方法,例如,执行以上描述的图1中的方法步骤S100至S300、图2中的方法步骤S210至S230、图3中的方法步骤S231至S232、图4中的方法步骤S310至S320、图5中的方法步骤S311至S312、图6中的方法步骤S3121至S3123以及图7中的方法步骤S400至S490。In addition, an embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are executed by a processor or a controller, for example, by the above-mentioned Execution by a processor in the embodiment of the index creation device can cause the above-mentioned processor to execute the index creation method in the above-mentioned embodiment, for example, perform the above-described method steps S100 to S300 in FIG. 1 and method step S210 in FIG. 2 Method steps S231 to S232 in Fig. 3, method steps S310 to S320 in Fig. 4, method steps S311 to S312 in Fig. 5, method steps S3121 to S3123 in Fig. 6 and method step S400 in Fig. 7 to S230, Fig. 3 to S490.
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、设备中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。Those of ordinary skill in the art can understand that all or some of the steps in the methods disclosed above, the functional modules/units in the system, and the device can be implemented as software, firmware, hardware, and an appropriate combination thereof.
在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be composed of several physical components. Components cooperate to execute. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit . Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As known to those of ordinary skill in the art, the term computer storage media includes both volatile and nonvolatile media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. permanent, removable and non-removable media. Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage devices, or can Any other medium used to store desired information and which can be accessed by a computer. In addition, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .
以上参照附图说明了本申请的优选实施例,并非因此局限本申请的权利范围。本领域技术人员不脱离本申请的范围和实质内所作的任何修改、等同替换和改进,均应在本申请的权利范围之内。The preferred embodiments of the present application have been described above with reference to the accompanying drawings, and the scope of rights of the present application is not limited thereby. Any modifications, equivalent replacements and improvements made by those skilled in the art without departing from the scope and essence of the present application shall fall within the scope of rights of the present application.

Claims (10)

  1. 一种索引创建方法,包括:An index creation method comprising:
    获取应用算法集合;Obtain a set of application algorithms;
    根据所述应用算法集合得到数据表使用频度和字段使用方式频度;以及Obtain the data table usage frequency and the field usage mode frequency according to the set of application algorithms; and
    根据所述数据表使用频度和所述字段使用方式频度创建索引。An index is created according to the usage frequency of the data table and the usage frequency of the field.
  2. 根据权利要求1所述的索引创建方法,其中,所述根据所述应用算法集合得到数据表使用频度和字段使用方式频度,包括:The index creation method according to claim 1, wherein said obtaining the data table usage frequency and field usage frequency according to the set of application algorithms includes:
    根据所述应用算法集合得到多个初始数据表以及与所述初始数据表对应的所述数据表使用频度;Obtaining a plurality of initial data tables and the frequency of use of the data tables corresponding to the initial data tables according to the set of application algorithms;
    对多个所述初始数据表进行筛选,得到候选数据表;以及Screening a plurality of the initial data tables to obtain candidate data tables; and
    对所述候选数据表进行扫描,得到多个字段、所述字段对应的使用方式以及所述字段使用方式频度。The candidate data table is scanned to obtain a plurality of fields, usage modes corresponding to the fields, and frequency of usage modes of the fields.
  3. 根据权利要求2所述的索引创建方法,其中,所述对多个所述初始数据表进行筛选,得到候选数据表,包括:The index creation method according to claim 2, wherein said screening a plurality of said initial data tables to obtain candidate data tables comprises:
    将满足第一预设条件的所述初始数据表标记为所述候选数据表,所述第一预设条件为所述初始数据表对应的所述数据表使用频度不小于第一预设值。marking the initial data table that satisfies a first preset condition as the candidate data table, the first preset condition being that the usage frequency of the data table corresponding to the initial data table is not less than a first preset value .
  4. 根据权利要求2所述的索引创建方法,其中,所述对所述候选数据表进行扫描,得到多个字段、所述字段对应的使用方式以及所述字段使用方式频度,包括:The index creation method according to claim 2, wherein the scanning of the candidate data table to obtain a plurality of fields, usage modes corresponding to the fields, and frequency of usage modes of the fields includes:
    将所述候选数据表添加至待创建索引数据表集合;以及adding the candidate data table to the set of index data tables to be created; and
    对所述待创建索引数据表集合中的所述候选数据表进行扫描,得到所述字段、所述字段对应的所述使用方式以及所述字段使用方式频度。The candidate data tables in the set of index data tables to be created are scanned to obtain the fields, the usage modes corresponding to the fields, and the frequency of usage modes of the fields.
  5. 根据权利要求2所述的索引创建方法,其中,所述根据所述数据表使用频度和所述字段使用方式频度创建索引,包括:The method for creating an index according to claim 2, wherein said creating an index according to the frequency of use of the data table and the frequency of use of the field includes:
    对所述数据表使用频度和所述字段使用方式频度进行筛选,得到索引字段;以及Filtering the frequency of use of the data table and the frequency of use of the field to obtain an index field; and
    根据所述索引字段来创建索引。An index is created according to the index field.
  6. 根据权利要求5所述的索引创建方法,其中,所述对所述数据表使用频度和所述字段使用方式频度进行筛选,得到索引字段,包括:The method for creating an index according to claim 5, wherein said filtering said data table usage frequency and said field usage frequency to obtain an index field includes:
    将满足第二预设条件的所述字段添加至待创建索引字段集合,所述第二预设条件为所述字段对应的所述字段使用方式频度不小于第二预设值;以及Adding the fields satisfying the second preset condition to the set of index fields to be created, the second preset condition being that the field usage frequency corresponding to the field is not less than a second preset value; and
    根据所述待创建索引数据表集合和所述待创建索引字段集合得到所述索引字段。The index fields are obtained according to the set of index data tables to be created and the set of index fields to be created.
  7. 根据权利要求6所述的索引创建方法,其中,所述根据所述待创建索引数据表集合和所述待创建索引字段集合得到所述索引字段,包括:The index creation method according to claim 6, wherein said obtaining said index fields according to said set of index data tables to be created and said set of index fields to be created comprises:
    根据所述待创建索引数据表集合和所述待创建索引字段集合,得到第一映射关系,所述第一映射关系为每个所述候选数据表与所述字段的映射关系;According to the set of index data tables to be created and the set of index fields to be created, a first mapping relationship is obtained, and the first mapping relationship is a mapping relationship between each of the candidate data tables and the fields;
    根据所述待创建索引字段集合与所述字段使用方式频度,得到第二映射关系,所述第二映射关系为每个所述字段与所述字段对应的使用方式的映射关系;以及Obtaining a second mapping relationship according to the set of index fields to be created and the usage frequency of the fields, the second mapping relationship is a mapping relationship between each of the fields and the usage mode corresponding to the field; and
    根据所述第一映射关系与所述第二映射关系,得到所述索引字段。The index field is obtained according to the first mapping relationship and the second mapping relationship.
  8. 根据权利要求1所述的索引创建方法,其中,所述获取应用算法集合,包括:The method for creating an index according to claim 1, wherein said acquiring a set of application algorithms comprises:
    当确定所述应用算法集合的内容发生改变,获取更新后的所述应用算法集合。When it is determined that the content of the application algorithm set changes, the updated application algorithm set is acquired.
  9. 一种索引创建装置,包括存储器、处理器、存储在所述存储器上并可在所述处理器上运行的程序以及用于实现所述处理器和所述存储器之间的连接通信的数据总线,其中,所述程序被所述处理器执行时实现如权利要求1至8任一项所述的索引创建方法的步骤。An index creation device, comprising a memory, a processor, a program stored on the memory and operable on the processor, and a data bus for realizing connection and communication between the processor and the memory, Wherein, when the program is executed by the processor, the steps of the index creation method according to any one of claims 1 to 8 are realized.
  10. 一种计算机可读存储介质,存储有计算机可执行程序,其中,所述计算机可执行程序用于使计算机执行如权利要求1至8任意一项所述的索引创建方法。A computer-readable storage medium storing a computer-executable program, wherein the computer-executable program is used to make a computer execute the index creation method according to any one of claims 1-8.
PCT/CN2022/127445 2021-11-03 2022-10-25 Index creation method and apparatus, and computer-readable storage medium WO2023078130A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111294583.XA CN116069777A (en) 2021-11-03 2021-11-03 Index creation method, apparatus, and computer-readable storage medium
CN202111294583.X 2021-11-03

Publications (1)

Publication Number Publication Date
WO2023078130A1 true WO2023078130A1 (en) 2023-05-11

Family

ID=86175693

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/127445 WO2023078130A1 (en) 2021-11-03 2022-10-25 Index creation method and apparatus, and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN116069777A (en)
WO (1) WO2023078130A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080126311A1 (en) * 2006-11-29 2008-05-29 Red Hat, Inc. Automatic index creation based on unindexed search evaluation
CN105320679A (en) * 2014-07-11 2016-02-10 中国移动通信集团重庆有限公司 Data table index set generation method and device
CN105630803A (en) * 2014-10-30 2016-06-01 国际商业机器公司 Method and apparatus for establishing index for document database
CN107239451A (en) * 2016-03-28 2017-10-10 北京京东尚科信息技术有限公司 Database index creation method and device
CN113590632A (en) * 2021-08-11 2021-11-02 平安普惠企业管理有限公司 Database index creating method, device, equipment and medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080126311A1 (en) * 2006-11-29 2008-05-29 Red Hat, Inc. Automatic index creation based on unindexed search evaluation
CN105320679A (en) * 2014-07-11 2016-02-10 中国移动通信集团重庆有限公司 Data table index set generation method and device
CN105630803A (en) * 2014-10-30 2016-06-01 国际商业机器公司 Method and apparatus for establishing index for document database
CN107239451A (en) * 2016-03-28 2017-10-10 北京京东尚科信息技术有限公司 Database index creation method and device
CN113590632A (en) * 2021-08-11 2021-11-02 平安普惠企业管理有限公司 Database index creating method, device, equipment and medium

Also Published As

Publication number Publication date
CN116069777A (en) 2023-05-05

Similar Documents

Publication Publication Date Title
US10664497B2 (en) Hybrid database table stored as both row and column store
US10346383B2 (en) Hybrid database table stored as both row and column store
RU2663358C2 (en) Clustering storage method and device
US8768927B2 (en) Hybrid database table stored as both row and column store
US8924373B2 (en) Query plans with parameter markers in place of object identifiers
US7680821B2 (en) Method and system for index sampled tablescan
CN108062314B (en) Dynamic sub-table data processing method and device
CN113094340A (en) Data query method, device and equipment based on Hudi and storage medium
WO2019200700A1 (en) Official document processing method and apparatus, and terminal device and storage medium
CN113553339A (en) Data query method, middleware, electronic device and storage medium
CN114398371A (en) Multi-copy fragmentation method, device, equipment and storage medium for database cluster system
US7814087B2 (en) Method of hierarchical searching on a conditional graph
CN111078705A (en) Spark platform based data index establishing method and data query method
CN106776702B (en) Method and device for processing indexes in master-slave database system
WO2023078130A1 (en) Index creation method and apparatus, and computer-readable storage medium
CN110990423B (en) SQL statement execution method, device, equipment and storage medium
US20160004749A1 (en) Search system and search method
CN109299106B (en) Data query method and device
CN115495462A (en) Batch data updating method and device, electronic equipment and readable storage medium
CN113779068B (en) Data query method, device, equipment and storage medium
CN114428776A (en) Index partition management method and system for time sequence data
CN112506953A (en) Query method, device and storage medium based on Structured Query Language (SQL)
CN109325031B (en) Data statistical method, device, equipment and storage medium
CN113821501B (en) Data archiving method and device
CN107391666B (en) Method and device for generating composite index key value

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22889155

Country of ref document: EP

Kind code of ref document: A1