CN118296001A

CN118296001A - Data query method, device, computer equipment, storage medium and program product

Info

Publication number: CN118296001A
Application number: CN202410377131.5A
Authority: CN
Inventors: 盛利
Original assignee: Mgjia Beijing Technology Co ltd
Current assignee: Mgjia Beijing Technology Co ltd
Filing date: 2024-03-29
Publication date: 2024-07-05

Abstract

The invention relates to the technical field of data analysis, and discloses a data query method, a data query device, computer equipment, a storage medium and a program product. The method comprises the following steps: acquiring a data table to be analyzed; respectively calculating the repeatability of each column of data in the data table to be analyzed; the repeatability is used for indicating the proportion of repeated data in each column of data to the total number of data; and selecting a column which corresponds to the task identifier to be analyzed and has the repeatability meeting the preset condition as an index column based on the repeatability of each column of data and the preset task identifier to be analyzed so as to perform data query. According to the scheme, accuracy is high and efficiency is high when the data query function is realized.

Description

Data query method, device, computer equipment, storage medium and program product

Technical Field

The present invention relates to the field of data analysis technologies, and in particular, to a data query method, a data query device, a computer device, a storage medium, and a program product.

Background

Mysql is a relational database commonly used in software development, and consists of libraries and data tables, and software developers can store data to be stored in mysql by designing the data tables.

Over time and with increasing traffic, software developers are continually updating newly acquired data in data tables, such that more and more data is available in a single data table, and there may be a large amount of duplicate redundant data in the data table. The high repeatability of each data in the table can lead to index failure, cause full-table scanning and slow query, and a large amount of repeated redundant data can also cause the problems of high master-slave synchronization delay, slow data table backup and the like, and a plurality of problems can be found only when query is performed, so that the query speed is very influenced. In the related art, depending on experience of a software developer, how to set an index, how to perform data query more effectively and the like are required to be judged, so that the accuracy is low and the efficiency is low.

Disclosure of Invention

In view of the above, the present invention provides a data query method, apparatus, computer device, storage medium and program product, so as to solve the problems of low accuracy and low efficiency in data query.

In a first aspect, the present invention provides a data query method, the method comprising: acquiring a data table to be analyzed; respectively calculating the repeatability of each column of data in the data table to be analyzed; the repeatability is used for indicating the proportion of repeated data in each column of data to the total number of data; and selecting a column which corresponds to the task identifier to be analyzed and has the repeatability meeting the preset condition as an index column based on the repeatability of each column of data and the preset task identifier to be analyzed so as to perform data query.

According to the scheme, the repeatability of each column of data in the data table to be analyzed is calculated to obtain the repeatability of each column of data, then the task identifier to be analyzed and the repeatability of each column of data are combined, and the column which corresponds to the task identifier to be analyzed and has the repeatability meeting the preset condition is selected as the index column, so that the repeatability of the index column can be reasonably ensured aiming at the task identifier to be analyzed when the data is queried later, the accuracy is high, the interference of the data with high repeatability and the data with a small relationship with the task to be analyzed on the query process is eliminated, the query speed is improved, and the efficiency is high.

In an alternative embodiment, the calculating the repeatability of each column of the data in the data table to be analyzed includes: acquiring the total number of data of a data table to be analyzed; performing a de-duplication operation on each column of data, and recording the de-duplication data amount of each column of data; the degree of repetition of each column of data is determined based on the quotient of the amount of deduplicated data per column of data and the total number of data.

The scheme defines how to calculate the repeatability of each column of data in the data table to be analyzed, and refines the scheme.

In an optional implementation manner, the selecting, based on the repeatability of each column of data and the preset task identifier to be analyzed, a column corresponding to the task identifier to be analyzed and having the repeatability meeting the preset condition as an index column includes: sequencing the repeatability of each column of data from small to large; and selecting the columns with the first target number, which correspond to the task identification to be analyzed and have the minimum repeatability, as index columns.

The scheme defines how to select the index column, and refines the scheme.

In an alternative embodiment, the method further comprises: detecting whether a data table to be analyzed contains a preset index column or not; if so, calculating the repeatability of the preset index row; comparing the repeatability of the preset index column with a preset repeatability threshold, and if the repeatability of the preset index column exceeds the preset repeatability threshold, selecting a column which corresponds to the task identifier to be analyzed and has the repeatability meeting a preset condition as a recommended index column based on the repeatability of each column of data and the task identifier to be analyzed.

According to the scheme, whether the data table to be analyzed has the preset index column is detected, if yes, whether the preset index column is reasonable is judged, if not, the recommended index column is given, and accuracy and efficiency of data query are guaranteed.

In an alternative embodiment, the method further comprises: detecting the number of columns of the preset index columns; and comparing the number of columns of the preset index columns with a preset column number threshold, if the number of columns of the preset index columns exceeds the preset column number threshold, sequencing the repetition degree of each column of the preset index columns from small to large, and selecting the columns with the minimum repetition degree and the second target number as combined index columns.

According to the scheme, whether the number of the preset index columns is reasonable is judged, if the number is too large, the preset index columns with small repeatability are selected for combination and used as combined index columns, and the efficiency of data query is guaranteed.

In an alternative embodiment, the method further comprises: comparing the total data number of the data table to be analyzed with a preset data quantity threshold value; if the total data number of the data table to be analyzed exceeds a preset data quantity threshold, displaying a warning control in an interface of the data table to be analyzed; and the warning control is used for prompting a user that the total number of the data table to be analyzed is excessive.

According to the scheme, whether the total data number of the data table to be analyzed is reasonable or not is judged, if the total data number is excessive, the user is prompted to be excessive, so that the user is warned to optimize, and the efficiency of data query is guaranteed.

In a second aspect, the present invention provides a data query apparatus, the apparatus comprising:

The data acquisition module is used for acquiring a data table to be analyzed;

The repeatability calculation module is used for calculating the repeatability of each column of data in the data table to be analyzed respectively; the repeatability is used for indicating the proportion of repeated data in each column of data to the total number of data;

And the index column determining module is used for selecting a column which corresponds to the task identifier to be analyzed and has the repeatability meeting the preset condition as an index column based on the repeatability of each column of data and the preset task identifier to be analyzed so as to perform data query.

In a third aspect, the present invention provides a computer device comprising: the data query method comprises the steps of storing computer instructions in a memory and a processor, wherein the memory and the processor are in communication connection, and the processor executes the computer instructions, so that the data query method of the first aspect or any corresponding implementation mode is executed.

In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon computer instructions for causing a computer to perform the data query method of the first aspect or any of its corresponding embodiments.

In a fifth aspect, the present invention provides a computer program product comprising computer instructions for causing a computer to perform the data querying method of the first aspect or any of its corresponding embodiments.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a data query method according to an embodiment of the invention;

FIG. 2 is a flow chart of another data query method according to an embodiment of the present invention;

FIG. 3 is a flow chart of yet another data query method according to an embodiment of the present invention;

FIG. 4 is a block diagram of a data query device according to an embodiment of the present invention;

Fig. 5 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Over time and with increasing traffic, software developers continuously update newly acquired data in a data table, so that more and more data are stored in a single data table, and because repeated steps or modules and the like may exist in the software development process, a large amount of repeated redundant data may exist in the data table, and the memory space is occupied. The high repeatability of each data in the table can lead to index failure, cause full-table scanning and slow query, and a large amount of repeated redundant data can also cause the problems of high master-slave synchronization delay, slow data table backup and the like, and a plurality of problems can be found only when specific query operation is carried out, so that the query speed is very influenced. In the related art, depending on experience of a software developer, how to set an index, how to perform data query more effectively and the like are required to be judged, so that the accuracy is low and the efficiency is low.

Therefore, the embodiment of the invention provides a data query method, which achieves the effect of improving query accuracy and efficiency by analyzing the repeatability in the data table and selecting the index column.

In accordance with an embodiment of the present invention, there is provided a data query method embodiment, it being noted that the steps shown in the flowchart of the figures may be performed in a computer system, such as a set of computer executable instructions, and, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in an order other than that shown or described herein.

In this embodiment, a data query method is provided, which may be used in the above mobile terminal, such as a notebook computer, a desktop computer, etc., fig. 1 is a flowchart of a data query method according to an embodiment of the present invention, and as shown in fig. 1, the flowchart includes the following steps:

step S101, a data table to be analyzed is obtained.

The data table to be analyzed may be a data table recorded and stored in advance by the user. The data table to be analyzed comprises rows and columns, which are specifically recorded and set according to requirements, for example, the data table records the achievement of a student in a certain school, each column can be a school, a school year, a class, a number, a student, a subject, an achievement and the like, and each row is information aiming at a single student.

Step S102, calculating the repeatability of each column of data in the data table to be analyzed.

The repetition degree is used for indicating the proportion of repeated data in each column of data to the total number of data. After obtaining the repetition degree of each column of data in the data table to be analyzed, whether the column of data is suitable for being used as an index can be analyzed according to the repetition degree of each column of data so as to perform data query, for example, the repetition degree of one column of data is 100%, which indicates that the column of data is the same and cannot be used as an index.

Step S103, selecting a column which corresponds to the task identifier to be analyzed and has the repeatability meeting the preset condition as an index column based on the repeatability of each column of data and the preset task identifier to be analyzed so as to perform data query.

The task identification to be analyzed comprises characteristic information of the task to be analyzed, is related to data in a data table to be analyzed, can be set by a user according to task requirements, for example, the user needs to analyze the performance of a student in a certain school year and a certain class, and at least the school, the school year, the class, the student and the performance are recorded in the pre-stored data table to be analyzed, and the task identification to be analyzed comprises the school, the school year, the class, the student and the performance. It should be noted that the data table to be analyzed may contain data other than the task identifier to be analyzed, such as the age of the student.

When a user performs data query, the task identifier to be analyzed can be selected according to task requirements, the computer equipment can analyze the task identifier to be analyzed and the repeatability of each column of data, and a column which corresponds to the task identifier to be analyzed and has the repeatability meeting the preset condition is selected as an index column. The preset condition may be set according to the requirement, for example, to a repetition degree of less than 20%.

According to the data query method provided by the embodiment, the repeatability of each column of data in the data table to be analyzed is calculated first to obtain the repetition condition of each column of data, then the column which corresponds to the task identifier to be analyzed and has the repeatability meeting the preset condition is selected as the index column by combining the task identifier to be analyzed and the repeatability of each column of data, so that the repeatability of the index column is reasonable and high in accuracy aiming at the task identifier to be analyzed when the data query is carried out subsequently, interference of data with high repeatability and data with a small relationship with the task to be analyzed on the query process is eliminated, the query speed is improved, and the efficiency is high.

In this embodiment, a data query method is provided, which may be used in the above mobile terminal, such as a notebook computer, a desktop computer, etc., and fig. 2 is a flowchart of the data query method according to an embodiment of the present invention, as shown in fig. 2, where the flowchart includes the following steps:

Step S201, a data table to be analyzed is obtained.

Optionally, the computer device may also obtain, in advance, the task identifier to be analyzed input by the user. The user can input the task identification to be analyzed through the interactive interface, for example, the computer equipment responds to the descriptive statement input by the user to split the keyword of the descriptive statement, compares the split keyword with the type information in the database to screen the task identification to be analyzed, or responds to the keyword input by the user to compare the keyword with the type information in the database to screen the task identification to be analyzed. For example, if the task to be analyzed is to query the results of all students in the Y class of the X school, the description sentence "query the results of all students in the Y class of the X school" may be input, and the keywords "X school", "Y class", "all students" and "results" are split by the computer device, or the keywords "X school", "Y class", "all students" and "results" are directly input by the user.

The client logs in the mysql database, at least stores data related to the task to be analyzed, and after logging in, the user can select, export or download the required data table to be analyzed in the interface. Firstly, table names of all data tables are displayed in an interface, and a user can select the data table to be analyzed through the table names. After the data table to be analyzed is selected, DDL (Data Definition Language, data definition language, used for defining database objects, such as libraries, tables, columns, etc.) information of the data table to be analyzed can be obtained, so that all column data of the data table to be analyzed can be obtained.

Optionally, it is analyzed whether the column data may be null to determine the validity of the column data. For example, there must be a value in column a, but if it is defined in the table-building statement of column a that the data in column a can be null, then it is determined that null is a state value, column a data is valid, or there is a problem in the table-building statement, and repair is required.

Step S202, calculating the repeatability of each column of data in the data table to be analyzed.

The repetition degree is used for indicating the proportion of repeated data in each column of data to the total number of data.

Specifically, the step S202 includes:

In step S2021, the total number of data in the data table to be analyzed is obtained.

The data table to be analyzed comprises rows and columns, the total data bar number represents how many rows are in total, data alignment is performed firstly after the data table to be analyzed is obtained, each row of data corresponds to each item of information (such as schools, classes, numbers and the like of the same students) of the same statistical object, each row of data corresponds to the same type of statistical information (such as numbers in the same column), and at least one row of data is generated and selected as an index column.

In step S2022, a deduplication operation is performed on each column of data, and the deduplication data amount of each column of data is recorded.

The amount of the deduplication data, that is, the amount of data after deduplication, that is, the amount of unrepeated data remaining after deduplication, the smaller the amount of deduplication data of the column data, which represents the higher the degree of duplication of the column data, the more difficult the column is to reflect the distinction between each line of data, and the more unsuitable the column is as an index.

In step S2023, the repetition degree of each column of data is determined based on the quotient of the amount of the de-duplicated data and the total number of data for each column of data.

Alternatively, the repetition degree of each column of data is calculated by the formula "repetition degree of column data=1-column data deduplication data amount/total number of data pieces".

Step S203, selecting a column corresponding to the task identifier to be analyzed and having the repetition degree meeting the preset condition as an index column based on the repetition degree of each column of data and the task identifier to be analyzed, so as to perform data query.

Optionally, when the task identifier to be analyzed is preset or output by the user, step S203 executed by the computer includes:

in step S2031, the repetition degree of each column of data is sorted from small to large.

Optionally, the first n pieces of data with the highest repetition degree in each column and the number of times of each occurrence of the first n pieces of data are recorded, so as to analyze the specific situation of each column of data.

Step S2032, selecting a first target number of columns with the minimum repetition corresponding to the task identifier to be analyzed as an index column.

The first target number may be set according to actual requirements, for example, set to 4, and then represents that 4 columns corresponding to the task identifier to be analyzed and having the minimum repeatability are selected as index columns, so as to perform subsequent data query.

In another possible implementation manner of the embodiment of the present application, the computer device selects, based on the repetition degree of each column of data, a column with the repetition degree meeting a preset condition as a candidate index column, and then displays the keyword corresponding to the index column on the interface; when a selection operation of a user on a keyword (for example, clicking the keyword or inputting a corresponding keyword in an input box) is received, a candidate index column corresponding to the keyword is selected and used as an index column required to be used by the user. Optionally, since the data table to be analyzed may already include a default index column or be set with an index column, when the data table to be analyzed is obtained, whether the data table to be analyzed includes a preset index column may also be detected first, and if the data table to be analyzed includes the preset index column, the rationality of the preset index column may be determined. Specifically, whether the data table to be analyzed contains a preset index row is detected firstly, if so, the repeatability of the preset index row is calculated, then the repeatability of the preset index row is compared with a preset repeatability threshold, and if the repeatability of the preset index row exceeds the preset repeatability threshold, a row which corresponds to the task identifier to be analyzed and meets preset conditions on the basis of the repeatability of each row of data and the task identifier to be analyzed is selected as a recommended index row; and if the index prompt control is not included, displaying an index prompt control in an interface of the data table to be analyzed, wherein the index prompt control is used for prompting a user that the data table to be analyzed does not include an index column and suggesting the user to increase the index column so as to improve the query efficiency. That is, in the case of including the preset index row, if the repetition degree of the preset index row exceeds the preset repetition degree threshold, that is, when the repetition degree of the preset index row is high, the preset index row is not suitable for being used as the index during data query, so that a row with low repetition degree can be selected for recommending to the user, and the user can conveniently replace the more reasonable index row according to the recommendation. The preset repetition threshold may be set according to practical requirements, for example, set to 20%. If the index column is not included, a reasonable index column can be recommended to the user by a method for selecting the index column (namely, selecting the first target number of columns which correspond to the task identification to be analyzed and have the minimum repeatability as the index column) in the scheme.

If the data table to be analyzed does not contain the preset index column, the index prompt control gives a prompt of 'currently no index, suggesting to increase the index according to the index repeatability and accelerating the query speed'.

Further, since the data table to be analyzed may include a plurality of preset index columns, if the number of the preset index columns is too large, the occupied space is large, the maintenance difficulty is increased, and the efficiency of data query is affected. Specifically, the number of columns of the preset index column is detected first, then the number of columns of the preset index column is compared with a preset column number threshold, if the number of columns of the preset index column exceeds the preset column number threshold, the repetition degree of each column of the preset index column is ordered from small to large, and a second target number of columns with the minimum repetition degree is selected as the index column. The preset column number threshold can be set according to actual requirements, for example, set to 4, and when the number of preset index columns exceeds 4, the number is judged to be excessive, and optimization is needed. The second target number may be set according to actual requirements, for example, 1, which indicates that the column with the smallest repetition degree is used as the index column, or 2, which indicates that the 2 columns with the smallest repetition degree are used as the index columns. And after selecting the columns with the minimum repetition degree and the second target number as index columns, canceling other preset index columns. Furthermore, the columns with the second target number and the minimum repetition degree can be combined to form a combined index column, so that data query is facilitated, for example, the columns with the minimum repetition degree are subject columns and score columns, and the name columns and the score columns are combined into json format, so that the subject columns and the score columns can be combined into one column, the total number of data is reduced, and the query speed is improved.

Optionally, when the total number of data in the data table to be analyzed is excessive, the user can be prompted to optimize the data table to be analyzed. Specifically, comparing the total data number of the data table to be analyzed with a preset data amount threshold, and displaying a warning control in an interface of the data table to be analyzed if the total data number of the data table to be analyzed exceeds the preset data amount threshold. The warning control is used for prompting a user that the total number of data of the data table to be analyzed is excessive, and prompting information such as 'the current data amount is excessive, the query performance can be reduced', and the query performance is suggested to be improved by adopting methods of reading and writing separation, library division, table division and the like can be displayed in the warning control. The preset data quantity threshold depends on factors such as hardware equipment, system environment, use scene and the like, and exceeding the preset data quantity threshold can cause slow speed of querying, adding, deleting, modifying and the like of the mysql database, and the preset data quantity threshold can be set according to actual requirements, for example, the preset data quantity threshold is set to be two tens of millions.

As one or more specific application examples of the embodiments of the present invention, the best mode or the most desirable mode of the inventors will be described below in connection with a specific application scenario.

Fig. 3 is a flowchart of a data query method according to an embodiment of the present invention, as shown in fig. 3, the data query method is firstly connected to a mysql database through an account password ip address, and when a table to be analyzed (a data table to be analyzed) is selected, DDL information of the table is obtained, names of columns are obtained, and data amount (total number of data dataCount) of the table is obtained. Illustratively, the total number of data bars of the data table to be analyzed is obtained by instruction "(dataCount) select count (from t)". Next, the repetition degree (repeat) of each column of data in the table is calculated, and the corresponding number of occurrences of the first 10 pieces of data whose repetition degree is highest for each column is calculated. Illustratively, the deduplication operation is performed on the column data by the instruction "select count (distinct coloumName) from t" to obtain the deduplication data amount of the column data. The top 10 pieces of data with the highest repetition degree in the column data and the number of times of occurrence of each of the top 10 pieces of data are acquired through an instruction "SELECT column_name,COUNT(*)as count FROM table_name GROUP BY column_name ORDER BY count DESC LIMIT 10". And then inquiring index information of the current table, if the current table is not indexed, giving a prompt of ' currently not indexed ', suggesting to increase the index according to the index repetition degree ', so as to speed up the inquiry and giving an optimization suggestion. If the current table has indexes, firstly inquiring the number of the indexes, and if the number is larger than 4, giving a prompt of 'excessive indexes of the current table can be optimized into combined indexes', so that space occupation is reduced, index maintenance cost is reduced, and optimization suggestions are given; if the index is smaller than 4, judging whether the index is suitable, if so, judging that the current index does not need to be modified and ending the flow, if not, giving a prompt that the index repetition degree is larger than 20 percent and is easy to fail, taking other columns into consideration as index columns, and giving optimization suggestions. The optimization proposal is to rank each column of repetition previously calculated from low to high and select the appropriate index. And finally judging whether the data volume of the table is proper, if the data volume is excessive, giving optimization suggestions such as read-write separation, library and table separation and the like.

The embodiment also provides a data query device, which is used for implementing the foregoing embodiments and preferred embodiments, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

The present embodiment provides a data query device, as shown in fig. 4, including:

A data acquisition module 401, configured to acquire a data table to be analyzed;

a repetition degree calculation module 402, configured to calculate a repetition degree of each column of data in the data table to be analyzed; the repeatability is used for indicating the proportion of repeated data in each column of data to the total number of data;

The index column determining module 403 is configured to select, based on the repetition degree of each column of data and a preset task identifier to be analyzed, a column corresponding to the task identifier to be analyzed and having a repetition degree meeting a preset condition as an index column, so as to perform data query.

In an alternative embodiment, the repeatability calculation module is further configured to: acquiring the total number of data of a data table to be analyzed; performing a de-duplication operation on each column of data, and recording the de-duplication data amount of each column of data; the degree of repetition of each column of data is determined based on the quotient of the amount of deduplicated data per column of data and the total number of data.

In an alternative embodiment, the index column determination module is further configured to: sequencing the repeatability of each column of data from small to large; and selecting the first target number of columns which correspond to the task identification to be analyzed and have the minimum repeatability as index columns.

In an optional implementation manner, the system further comprises a preset index detection module, which is used for detecting whether the data table to be analyzed contains a preset index column; if so, calculating the repeatability of the preset index row; comparing the repeatability of the preset index row with a preset repeatability threshold, and if the repeatability of the preset index row exceeds the preset repeatability threshold, selecting a row which corresponds to the task identifier to be analyzed and has the repeatability meeting the preset condition as a recommended index row based on the repeatability of each row of data and the task identifier to be analyzed.

In an alternative embodiment, the preset index detection module is further configured to: detecting the number of columns of the preset index column; and comparing the number of columns of the preset index column with a preset column number threshold, if the number of columns of the preset index column exceeds the preset column number threshold, sequencing the repetition degree of each column of the preset index column from small to large, and selecting the columns with the minimum repetition degree and the second target number as combined index columns.

In an alternative embodiment, the method further comprises a data volume comparison module, which is used for comparing the total number of data of the data table to be analyzed with a preset data volume threshold; if the total data number of the data table to be analyzed exceeds a preset data quantity threshold, displaying a warning control in an interface of the data table to be analyzed; the warning control is used for prompting a user that the total number of the data table to be analyzed is excessive.

Further functional descriptions of the above respective modules and units are the same as those of the above corresponding embodiments, and are not repeated here.

The data query device in this embodiment is presented in the form of a functional unit, where the unit refers to an ASIC (Application SPECIFIC INTEGRATED Circuit) Circuit, a processor and a memory executing one or more software or fixed programs, and/or other devices that can provide the above functions.

The embodiment of the invention also provides computer equipment, which is provided with the data query device shown in the figure 4.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a computer device according to an alternative embodiment of the present invention, as shown in fig. 5, the computer device includes: one or more processors 10, memory 20, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the computer device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple computer devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 5.

The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.

Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform the methods shown in implementing the above embodiments.

The memory 20 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created according to the use of the computer device, etc. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk, or solid state disk; the memory 20 may also comprise a combination of the above types of memories.

The computer device further comprises input means 30 and output means 40. The processor 10, memory 20, input device 30, and output device 40 may be connected by a bus or other means, for example in fig. 5.

The input device 30 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the computer apparatus, such as a touch screen, a keypad, a mouse, a trackpad, a touchpad, a pointer stick, one or more mouse buttons, a trackball, a joystick, and the like. The output means 40 may include a display device, auxiliary lighting means (e.g., LEDs), tactile feedback means (e.g., vibration motors), and the like. Such display devices include, but are not limited to, liquid crystal displays, light emitting diodes, displays and plasma displays. In some alternative implementations, the display device may be a touch screen.

The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.

Portions of the present invention may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or aspects in accordance with the present invention by way of operation of the computer. Those skilled in the art will appreciate that the form of computer program instructions present in a computer readable medium includes, but is not limited to, source files, executable files, installation package files, etc., and accordingly, the manner in which the computer program instructions are executed by a computer includes, but is not limited to: the computer directly executes the instruction, or the computer compiles the instruction and then executes the corresponding compiled program, or the computer reads and executes the instruction, or the computer reads and installs the instruction and then executes the corresponding installed program. Herein, a computer-readable medium may be any available computer-readable storage medium or communication medium that can be accessed by a computer.

Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.

Claims

1. A method of querying data, the method comprising:

Acquiring a data table to be analyzed;

respectively calculating the repeatability of each column of data in the data table to be analyzed; the repeatability is used for indicating the proportion of repeated data in each column of data to the total number of data;

And selecting a column which corresponds to the task identifier to be analyzed and has the repeatability meeting the preset condition as an index column based on the repeatability of each column of data and the preset task identifier to be analyzed so as to perform data query.

2. The method of claim 1, wherein the separately calculating the repetition level of each column of the data in the data table to be analyzed comprises:

Acquiring the total number of data of a data table to be analyzed;

performing a de-duplication operation on each column of data, and recording the de-duplication data amount of each column of data;

The degree of repetition of each column of data is determined based on the quotient of the amount of deduplicated data per column of data and the total number of data.

3. The method according to claim 1, wherein selecting, as the index column, a column corresponding to the task identifier to be analyzed and having a repetition degree meeting a preset condition based on the repetition degree of each column of data and a preset task identifier to be analyzed, comprises:

sequencing the repeatability of each column of data from small to large;

and selecting the columns with the first target number, which correspond to the task identification to be analyzed and have the minimum repeatability, as index columns.

4. A method according to claim 3, characterized in that the method further comprises:

detecting whether a data table to be analyzed contains a preset index column or not;

if so, calculating the repeatability of the preset index row;

Comparing the repeatability of the preset index column with a preset repeatability threshold, and if the repeatability of the preset index column exceeds the preset repeatability threshold, selecting a column which corresponds to the task identifier to be analyzed and has the repeatability meeting a preset condition as a recommended index column based on the repeatability of each column of data and the task identifier to be analyzed.

5. The method according to claim 4, wherein the method further comprises:

detecting the number of columns of the preset index columns;

and comparing the number of columns of the preset index columns with a preset column number threshold, if the number of columns of the preset index columns exceeds the preset column number threshold, sequencing the repetition degree of each column of the preset index columns from small to large, and selecting the columns with the minimum repetition degree and the second target number as combined index columns.

6. The method according to claim 2, wherein the method further comprises:

comparing the total data number of the data table to be analyzed with a preset data quantity threshold value;

If the total data number of the data table to be analyzed exceeds a preset data quantity threshold, displaying a warning control in an interface of the data table to be analyzed; and the warning control is used for prompting a user that the total number of the data table to be analyzed is excessive.

7. A data querying device, the device comprising:

The data acquisition module is used for acquiring a data table to be analyzed;

8. A computer device, comprising:

A memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the data query method of any of claims 1 to 6.

9. A computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the data query method of any of claims 1 to 6.

10. A computer program product comprising computer instructions for causing a computer to perform the data query method of any one of claims 1 to 6.