CN117785923A

CN117785923A - Method and system for improving query performance of high-line number scene of small file

Info

Publication number: CN117785923A
Application number: CN202311783407.1A
Authority: CN
Inventors: 甘雨; 李扬; 韩卿
Original assignee: Shanghai Kyligence Information Technology Co ltd
Current assignee: Shanghai Kyligence Information Technology Co ltd
Priority date: 2023-12-22
Filing date: 2023-12-22
Publication date: 2024-03-29

Abstract

The embodiment of the invention discloses a method and a system for improving the inquiry performance of a small file high-line number scene, wherein the method for improving the inquiry performance of the small file high-line number scene comprises the following steps: acquiring a high-line-number small file, and constructing a task based on the high-line-number small file; judging whether an idle processor exists in a scheduling system, if so, splitting the tasks based on the number of the idle processors to obtain a corresponding number of subtasks; and issuing the subtasks to the corresponding idle processors. The method for improving the query performance of the small-file high-line number scene solves the problem that the query performance of the small-file high-line number scene is poor in the prior art.

Description

Method and system for improving query performance of high-line number scene of small file

Technical Field

The invention relates to the technical field of computers, in particular to a method, a system, electronic equipment and a storage medium for improving the inquiry performance of a high-line number scene of a small file.

Background

OLAP is a computer processing method for processing and analyzing multidimensional data. Allowing users to view and analyze data from different dimensions and levels to support deeper business insights. OLAP systems typically use multidimensional databases, providing intuitive interfaces and flexible query functions. MPP is a computer architecture or method of processing data that aims to speed up processing speed by using multiple processing units or nodes simultaneously. In an MPP system, tasks are broken down and distributed to multiple processors or nodes, each capable of independently executing its assigned tasks.

In the data warehouse and business intelligence arts, OLAP borrows MPP architecture or a distributed computing framework (such as Spark) to support complex multidimensional data analysis, while SQL is a common language for performing query and analysis operations in these systems. To achieve high performance data querying and analysis, the data is typically partitioned into multiple portions that are processed in parallel by separate processing units (typically nodes or servers). For computers, parallel refers to a plurality of tasks that can be executed simultaneously, and a scheduling system implemented by a program is responsible for task division and scheduling execution. The number of tasks determines the parallel processing capability, and the more the number of tasks is, the faster the task is usually executed, and the less the number of tasks is, the slower the task is usually executed.

However, the system resources are limited, too many tasks increase the scheduling cost, and slow down the execution of the query. Therefore, in general, the number of tasks is determined by the read file size and the number of files, and one task processes a certain amount of content data. In addition, since the computational power of the computer processor is limited, the more the number of data lines processed by one task, the slower the computation. The inquiry performance of the small file high-line number scene is poor, because the small file task number is small, but the line number is multiple, the calculation is slow, and although the system still has rich calculation resources, the system cannot be scheduled and utilized.

Therefore, a method for improving the performance of the small file high-line number scene query is needed.

Disclosure of Invention

The embodiment of the invention aims to provide a method, a system, electronic equipment and a storage medium for improving the query performance of a small-file high-line number scene, which are used for solving the problem of poor query performance of the small-file high-line number scene in the prior art.

In order to achieve the above objective, an embodiment of the present invention provides a method for improving the performance of a small file high-line number scene query, where the method specifically includes:

acquiring a high-line-number small file, and constructing a task based on the high-line-number small file;

judging whether an idle processor exists in a scheduling system, if so, splitting the tasks based on the number of the idle processors to obtain a corresponding number of subtasks;

and issuing the subtasks to the corresponding idle processors.

Based on the technical scheme, the invention can also be improved as follows:

further, the obtaining of the high-line-number small file, and the constructing of the task based on the high-line-number small file comprise;

and reading the file size and the number of the high-line small files, and constructing tasks based on the file size and the number of the high-line small files.

Further, judging whether an idle processor exists in the scheduling system, if so, splitting the tasks based on the number of the idle processors to obtain a corresponding number of subtasks, wherein the subtasks comprise;

and when the idle processor does not exist in the dispatching system, distributing the task to one processor for data operation.

Further, the issuing of the subtasks to the corresponding idle processors includes;

judging whether the subtasks are completely issued to the corresponding idle processors, and when the subtasks are completely issued to the corresponding idle processors, scheduling the tasks to be completed;

and generating prompt information when the subtasks are not completely issued to the corresponding idle processors, wherein the prompt information comprises the undelivered subtasks.

A system for improving the inquiry performance of a high-line number scene of a small file comprises:

the acquisition module is used for acquiring the high-line number small file;

the task construction module is used for constructing tasks based on the high-line-number small files;

the subtask splitting module is used for judging whether idle processors exist in the scheduling system, if yes, splitting the tasks based on the number of the idle processors to obtain a corresponding number of subtasks;

and the subtask issuing module is used for issuing the subtasks to the corresponding idle processors.

Further, the task construction module is further configured to read a file size and a file number of the high-line-count small file, and construct a task based on the file size and the file number of the high-line-count small file.

Further, the system for improving the inquiry performance of the high-line number scene of the small file further comprises a task issuing module;

and the task issuing module is used for distributing the task to one processor for data operation when the idle processor does not exist in the scheduling system.

Further, the system for improving the inquiry performance of the high-line number scene of the small file further comprises a judging module;

the judging module is used for judging whether the subtasks are completely issued to the corresponding idle processors, and when the subtasks are completely issued to the corresponding idle processors, the scheduling task is completed;

An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method when the computer program is executed.

A non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method.

The embodiment of the invention has the following advantages:

according to the method for improving the high-line number scene query performance of the small file, the high-line number small file is obtained, and a task is constructed based on the high-line number small file; judging whether an idle processor exists in a scheduling system, if so, splitting the tasks based on the number of the idle processors to obtain a corresponding number of subtasks; issuing the subtasks to corresponding idle processors; the method solves the problem of poor inquiry performance of a small file high-line number scene in the prior art.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those skilled in the art from this disclosure that the drawings described below are merely exemplary and that other embodiments may be derived from the drawings provided without undue effort.

The structures, proportions, sizes, etc. shown in the present specification are shown only for the purposes of illustration and description, and are not intended to limit the scope of the invention, which is defined by the claims, so that any structural modifications, changes in proportions, or adjustments of sizes, which do not affect the efficacy or the achievement of the present invention, should fall within the scope of the invention.

FIG. 1 is a flow chart of a method for improving the inquiry performance of a high-line number scene of a small file according to the invention;

FIG. 2 is a first architecture diagram of a system for enhancing the performance of a small file high-line number scene query in accordance with the present invention;

FIG. 3 is a second architecture diagram of the system for enhancing the performance of a small file high line number scene query of the present invention;

FIG. 4 is a third architecture diagram of a system for enhancing the performance of a small file high line number scene query in accordance with the present invention;

fig. 5 is a schematic diagram of an entity structure of an electronic device according to the present invention.

Wherein the reference numerals are as follows:

the system comprises an acquisition module 10, a task construction module 20, a subtask splitting module 30, a subtask issuing module 40, a task issuing module 50, a judging module 60, an electronic device 70, a processor 701, a memory 702 and a bus 703.

Detailed Description

Other advantages and advantages of the present invention will become apparent to those skilled in the art from the following detailed description, which, by way of illustration, is to be read in connection with certain specific embodiments, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Examples

Fig. 1 is a flowchart of an embodiment of a method for improving the performance of a small file high line number scene query according to the present invention, as shown in fig. 1, and the method for improving the performance of a small file high line number scene query according to the embodiment of the present invention includes the following steps:

s101, acquiring a high-line number small file, and constructing a task based on the high-line number small file;

specifically, the file size and the number of the high-line-number small files are read, and tasks are constructed based on the file size and the number of the high-line-number small files.

S102, judging whether idle processors exist in a scheduling system, if yes, splitting tasks based on the number of the idle processors to obtain a corresponding number of subtasks;

specifically, when no idle processor exists in the scheduling system, the task is distributed to one processor to perform data operation.

S103, issuing the subtasks to the corresponding idle processors;

specifically, whether the subtasks are completely issued to the corresponding idle processors is judged, and when the subtasks are completely issued to the corresponding idle processors, the scheduling task is completed;

As shown in fig. 4, one embodiment of a method for improving the performance of a small file high-line number scene query is as follows;

the high-line number small files 1, 2 and 3 are processed by the task 1, the processor 5 is responsible for the calculation of the task 1, the other 9 processors are idle, after the scheduling system discriminates the situation, the task 1 is split into the task 1-1 and the task 1-2 … … task 1-10 again, each split task only processes smaller-scale data volume, and the processor 1 and the processor 2 … … processor 10 respectively perform data operation, so that the parallelism of data processing is improved, the system resources are fully utilized, and the performance is improved.

For a common file, task splitting can not be done deliberately, and the task splitting blindly only increases the scheduling cost of the system and has no benefit;

when the system is split, the resource use condition of the dispatching system (namely whether an idle processor exists in the dispatching system) is judged, and the idle processor is utilized and put into task calculation.

For example, if the processors are all busy, task 1 will not be split, task 1 continues to complete data computation by processor 5;

if the processors 9, 10 are busy, only tasks 1-1, 1-2, … …, 1-8 are split, each of which processes only smaller-scale data content, and the processor 1, 2, … …, 8 performs data operations;

if there are more idle processors 11-100, then only tasks 1-1, 1-2, … …, 1-10 are split, each of which processes only smaller scale data content and is operated on by processor 1, processor 2, … …, processor 10, respectively.

The method for improving the query performance of the high-line number scene of the small file acquires the high-line number small file, and builds a task based on the high-line number small file; judging whether an idle processor exists in a scheduling system, if so, splitting the tasks based on the number of the idle processors to obtain a corresponding number of subtasks; and issuing the subtasks to the corresponding idle processors. The method solves the problem of poor inquiry performance of a small file high-line number scene in the prior art.

The method for improving the inquiry performance of the small file high line number scene can automatically identify the small file high line number scene through a scheduling system realized by a program, continuously split the read file data during operation, decompose the original limited number of tasks into more subtasks, further improve the parallelism of data processing, fully utilize the idle resources of the scheduling system, and further improve the performance.

FIGS. 2-3 are architecture diagrams of a system embodiment of the present invention for improving the performance of a small file high-line number scene query; as shown in fig. 2-3, a system for improving the performance of a small file high-line number scene query provided by an embodiment of the present invention includes the following steps:

an acquisition module 10, configured to acquire a high-line number small file;

a task construction module 20, configured to construct a task based on the high-line-count small file;

the subtask splitting module 30 is configured to determine whether an idle processor exists in the scheduling system, if yes, split the task based on the number of the idle processors, and obtain a corresponding number of subtasks;

and a subtask issuing module 40, configured to issue the subtask to a corresponding idle processor.

The task construction module 20 is further configured to read a file size and a file number of the high-line small file, and construct a task based on the file size and the file number of the high-line small file.

The system for improving the inquiry performance of the high-line number scene of the small file further comprises a task issuing module 50;

the task issuing module 50 is configured to assign a task to a processor for performing a data operation when no idle processor exists in the scheduling system.

The system for improving the inquiry performance of the high-line number scene of the small file further comprises a judging module 60;

the judging module 60 is configured to judge whether the subtasks are all issued to the corresponding idle processors, and schedule the tasks to be completed when the subtasks are all issued to the corresponding idle processors;

The system for improving the inquiry performance of the high-line number scene of the small file acquires the high-line number small file through the acquisition module 10; constructing a task based on the high-line-number small file through a task construction module 20; judging whether idle processors exist in a scheduling system or not through a subtask splitting module 30, if yes, splitting the tasks based on the number of the idle processors to obtain a corresponding number of subtasks; issuing the subtasks to corresponding idle processors through a subtask issuing module 40; the method solves the problem of poor inquiry performance of a small file high-line number scene in the prior art.

Fig. 5 is a schematic diagram of an entity structure of an electronic device according to an embodiment of the present invention, as shown in fig. 5, an electronic device 70 includes: a processor 701, a memory 702, and a bus 703;

wherein, the processor 701 and the memory 702 complete communication with each other through the bus 703;

the processor 701 is configured to invoke program instructions in the memory 702 to perform the methods provided by the above-described method embodiments, for example, including: acquiring a high-line-number small file, and constructing a task based on the high-line-number small file; judging whether an idle processor exists in a scheduling system, if so, splitting the tasks based on the number of the idle processors to obtain a corresponding number of subtasks; and issuing the subtasks to the corresponding idle processors.

The present embodiment provides a non-transitory computer readable storage medium storing computer instructions that cause a computer to perform the methods provided by the above-described method embodiments, for example, including: acquiring a high-line-number small file, and constructing a task based on the high-line-number small file; judging whether an idle processor exists in a scheduling system, if so, splitting the tasks based on the number of the idle processors to obtain a corresponding number of subtasks; and issuing the subtasks to the corresponding idle processors.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: various storage media such as ROM, RAM, magnetic or optical disks may store program code.

The apparatus embodiments described above are merely illustrative, wherein elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on such understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the embodiments or the methods of some parts of the embodiments.

While the invention has been described in detail in the foregoing general description and specific examples, it will be apparent to those skilled in the art that modifications and improvements can be made thereto. Accordingly, such modifications or improvements may be made without departing from the spirit of the invention and are intended to be within the scope of the invention as claimed.

Claims

1. The method for improving the inquiry performance of the high-line number scene of the small file is characterized by comprising the following steps:

and issuing the subtasks to the corresponding idle processors.

2. The method for improving the performance of the high-line number scene query of the small file according to claim 1, wherein the obtaining the high-line number small file and constructing a task based on the high-line number small file comprises the following steps of;

3. The method for improving the performance of the small file high line number scene query according to claim 1, wherein the determining whether an idle processor exists in the scheduling system is characterized in that if yes, splitting the task based on the number of the idle processors to obtain a corresponding number of subtasks, including;

4. The method for improving the performance of the small file high line number scene query according to claim 1, wherein the issuing the subtask to the corresponding idle processor comprises the following steps of;

5. The system for improving the inquiry performance of the high-line number scene of the small file is characterized by comprising the following steps:

the acquisition module is used for acquiring the high-line number small file;

6. The system for improving the performance of high-line-count scene query of small files according to claim 5, wherein said task construction module is further configured to read the file size and the number of files of said high-line-count small files, and construct a task based on the file size and the number of files of the high-line-count small files.

7. The system for improving the performance of the small file high line number scene query according to claim 5, wherein the system for improving the performance of the small file high line number scene query further comprises a task issuing module;

8. The system for improving the performance of the small file high line number scene query according to claim 5, wherein the system for improving the performance of the small file high line number scene query further comprises a judging module;

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 4 when the computer program is executed.

10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any of claims 1 to 4.