CN118132542A - Full-quantity calculation detection method and device - Google Patents
Full-quantity calculation detection method and device Download PDFInfo
- Publication number
- CN118132542A CN118132542A CN202211497396.6A CN202211497396A CN118132542A CN 118132542 A CN118132542 A CN 118132542A CN 202211497396 A CN202211497396 A CN 202211497396A CN 118132542 A CN118132542 A CN 118132542A
- Authority
- CN
- China
- Prior art keywords
- partition
- target
- target table
- big data
- field
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004364 calculation method Methods 0.000 title claims abstract description 78
- 238000001514 detection method Methods 0.000 title claims abstract description 49
- 238000005192 partition Methods 0.000 claims abstract description 173
- 238000012545 processing Methods 0.000 claims abstract description 34
- 238000000034 method Methods 0.000 claims abstract description 33
- 230000015654 memory Effects 0.000 claims description 22
- 238000004590 computer program Methods 0.000 claims description 6
- 230000006870 function Effects 0.000 description 12
- 238000010586 diagram Methods 0.000 description 10
- 230000000694 effects Effects 0.000 description 5
- 230000003993 interaction Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000013461 design Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 230000002411 adverse Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 238000013501 data transformation Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011068 loading method Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application discloses a full-quantity calculation detection method and device, which are used for reducing the cost of processing problems and improving the quality and efficiency of big data operation. The method comprises the following steps: acquiring a big data job request, wherein the big data job request comprises an identifier of a target table and a value of a target field, and the big data job request is used for inquiring data indicated by the value of the target field in the target table in a big data system; determining that the target table is a partition table, determining that the target field does not belong to the partition field of the target table, and generating reminding information which is used for prompting that the big data operation request has full calculation.
Description
Technical Field
The present application relates to the field of computers, and in particular, to a full-scale computing detection method and apparatus.
Background
With the rapid development and popularization of computer and information technology, the scale of enterprise application systems is rapidly increased, data generated by enterprise applications is also explosively increased, and more enterprises begin to perform big data operations under a big data platform. A user may execute custom big data jobs through the big data platform, but big data jobs submitted by the user may present a variety of quality problems. For example, for the full-scale calculation of the partition table, a typical quality problem affecting the calculation performance is that when a user-defined big data job is a query of the partition table, all data of the table may be scanned, the calculation load is larger and larger over time, the job performance is continuously deteriorated, and finally, the user-defined big data job may even fail.
Meanwhile, if quality problems occur in the big data operation process, users mainly use post-processing strategies after the problems occur, and whether the problems affecting the performance exist or not is judged by reversely analyzing the consumed resources and time consumption in the task operation process, but when the problems are found, adverse effects are necessarily generated on the users, and meanwhile, the cost for processing the problems is increased.
Disclosure of Invention
The embodiment of the application provides a full-quantity calculation detection method and device, which are used for determining whether full-quantity calculation exists in a big data job submitted by a user after the user submits the big data job, and finding whether the big data job has a quality problem affecting calculation performance in advance, so that the cost of processing the problem is reduced, and the quality and efficiency of the big data job are improved.
In order to achieve the above purpose, the embodiment of the present application adopts the following technical scheme:
In a first aspect, a full-scale computing detection method is provided, and is applied to a big data system, at least one partition table is built in the big data system, each partition table is configured with one or more partition fields, each partition field is used for indicating partition characteristics, and data in the big data system forms the partition table according to the partition fields. The detection method for the full-quantity calculation comprises the following steps: and acquiring a big data job request, wherein the big data job request comprises an identification of a target table and a value of a target field, and the big data job request is used for inquiring data indicated by the value of the target field in the target table in a big data system. Determining the target table as a partition table; it is determined that the target field does not belong to the partition field of the target table. And generating reminding information which is used for reminding that the full-quantity calculation exists in the big data operation request.
According to the full-quantity calculation detection method provided by the application, when the big data job request submitted by the user is obtained, the target table in the big data job request is determined to be the partition table, the target field does not belong to the partition field of the target table, the reminding information is generated, the full-quantity calculation is prompted to exist in the big data job request, the quality problem affecting the calculation performance in the big data job request is found in advance, the cost of processing the problem is reduced, and the quality and efficiency of the big data job are improved.
In a possible implementation manner, the full-scale calculation detection method provided by the application further comprises the following steps: according to the identification of the target table, obtaining the table metadata of the target table, wherein the table metadata comprises: attributes of the target table, partition fields of the target table. Determining the target table as a partition table includes: acquiring the attribute of the target table according to the identification of the target table; and judging whether the target table is a partition table or not by utilizing the attribute of the target table. Determining that the target field does not belong to the partition field of the target table includes: and acquiring the partition field of the target table according to the identification of the target table, and judging whether the target field belongs to the partition field of the target table.
In one possible implementation, the table metadata of the target table further includes a data amount of the target table. Before generating the reminding information, the detection method of the full-quantity calculation further comprises the following steps: it is determined that the amount of data of the target table is greater than the threshold.
In the possible implementation manner, because the partition table with small data volume may exist in the big data system, when big data operation is performed on the partition table, the computing performance of the computing device is not affected even if the whole volume calculation is directly performed, and whether the target table in the big data operation request is the partition table and whether the target field belongs to the partition field of the target table does not need to be further judged, so that the efficiency of the big data operation is improved.
In a possible implementation manner, after generating the reminding information, the full-quantity calculation detection method provided by the application further comprises the following steps: and generating suggestion information corresponding to the big data job request.
In this possible implementation manner, the computing device may generate the suggestion information of the big data job request after generating the alert information, so that the user may modify the big data job request by using the suggestion information, and efficiency of the big data job is improved.
In one possible implementation, the suggestion information is used to provide the partition field of the tag table.
In this possible implementation manner, the quality and efficiency of the big data job are improved by providing the partition field of the target table for the user, so that the user can modify the target field in the big data job request according to the partition field of the target table.
In a possible implementation manner, the full-scale calculation detection method provided by the application further comprises the following steps: determining the target table as a non-partition table, and executing a big data operation request; or determining that the target field belongs to the partition field of the target table, and executing the big data job request.
In a possible implementation manner, the full-scale calculation detection method provided by the application further comprises the following steps: and determining that the data volume of the target table is smaller than the threshold value, and executing the big data job request.
In a second aspect, the present application provides a full-scale computing detection apparatus, applied to a big data system, in which at least one partition table is established, each partition table is configured with one or more partition fields, each partition field is used to indicate a partition feature, and data in the big data system forms a partition table according to the partition fields. The full-quantity calculation detection device comprises: the device comprises an acquisition module, a processing module and a generation module. Wherein:
The acquisition module is used for acquiring a big data operation request; the big data job request comprises the identification of the target table and the value of the target field; big data job requests are used to query a target table for data indicated by the value of the target field in a big data system.
The processing module is used for determining the target table as a partition table; it is determined that the target field does not belong to the partition field of the target table.
The generation module is used for generating reminding information, wherein the reminding information is used for reminding that the big data operation request has full calculation.
In one possible implementation manner, the acquiring module is specifically configured to: according to the identification of the target table, obtaining the table metadata of the target table, wherein the table metadata comprises: attributes of the target table, partition fields of the target table. The processing module is specifically used for: acquiring the attribute of the target table according to the identification of the target table; judging whether the target table is a partition table or not by utilizing the attribute of the target table; and acquiring the partition field of the target table according to the identification of the target table, and judging whether the target field belongs to the partition field of the target table.
In one possible implementation, the table metadata of the target table further includes a data amount of the target table. The processing module is specifically used for: it is determined that the amount of data of the target table is greater than the threshold.
In a possible implementation manner, the generating module is further configured to: and generating suggestion information corresponding to the big data job request after generating the reminding information.
In one possible implementation, the suggestion information is used to provide the partition field of the target table.
In a possible implementation manner, the full-quantity computing detection device provided by the application further comprises an execution module, wherein the execution module is used for executing a big data operation request if the processing module determines that the target table is a non-partition table; or if the processing module determines that the target field belongs to the partition field of the target table, executing the big data job request.
In one possible implementation manner, the execution module is specifically configured to: and if the processing module determines that the data volume of the target table is smaller than the threshold value, executing the big data job request.
In a third aspect, a computing device is provided, which has a function of implementing the detection method of the full-scale computation described in the first aspect or any one of the possible implementation manners. The functions can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above.
In a fourth aspect, a computer readable storage medium is provided, in which instructions are stored which, when run on a computer, enable the computer to perform the full-scale computational detection method of the first aspect or any one of the possible implementations described above.
In a fifth aspect, a computer program product is provided comprising instructions which, when run on a computer, enable the computer to perform the method of detection of full-scale calculations as described in the first aspect or any one of the possible implementations.
The technical effects of any one of the design manners of the third aspect to the fifth aspect may be referred to technical effects of different possible implementation manners of the first aspect, which are not described herein.
Drawings
FIG. 1 is a schematic diagram of a big data system according to an embodiment of the present application;
FIG. 2 is a device architecture diagram of a computing device provided in an embodiment of the present application;
FIG. 3 is a schematic flow chart of a full-scale calculation detection method according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a syntax tree parsing module according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a metadata collection module according to an embodiment of the present application;
FIG. 6 is a schematic flow chart of a full-scale calculation detection method according to an embodiment of the present application;
FIG. 7 is a schematic flow chart of a full-scale calculation detection method according to an embodiment of the present application;
FIG. 8 is a schematic structural diagram of a full-scale calculation detecting device according to an embodiment of the present application;
FIG. 9 is a schematic diagram of another configuration of a full-scale computing device according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of a computing device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. Wherein, in the description of the present application, "/" means that the related objects are in a "or" relationship, unless otherwise specified, for example, a/B may mean a or B; the "and/or" in the present application is merely an association relationship describing the association object, and indicates that three relationships may exist, for example, a and/or B may indicate: there are three cases, a alone, a and B together, and B alone, wherein a, B may be singular or plural. Also, in the description of the present application, unless otherwise indicated, "a plurality" means two or more than two. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.
In addition, in order to facilitate the clear description of the technical solution of the embodiments of the present application, in the embodiments of the present application, the words "first", "second", etc. are used to distinguish the same item or similar items having substantially the same function and effect. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ. Meanwhile, in the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as examples, illustrations or explanations. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion that may be readily understood.
In addition, the network architecture and the service scenario described in the embodiments of the present application are for more clearly describing the technical solution of the embodiments of the present application, and do not constitute a limitation on the technical solution provided by the embodiments of the present application, and as a person of ordinary skill in the art can know, with evolution of the network architecture and appearance of a new service scenario, the technical solution provided by the embodiments of the present application is also applicable to similar technical problems.
For ease of understanding, the terms designed for the present application will be explained first.
A data warehouse tool (HIVE), which is a data warehouse tool based on a distributed system infrastructure (hadoop) for data extraction, transformation, and loading. HIVE is a mechanism by which large-scale data stored in hadoops can be stored, queried, and analyzed.
Big data system refers to a system for scheduling and managing data warehouses. For example, the big data system may schedule data in the data warehouse based on the obtained big data job request and export the data to complete the big data job. Wherein at least one partition table may be established in the big data system. The big data system can partition the data in the data warehouse according to a certain partition condition (such as a partition field), and the partition condition is added when the data is scheduled, so that the data corresponding to the partition condition can be queried, all the data in the data warehouse are prevented from being queried, and the query efficiency is improved.
The partition table is to partition the data in the big data system according to the partition fields to form a partition table, and one or more partition fields can be configured in each partition table. For example, the partition table may refer to a partition table in the data warehouse HIVE. The tables in HIVE correspond to the specified directories on the hadoop distributed file system (hadoop distributed FILE SYSTEM, HDFS), and when querying data, the full table is scanned by default, thus the time and performance consumption is very large. The partition table actually corresponds to a separate folder on the HDFS file system under which all the data files of the partition are located. The partitions in HIVE can divide a large data set into small data sets according to service requirements by means of directory division.
And (3) full-quantity calculation, namely taking out all data of all the partitions of the partition table and participating in calculation.
And a partition field for describing partition characteristics of the partition table. For example, in an employee statistics database, an "entry time" is a common attribute of all rows in a table, and the "entry time" may be partitioned according to time conditions according to a service, so the "entry time" may be configured as a partition field. When inquiring data, if the partition field is provided, only the partition corresponding to the partition field in the database can be searched, the whole calculation is not needed, the inquiring range is reduced, and the inquiring efficiency is improved.
Big data operation refers to the process of processing big data. Big data operation can be mainly divided into big data calculation and big data transmission, for example, table a and table b exist, the big data calculation can be used for carrying out logic calculation processing on table a to generate table b, and the big data transmission can be used for transmitting table a to table b.
At present, after a user submits a big data job on a big data job platform, the big data job is directly executed by the big data platform, and whether the big data job has a quality problem affecting the computing performance cannot be judged. Based on the method and the device, after a big data job request submitted by a user is acquired, the target table in the big data job request is determined to be a partition table, the target field does not belong to the partition field of the target table, so that the existence of the whole calculation in the big data job request can be determined, and prompting information for prompting the existence of the whole calculation in the big data job request is generated, so that the quality problem affecting the calculation performance of the big data job request is prompted, the cost of processing the problem is reduced, and the quality and the efficiency of the big data job are improved.
The following describes in detail the implementation of the embodiment of the present application with reference to the drawings.
The scheme provided by the application can be applied to the computing device 100 illustrated in fig. 1. The computing device may be a computer or other form of product, without limitation. As shown in fig. 1, computing device 100 includes a client 101, a server 102.
The client 101 is configured to provide an interaction channel to a user, where the interaction channel may be in the form of a global wide area network (Web), an application (App), a sensor, or the like. The user may store input data by the server 102 through an interaction channel provided by the client 101, and may perform query and processing work on the data in the server 102 through the client 101.
For example, a user may input data through an interaction channel provided by the client 101, establish at least one partition table in the server 102, and configure one or more partition fields for each partition table. Meanwhile, the user can send an instruction to the server 102 through the client 101, and the instruction can be a big data job request. The server 102 may receive and store data transmitted from the client 101 using a plurality of databases, and query and process the stored data according to instructions input by a user at the client 101.
For example, the computing device 102 may receive a big data job request sent by a user through the client 101, and parse the big data job request to obtain an identifier of a target table and a value of a target field included in the big data job request, where the big data job request is used to query data indicated by the target field in the target table. After determining that the target table is a partition table and the target field does not belong to the partition field of the target table, the server 102 generates reminding information for reminding that the big data job request has full calculation, and displays the reminding information to the user through the client 101.
The solution provided by the present application may be applied to a big data operating system illustrated in fig. 2, which may be deployed on the server 102 of the computing device 100 illustrated in fig. 1. At least one partition table is established in the big data system, each partition table is configured with one or more partition fields, the data in the big data system form the partition table according to the partition fields, and each partition field is used for indicating partition characteristics. As shown in fig. 2, the big data operating system may include a metadata collection module 201, a syntax tree parsing module 202, and a full-scale calculation detection module 203.
The metadata acquisition module 201 is configured to acquire table metadata of a table stored in the big data system. The table metadata may include: attributes of the table, partition fields of the table.
For example, the attributes of a table may be used to indicate whether the table is a partition table, and the attributes of a table may be parameters configured at build time of the table.
The syntax tree parsing module 202 is configured to obtain a big data job request from the client 101, parse the obtained big data job request, and determine an identifier of a target table and a value of a target field in the big data job request.
The full-volume calculation detection module 203 is configured to generate, after determining that the target table in the big data job request is a partition table and the target field does not belong to the partition field of the target table, a reminder for prompting that the big data job request has full-volume calculation.
It should be noted that, the big data system illustrated in fig. 2 is merely illustrative of the application scenario of the present application, and is not limited to the application scenario of the present application.
The scheme provided by the application can be applied to an impromptu query system to detect HiveQL sentences submitted by a user, and if the target table in the HiveQL sentences is a partition table and the target field does not belong to the partition field of the target table, the scheme can be used for generating reminding information for reminding that full-scale calculation exists in the Hive QL sentences.
In one aspect, the present application discloses a method of detection of full-scale calculations, which may be performed by a computing device, such as computing device 100 illustrated in FIG. 1. As shown in fig. 3, the method may include the steps of:
S301, the computing device acquires a big data job request.
Wherein the big data job request may include an identification of the target table, a value of the target field. The big data job request is for querying, in a big data system, data indicated by a value of a target field in a target table.
Illustratively, as an alternative embodiment of the present application, as shown in fig. 4, fig. 4 is a schematic structural diagram of a syntax tree parsing module, which may obtain the identifier of the target table and the value of the target field in the HiveQL statement by parsing the configuration unit query language (hive structured query language, hiveQL) (or called HiveQL statement) in the big data job request submitted by the client. The identifier of the target table may be a table name of the target table, and the value of the target field may be determined by analyzing a HiveQL statement submitted by the user, a where condition field corresponding to the identifier of the target table in the HiveQL statement, and so on.
Illustratively, the computing device obtains a big data job request, wherein the identification of the target table is: and in the staff information statistical table, the target field is the job entering time, and the value of the target field is 2022.1-2022.6, so that the big data job request is used for inquiring the data information with the job entering time of 2022.1-2022.6 in the staff information statistical table so as to carry out subsequent job processing.
S302, the computing device determines the target table as a partition table.
Specifically, the computing device may first obtain, according to the identifier of the target table, table metadata of the target table. The table metadata of the target table may be stored in the big data system in advance. For example, the table metadata may be stored in the metadata collection module 201 of the big data system illustrated in FIG. 2.
For example, as shown in fig. 5, which is a schematic structural diagram of a metadata collection module, the metadata collection module may collect metadata of all tables in the big data system, where the metadata may include: the identity of the table, the attributes of the table, the partition fields of the table, the data amount of the table.
Illustratively, the identity of the table is used to uniquely indicate the table, which may be a table name.
Further, in S302, the computing device may obtain the attribute of the target table according to the identifier of the target table in the big data job request, and further determine whether the target table is a partition table according to the attribute of the target table. If the attribute of the target table indicates that the target table is a partition table, determining that the target table is a partition table.
For example, the identification of the target table from which the big data job request was obtained by the computing device is: in the table metadata of the employee information statistics table, the attribute of the employee information statistics table indicates that the table is a partition table, and the computing device can determine that the employee information statistics table is the partition table.
S303, the computing device determines that the target field does not belong to the partition field of the target table.
Specifically, the computing device may obtain the partition field of the target table according to the identifier of the target table in the big data job request, and determine whether the target field in the big data job request belongs to the partition field of the target table.
For example, table 1 illustrates an employee statistics table, all of the fields of which include: serial number, name, gender, age, date of job entry.
TABLE 1
Sequence number | Name of name | Sex (sex) | Age (age) | Date of job entry |
1 | Wangdi (Chinese character) | Female | 24 | 2021-05-04 |
2 | Zhang San (Zhang San) | Man's body | 23 | 2022-03-04 |
3 | Liwu four-element bag | Man's body | 20 | 2022-04-06 |
4 | Zhang Wei A | Man's body | 23 | 2022-05-03 |
…… | …… | …… | …… | …… |
The identification of the target table in the big data job request acquired by the computing equipment is as follows: in the staff information statistical table, the target field in the big data job request is 'sex is male', and the partition field of the target table is acquired by the computing equipment from the table metadata of the table as follows: gender, age, time of job entry, the target field may be determined to belong to the partition field of the target table.
S304, the computing device generates reminding information.
The reminding information is used for indicating that the big data job request has full calculation.
The computing device may determine that the target table in the big data job request is a partition table, and the target field does not belong to a partition field of the target table. And the computing device can send the reminding information to a client in communication connection with the computing device so as to remind a user that full-quantity calculation exists in the big data request.
According to the scheme provided by the application, after the big data job request submitted by the user is obtained, the target table in the big data job request is determined to be the partition table, and the target field does not belong to the partition field of the target table, so that the big data job request can be determined to have full calculation, and the reminding information is generated, so that the big data job request is reminded of the quality problem affecting the calculation performance, the cost of processing the problem is reduced, and the quality and efficiency of the big data job are improved.
Further, the computing device may execute the big data job request in several cases:
The computing device determines that the target table is not a partition table in cases a, S302.
In the case b, S302, the computing device determines that the target table is a partition table, S303, the computing device determines that the target field belongs to the partition field of the target table.
Further, since there may be a partition table with a small data size in the big data system, when big data is worked on, even if the whole calculation is directly carried out, the calculation performance of the computing device is not affected, and as shown in fig. 6, the method for detecting the whole calculation provided by the embodiment of the application may further include: s305. For example, S305 may be executed before S302, or the execution sequence thereof may be adjusted according to actual needs, and fig. 6 illustrates only one possible execution sequence, which is not limited in particular.
S305, the computing device determines that the data amount of the target table is greater than the threshold.
Wherein, the table metadata of the table in the big data system can further comprise: the data amount of the table.
Illustratively, in S305, the computing device may obtain, from the table metadata of the target table, the data amount of the target table according to the identification of the target table in the big data job request, and determine whether the data amount of the target table is greater than a threshold.
If the data amount of the target table is greater than the threshold, execution continues with S302 determining whether the target table is a partition table, and S303 determining whether a partition field of the target table exists in the target field. When the data amount of the target table is smaller than the threshold value, it can be considered that even the full-amount calculation does not affect the calculation performance, without further judgment.
It should be noted that, in the case where the data amount of the target table is equal to the threshold value, S302 may be executed to continuously determine whether the target table is a partition table, and S303 may determine whether the partition field of the target table exists in the target field, or no further determination is required, which is not limited in the embodiment of the present application.
The execution of S305 can avoid the quality problem that affects the calculation performance when performing a big data job on a partition table in which the data amount is not large, which may exist in the big data system, and thus can perform the detection of the full calculation on the table in which the data amount is smaller than the threshold value.
The threshold may be a threshold of data volume that may have a quality problem affecting the overall computing performance, and the specific value of the threshold may be determined according to actual requirements, which is not limited in the embodiment of the present application.
Further, as shown in fig. 6, after generating the reminding information, the method for detecting full-scale calculation provided in the embodiment of the present application may further include S306.
S306, the computing device generates suggestion information corresponding to the big data job request.
Wherein the suggestion information may be used to provide a partition field of the target table.
For example, the partition field of the target table may be included in the suggestion information to be output to prompt the user to modify the big data job request according to the provided partition field as the target field.
For example, when the sphere condition field in the HiveQL statement in the big data job request submitted by the user does not belong to the partition field of the target table, the partition field of the target table may be called and provided to the user. When a user submits a job with a full-quantity calculation problem, the partition field of the target table is provided to prompt the user to modify a big data job request according to the provided partition field serving as the target field, so that the full-quantity calculation is convenient for the user to optimize, and the efficiency is improved.
Further, when S305 is performed in the detection method of the full-quantity calculation, the computing device may perform the large data job request in the following cases:
In case 1, S305, the computing device determines that the data amount of the target table is smaller than the threshold value.
In case 2, S305, the computing device determines that the data amount of the target table is greater than the threshold, S302, the computing device determines that the target table is not a partition table.
In the case 3, S305, the computing device determines that the data amount of the target table is greater than the threshold, S302, the computing device determines that the target table is a partition table, S303, the computing device determines that the target field belongs to the partition field of the target table.
Fig. 7 is a flowchart illustrating a judgment logic after a computing device obtains a big data job request in the full-scale computing detection method provided by the present application. As shown in fig. 7, the flow of the determination logic may include:
s701, the computing device determines whether the data amount of the target table is greater than a threshold.
Specifically, when the data amount of the target table is greater than the threshold, S702 is executed to continuously determine whether the partition field of the target table exists in the target field. When the data amount of the target table is smaller than the threshold value, S705 is performed to perform the full-amount large data job request.
S702, the computing device determines whether the target table is a partition table.
Specifically, when it is determined that the target table is a partition table, S703 is continued to determine whether or not the target field belongs to the partition field of the target table, and when it is determined that the target table is not a partition table, S705 is performed to execute the full-size large data job request.
S703, the computing device determines whether the target field belongs to the partition field of the target table.
Specifically, when the target field does not belong to the partition field of the target table, S704 is executed to generate a reminder for prompting that there is a full computation in the big data job request, and when the target field belongs to the partition field of the target table, S705 is executed to execute the big data job request.
It should be noted that the execution sequence of S701 to S703 may be adjusted according to actual requirements, and fig. 7 illustrates only one possible execution sequence, and is not limited in particular.
The process of executing the big data job request in the embodiment of the present application is not described in detail.
The above description has been presented with respect to the solution provided by the embodiment of the present application mainly from the point of view of the working principle of the device. It is to be appreciated that the computing device, in order to implement the functionality described above, includes corresponding hardware structures and/or software modules that perform the various functions. Those of skill in the art will readily appreciate that the various illustrative algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The embodiment of the application can divide the functional modules of the computing device according to the method example, for example, each functional module can be divided corresponding to each function, or two or more functions can be integrated in one processing module. The integrated modules may be implemented in hardware or in software functional modules. It should be noted that, in the embodiment of the present application, the division of the modules is schematic, which is merely a logic function division, and other division manners may be implemented in actual implementation.
In the case of dividing the respective functional modules with the respective functions, fig. 8 shows a schematic diagram of one possible composition of the full-quantity-calculation detecting apparatus referred to above and in the embodiment, as shown in fig. 8, the full-quantity-calculation detecting apparatus 80 may include: an acquisition module 81, a processing module 82, and a generation module 83.
Wherein, the obtaining module 81 is configured to enable the detecting device 80 for full-scale calculation to execute S301 in the detecting method for full-scale calculation shown in fig. 3 or fig. 6.
The processing module 82 is configured to enable the detection apparatus 80 for full-scale calculation to execute S302 and S303 in the detection method for full-scale calculation shown in fig. 3, or S302, S303, S305 in the detection method for full-scale calculation shown in fig. 6.
The generation module 83 is configured to support S304 in the detection method in which the detection device 80 for full-scale calculation performs the full-scale calculation shown in fig. 3, or S304 and S306 in the detection method in which the detection device 80 for full-scale calculation performs the full-scale calculation shown in fig. 6.
In an embodiment of the present application, further, as shown in fig. 9, the detecting device 80 for full calculation may further include: the module 84 is executed.
An execution module 84 for executing S705 in the detection method of the full-quantity calculation shown in fig. 7 by the detection device 80 supporting the full-quantity calculation.
It should be noted that, all relevant contents of each step related to the above method embodiment may be cited to the functional description of the corresponding functional module, which is not described herein.
The detection device 80 for full-scale calculation provided in the embodiment of the present application is used for executing the detection method for full-scale calculation, so that the same effects as those of the detection method for full-scale calculation can be achieved.
Embodiments of the present application also provide a computing device 100, where the computing device 100 may include a memory 1001, a processor 1002, and a transceiver 1003, where the memory 1001 and the processor 1002 may be connected by a bus or network or other means, as shown in fig. 10, where the connection is exemplified by a bus.
The processor 1002 may be a central processing unit (central processing unit, CPU). The processor 1002 may also be other general purpose processors, digital Signal Processors (DSP), application SPECIFIC INTEGRATED Circuits (ASIC), field-programmable gate arrays (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or a combination thereof. Such as a syntax tree parsing module, a full-scale computation detection module, etc., in embodiments of the application.
The memory 1001 may be a volatile memory (RAM) such as a random-access memory (RAM); or a non-volatile memory (non-volatile memory), such as a read-only memory (ROM), a flash memory (flash memory), a hard disk (HARD DISK DRIVE, HDD) or a solid state disk (SSD-STATE DRIVE, SSD); or a combination of the above-mentioned types of memories for storing application code, configuration files, data information, or other content in which the methods of the application may be implemented.
The memory 1001 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as metadata collection modules and the like in the embodiments of the present application. The processor 1002 executes various functional applications of the processor and data processing by running non-transitory software programs, instructions, and modules stored in the memory 1001.
The memory 1001 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created by the processor 1002, etc. In addition, memory 1001 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 1001 optionally includes memory remotely located with respect to processor 1002, which may be connected to processor 1002 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transceiver 1003 is used for information interaction of the computing device 100 with other devices.
The one or more modules are stored in the memory 1001 and when executed by the processor 1002 perform the detection method of the full-scale computation in the embodiment shown in fig. 3 or 6 or 7.
Embodiments of the present application also provide a computer readable storage medium having instructions stored thereon that, when executed, perform the detection method and related steps of the full-scale computation in the method embodiments described above.
The present application also provides a computer program product which, when run on a computer, causes the computer to perform the relevant steps of the full-scale method of detection in the method embodiments described above.
It will be apparent to those skilled in the art from this description that, for convenience and brevity of description, only the above-described division of the functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to perform all or part of the functions described above.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another apparatus, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and the parts displayed as units may be one physical unit or a plurality of physical units, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application, or a contributing part or all or part of the technical solution, may be embodied in the form of a software product, where the software product is stored in a storage medium, and includes several instructions for causing a device (may be a single-chip microcomputer, a chip or the like) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely illustrative of specific embodiments of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (17)
1. The detection method of the full-scale calculation is characterized by being applied to a big data system, wherein at least one partition table is established in the big data system, each partition table is provided with one or more partition fields, each partition field is used for indicating partition characteristics, and data in the big data system form the partition table according to the partition fields; the method comprises the following steps:
Acquiring a big data job request, wherein the big data job request comprises an identifier of a target table and a value of a target field, and the big data job request is used for inquiring data indicated by the value of the target field in the target table in the big data system;
Determining the target table as a partition table;
Determining that the target field does not belong to a partition field of the target table;
And generating reminding information, wherein the reminding information is used for reminding that the big data operation request has full calculation.
2. The method according to claim 1, wherein the method further comprises:
according to the identification of the target table, obtaining the table metadata of the target table, wherein the table metadata comprises: the attribute of the target table and the partition field of the target table;
The determining that the target table is a partition table includes:
Acquiring the attribute of the target table according to the identification of the target table;
judging whether the target table is a partition table or not by utilizing the attribute of the target table;
the determining that the target field does not belong to the partition field of the target table includes:
and acquiring the partition field of the target table according to the identification of the target table, and judging whether the target field belongs to the partition field of the target table.
3. The method of claim 1 or 2, wherein the table metadata further comprises a data amount of the target table; before the reminding information is generated, the method further comprises the following steps:
and determining that the data amount of the target table is greater than a threshold.
4. A method according to any one of claims 1-3, wherein after generating the alert information, the method further comprises:
and generating suggestion information corresponding to the big data job request.
5. The method of claim 4, wherein the suggestion information is used to provide a partition field of the target table.
6. The method according to claim 1 or 2, characterized in that the method further comprises:
determining the target table as a non-partition table, and executing the big data job request;
Or alternatively
And determining that the target field belongs to the partition field of the target table, and executing the big data job request.
7. A method according to claim 3, characterized in that the method further comprises:
And determining that the data volume of the target table is smaller than the threshold value, and executing the big data job request.
8. The full-quantity computing detection device is characterized by being applied to a big data system, wherein at least one partition table is established in the big data system, each partition table is provided with one or more partition fields, each partition field is used for indicating partition characteristics, and data in the big data system form the partition table according to the partition fields; the device comprises:
The system comprises an acquisition module, a data processing module and a data processing module, wherein the acquisition module is used for acquiring a big data job request, the big data job request comprises an identification of a target table and a value of a target field, and the big data job request is used for inquiring data indicated by the value of the target field in the target table in the big data system;
The processing module is used for determining that the target table is a partition table; determining that the target field does not belong to a partition field of the target table;
the generating module is used for generating reminding information after the processing module determines that the target table is a partition table and the target field does not belong to the partition field of the target table, wherein the reminding information is used for reminding that the big data operation request has full calculation.
9. The apparatus of claim 8, wherein the device comprises a plurality of sensors,
The acquisition module is specifically configured to: according to the identification of the target table, obtaining the table metadata of the target table, wherein the table metadata comprises: the attribute of the target table and the partition field of the target table;
The processing module is specifically configured to: acquiring the attribute of the target table according to the identification of the target table; judging whether the target table is a partition table or not by utilizing the attribute of the target table; and acquiring the partition field of the target table according to the identification of the target table, and judging whether the target field belongs to the partition field of the target table.
10. The apparatus according to claim 8 or 9, wherein the table metadata further comprises a data amount of the target table;
The processing module is specifically configured to: determining that the data amount of the target table is greater than a threshold; the generating module is specifically configured to generate the reminder after the processing module determines that the target table is a partition table, the target field does not belong to the partition field of the target table, and it is determined that the data size of the target table is greater than a threshold.
11. The device according to any one of claims 8-10, wherein,
The generating module is further configured to: and generating suggestion information corresponding to the big data job request after generating the reminding information.
12. The apparatus of claim 11, wherein the suggestion information is to provide a partition field of the target table.
13. The apparatus according to claim 8 or 9, characterized in that the apparatus further comprises:
The execution module is used for executing the big data operation request if the processing module determines that the target table is a non-partition table; or if the processing module determines that the target field belongs to the partition field of the target table, executing the big data job request.
14. The apparatus of claim 10, wherein the device comprises a plurality of sensors,
The execution module is specifically configured to: and if the processing module determines that the data volume of the target table is smaller than the threshold value, executing the big data job request.
15. A computing device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the full-scale computational detection method of any one of claims 1-7.
16. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements a full-scale computational detection method according to any one of claims 1-7.
17. A computer program product, characterized in that the computer program product, when run on a computer, causes the computer to perform the full-scale computational detection method according to any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211497396.6A CN118132542A (en) | 2022-11-25 | 2022-11-25 | Full-quantity calculation detection method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211497396.6A CN118132542A (en) | 2022-11-25 | 2022-11-25 | Full-quantity calculation detection method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118132542A true CN118132542A (en) | 2024-06-04 |
Family
ID=91238189
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211497396.6A Pending CN118132542A (en) | 2022-11-25 | 2022-11-25 | Full-quantity calculation detection method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118132542A (en) |
-
2022
- 2022-11-25 CN CN202211497396.6A patent/CN118132542A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ali et al. | A survey of RDF stores & SPARQL engines for querying knowledge graphs | |
US10831562B2 (en) | Method and system for operating a data center by reducing an amount of data to be processed | |
EP2695087B1 (en) | Processing data in a mapreduce framework | |
Konrath et al. | Schemex—efficient construction of a data catalogue by stream-based indexing of linked data | |
US8719271B2 (en) | Accelerating data profiling process | |
CN112136123B (en) | Characterizing files for similarity searching | |
US11556590B2 (en) | Search systems and methods utilizing search based user clustering | |
CN104503891A (en) | Method and device for online monitoring JVM (Java Virtual Machine) thread | |
US10185743B2 (en) | Method and system for optimizing reduce-side join operation in a map-reduce framework | |
CN112527848A (en) | Multi-data-source-based report data query method, device, system and storage medium | |
EP3413214A1 (en) | Selectivity estimation for database query planning | |
US11132363B2 (en) | Distributed computing framework and distributed computing method | |
US20170083828A1 (en) | Query hint learning in a database management system | |
US9026539B2 (en) | Ranking supervised hashing | |
US11200201B2 (en) | Metadata storage method, device and server | |
US10824803B2 (en) | System and method for logical identification of differences between spreadsheets | |
US20150106899A1 (en) | System and method for cross-cloud identity matching | |
CN112445833A (en) | Data paging query method, device and system for distributed database | |
CN111159213A (en) | Data query method, device, system and storage medium | |
Jalili et al. | Indexing next-generation sequencing data | |
CN118132542A (en) | Full-quantity calculation detection method and device | |
US10726013B2 (en) | Information processing device, information processing method, and recording medium | |
CN111090672B (en) | Data optimization method and device | |
CN114297260A (en) | Distributed RDF data query method and device and computer equipment | |
CN109344166B (en) | Database monitoring method, computer readable storage medium and terminal device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication |