CN113760856A - Database management method and device, computer readable storage medium and electronic device - Google Patents

Database management method and device, computer readable storage medium and electronic device Download PDF

Info

Publication number
CN113760856A
CN113760856A CN202010504496.1A CN202010504496A CN113760856A CN 113760856 A CN113760856 A CN 113760856A CN 202010504496 A CN202010504496 A CN 202010504496A CN 113760856 A CN113760856 A CN 113760856A
Authority
CN
China
Prior art keywords
database
directory
capacity
root
target database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010504496.1A
Other languages
Chinese (zh)
Inventor
刘士超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JD Digital Technology Holdings Co Ltd
Original Assignee
JD Digital Technology Holdings Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JD Digital Technology Holdings Co Ltd filed Critical JD Digital Technology Holdings Co Ltd
Priority to CN202010504496.1A priority Critical patent/CN113760856A/en
Publication of CN113760856A publication Critical patent/CN113760856A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2379Updates performed during online database operations; commit processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention relates to a database management method and a device, a computer readable storage medium and electronic equipment, which relate to the technical field of computers, and the database management method comprises the following steps: acquiring a library directory of a target database and a root directory corresponding to the library directory; calculating the number of each root directory and the capacity of each root directory, and calculating the number of subdirectories corresponding to each root directory and the capacity of each subdirectory; generating a database table of a target database according to the library directory, the root directory, the number of each root directory, the capacity of each root directory, the number of each subdirectory and the capacity of each subdirectory; and storing the database table of the target database into the relational database so that a user can conveniently check the database table of the target database in a regular expression matching mode. The embodiment of the invention solves the problem that the using capacity of the database table is not counted by the current mainstream open-source big data management platform in the prior art.

Description

Database management method and device, computer readable storage medium and electronic device
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a database management method, a database management device, a computer readable storage medium and electronic equipment.
Background
In order to reasonably utilize machine resources, large data clusters are created according to budgets, many data clusters are public clusters, the resource use is sufficient, if the use capacity of some database tables is increased suddenly, the overall use rate is increased, even the writing of other teams is influenced, the task execution fails, and accidents are caused.
In order to solve the above problem, it is necessary to count the used capacity of each database table in time, and to process the cluster in time when the used capacity is abnormal.
However, the current mainstream open source big data management platform has no function of counting the use capacity of the database table.
Therefore, it is desirable to provide a new database management method and apparatus.
It is to be noted that the information invented in the above background section is only for enhancing the understanding of the background of the present invention, and therefore, may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The present invention is directed to a database management method, a database management apparatus, a computer-readable storage medium, and an electronic device, which overcome, at least to some extent, the problem that the used capacity of a database table cannot be counted due to the limitations and disadvantages of the related art.
According to an aspect of the present disclosure, there is provided a database management method including:
acquiring a library directory of a target database and a root directory corresponding to the library directory;
calculating the number of the root directories and the capacity of the root directories, and calculating the number of subdirectories corresponding to the root directories and the capacity of the subdirectories;
generating a database table of the target database according to the library directory, the root directory, the number of each root directory, the capacity of each root directory, the number of each subdirectory and the capacity of each subdirectory;
and storing the database table of the target database into a relational database so that a user can conveniently check the database table of the target database in a regular expression matching mode.
In an exemplary embodiment of the present disclosure, the database management method further includes:
calculating the total capacity of the target database according to the database table of the target database;
calculating the storage ratio of the target database in the distributed system according to the total capacity of the target database, and judging whether the storage ratio is greater than a first preset threshold value;
and when the storage proportion is determined to be larger than a first preset threshold value, positioning a root directory and/or a subdirectory generating abnormal data according to a database table of the target database.
In an exemplary embodiment of the present disclosure, the database management method further includes:
and acquiring corresponding table data under the root directory and/or the subdirectories which generate the abnormal data, and analyzing the reasons for generating the abnormal data according to the table data.
In an exemplary embodiment of the present disclosure, the database management method further includes:
judging whether the storage ratio is larger than a second preset threshold value or not;
when the storage proportion is determined to be larger than a second preset threshold value, generating alarm information corresponding to the target database according to a database table of the target database;
and storing the alarm information into a target database so as to facilitate the capacity expansion of the distributed system by a user according to the alarm information.
In an exemplary embodiment of the present disclosure, storing the database table of the target database into a relational database includes:
generating a data storage request according to a database table of the target database and a token of the target database;
and sending the data storage request to the relational database, so that the relational database stores the database table when the token is confirmed to pass the verification.
In an exemplary embodiment of the present disclosure, obtaining a library directory of a target database and a root directory corresponding to the library directory includes:
and regularly acquiring the library catalogue of the target database and the root catalogue corresponding to the library catalogue through the statistical script at preset time intervals.
In an exemplary embodiment of the present disclosure, the target database is a Hive database and/or an Hbase database.
According to an aspect of the present disclosure, there is provided a database management apparatus including:
the system comprises a catalog acquisition module, a root catalog acquisition module and a database management module, wherein the catalog acquisition module is used for acquiring a library catalog of a target database and a root catalog corresponding to the library catalog;
the first calculation module is used for calculating the number of the root directories and the capacity of the root directories, and calculating the number of subdirectories corresponding to the root directories and the capacity of the subdirectories;
a database table generating module, configured to generate a database table of the target database according to the library directory, the root directory, the number of each root directory, the capacity of each root directory, the number of each sub-directory, and the capacity of each sub-directory;
and the database table storage module is used for storing the database table of the target database into a relational database so as to be convenient for a user to check the database table of the target database in a regular expression matching mode.
According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a database management method as recited in any one of the above.
According to an aspect of the present disclosure, there is provided an electronic device including:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform any of the database management methods described above via execution of the executable instructions.
On one hand, the database management method provided by the embodiment of the invention obtains the library catalog of the target database and the root catalog corresponding to the library catalog; calculating the number of each root directory and the capacity of each root directory, and calculating the number of subdirectories corresponding to each root directory and the capacity of each subdirectory; then generating a database table of the target database according to the library directory, the root directory, the number of each root directory, the capacity of each root directory, the number of each subdirectory and the capacity of each subdirectory; finally, the database table of the target database is stored in the relational database, so that a user can conveniently check the database table of the target database, and the problem that the use capacity of a database table is not counted by a current mainstream open-source big data management platform in the prior art is solved; on the other hand, the database table of the target database is stored in the relational database, so that a user can check the database table of the target database in a regular expression matching mode, the checking speed is improved, and the user experience is further improved; on the other hand, the database table comprises the library directories, the root directories, the number of the root directories, the capacity of the root directories, the number of the subdirectories and the capacity of the subdirectories of the target database, so that a user can visually see the capacity use condition of each directory, and the directory with overlarge or undersize capacity can be conveniently positioned.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
Fig. 1 schematically shows a flow chart of a database management method according to an exemplary embodiment of the present invention.
Fig. 2 schematically shows a flow chart of another database management method according to an exemplary embodiment of the present invention.
Fig. 3 schematically shows a flow chart of another database management method according to an exemplary embodiment of the present invention.
FIG. 4 schematically illustrates a flow diagram of another database management method according to an example embodiment of the invention.
FIG. 5 schematically illustrates a library number trend graph according to an exemplary embodiment of the present invention.
FIG. 6 schematically illustrates a table quantity trend graph according to an exemplary embodiment of the present invention.
FIG. 7 schematically illustrates a chart of table capacity variation trends in accordance with an exemplary embodiment of the present invention.
Fig. 8 schematically shows a block diagram of a database management apparatus according to an exemplary embodiment of the present invention.
Fig. 9 schematically illustrates an electronic device for implementing the above-described database management method according to an exemplary embodiment of the present invention.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the invention.
Furthermore, the drawings are merely schematic illustrations of the invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
In the present exemplary embodiment, a database management method is first provided, and the method may be performed in a terminal device, a server cluster or a cloud server; of course, those skilled in the art may also operate the method of the present invention on other platforms as needed, and this is not particularly limited in this exemplary embodiment. Referring to fig. 1, the database management method may include the steps of:
step S110, a library directory of a target database and a root directory corresponding to the library directory are obtained.
And S120, calculating the number of the root directories and the capacity of the root directories, and calculating the number of the subdirectories corresponding to the root directories and the capacity of the subdirectories.
And S130, generating a database table of the target database according to the library directory, the root directory, the number of each root directory, the capacity of each root directory, the number of each subdirectory and the capacity of each subdirectory.
And S140, storing the database table of the target database into a relational database so that a user can conveniently check the database table of the target database in a regular expression matching mode.
In the database management method, on one hand, a library directory of a target database and a root directory corresponding to the library directory are obtained; calculating the number of each root directory and the capacity of each root directory, and calculating the number of subdirectories corresponding to each root directory and the capacity of each subdirectory; then generating a database table of the target database according to the library directory, the root directory, the number of each root directory, the capacity of each root directory, the number of each subdirectory and the capacity of each subdirectory; finally, the database table of the target database is stored in the relational database, so that a user can conveniently check the database table of the target database, and the problem that the use capacity of a database table is not counted by a current mainstream open-source big data management platform in the prior art is solved; on the other hand, the database table of the target database is stored in the relational database, so that a user can check the database table of the target database in a regular expression matching mode, the checking speed is improved, and the user experience is further improved; on the other hand, the database table comprises the library directories, the root directories, the number of the root directories, the capacity of the root directories, the number of the subdirectories and the capacity of the subdirectories of the target database, so that a user can visually see the capacity use condition of each directory, and the directory with overlarge or undersize capacity can be conveniently positioned.
Hereinafter, each step involved in the database management method according to the exemplary embodiment of the present invention will be explained and explained in detail with reference to the drawings.
First, terms related to exemplary embodiments of the present invention are explained and explained.
Hbase, a distributed, column-oriented open-source database, derived from Google paper "Bigtable: a distributed system of structured data. Just as Bigtable takes advantage of the distributed data storage provided by the Google File System (File System), HBase provides Bigtable-like capabilities over Hadoop. HBase is a sub-item of the Hadoop item of Apache, unlike a general relational database, which is a database suitable for unstructured data storage, another difference is that HBase is based on a column rather than a row-based schema.
Hive is a data warehouse tool based on Hadoop, which is used for data extraction, transformation and loading, and is a mechanism that can store, query and analyze large-scale data stored in Hadoop. The Hive data warehouse tool can map the structured data file into a database table, provide SQL query function and convert SQL sentences into MapReduce tasks for execution. The Hive has the advantages that the learning cost is low, fast MapReduce statistics can be realized through similar SQL sentences, the MapReduce becomes simpler, a special MapReduce application program does not need to be developed, and the Hive is a statistical analysis and Windows registry file which is very suitable for a data warehouse.
In addition, Hbase and Hive are stored based on HDFS (Hadoop Distributed File System), and have libraries (namespaces in Namespace, Hbase, and Hive, which are collections of tables), tables (structures of data stored in tables, Hbase, and Hive), and are different storage directories corresponding to HDFS.
Next, the purpose of the exemplary embodiments of the present invention will be explained and explained. Specifically, as the database tables of the Hbase and Hive databases do not change, the database tables are not aggregated and counted, and when the capacity of a large data cluster is greatly increased, which database or table is suddenly increased cannot be judged, so that the problem of inconvenient and rapid positioning is caused, and the service is likely to be affected. The scheme collects the database table capacity, the number and the file number of the databases such as Hbase and Hive stored based on the HDFS every day, changes can be conveniently checked, a large table can be conveniently determined, and whether optimized storage is needed or not can be conveniently determined according to the file number. According to the threshold value set by the platform, when the capacity of the base table is suddenly increased or decreased, the manager is actively informed to pay attention to whether the capacity is abnormal or not, so that the problem can be timely found, and the service is prevented from being influenced.
Hereinafter, steps S110 to S140 will be explained and explained.
In step S110, a library directory of the target database and a root directory corresponding to the library directory are obtained.
In this example, the target database may be a Hive database and an Hbase database, or may be other types of databases stored based on an HDFS, which is not limited in this example. Specifically, the library directory of the target database and the root directory corresponding to the library directory can be obtained regularly through a statistical script at preset time intervals; the preset time may be, for example, one day or 12 hours.
For example, the data management platform may embed a statistical script, and then periodically trigger the statistical script to execute a statistical task in the same time period of each day, where the statistical script may execute the statistical task by removing any one of Hbase and/or Hive machines according to the cluster information to be counted, and obtain a library directory stored based on the HDFS and a root directory corresponding to the library directory. The library directory may be, for example,/hbase/data, or/hive/data, and the root directory corresponding to the library directory may be, for example,/hbase/data/default, and/hbase/data/test 1, etc. Of course, other forms of root directories may be included, and this example is not particularly limited thereto.
In step S120, the number of the root directories and the capacity of the root directories are calculated, and the number of the subdirectories corresponding to the root directories and the capacity of the subdirectories are calculated.
In this exemplary embodiment, after the library directory and the root directory corresponding to the library directory are obtained, the number of the root directory and the number of the subdirectories under each library directory may be counted by the counting script, and then the capacity of each root directory and the capacity of the subdirectory corresponding to the root directory may be counted by the du command, where the capacity refers to the capacity that has been used.
For example, the directory where the Hbase is stored on the HDFS is/Hbase/data, where the directory stores the Hbase library table, the statistical script will count the number of all directories under the directory, and count the capacity of each directory using du command, where these data are the number of Hbase libraries and the capacity of each library, for example, there are two directories, default and test1, under/Hbase/data, these two directories are two library names under the current Hbase database, and the script counts the capacity of these two directories through Hadoop command du. After the number and the capacity of the library are counted, the script respectively counts the number and the capacity of the directory under each library, namely the number and the capacity of the table under the library, the counting method also uses a Hadoop command to calculate the count (number) of the directory, du calculates the capacity, for example, 100 directories exist under a test1 directory, which indicates that 100 tables exist in an Hbase database under a test1 database, and du calculates the capacity of each table. The number of files under each base table is obtained through the count parameter of Hadoop.
In step S130, a database table of the target database is generated according to the library directory, the root directory, the number of each root directory, the capacity of each root directory, the number of each sub-directory, and the capacity of each sub-directory.
In the present exemplary embodiment, after the library directory, the root directory, the number of root directories, the capacity of root directories, the number of subdirectories, and the capacity of subdirectories are obtained, the database table of the target database may be generated according to the library directory, the root directory, the number of root directories, the capacity of root directories, the number of subdirectories, and the capacity of subdirectories. By the method, which library and table are suddenly increased or suddenly decreased, or the size of the occupied amount and the number of files can be quickly positioned, so that the service can be conveniently adjusted in time, and Hbase and Hive services can be reasonably used.
In step S140, the database tables of the target database are stored in a relational database, so that the user can conveniently check the database tables of the target database in a regular expression matching manner.
In the present exemplary embodiment, first, the database table of the target database is stored into the relational database. Wherein storing the database table of the target database into the relational database may include: firstly, generating a data storage request according to a database table of the target database and a token of the target database; secondly, the data storage request is sent to the relational database, so that the relational database stores the database table when the token is confirmed to pass the verification. The token is provided to the target database after the target database is authenticated by the relational database. Therefore, a data storage request can be generated according to the token and the database table, and after the relational database receives the data storage request, the database table can be stored after the token is verified; also, in the relational database, the stored key may be a database name of the target database, and the value may be a database table corresponding to the target database.
Secondly, after the database table is successfully stored, when a user needs to check the database table of the target database, the target database can be directly checked in a regular expression matching mode; for example, the subdirectories under the root directory default under the data base directory of the Hbase database can be directly checked in a mode of Hbase + data + default; by the method, the viewing speed can be improved.
Fig. 2 schematically illustrates another database management method according to an exemplary embodiment of the present invention. Referring to fig. 2, the database management method may further include steps S210 to S230. Wherein:
in step S210, the total capacity of the target database is calculated according to the database table of the target database.
In step S220, a storage ratio of the target database in the distributed system is calculated according to the total capacity of the target database, and it is determined whether the storage ratio is greater than a first preset threshold.
In step S230, when it is determined that the storage proportion is greater than the first preset threshold, a root directory and/or a sub-directory that generates abnormal data is located according to the database table of the target database.
Hereinafter, steps S210 to S230 will be explained and explained. Firstly, the total capacity of a target database can be calculated according to the capacity of each directory in a database table of the target database (the capacity of a root directory and the capacity of subdirectories), then the storage proportion of the target database in a distributed system is calculated according to the total capacity of the target database (the used capacity of the target database) and the total capacity of the distributed system (the total available capacity of the distributed system), and then whether the storage proportion is larger than a first preset threshold value is judged; the first preset threshold value can be determined according to the average value of the storage ratio of the target data every day in a period of time; further, if the value is larger than the first preset threshold, it indicates that the data in the database is abnormal, and the root directory and/or the subdirectory which generate abnormal data can be directly located according to the database table; by the method, the positioning speed can be improved, so that managers can analyze abnormal reasons as fast as possible.
Meanwhile, in order to further avoid the influence on the distributed system due to abnormal data and further influence on the use of other users, the database management method may further include: and acquiring corresponding table data under the root directory and/or the subdirectories which generate the abnormal data, and analyzing the reasons for generating the abnormal data according to the table data. If the reason for generating the abnormal data is that the data is increased rapidly due to the increase of normal flow, the capacity can be directly expanded or not taken care of, and if the reason is other malicious reasons, corresponding measures can be taken for a data producer generating the abnormal data, so that the safety of the distributed system can be improved.
It should be added here that if the storage proportion is too small, it may also be considered as data abnormal, the subdirectory and/or the root directory with the capacity smaller than a certain value may be deleted, so as to avoid the problem that the directory in the database table is too many and is inconvenient to view.
FIG. 3 schematically illustrates another database management method according to an example embodiment of the invention. Referring to fig. 3, the database management method may further include steps S310 to S330. Wherein:
in step S310, it is determined whether the storage ratio is greater than a second preset threshold.
In step S320, when it is determined that the storage proportion is greater than a second preset threshold, generating alarm information corresponding to the target database according to a database table of the target database.
In step S330, the alarm information is stored in a target database, so that a user can expand the distributed system according to the alarm information.
Hereinafter, steps S310 to S330 will be explained and explained. Firstly, judging whether the storage proportion is greater than a second preset threshold, wherein the second preset threshold may be ninety percent, or may be other values, and may be set by itself according to needs, which is not limited in this example; secondly, if the number of the alarm messages is more than ninety percent, corresponding alarm messages are generated, so that managers can conveniently expand the capacity of the distributed system, and further the use of other users is prevented from being influenced; of course, if the number of the warning messages is between seventy percent and ninety percent, the warning messages can be set by yellow, and the manager can judge whether the capacity expansion processing is needed according to the actual situation.
Hereinafter, the database management method according to the exemplary embodiment of the present invention will be further explained and explained with reference to fig. 4. Referring to fig. 4, the database management method may include the steps of:
and step S410, a timing task is built in the program, and statistics is executed every day.
And step S420, performing statistics on the specified Hbase and Hive according to the statistical script parameters, and storing a relational database. Specifically, a client machine of Hbase or Hive is selected, a statistical size and a statistical number of Hadoop commands are executed, paths are recorded by a self-research platform and provided to a statistical script, for example, a directory of Hbase stored on HDFS is/Hbase/data, where a base table of Hbase is stored under the directory, the statistical script may count the number of all directories under the directory, a du command is used to count the capacity of each directory, that is, the number of Hbase libraries and the capacity of each library, for example, two directories, default and test1, are located under the Hbase database, and the script counts the capacity of the two directories through the Hadoop command du. After the number and the capacity of the libraries are counted, the script respectively counts the number and the capacity of the directories under each library, namely the number and the capacity of the tables under the library, the statistical method also uses a Hadoop command to calculate the count and du of the directories, for example, the 100 directories exist under the test1 directory, which shows that 100 tables exist in the Hbase database under the test1 library, and the capacity of each table is calculated by du respectively. The number of files under each base table is obtained through the count parameter of Hadoop. And finally, transmitting the data into a self-research platform database through an interface provided by the self-research platform and a token authentication mode.
And step S430, summarizing the database table information according to the conditions, so that the user can check the database table information as required. Specifically, when the user views the statistics information of the base table, the user clicks and views the base table statistics on the detail pages such as Hbase and Hive corresponding to the platform. By way of example of Hbase, reference is made to FIG. 5 for a detailed illustration. Therein, fig. 5 shows a trend chart of the number of bins under Hbase so that the user can view the trend of this bin capacity.
Further, by clicking on the details, the user can enter the warehouse to check the information of the number, the capacity size and the number of files of all tables in the warehouse. Also, the bar graph shows the trend of the number of tables under this warehouse (as shown in fig. 6 in particular), and the list shows the detailed information of each table currently. Similarly, clicking on the trend behind the table name can see the trend of the table capacity (specifically, refer to fig. 7).
Furthermore, after the information of the base tables is collected every day, the increase and decrease of each base and each table are calculated by the self-research platform, and if the increase exceeds the threshold value set by the platform, alarm information is sent to an administrator to prompt the administrator to observe whether the abnormal conditions exist. If the capacity of the table is not changed recently, the table can be communicated with a service party, whether the table is not used or not, whether the table can be cleaned or not, and the condition that no table is used in a warehouse is reduced.
The database management method provided by the embodiment of the invention at least has the following advantages:
on one hand, through configuration information collected during cluster deployment, such as positions of warehouses stored on the HDFS, client information and the like, a Hadoop command is executed on a configured storage directory through a client regularly every day, statistical information including capacity, quantity and file quantity of Hbase and Hive base tables is obtained and stored in a platform database, problem troubleshooting is facilitated, and meanwhile, alarm notification can be triggered when sudden increase and sudden decrease occur.
On the other hand, when the Hbase and Hive base tables use storage capacity for alarming, which base table results in can be quickly judged, or whether a service has a plurality of tables or not can be judged; meanwhile, the change of the capacity, the number and the file number of the base tables of the Hbase and the Hive can be conveniently checked, which base and table are convenient to position, or the size of occupied amount and the number of files are convenient, the service is convenient to adjust in time, and the Hbase and Hive services are reasonably used.
On the other hand, on the basis of a self-research big data management platform, a statistical script is regularly executed every day, database table information of databases stored on the basis of HDFS (Hadoop distributed File System) such as Hbase and Hive on the platform is obtained, and the collected information is stored in a relational database of the self-research platform.
Further, the library number, the library capacity, the table number, the table capacity and the file number of Hbase and Hive of the existing big data cluster are counted every day, the recent change trend can be checked, which library and table are convenient to investigate, the recent change is large, and the problem of positioning is convenient. During statistics, the library names and the table names can be matched according to the regular expressions, and the problem that statistics of a large number of unimportant libraries or tables is long in statistical period is avoided.
The embodiment of the invention also provides a database management device. Referring to fig. 8, the database management apparatus may include a directory acquiring module 810, a first calculating module 820, a database table generating module 830, and a database table storing module 840. Wherein:
the catalog retrieval module 810 may be configured to retrieve a library catalog of a target database and a root catalog corresponding to the library catalog.
The first calculating module 820 may be configured to calculate the number of each root directory and the capacity of each root directory, and calculate the number of subdirectories corresponding to each root directory and the capacity of each subdirectory.
The database table generating module 830 may be configured to generate a database table of the target database according to the library directory, the root directory, the number of each root directory, the capacity of each root directory, the number of each sub-directory, and the capacity of each sub-directory.
The database table storage module 840 may be configured to store the database table of the target database into a relational database, so that a user may view the database table of the target database in a regular expression matching manner.
In an exemplary embodiment of the present disclosure, the database management apparatus further includes:
the second calculation module can be used for calculating the total capacity of the target database according to the database table of the target database;
the third calculation module may be configured to calculate a storage proportion of the target database in the distributed system according to a total capacity of the target database, and determine whether the storage proportion is greater than a first preset threshold;
and the directory positioning module can be used for positioning the root directory and/or the subdirectory generating abnormal data according to the database table of the target database when the storage proportion is determined to be greater than a first preset threshold value.
In an exemplary embodiment of the present disclosure, the database management apparatus further includes:
and the abnormal reason analysis module can be used for acquiring corresponding table data under the root directory and/or the subdirectory which generate the abnormal data and analyzing the reason of generating the abnormal data according to the table data.
In an exemplary embodiment of the present disclosure, the database management apparatus further includes:
the storage ratio judging module can be used for judging whether the storage ratio is larger than a second preset threshold value;
the warning information generating module may be configured to generate warning information corresponding to the target database according to a database table of the target database when it is determined that the storage proportion is greater than a second preset threshold;
and the alarm information storage module can be used for storing the alarm information into a target database so as to facilitate a user to expand the distributed system according to the alarm information.
In an exemplary embodiment of the present disclosure, storing the database table of the target database into a relational database includes:
generating a data storage request according to a database table of the target database and a token of the target database;
and sending the data storage request to the relational database, so that the relational database stores the database table when the token is confirmed to pass the verification.
In an exemplary embodiment of the present disclosure, obtaining a library directory of a target database and a root directory corresponding to the library directory includes:
and regularly acquiring the library catalogue of the target database and the root catalogue corresponding to the library catalogue through the statistical script at preset time intervals.
In an exemplary embodiment of the present disclosure, the target database is a Hive database and/or an Hbase database.
The specific details of each module in the database management apparatus have been described in detail in the corresponding database management method, and therefore are not described herein again.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the invention. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Moreover, although the steps of the methods of the present invention are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.
In an exemplary embodiment of the present invention, there is also provided an electronic device capable of implementing the above method.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
An electronic device 900 according to this embodiment of the invention is described below with reference to fig. 9. The electronic device 900 shown in fig. 9 is only an example and should not bring any limitations to the function and scope of use of the embodiments of the present invention.
As shown in fig. 9, the electronic device 900 is embodied in the form of a general purpose computing device. Components of electronic device 900 may include, but are not limited to: the at least one processing unit 910, the at least one storage unit 920, a bus 930 connecting different system components (including the storage unit 920 and the processing unit 910), and a display unit 940.
Wherein the storage unit stores program code that is executable by the processing unit 910 to cause the processing unit 910 to perform steps according to various exemplary embodiments of the present invention described in the above section "exemplary methods" of the present specification. For example, the processing unit 910 may execute step S110 as shown in fig. 1: acquiring a library directory of a target database and a root directory corresponding to the library directory; step S120: calculating the number of the root directories and the capacity of the root directories, and calculating the number of subdirectories corresponding to the root directories and the capacity of the subdirectories; step S130: generating a database table of the target database according to the library directory, the root directory, the number of each root directory, the capacity of each root directory, the number of each subdirectory and the capacity of each subdirectory; step S140: and storing the database table of the target database into a relational database so that a user can conveniently check the database table of the target database in a regular expression matching mode.
The storage unit 920 may include a readable medium in the form of a volatile storage unit, such as a random access memory unit (RAM)9201 and/or a cache memory unit 9202, and may further include a read only memory unit (ROM) 9203.
Storage unit 920 may also include a program/utility 9204 having a set (at least one) of program modules 9205, such program modules 9205 including but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 930 can be any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 900 may also communicate with one or more external devices 1000 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 900, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 900 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interface 950. Also, the electronic device 900 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet) via the network adapter 960. As shown, the network adapter 960 communicates with the other modules of the electronic device 900 via the bus 930. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 900, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a computing device (which can be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiment of the present invention.
In an exemplary embodiment of the present invention, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above section "exemplary methods" of the present description, when said program product is run on the terminal device.
According to the program product for realizing the method, the portable compact disc read only memory (CD-ROM) can be adopted, the program code is included, and the program product can be operated on terminal equipment, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

Claims (10)

1. A database management method, comprising:
acquiring a library directory of a target database and a root directory corresponding to the library directory;
calculating the number of the root directories and the capacity of the root directories, and calculating the number of subdirectories corresponding to the root directories and the capacity of the subdirectories;
generating a database table of the target database according to the library directory, the root directory, the number of each root directory, the capacity of each root directory, the number of each subdirectory and the capacity of each subdirectory;
and storing the database table of the target database into a relational database so that a user can conveniently check the database table of the target database in a regular expression matching mode.
2. The database management method according to claim 1, further comprising:
calculating the total capacity of the target database according to the database table of the target database;
calculating the storage ratio of the target database in the distributed system according to the total capacity of the target database, and judging whether the storage ratio is greater than a first preset threshold value;
and when the storage proportion is determined to be larger than a first preset threshold value, positioning a root directory and/or a subdirectory generating abnormal data according to a database table of the target database.
3. The database management method according to claim 2, further comprising:
and acquiring corresponding table data under the root directory and/or the subdirectories which generate the abnormal data, and analyzing the reasons for generating the abnormal data according to the table data.
4. The database management method according to claim 2, further comprising:
judging whether the storage ratio is larger than a second preset threshold value or not;
when the storage proportion is determined to be larger than a second preset threshold value, generating alarm information corresponding to the target database according to a database table of the target database;
and storing the alarm information into a target database so as to facilitate the capacity expansion of the distributed system by a user according to the alarm information.
5. The database management method of claim 1, wherein storing the database table of the target database into a relational database comprises:
generating a data storage request according to a database table of the target database and a token of the target database;
and sending the data storage request to the relational database, so that the relational database stores the database table when the token is confirmed to pass the verification.
6. The database management method according to claim 1, wherein obtaining the library directory of the target database and the root directory corresponding to the library directory comprises:
and regularly acquiring the library catalogue of the target database and the root catalogue corresponding to the library catalogue through the statistical script at preset time intervals.
7. The database management method according to any one of claims 1 to 5, wherein the target database is a Hive database and/or an Hbase database.
8. A database management apparatus, comprising:
the system comprises a catalog acquisition module, a root catalog acquisition module and a database management module, wherein the catalog acquisition module is used for acquiring a library catalog of a target database and a root catalog corresponding to the library catalog;
the first calculation module is used for calculating the number of the root directories and the capacity of the root directories, and calculating the number of subdirectories corresponding to the root directories and the capacity of the subdirectories;
a database table generating module, configured to generate a database table of the target database according to the library directory, the root directory, the number of each root directory, the capacity of each root directory, the number of each sub-directory, and the capacity of each sub-directory;
and the database table storage module is used for storing the database table of the target database into a relational database so as to be convenient for a user to check the database table of the target database in a regular expression matching mode.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the database management method according to any one of claims 1 to 7.
10. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the database management method of any of claims 1-7 via execution of the executable instructions.
CN202010504496.1A 2020-06-05 2020-06-05 Database management method and device, computer readable storage medium and electronic device Pending CN113760856A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010504496.1A CN113760856A (en) 2020-06-05 2020-06-05 Database management method and device, computer readable storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010504496.1A CN113760856A (en) 2020-06-05 2020-06-05 Database management method and device, computer readable storage medium and electronic device

Publications (1)

Publication Number Publication Date
CN113760856A true CN113760856A (en) 2021-12-07

Family

ID=78783949

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010504496.1A Pending CN113760856A (en) 2020-06-05 2020-06-05 Database management method and device, computer readable storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN113760856A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610279A (en) * 2021-07-20 2021-11-05 中国石油大学(华东) Accident prediction method based on data set regularity

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5642510A (en) * 1992-05-28 1997-06-24 Nec Corporation Catalog management system using a catalog pool, pool management table and pool number data table
CN102722487A (en) * 2011-03-30 2012-10-10 腾讯科技(深圳)有限公司 Method and apparatus for file management
US20140156609A1 (en) * 2011-12-06 2014-06-05 International Business Machines Corporation Database table compression
CN103870603A (en) * 2014-04-03 2014-06-18 联想(北京)有限公司 Directory management method and electronic device
CN104239493A (en) * 2014-09-09 2014-12-24 北京京东尚科信息技术有限公司 Cross-cluster data migration method and system
CN105335405A (en) * 2014-07-29 2016-02-17 北京奇虎科技有限公司 System file detection method and device
US20170316017A1 (en) * 2016-04-28 2017-11-02 inXtron. Inc. Multi hard-disk file management system and method thereof
CN109117083A (en) * 2017-06-26 2019-01-01 深圳回收宝科技有限公司 Mobile terminal, built-in memory capacity detection method and computer readable storage medium
TWI656451B (en) * 2018-03-23 2019-04-11 中華電信股份有限公司 Method of managing content structure configuration and apparatus using the same
CN111209259A (en) * 2018-11-22 2020-05-29 杭州海康威视系统技术有限公司 NAS distributed file system and data processing method

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5642510A (en) * 1992-05-28 1997-06-24 Nec Corporation Catalog management system using a catalog pool, pool management table and pool number data table
CN102722487A (en) * 2011-03-30 2012-10-10 腾讯科技(深圳)有限公司 Method and apparatus for file management
US20140156609A1 (en) * 2011-12-06 2014-06-05 International Business Machines Corporation Database table compression
CN103870603A (en) * 2014-04-03 2014-06-18 联想(北京)有限公司 Directory management method and electronic device
CN105335405A (en) * 2014-07-29 2016-02-17 北京奇虎科技有限公司 System file detection method and device
CN104239493A (en) * 2014-09-09 2014-12-24 北京京东尚科信息技术有限公司 Cross-cluster data migration method and system
US20170316017A1 (en) * 2016-04-28 2017-11-02 inXtron. Inc. Multi hard-disk file management system and method thereof
CN109117083A (en) * 2017-06-26 2019-01-01 深圳回收宝科技有限公司 Mobile terminal, built-in memory capacity detection method and computer readable storage medium
TWI656451B (en) * 2018-03-23 2019-04-11 中華電信股份有限公司 Method of managing content structure configuration and apparatus using the same
CN111209259A (en) * 2018-11-22 2020-05-29 杭州海康威视系统技术有限公司 NAS distributed file system and data processing method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610279A (en) * 2021-07-20 2021-11-05 中国石油大学(华东) Accident prediction method based on data set regularity

Similar Documents

Publication Publication Date Title
US11314574B2 (en) Techniques for managing and analyzing log data
US10474513B2 (en) Cluster-based processing of unstructured log messages
US11614990B2 (en) Automatic correlation of dynamic system events within computing devices
CN109844781B (en) System and method for identifying process flows from log files and visualizing the flows
US10560465B2 (en) Real time anomaly detection for data streams
US10303533B1 (en) Real-time log analysis service for integrating external event data with log data for use in root cause analysis
US10713271B2 (en) Querying distributed log data using virtual fields defined in query strings
US10437703B2 (en) Correlation of source code with system dump information
EP4099170B1 (en) Method and apparatus of auditing log, electronic device, and medium
US11354836B2 (en) Systems and methods for displaying representative samples of tabular data
KR102614428B1 (en) Systems and methods for updating multi-tier cloud-based application stacks
CN111881011A (en) Log management method, platform, server and storage medium
US20200073781A1 (en) Systems and methods of injecting fault tree analysis data into distributed tracing visualizations
CN112000992B (en) Data leakage prevention protection method and device, computer readable medium and electronic equipment
US10187264B1 (en) Gateway path variable detection for metric collection
CN110717130B (en) Dotting method, dotting device, dotting terminal and storage medium
CN110765090A (en) Log data management method and device, storage medium and electronic equipment
CN112671602B (en) Data processing method, device, system, equipment and storage medium of edge node
CN114116422A (en) Hard disk log analysis method, hard disk log analysis device and storage medium
US10644971B2 (en) Graph search in structured query language style query
CN113760856A (en) Database management method and device, computer readable storage medium and electronic device
CN115580528A (en) Fault root cause positioning method, device, equipment and readable storage medium
CN114372061A (en) Data cluster management method, system, server and computer readable storage medium
US11550692B2 (en) Integrated event processing and policy enforcement
CN109324951A (en) The acquisition methods and device of hard disk information in server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
CB02 Change of applicant information

Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant after: Jingdong Technology Holding Co.,Ltd.

Address before: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant before: Jingdong Digital Technology Holding Co.,Ltd.

Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant after: Jingdong Digital Technology Holding Co.,Ltd.

Address before: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant before: JINGDONG DIGITAL TECHNOLOGY HOLDINGS Co.,Ltd.

CB02 Change of applicant information
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination