CN114491197A

CN114491197A - Database expansion method and device based on big data

Info

Publication number: CN114491197A
Application number: CN202210401409.9A
Authority: CN
Inventors: 汪明; 陆永辉; 王广军; 张勇; 吉艳; 李友冰
Original assignee: Kongzhi Technology Xuahzou Co ltd
Current assignee: Kongzhi Technology Xuahzou Co ltd
Priority date: 2022-04-18
Filing date: 2022-04-18
Publication date: 2022-05-13
Anticipated expiration: 2042-04-18
Also published as: CN114491197B

Abstract

The invention is suitable for the technical field of data processing, and particularly relates to a database expansion method and device based on big data, wherein the method comprises the following steps: carrying out data classification on the stored data to obtain a data classification result; carrying out region division on an original database to generate a region division result of the original database; generating a division result of the newly added database area; carrying out partition storage on the stored data; and after receiving the read-write request, analyzing the read-write request, judging a data change area, and completing corresponding read-write operation in the data change area. The method classifies the data in the original database, stores the data according to the categories, and divides the regions in the newly added database according to the same classification mode, so that the newly added database and the original database have the same classification, and identifies and stores the data when performing data operation, thereby avoiding performing all retrieval on the database and improving the response speed of the system.

Description

Database expansion method and device based on big data

Technical Field

The invention belongs to the technical field of data processing, and particularly relates to a database expansion method and device based on big data.

Background

A database is a repository where data is stored. The storage space is large, and millions, millions and hundreds of millions of data can be stored. However, the database does not store data randomly, and has certain rules, otherwise, the query efficiency is low. The world today is an internet world that is full of data, which is flooded with large amounts of data. I.e. the internet world is the data world. The sources of data are many, such as travel records, consumption records, web pages viewed, messages sent, and so forth. In addition to text type data, images, music, and sounds are data.

The capacity of database is limited, in the long-term use process, data in the database can increase gradually, lead to available capacity tapering, consequently when the database is about to be taken up completely, need carry out the dilatation processing to the database, current dilatation mode is mainly for carrying out overall management to a plurality of databases through intermediate server, all data reading and storage are all received through intermediate server, after handling the completion, write or read the database through intermediate server, consequently intermediate server and database itself have constituted a virtual database jointly, and when subsequent increase database, only need through intermediate server manage can.

However, since the data between the original database and the newly added database is not directly related, the data reading or writing operation is complicated, and the response time is prolonged.

Disclosure of Invention

An embodiment of the present invention is directed to provide a database expansion method based on big data, and aims to solve the problems in the third part of the background art.

The embodiment of the invention is realized in such a way that a database expansion method based on big data comprises the following steps:

reading stored data in an original database, and performing data classification on the stored data to obtain a data classification result;

performing region division on an original database according to a data classification result to generate an original database region division result, wherein the original database region division result at least comprises division region mark information and division region internal memory information;

performing region division on the newly added database according to the region division result of the original database, and generating a region division result of the newly added database;

performing partition storage on the stored data according to the division result of the newly added database region and the division result of the original database region;

and after receiving the read-write request, analyzing the read-write request, judging a data change area, and completing corresponding read-write operation in the data change area.

Preferably, the step of performing region division on the original database according to the data classification result to generate a region division result of the original database specifically includes:

analyzing the stored data to determine the type number of the stored data;

dividing the large-class area of the original database according to the number of the types of the stored data to obtain a plurality of large-class storage areas;

analyzing the stored data belonging to each large-class storage area, dividing each large-class storage area into a plurality of small-class storage areas, establishing a link relation between the same small-class storage areas in different large-class storage areas, and generating an original database area division result.

Preferably, the step of performing region division on the newly added database according to the region division result of the original database and generating the region division result of the newly added database specifically includes:

performing region division on the newly added database according to the region division result of the original database to obtain a plurality of large-class storage regions;

establishing a link relation between the same large-class storage areas in the original database and the newly added database;

performing area division on all large-class storage areas according to the area division result of the original database to obtain a plurality of small-class storage areas;

and establishing a link relation between the same small storage areas in different large storage areas to generate a division result of the newly added database area.

Preferably, after receiving the read-write request, the step of analyzing the read-write request, determining a data change region, and completing a corresponding read-write operation in the data change region specifically includes:

receiving a read-write request, and analyzing the read-write request to obtain a request analysis result;

judging the data attribution type according to the request analysis result, and determining a data change area;

and searching the data change area, and completing corresponding operation in the data change area.

Preferably, after the step of performing the region division on the original database according to the data classification result, generating a data index is further included.

Preferably, in the step of analyzing the read-write request and determining the data change area after receiving the read-write request, if there is no corresponding data change area, a new type data area is separately divided, and the read-write operation is completed in the new type data area.

Preferably, the memory information of the divided regions is determined according to the size of data to be stored in each divided region.

Another object of an embodiment of the present invention is to provide a device for expanding a database based on big data, where the device includes:

the data reading module is used for reading the stored data in the original database and carrying out data classification on the stored data to obtain a data classification result;

the first region dividing module is used for performing region division on the original database according to the data classification result to generate an original database region dividing result, wherein the original database region dividing result at least comprises divided region mark information and divided region memory information;

the second area division module is used for carrying out area division on the newly added database according to the area division result of the original database and generating an area division result of the newly added database;

the data storage module is used for carrying out partition storage on the stored data according to the division result of the newly added database region and the division result of the original database region;

and the data reading and writing module is used for analyzing the reading and writing request after receiving the reading and writing request, judging a data change area and finishing corresponding reading and writing operation in the data change area.

Preferably, the first area dividing module includes:

the data type identification unit is used for analyzing the stored data and determining the number of the types of the stored data;

the first large-class area dividing unit is used for performing large-class area division on the original database according to the number of types of stored data to obtain a plurality of large-class storage areas;

the first subclass data dividing unit is used for analyzing the stored data belonging to each major class storage area, dividing each major class storage area into a plurality of subclass storage areas, establishing a link relation between the same subclass storage areas in different major class storage areas and generating an original database area dividing result.

Preferably, the second area dividing module includes:

the second large-class area dividing unit is used for carrying out area division on the newly added database according to the area division result of the original database to obtain a plurality of large-class storage areas;

the large-class data link unit is used for establishing a link relation between the same large-class storage areas in the original database and the newly added database;

the second small-class area dividing unit is used for carrying out area division on all large-class storage areas according to the area division result of the original database to obtain a plurality of small-class storage areas;

and the subclass data linking unit is used for establishing a linking relation between the same subclass storage areas in different large-class storage areas and generating a newly added database area dividing result.

According to the database expansion method based on the big data, provided by the embodiment of the invention, when a database is newly added, data in an original database is classified and classified, the data is stored according to the classification, and the region in the newly added database is divided according to the same classification mode, so that the newly added database and the original database have the same classification.

Drawings

Fig. 1 is a flowchart of a database expansion method based on big data according to an embodiment of the present invention;

fig. 2 is a flowchart illustrating a step of performing area division on an original database according to a data classification result and generating an area division result of the original database according to an embodiment of the present invention;

fig. 3 is a flowchart of a step of performing region partitioning on a newly added database according to a region partitioning result of an original database and generating a region partitioning result of the newly added database according to an embodiment of the present invention;

fig. 4 is a flowchart illustrating a step of analyzing a read/write request to determine a data change area after receiving the read/write request, and completing a corresponding read/write operation in the data change area according to an embodiment of the present invention;

fig. 5 is an architecture diagram of a database expansion apparatus based on big data according to an embodiment of the present invention;

fig. 6 is an architecture diagram of a first region dividing module according to an embodiment of the present invention;

fig. 7 is an architecture diagram of a second region dividing module according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It will be understood that, as used herein, the terms "first," "second," and the like may be used herein to describe various elements, but these elements are not limited by these terms unless otherwise specified. These terms are only used to distinguish one element from another. For example, a first xx script may be referred to as a second xx script, and similarly, a second xx script may be referred to as a first xx script, without departing from the scope of the present application.

The capacity of database is limited, in the long-term use process, data in the database can increase gradually, lead to available capacity tapering, consequently when the database is about to be taken up completely, need carry out the dilatation processing to the database, current dilatation mode is mainly for carrying out overall management to a plurality of databases through intermediate server, all data reading and storage are all received through intermediate server, after handling the completion, write or read the database through intermediate server, consequently intermediate server and database itself have constituted a virtual database jointly, and when subsequent increase database, only need through intermediate server manage can. However, since the data between the original database and the newly added database is not directly related, the data reading or writing operation is complicated, and the response time is prolonged.

When a database is newly added, the method classifies the data in the original database, classifies the data, stores the data according to the classification, and divides the region in the newly added database according to the same classification mode, so that the newly added database and the original database have the same classification.

As shown in fig. 1, a flowchart of a method for expanding a database based on big data according to an embodiment of the present invention is provided, where the method includes:

and S100, reading the stored data in the original database, and performing data classification on the stored data to obtain a data classification result.

In this step, the stored data in the original database is read first, so as to be classified according to the content contained in the data, that is, the stored data is subjected to data classification to obtain a data classification result.

S200, performing region division on the original database according to the data classification result to generate an original database region division result, wherein the original database region division result at least comprises division region mark information and division region memory information.

In this step, the data classification result records the type of data in the stored data and the size of data occupied by each type of data, and generates an original database region division result according to the size of a memory occupied by each type of data, where the original database region division result at least includes divided region mark information and divided region memory information, that is, each region obtained by division is marked and given a piece of divided region mark information, the divided region mark information is unique and is used as a specific mark of the region, and the divided region memory information refers to the size of the memory occupied by the region; after the step of carrying out region division on the original database according to the data classification result, generating a data index; and the memory information of the divided areas is determined according to the size of the data needing to be stored in each divided area.

And S300, performing area division on the newly added database according to the area division result of the original database, and generating the area division result of the newly added database.

In this step, the size of the memory occupied by each type of data is recorded in the original database partitioning result, the newly added database is partitioned according to the memory occupation ratio, and a plurality of independent areas are partitioned, so that the proportion between the memories occupied by the areas is the same as the partitioning condition in the original database.

And S400, performing partition storage on the stored data according to the division result of the newly added database region and the division result of the original database region.

In this step, the stored data is stored in a partitioned manner according to the partition result of the newly added database region and the partition result of the original database region, and after the regions of the original database and the newly added database are partitioned, the regions with the same memory ratio are partitioned in the original database and the newly added database, so that the corresponding data in the stored data is distributed according to the memory ratio occupied by the same partition in the original database and the newly added database, namely the ratio between the partition A in the original database and the partition A 'in the newly added database is 2:1, and then the data a in the stored data is distributed according to the ratio of 2:1 and is respectively stored in the partition A and the partition A'.

And S500, after receiving the read-write request, analyzing the read-write request, judging a data change area, and completing corresponding read-write operation in the data change area.

In this step, after receiving the read-write request, analyzing the read-write request, and by analyzing the read-write request, the target data of the read-write operation can be known, and the area where the read-write operation is going to occur is determined according to the type of the target data to judge the data change area; and in the step of analyzing the read-write request and judging the data change area after receiving the read-write request, if the corresponding data change area does not exist, a new type data area is separately divided, and the read-write operation is completed in the new type data area.

As shown in fig. 2, as a preferred embodiment of the present invention, the step of performing region partition on the original database according to the data classification result to generate a region partition result of the original database specifically includes:

s201, analyzing the stored data and determining the type number of the stored data.

In this step, the stored data is analyzed, and the number of data types included in the stored data is determined through analysis, specifically, the stored data may be analyzed according to a preset data type database, all types of data are recorded in the data type database, and the number of data types included in the stored data is quickly determined through query and comparison.

S202, performing large-class area division on the original database according to the number of the types of the stored data to obtain a plurality of large-class storage areas.

In this step, the original database is divided into large-class areas according to the number of types of stored data, and during this area division, the data are divided into large-class areas according to the number of types of data, for example, for video, audio and pictures, the video large-class areas, the audio large-class areas and the picture large-class areas are divided into three large-class storage areas.

And S203, analyzing the stored data belonging to each large-class storage area, dividing each large-class storage area into a plurality of small-class storage areas, establishing a link relation between the same small-class storage areas in different large-class storage areas, and generating an original database area division result.

In this step, since the large-class area division is already performed, what kind of data is stored in each large-class area is already clear, at this time, in order to improve the data retrieval efficiency, each large-class storage area is further divided into areas, each large-class storage area is divided into a plurality of small-class storage areas, for example, for a video large-class storage area, the video large-class storage area is further divided according to the content of a video, and is divided into people, landscapes, articles and the like for further subdivision, and for a photo large class, the same or similar classification also exists, and a link relationship is established between the same small-class storage areas in different large-class storage areas, so as to generate an original database area division result.

As shown in fig. 3, as a preferred embodiment of the present invention, the step of performing region division on the newly added database according to a region division result of the original database, and generating a region division result of the newly added database specifically includes:

s301, performing region division on the newly added database according to the region division result of the original database to obtain a plurality of large storage regions.

S302, a link relation is established between the same large-class storage areas in the original database and the newly added database.

In the step, the newly added database is subjected to region division according to the region division result of the original database to obtain a plurality of large-class storage regions, and a link relation is established between the same large-class storage regions in the original database and the newly added database.

And S303, performing area division on all the large-class storage areas according to the area division result of the original database to obtain a plurality of small-class storage areas.

S304, establishing a link relation between the same small storage areas in different large storage areas, and generating a division result of the newly added database area.

In the step, all the large-class storage regions are subjected to region division according to the region division result of the original database, the newly added database is subjected to region subdivision in the same way to obtain small-class storage regions, and a link relation is established between the same small-class storage regions in different large-class storage regions so as to improve the interoperability between data and finally generate the region division result of the newly added database.

As shown in fig. 4, as a preferred embodiment of the present invention, the step of analyzing the read-write request after receiving the read-write request, determining a data variation area, and completing corresponding read-write operation in the data variation area specifically includes:

s501, receiving the read-write request, and analyzing the read-write request to obtain a request analysis result.

In this step, first, a read-write request is received, where the read-write request includes data to be read-written and can be analyzed to obtain a request analysis result;

and S502, judging the data attribution type according to the request analysis result, and determining the data change area.

S503, searching the data change area, and completing the corresponding operation in the data change area.

In the step, the data attribution type is judged according to the request analysis result, the area where the read-write operation is to occur is determined according to the type of the target data so as to judge the data change area, and after the determination, the search is carried out in the data change area, so that the data search speed is greatly improved, and the corresponding read-write operation is completed in the data change area.

As shown in fig. 5, the present invention provides a big data-based database expansion apparatus, which is characterized in that the apparatus includes:

the data reading module 100 is configured to read stored data in an original database, and perform data classification on the stored data to obtain a data classification result.

In the present system, the data reading module 100 reads the stored data in the original database, so as to classify the data according to the content contained in the data, that is, to classify the stored data to obtain the data classification result, which is used to analyze the stored data, so as to classify the data according to the data type, and sequentially serve as the basis for subsequent partitioning.

The first region dividing module 200 is configured to perform region division on the original database according to the data classification result to generate a region division result of the original database, where the region division result of the original database at least includes region division flag information and region division memory information.

In the present system, the first region dividing module 200 generates an original database region division result according to the size of the memory occupied by each type of data, where the original database region division result at least includes divided region flag information and divided region memory information, that is, each divided region is marked and given a piece of divided region flag information, the divided region flag information is unique and is used as a specific mark of the region, and the divided region memory information refers to the size of the memory occupied by the region.

The second region dividing module 300 is configured to perform region division on the newly added database according to the region division result of the original database, and generate a region division result of the newly added database.

In the present system, the second region dividing module 300 records the size of the memory occupied by each type of data in the original database dividing result, divides the newly added database according to the memory occupation ratio, and divides a plurality of independent regions, so that the proportion of the memory occupied by each region is the same as the partition condition in the original database, and since the memory capacity of the newly added database may be different from that of the original database, it is only necessary to ensure that the memory occupation ratio between each region in the original database and each region in the newly added database is the same.

And the data storage module 400 is configured to perform partition storage on the stored data according to the new database region partition result and the original database region partition result.

In the present system, the data storage module 400 performs partition storage on the stored data according to the partition result of the newly added database and the partition result of the original database, and after the partition of the original database and the newly added database, a plurality of regions with the same memory ratio are partitioned in the original database and the newly added database, so that the corresponding data in the stored data is distributed according to the memory ratio occupied by the same partition in the original database and the newly added database.

The data read-write module 500 is configured to, after receiving the read-write request, analyze the read-write request, determine a data change area, and complete a corresponding read-write operation in the data change area.

In the system, after receiving the read-write request, the data read-write module 500 analyzes the read-write request, and by analyzing the read-write request, the target data of the read-write operation at this time can be known, and the area where the read-write operation is to occur is determined according to the type of the target data to determine the data change area.

As shown in fig. 6, as a preferred embodiment of the present invention, the first area division module 200 includes:

and the data type identification unit 201 is used for analyzing the stored data and determining the number of the types of the stored data.

In this module, the data type identification unit 201 analyzes the stored data, and determines the number of data types included in the stored data through analysis, specifically, the stored data may be analyzed according to a preset data type database, all types of data are recorded in the data type database, and the number of data types included in the stored data is quickly determined through query and comparison.

The first large-class area dividing unit 202 is configured to perform large-class area division on the original database according to the number of types of stored data to obtain a plurality of large-class storage areas.

In this module, the first large-class area dividing unit 202 performs large-class area division on the original database according to the number of types of stored data, and performs area division according to the large class of data when the area division is performed this time, for example, for video, audio, and pictures, the area division is performed according to the video large class, the audio large class, and the picture large class, and three large-class storage areas are obtained by the division.

The first subclass data dividing unit 203 is configured to analyze stored data belonging to each major storage area, perform subclass area division, divide each major storage area into multiple subclass storage areas, establish a link relationship between the same subclass storage areas in different major storage areas, and generate an original database area division result.

In this module, the first subclass data dividing unit 203 divides the large-class regions into distinct data to be stored in each large-class region, and then further divides each large-class storage region into a plurality of subclass storage regions for improving data retrieval efficiency, for example, for the video large-class storage region, the video large-class storage region is further divided according to the content of the video, and is further divided into people, scenery, articles, and the like for further subdivision, and for the large class of photos, the same or similar classification exists, and a link relationship is established between the same subclass storage regions in different large-class storage regions, so as to generate the original database region division result.

As shown in fig. 7, as a preferred embodiment of the present invention, the second area division module 300 includes:

the second large-class region dividing unit 301 is configured to perform region division on the newly added database according to the region division result of the original database, so as to obtain a plurality of large-class storage regions.

The large-scale data linking unit 302 is configured to establish a linking relationship between the same large-scale storage areas in the original database and the new database.

In the module, the newly added database is subjected to region division according to the region division result of the original database to obtain a plurality of large-class storage regions, and a link relation is established between the same large-class storage regions in the original database and the newly added database.

The second small class area dividing unit 303 is configured to perform area division on all large class storage areas according to an original database area division result to obtain a plurality of small class storage areas.

And a subclass data linking unit 304, configured to establish a linking relationship between the same subclass storage areas in different large-class storage areas, and generate a new database area division result.

In the module, all the large-class storage regions are subjected to region division according to the region division result of the original database, the newly added database is subjected to region subdivision in the same way to obtain small-class storage regions, and a link relation is established between the same small-class storage regions in different large-class storage regions so as to improve the interoperability between data and finally generate the region division result of the newly added database.

It should be understood that, although the steps in the flowcharts of the embodiments of the present invention are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in various embodiments may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A big data-based database expansion method is characterized by comprising the following steps:

2. The method for database expansion based on big data according to claim 1, wherein the step of performing region partition on the original database according to the data classification result to generate the region partition result of the original database specifically comprises:

analyzing the stored data to determine the type number of the stored data;

3. The method for database expansion based on big data according to claim 1, wherein the step of performing region partition on the newly added database according to the region partition result of the original database and generating the region partition result of the newly added database specifically comprises:

4. The method for expanding a database based on big data according to claim 1, wherein the step of analyzing the read-write request after receiving the read-write request, determining a data change area, and completing the corresponding read-write operation in the data change area specifically comprises:

5. The method for expanding the database based on the big data according to claim 1, wherein the step of partitioning the original database according to the data classification result further includes generating a data index.

6. The method according to claim 1, wherein in the step of analyzing the read-write request and determining the data change area after receiving the read-write request, if there is no corresponding data change area, a new data type area is separately divided, and the read-write operation is completed in the new data type area.

7. The big-data-based database expansion method according to claim 1, wherein the partitioned-region memory information is determined according to a size of data to be stored in each partitioned region.

8. An apparatus for expanding database based on big data, the apparatus comprising:

9. The big data based database expansion device according to claim 8, wherein the first region division module comprises:

10. The big data based database expansion device according to claim 8, wherein the second region division module comprises: