CN110955704A

CN110955704A - Data management method, device, equipment and storage medium

Info

Publication number: CN110955704A
Application number: CN201911222661.8A
Authority: CN
Inventors: 孟宪奎; 刘振华; 谢永恒; 万月亮
Original assignee: Beijing Ruian Technology Co Ltd
Current assignee: Beijing Ruian Technology Co Ltd
Priority date: 2019-12-03
Filing date: 2019-12-03
Publication date: 2020-04-03

Abstract

The embodiment of the invention discloses a data management method, a data management device, data management equipment and a storage medium. The method comprises the following steps: receiving data to be managed uploaded by a service end, and determining access time and data type of the data to be managed; when the access time meets a real-time storage condition, determining a first target storage area to which the data to be managed belongs according to the data type, and sending the data to be managed to the first target storage area in a data segment form for storage, wherein the first target storage area is a pre-divided storage area. The problem that data cannot be normally input due to overlarge data amount caused by inputting all data in the data storage process is solved, and the success rate of data input is improved; the storage area is determined according to the data type, the data of different data types are stored in the storage area, the problem of difficult query caused by overlarge data amount stored by using a single table in the data storage process is solved, and the query efficiency is improved.

Description

Data management method, device, equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of data management, in particular to a data management method, a data management device, data management equipment and a storage medium.

Background

With the gradual application of big data in various industries, the query of massive data meets unprecedented challenges. In the field of big data, the multi-factor requirements of high concurrency, high performance, high storage and the like are ensured. Particularly, in a business scenario involving querying staff information in a time series, generally, a relational database is used to store and process time series big data, however, due to the inherent disadvantage of the relational database, the time series big data cannot be efficiently stored and queried, and thus the business use of the time series big data is affected.

Currently, a special massive data time sequence library Druid is adopted by related technicians to manage time sequence big data, so that the time sequence big data can be efficiently stored and rapidly processed. However, in the current scenario, if the time series data has strong discreteness and time delay, a serious data intake scheduling problem occurs in the data consumption process of the Druid, for example, segment data cannot be loaded from the distributed system infrastructure hadoop, the metadata data set mysql has a large load, and the like, which causes a large probability failure of a data intake task. In addition, in the data query link, the query performance is poor due to the fact that single-table storage is adopted at present and the data size is large.

Disclosure of Invention

The embodiment of the invention provides a data management method, a data management device, data management equipment and a storage medium, which are used for realizing rapid data query.

In a first aspect, an embodiment of the present invention provides a data management method, where the data management method includes:

receiving data to be managed uploaded by a service end, and determining access time and data type of the data to be managed;

when the access time meets a real-time storage condition, determining a first target storage area to which the data to be managed belongs according to the data type, and sending the data to be managed to the first target storage area in a data segment form for storage, wherein the first target storage area is a pre-divided storage area.

In a second aspect, an embodiment of the present invention further provides a data management apparatus, where the data management apparatus includes:

the receiving module is used for receiving the data to be managed uploaded by the service terminal and determining the access time and the data type of the data to be managed;

and the storage module is used for determining a first target storage area to which the data to be managed belongs according to the data type when the access time meets a real-time storage condition, and sending the data to be managed to the first target storage area in a data segment form for storage, wherein the first target storage area is a pre-divided storage area.

In a third aspect, an embodiment of the present invention further provides an apparatus, where the apparatus includes:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement a method as in any one of the embodiments of the invention.

In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the method according to any one of the embodiments of the present invention.

The embodiment of the invention receives the data to be managed uploaded by a service terminal, and determines the access time and the data type of the data to be managed; when the access time meets a real-time storage condition, determining a first target storage area to which the data to be managed belongs according to the data type, and sending the data to be managed to the first target storage area in a data segment form for storage, wherein the first target storage area is a pre-divided storage area. The data with the access time meeting the real-time storage condition is stored, so that the problems that the data volume is too large and the data cannot be normally input due to the fact that all data are input in the data storage process are solved, and the success rate of data input is improved; the storage area is determined according to the data type, the data of different data types are stored in the storage area, the problem of difficult query caused by overlarge data amount stored by using a single table in the data storage process is solved, and the query efficiency is improved.

Drawings

FIG. 1 is a flow chart of a data management method according to a first embodiment of the present invention;

FIG. 2 is a flowchart of a data management method according to a second embodiment of the present invention;

FIG. 3 is a flowchart illustrating a data storage implementation according to a second embodiment of the present invention;

FIG. 4 is a flowchart illustrating an implementation of data query according to a second embodiment of the present invention;

FIG. 5 is a flowchart of a data management method according to a third embodiment of the present invention;

FIG. 6 is a flowchart illustrating a data management method according to a third embodiment of the present invention;

fig. 7 is a structural diagram of a data management apparatus in a fourth embodiment of the present invention;

fig. 8 is a schematic structural diagram of an apparatus in the fifth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a flowchart of a data management method according to an embodiment of the present invention, where the method provided in the embodiment of the present invention is applicable to a client that manages a mass data time sequence library, and the embodiment is applicable to a case of managing large data, and the method may be executed by a data management device, and specifically includes the following steps:

and step 11, receiving the data to be managed uploaded by the service terminal, and determining the access time and the data type of the data to be managed.

The service end can be specifically understood as a user end for collecting and uploading data; the data to be managed can be specifically understood as data which are acquired and need to be stored in the Druid for regular management; the access time can be specifically understood as the time for uploading the data to be managed to the management device; the data type can be specifically understood as attribute information of data for distinguishing different kinds of data

Specifically, the mode of receiving the data to be managed uploaded by the service end may be to receive the data to be managed through a wireless network; the data type can be divided according to types such as an identity card, an MAC address, an IP address, a mobile phone number and the like; the mode for determining the access time and the data type of the data to be managed may be to determine which data is data of an identification card type data, an MAC address type data, an IP address type data, a mobile phone number type data, and the like according to the data to be managed, and the access time may be determined according to the generation time of the data carried by the data to be managed.

And step 12, when the access time meets a real-time storage condition, determining a first target storage area to which the data to be managed belongs according to the data type, and sending the data to be managed to the first target storage area in a data segment form for storage, wherein the first target storage area is a pre-divided storage area.

The real-time storage condition can be specifically understood as a preset time threshold, and when the time difference between the access time and the data generation time is smaller than the preset time threshold, the real-time storage condition is met; the first target storage area may be specifically understood as a storage area matched with the data to be managed and used for storing the data to be managed, and different storage areas are divided in advance according to different data types.

Specifically, the data are divided into different types, so each type has a corresponding storage area, when the access time of the data meets the real-time storage condition, a first target storage area to which the data to be managed belong is determined according to the data type of the data to be managed, and the management data are sent to the first target storage area in a data segment form for storage.

The method comprises the steps of receiving data to be managed uploaded by a service end, and determining access time and data type of the data to be managed; when the access time meets a real-time storage condition, determining a first target storage area to which the data to be managed belongs according to the data type, and sending the data to be managed to the first target storage area in a data segment form for storage, wherein the first target storage area is a pre-divided storage area. The data with the access time meeting the real-time storage condition is stored, so that the problems that the data volume is too large and the data cannot be normally input due to the fact that all data are input in the data storage process are solved, and the success rate of data input is improved; the storage area is determined according to the data type, the data of different data types are stored in the storage area, the problem of difficult query caused by overlarge data amount stored by using a single table in the data storage process is solved, and the query efficiency is improved.

Example two

Fig. 2 is a flowchart of a data management method according to a second embodiment of the present invention. The technical scheme of the embodiment is further refined on the basis of the technical scheme, and specifically mainly comprises the following steps:

and step 21, receiving the data to be managed uploaded by the service end, and determining the access time and the data type of the data to be managed.

And step 22, when the access time meets the real-time storage condition, extracting key attribute information in the data type, and determining a target physical storage table matched with the key attribute information from a pre-divided physical storage table set.

The key attribute information can be specifically understood as information which is carried in data and can be used for distinguishing data types, and the information can be an identity card, an MAC address, an IP address, a mobile phone number and the like; a physical storage table set may be understood in particular as a set of different tables storing different types of data; the target physical storage table may be specifically understood as a storage table corresponding to each data type.

Specifically, the key attribute item information forms of different types of data are different, for example, the digits of the mobile phone number and the identity card number are different, and the attribute items can be determined according to the obtained digits of the data; when the access time meets the real-time storage condition, a target physical storage table matched with the key attribute information can be determined from a pre-divided physical storage table set by extracting the key attribute item information in the data type. The data storage interval can be divided according to the data type, and the area for storing a large amount of data is divided again, so that the divided area is smaller, and the management of mass data is facilitated.

And step 23, determining the storage area to which the target physical storage table belongs as a first target storage area to which the data to be managed belongs.

Specifically, the target physical storage table is in a storage area, one storage area has a plurality of target physical storage tables, and when the target physical storage table for managing data is determined, the target storage area to which the data to be managed belongs can be determined according to the storage area to which the target physical storage table belongs, that is, the first target storage area is determined.

And 24, establishing mapping information between the data type of the data to be managed and the first target storage area, and storing the mapping information into an area mapping relation table.

The mapping information can be specifically understood as a corresponding relation between the data type and the first target storage area; the area mapping relationship table may be specifically understood as a table containing a plurality of kinds of mapping information.

Specifically, the target storage area may be determined according to the data type and the mapping table by using the data type of the data to be managed in the area mapping relationship table and the mapping information of the first target storage area.

And 25, sending the data to be managed to the first target storage area in a data segment mode for storage.

The data segment may be specifically understood as file data.

Specifically, the management data is stored in the form of a data segment when being stored in the first target storage area.

Further, the sending the data to be managed to the first target storage area in the form of data segments for storage includes: and sending the data to be managed to the first target storage area in a data segment form, and storing the data to be managed into a target physical storage table matched with the key attribute information of the data to be managed in the first target storage area.

Specifically, the final storage location of the data to be managed may be a target storage table, or may be a target storage area in a target physical storage table, and when the data to be managed is sent to the first target storage area in a data segment form during storage, the data to be managed is also stored in the target physical storage table in the first target storage area, which is matched with the key attribute information of the data to be managed, according to the mapping information, that is, is stored in the target physical storage table in the first target storage area according to the mapping relationship.

Illustratively, FIG. 3 provides a flowchart of a data storage implementation, in which a first target physical storage table 203, a second target physical storage table 204, a first target storage area 205, and a second target storage area 206 are shown, but it is understood that the target physical storage table and the target storage area are not limited to two. When the data needs to be stored in the Druid, the service end 201 sends the data to the client 202 for data writing, which may be by sending a data writing request, and when the client 202 receives the data writing request sent by the service end 201, the data uploaded by the service end 201 is received, the data type is determined, based on the data type, data indexing is performed according to the area mapping relation table, a target physical storage table and a target storage area where the data is located may be determined, and based on the target physical storage table and the target storage area information, the data is stored in the corresponding storage area.

And step 26, when a data query request sent by a service end is received, querying a region mapping relation table according to the data type in the data query request, and determining a target storage region to be searched.

Specifically, the manner of receiving the data query request sent by the service end may be request information transmitted through a wireless network.

When a service end wants to perform data query, a data query request is sent, data to be queried needs to be given when the data query is performed, the data type is determined according to the data, and a target storage area to which the data belongs is determined according to an area mapping relation table.

And 27, searching a target data segment corresponding to the data query request from the target storage area to be searched according to the key attribute information in the data query request.

Specifically, the target data segment corresponding to the data query request may be searched by traversing the target storage area, comparing the data in the target storage area one by one, and determining the target data segment corresponding to the data query request.

For example, fig. 4 provides a flow chart for implementing data query, where when data information stored in a dry needs to be queried, a service end 201 sends a data query request to a client 202, and when the client 202 receives the data query request sent by the service end 201, a data type is determined according to data to be queried, data indexing is performed according to an area mapping relationship table based on the data type, a table and a partition where the data is located can be determined, and the data is taken out from a corresponding storage area. When data storage and query are carried out, a consistent hash algorithm is adopted, the problem that on the basis of data identification, the data are positioned on a specific physical table according to data characteristics is mainly solved, query on each physical table is avoided, occupation of cluster system resources is reduced, and system concurrency is improved.

EXAMPLE III

Fig. 5 is a flowchart of a data management method according to a third embodiment of the present invention. The technical scheme of the embodiment is further refined on the basis of the technical scheme, and specifically mainly comprises the following steps:

and step 31, receiving the data to be managed uploaded by the service end, and determining the access time and the data type of the data to be managed.

And step 32, when the access time meets a real-time storage condition, determining a first target storage area to which the data to be managed belongs according to the data type, and sending the data to be managed to the first target storage area in a data segment form for storage, wherein the first target storage area is a pre-divided storage area.

And step 33, when the access time meets the offline storage condition, storing the data to be managed to a candidate additional recording data table of the Hadoop distributed storage system.

The offline storage condition can be specifically understood as a preset time value, and when the data access time exceeds the preset time value, the data access time is considered to meet the offline storage condition, and then offline storage is performed; the candidate entry data table may be specifically understood as a table storing data satisfying an offline storage condition. For example, the offline storage time is set to be longer than 4 hours, and when the data uploaded by the service end is longer than 4 hours, the data meets the offline storage condition, and the data is stored in the Hadoop.

Specifically, whether the access time meets the offline storage condition is judged, and when the offline storage condition is met, the management data is stored into the candidate additional recording data table on the Hadoop distributed storage system.

And step 34, when a data additional recording task is received, performing data grouping and sorting on the data to be managed in the candidate additional recording data table through the Hadoop to obtain the data table to be added and recorded which is stored on the Hadoop after sorting.

The additional recording task can be specifically understood as data additional recording at intervals set according to requirements; the data table to be recorded additionally can be understood as a table storing sorted data to be managed.

Specifically, the data grouping and sorting mode may be to divide and sort the data according to the data type and according to the time interval, the target physical storage table and the division rule of the target storage area, and store the sorted data in the data table to be recorded on the Hadoop.

Step 35, determining a second target storage area to which the grouped data to be managed in the data table to be recorded belongs, and merging the grouped data to be managed in the data table to be recorded into the corresponding second target storage area.

The second target storage area may be specifically understood as an area for storing data, and different storage areas are divided according to different data types.

Specifically, a second target storage area to which the grouped data to be managed belongs is determined according to the data type, and the data to be managed is merged into the corresponding second target storage area for storage management.

Step 36, determining the generation time of the data stored in each storage area divided in advance.

Specifically, the generation time of the data may be time-stamped when the data is generated, and the generation time of the data is determined by the time-stamping of the data.

Step 37, delete the stored data whose generation time is greater than the set time threshold from the corresponding storage area.

The set time threshold may be specifically understood as a time value preset according to actual conditions and requirements.

Specifically, when the generation time of the data is greater than a set time threshold, the outdated data is deleted according to a basic section deletion interface provided by the Druid.

By deleting the expired data, the expired data can be timely deleted, and the storage space on the bottom-layer storage hadoop is reduced.

For example, fig. 6 provides an exemplary flow chart of a data management method, where when data uploaded by a service end is real-time data, data is first indexed, information of a storage area to which the data belongs is determined, and the data is stored in a corresponding storage area on a drive. When the data uploaded by the service end is offline storage data, the data is stored in Hadoop, the data is grouped and sorted by starting a data logging task, the data is grouped and sorted according to time granularity, partitions and division rules of a branch table during grouping and sorting, the data falls onto the Hadoop again according to groups, and then a drive merging task is started to merge the grouped data onto the existing drive data segment, so that the logging of the delayed data is realized.

The method comprises the steps of receiving data to be managed uploaded by a service end, and determining access time and data type of the data to be managed; when the access time meets a real-time storage condition, determining a first target storage area to which the data to be managed belongs according to the data type, and sending the data to be managed to the first target storage area in a data segment form for storage, wherein the first target storage area is a pre-divided storage area. The data with the access time meeting the real-time storage condition is stored, so that the problems that the data volume is too large and the data cannot be normally input due to the fact that all data are input in the data storage process are solved, and the success rate of data input is improved; the storage area is determined according to the data type, the data of different data types are stored in the storage area, the problem of difficult query caused by overlarge data amount stored by using a single table in the data storage process is solved, and the query efficiency is improved. By supplementing the data, the integrity of the data is ensured; by deleting the expired data, the expired data can be timely deleted, and the storage space on the bottom-layer storage hadoop is reduced.

Example four

Fig. 7 is a structural diagram of a data management apparatus according to a fourth embodiment of the present invention, where the apparatus includes: a receiving module 41 and a storing module 42.

The receiving module 41 is configured to receive data to be managed uploaded by a service end, and determine access time and a data type of the data to be managed; and the storage module 42 is configured to, when the access time meets a real-time storage condition, determine a first target storage area to which the data to be managed belongs according to the data type, and send the data to be managed to the first target storage area in a data segment form for storage, where the first target storage area is a pre-divided storage area.

Further, the storage module 42 includes:

and the extracting unit is used for extracting the key attribute information in the data type and determining a target physical storage table matched with the key attribute information from a pre-divided physical storage table set.

And the determining unit is used for determining the storage area to which the target physical storage table belongs as a first target storage area to which the data to be managed belongs.

Further, the manner of sending the data to be managed to the first target storage area in the form of data segments for storage may be: and sending the data to be managed to the first target storage area in a data segment form, and storing the data to be managed into a target physical storage table matched with the key attribute information of the data to be managed in the first target storage area.

Further, the storage module 42 further includes:

and the mapping unit is used for establishing the data type of the data to be managed and the mapping information of the first target storage area, and storing the mapping information into an area mapping relation table.

And the receiving unit is used for querying the area mapping relation table according to the data type in the data query request and determining a target storage area to be searched when the data query request sent by the service end is received.

And the searching unit is used for searching the target data segment corresponding to the data query request from the target storage area to be searched according to the key attribute information in the data query request.

Further, the apparatus further comprises:

and the candidate storage module is used for storing the data to be managed to a candidate entry data table of the Hadoop distributed storage system when the access time meets an offline storage condition.

And the to-be-recorded module is used for performing data grouping and sorting on the to-be-managed data in the candidate complementary recording data table through the Hadoop when a data complementary recording task is received, and obtaining the to-be-recorded data table which is stored on the Hadoop after sorting.

And the area determining module is used for determining a second target storage area to which the grouped data to be managed in the data table to be recorded belongs, and merging the grouped data to be managed in the data table to be recorded into the corresponding second target storage area.

And the time determining module is used for determining the stored time of the data stored in each storage area which is divided in advance.

And the deleting module is used for deleting the stored data of which the stored time is greater than the set time threshold from the corresponding storage area.

The data management device provided by the embodiment of the invention can execute the data management method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

EXAMPLE five

Fig. 8 is a schematic structural diagram of an apparatus according to a fifth embodiment of the present invention, as shown in fig. 8, the apparatus includes a processor 50, a memory 51, an input device 52, and an output device 53; the number of processors 50 in the device may be one or more, and one processor 50 is taken as an example in fig. 8; the processor 50, the memory 51, the input device 52 and the output device 53 in the apparatus may be connected by a bus or other means, which is exemplified in fig. 8.

The memory 51, as a computer-readable storage medium, may be used for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to a data management method in an embodiment of the present invention (for example, the receiving module 41 and the storage module 42 in the data management apparatus). The processor 50 executes various functional applications of the device and data processing by executing software programs, instructions and modules stored in the memory 51, that is, implements the above-described data management method.

The memory 51 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 51 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 51 may further include memory located remotely from the processor 50, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 52 is operable to receive input numeric or character information and to generate key signal inputs relating to user settings and function controls of the apparatus. The output device 53 may include a display device such as a display screen.

EXAMPLE six

An embodiment of the present invention further provides a storage medium containing computer-executable instructions, which when executed by a computer processor, perform a data management method, including:

Of course, the storage medium provided by the embodiment of the present invention contains computer-executable instructions, and the computer-executable instructions are not limited to the operations of the method described above, and may also perform related operations in the data management method provided by any embodiment of the present invention.

From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.

It should be noted that, in the embodiment of the data management apparatus, the included units and modules are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method for managing data, comprising:

2. The method according to claim 1, wherein the determining the first target storage area to which the data to be managed belongs according to the data type comprises:

extracting key attribute information in the data type, and determining a target physical storage table matched with the key attribute information from a pre-divided physical storage table set;

and determining the storage area to which the target physical storage table belongs as a first target storage area to which the data to be managed belongs.

3. The method of claim 2, wherein sending the data to be managed to the first target storage area for storage in data segments comprises:

and sending the data to be managed to the first target storage area in a data segment form, and storing the data to be managed into a target physical storage table matched with the key attribute information of the data to be managed in the first target storage area.

4. The method of claim 1, wherein before sending the data to be managed to the first target storage area in data segments for storage, the method further comprises:

and establishing mapping information of the data type of the data to be managed and the first target storage area, and storing the mapping information into an area mapping relation table.

5. The method of claim 4, further comprising:

when a data query request sent by a service end is received, querying a region mapping relation table according to a data type in the data query request, and determining a target storage region to be searched;

and searching a target data segment corresponding to the data query request from the target storage area to be searched according to the key attribute information in the data query request.

6. The method of claim 1, further comprising:

when the access time meets an offline storage condition, storing the data to be managed to a candidate additional recording data table of a distributed storage system Hadoop;

when a data additional recording task is received, performing data grouping and sorting on the data to be managed in the candidate additional recording data table through the Hadoop to obtain a data table to be additionally recorded which is stored on the Hadoop after sorting;

and determining a second target storage area to which the grouped data to be managed in the data table to be recorded belongs, and merging the grouped data to be managed in the data table to be recorded into the corresponding second target storage area.

7. The method of claim 1, further comprising:

determining the generation time of data stored in each storage area divided in advance;

and deleting the stored data with the generation time larger than the set time threshold from the corresponding storage area.

8. A data management apparatus, comprising:

9. An apparatus, characterized in that the apparatus comprises:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

10. A storage medium containing computer-executable instructions for performing the method of any one of claims 1-7 when executed by a computer processor.