CN109426438B

CN109426438B - Real-time big data mirror image storage method and device

Info

Publication number: CN109426438B
Application number: CN201710771908.6A
Authority: CN
Inventors: 涂锋; 尹启禄; 顾学伟; 王建宏; 刘钰柏; 黄志豪; 刘忱
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Group Guangdong Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Group Guangdong Co Ltd
Priority date: 2017-08-31
Filing date: 2017-08-31
Publication date: 2021-09-21
Anticipated expiration: 2037-08-31
Also published as: CN109426438A

Abstract

The embodiment of the invention provides a real-time big data mirror image storage method and a device, the method carries out data splitting on original data acquired from a real-time data source, carries out mirror image processing such as rearrangement, screening and deletion on the split data according to actual service requirements, and finally stores the data after mirror image processing, thereby reducing data redundancy and improving data availability. In addition, the method provided by the embodiment of the invention can also be used for carrying out inspection analysis on the cache data subjected to mirror image processing and the storage data finally stored in the specified path, and the storage can be finished only when the error between the cache data and the original data is small, so that the accuracy of the storage data can be increased, and powerful support is provided for the later data analysis.

Description

Real-time big data mirror image storage method and device

Technical Field

The embodiment of the invention relates to the technical field of software, in particular to a real-time big data mirror image storage method and device.

Background

With the rapid development of internet technology, big data has become a hot topic, especially for operators and large internet companies, data is growing in the magnitude of PB every day, and in order to respond to the call of parties and governments, each related enterprise vigorously develops the big data application industry, builds its own big data analysis and processing platform, and performs storage, analysis, application and the like of big data. In practical big data applications, the data acquisition is very real-time, for example: the real-time performance of the signaling data acquisition of an operator and the log data acquisition of an internet company ranges from minute to second, and the real-time data can be large data application with high real-time performance requirements, such as: the urban thermodynamic diagram brings the promotion of application accuracy and quality, so how to better store and analyze the acquired data, reduce the time from acquisition to storage to application, ensure the accuracy of the data, and is a problem to be solved urgently.

The current popular big data platform is mainly based on an open-source hadoop platform, and big data is stored through a Hadoop Distributed File System (HDFS). For the storage of real-time big data, the data is generally received, serialized and compressed, and then sequentially stored in a local file system as small files, after the absolute position of the small file is determined, the relative position of the small file is recalculated to be added as a big file to ensure the integrity of the file, and the big file can still be divided, and then the small file is asynchronously added into the HDFS.

However, in the process of implementing the invention, the inventor finds that the existing scheme has the following problems:

1. the data redundancy is large, after the data storage is completed, the subsequent data analysis application needs to perform a large amount of original processing on the original data, remove useless information and then use the useless information for analysis, and a large amount of useful computing resources are consumed;

2. the data missing possibility is high, and due to the fact that the data content is not checked after being stored, partial data can be missed and cannot be found, and the later-stage data analysis is inaccurate.

Disclosure of Invention

The embodiment of the invention provides a real-time big data mirror image storage method and device, which are used for overcoming the defects of large data redundancy and easy data loss of the existing big data storage method.

In a first aspect, an embodiment of the present invention provides a real-time big data mirror storage method, including:

receiving a real-time data source;

performing row-column splitting on original data in the real-time data source to obtain the original data record number of the original data; carrying out mirror image processing on the original data according to a preset mirror image algorithm to obtain a data result after mirror image processing, storing the data result into a cache variable, and recording the number of cache data records in the cache variable;

if the size of the cache variable reaches a set value, judging whether the error between the original data record number and the cache data record number is smaller than a preset threshold value;

if the number of the cache data in the cache variable is smaller than the preset value, storing the cache data in the cache variable into a storage file according to a specified configuration path, and recording the number of the stored data records in the storage file;

judging whether the error between the number of the cached data records and the number of the stored data records is smaller than a preset threshold value or not; and if the storage file is smaller than the preset storage file, sending the storage file to an external distributed storage system for storage.

Optionally, the mirroring processing on the original data according to a preset mirroring algorithm to obtain a mirrored data result includes:

loading a data mirror configuration table;

and mirroring the row and column data of each row in the original data according to the row data mirroring mapping relation configured in the configuration table to obtain a mirrored data result.

Optionally, the method further comprises:

acquiring the resource condition of a local system, and calculating the current resource load value of the local system;

if the resource load value of the native system is greater than a first threshold value, reducing a data mirroring processing queue;

if the resource load value of the native system is smaller than a second threshold value, adding a data mirror processing queue;

wherein the first threshold is greater than the second threshold.

Optionally, the method further comprises:

acquiring the resource condition of the external distributed storage system, and calculating the current resource load value of the external distributed storage system;

if the resource load value of the external distributed storage system is larger than a third threshold value, reducing a data mirror image storage queue;

if the resource load value of the external distributed storage system is smaller than a fourth threshold value, adding a data mirror image storage queue;

wherein the third threshold is greater than the fourth threshold.

In a second aspect, an embodiment of the present invention provides a real-time big data mirroring storage device, including:

the data receiving module is used for receiving a real-time data source;

the data mirror image processing module is used for splitting rows and columns of original data in the real-time data source to obtain the number of original data records of the original data; carrying out mirror image processing on the original data according to a preset mirror image algorithm to obtain a data result after mirror image processing, storing the data result into a cache variable, and recording the number of cache data records in the cache variable;

the data checking module is used for judging whether the error between the original data record number and the cache data record number is smaller than a preset threshold value or not if the size of the cache variable reaches a set value;

the data mirror image storage module is used for storing the data in the cache variables into a storage file according to a specified configuration path and recording the number of stored data records in the storage file if the judgment is smaller than the preset value;

the data checking module is further configured to determine whether an error between the number of cached data records and the number of stored data records is smaller than a preset threshold; and if the storage file is smaller than the preset storage file, sending the storage file to an external distributed storage system for storage.

Optionally, the data mirroring processing module is further configured to:

loading a data mirror configuration table;

Optionally, the apparatus further comprises a computing resource monitoring module configured to:

wherein the first threshold is greater than the second threshold.

wherein the third threshold is greater than the fourth threshold.

In a third aspect, a further embodiment of the present invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method according to the first aspect when executing the program.

In a fourth aspect, a further embodiment of the invention provides a computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the steps of the method according to the first aspect.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

fig. 1 is a flowchart of a real-time big data mirror storage method according to an embodiment of the present invention;

FIG. 2 is a flowchart of a method for storing a real-time big data mirror image according to an embodiment of the present invention;

fig. 3 is a schematic diagram of splitting and mirroring the original data according to the embodiment of the present invention;

FIG. 4 is a flowchart of a method for monitoring a local system and an external distributed storage system according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of an embodiment of a real-time big data mirroring storage device according to the present invention;

FIG. 6 is a schematic structural diagram of an embodiment of a real-time big data mirroring storage device according to the present invention;

fig. 7 is a block diagram of an embodiment of a computer device provided in the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In a first aspect, an embodiment of the present invention provides a real-time big data mirror storage method, as shown in fig. 1, including:

s101, receiving a real-time data source;

s102, performing row-column splitting on original data in the real-time data source to obtain the original data record number of the original data; carrying out mirror image processing on the original data according to a preset mirror image algorithm to obtain a data result after mirror image processing, storing the data result into a cache variable, and recording the number of cache data records in the cache variable;

s103, if the size of the cache variable reaches a set value, judging whether the error between the original data record number and the cache data record number is smaller than a preset threshold value;

s104, if the number of the cache data in the cache variable is smaller than the preset number, storing the cache data in the cache variable into a storage file according to a specified configuration path, and recording the number of the stored data records in the storage file;

s105, judging whether the error between the cached data record number and the stored data record number is smaller than a preset threshold value or not; and if the storage file is smaller than the preset storage file, sending the storage file to an external distributed storage system for storage.

The embodiment of the invention provides a real-time big data mirror image storage method, which is used for splitting data of original data acquired from a real-time data source, carrying out mirror image processing such as rearrangement, screening and deletion on the split data according to actual service requirements, and finally storing the data after mirror image processing, thereby reducing data redundancy and improving data availability. In addition, the method provided by the embodiment of the invention can also be used for carrying out inspection analysis on the cache data subjected to mirror image processing and the storage data finally stored in the specified path, and the storage can be finished only when the error between the cache data and the original data is small, so that the accuracy of the storage data can be increased, and powerful support is provided for the later data analysis.

To facilitate an understanding of the method provided by the above examples, an alternative implementation of the various steps in the method is described in detail below with reference to fig. 2.

And S101, receiving a real-time data source.

Specifically, the method may include:

(1) starting N (for example, 10) data receiving thread queues according to the system configuration data;

(2) each thread is in butt authentication with the data source server according to the data source configuration so as to be used for subsequently adapting the received external real-time data, such as KAFKA (open source real-time data transmission software) interface data, FTP interface data, file data or other interface data sources can be added;

(3) each thread queue receives an external real-time data source;

(4) checkpoint 1 is marked for data verification and checkpoint 1 is set to the variable CHECKPOINTDATA 1.

The real-time data source here may be: operator network signaling, internet company application system logs and the like, wherein the data content is mainly text, the data format is mainly according to line data, and each line of data is spaced by the same separator, such as:

line 1: a1, A2, A3, A4, A5\ r \ n

Line 2: b1, B2, B3, B3, B5\ r \ n

……

Where "\\ r \ n" is an autonomously definable row delimiter and "," is an autonomously definable row data field delimiter.

S102, splitting the original data in the real-time data source in a row-column mode to obtain the original data record number of the original data; and carrying out mirror image processing on the original data according to a preset mirror image algorithm to obtain a data result after mirror image processing, storing the data result into a cache file, and recording the number of cache data records in the cache file.

Specifically, the method may include:

(1) starting N (for example, 10) data mirroring thread queues according to the system configuration data;

(2) referring to fig. 3, after receiving data, each thread queue first performs data splitting on the data, and the splitting is performed first by rows and then by columns.

The line splitting is to divide the lines by line separators and store them in a line data array variable RowData [ n ], and record the number of data lines, which is understood to be the number of original data records, and then accumulate the number of original data records in CHECKPOINTDATA 1. For the example shown in fig. 3, chekpointdata 1 is 3.

When column splitting is performed, the row data array RowData [ n ] is read first, and a group of data (i.e. a row) is taken each time and split according to the configured column separators. Taking RowData [0] as an example, the first group of data RowData [0] is taken, and the data of RowData [0] is further taken as: a1, a2, A3, a4, a5, a6, a7, A8, a9, a 10. Here, the data separator is ",", and 10 data (i.e., a1 to a10) are produced by column separation by the separator. Setting an array variable COLDATA [ ] of 10 elements, and sequentially storing data in the COLDATA, wherein COLDATA [0] ═ A1 ', COLDATA [1] ═ A2', COLDATA [2] ═ A3 ', …, and COLDATA [9] ═ A10';

(3) and loading a data mirror configuration table, wherein the mirror configuration table can be set according to actual conditions. For example, the configuration format is: the data interface names are column 1, column 2, column 3, column 4, column 5, column 6, column 7, column 8, and column 9. The arrangement of columns can be set according to the service requirements, for example, IN1:0,2,1,4,3,5,7,8, 9. IN which IN1 is the name of the data interface, and 0,2,1,4,3,5,7,8, and 9 are the column data mirror mapping relationship. In the step (2), the sequence of the original data after the row-column splitting is 0,1,2,3,4,5,6,7,8, and 9. The mirror image configuration table can rearrange and screen the column data of the original data according to the business requirements, and can also remove useless data. The determination of whether the data is useless may be determined according to actual business conditions, for example, some fields are useless or some information is useless in some businesses, and the useless data can be removed through the mirroring step.

Then, storing the mirror image mapping data into a variable array, namely, the mirror image mapping data is stored into [0,2,1,4,3,5,7,8,9 ];

(4) as shown in fig. 3, data mirroring is performed according to the column mirroring relationship to obtain a new mirroring result. Specifically, the mirror image mapping array variable MIRRORTABLE [ ] and the column group COLDATA [ ] may be obtained first, the mirror image data storage variable MIRRORDATA is set, and then the mirror image mapping array variable is sequentially read as the serial number of the column group COLDATA [ ] to rearrange and accumulate the data and store the data in the data storage variable MIRRORDATA, that is, the mirror image mapping array variable is stored in the data storage variable MIRRORDATA

MIRRORDATA＝COLDATA[MIRRORTABLE[0]]+”,”

+COLDATA[MIRRORTABLE[1]]+”,”

+……

+COLDATA[MIRRORTABLE[8]]

The final result of the mirror image is:

MIRRORDATA ═ a1, A3, a2, a5, a4, a6, A8, a9, a10 ", where data" a7 "has been removed in a mirror image relationship;

(5) forming a new data result MIRRORDATAS after the data lines finish mirroring according to the steps (3) and (4);

(6) judging whether a cache variable used for data interface data exists at present, if not, creating a cache variable, accumulating the mirror image data MIRRORDATAS into the cache variable, and counting the number of cache data records in the cache variable. The checkpoint 2 variable for data verification is set to CHECKPOINTDATA2, and the number of buffered data records is stored in CHECKPOINTDATA 2. For the example shown in fig. 3, chekpointdata 2 is 3.

S103, if the size of the cache file reaches a set value, judging whether the error between the original data record number and the cache data record number is smaller than a preset threshold value or not;

the specific checking steps include: firstly, when the system is started, a data inspection thread is started to wait for data inspection. If the size of the buffer variable is determined to reach the set value, the chekpointdata 1 (the number of original data records) and the chekpointdata 2 (the number of buffer data records) of the current thread can be obtained and compared. If the error of the data record number of the two data records is smaller than or equal to the preset threshold value, the detection is considered to be successful, namely, the data loss is small, the current accuracy of the data is high, and the next step of processing can be carried out at the moment; if the data loss is larger than the preset threshold value, the inspection is not successful, namely the data loss is more, the accuracy of the current data is lower, at the moment, early warning information can be set to inform workers that the data loss is more, whether the mirroring processing of the original data needs to be executed again or not is determined according to the configuration, and the result is recorded again until the result inspection is successful.

S104, if the current data is smaller than the preset data, storing the cache file into a storage file according to a specified configuration path, and recording the number of stored data records in the storage file;

specifically, the method comprises the following steps:

(1) starting N (for example, 10) data storage thread queues according to the system configuration data;

(2) the thread queue acquires the cache file which is successfully checked in S103 and needs to be transmitted, namely the data in the cache variable;

(3) the queue thread stores the cache data according to a configuration path through a storage interface of a distributed storage system (for example, an HDFS system);

(4) the queue thread obtains storage result information, including distributed storage information such as storage file blocks, paths, sizes and the like, counts the number of storage data records in the storage file, sets a check point 3 variable for data inspection to be CHECKPOINTDATA3, and stores the number of the storage data records into CHECKPOINTDATA 3.

S105, judging whether the error between the number of the cached data records and the number of the stored data records is smaller than a preset threshold value or not; and if the number of the storage files is smaller than the preset value, the storage files are sent to an external distributed storage system for storage.

The data check thread queue that was started at the beginning of system startup may now check for stored data records. The specific checking steps include: the queue obtains and compares the CHECKPOINTDATA2 (number of buffered data records) and CHECKPOINTDATA3 (number of stored data records) for the current thread. If the error of the data record number of the two data records is smaller than or equal to a preset threshold value, the verification is considered to be successful, namely the data loss is small, the current accuracy of the data is high, and the thread storage is finished at the moment; if the data loss is larger than the preset threshold value, the verification is not successful, namely the data loss is more, the accuracy of the current data is lower, at the moment, early warning information can be set to inform workers that the data loss is more, whether distributed storage of the cache variable data needs to be executed again or not is determined according to configuration, and the result is recorded again until the result is verified successfully. After the verification is successful, the storage file can be sent to an external distributed storage system for storage.

According to the embodiment of the invention, the primary data is subjected to mirror image processing through the steps, so that redundant data is removed, and the storage pressure is reduced. Meanwhile, the cached data, the final stored data and the initial original data in the period can be checked, and the further processing can be carried out when the error is smaller than a preset value, so that the quality of the data can be controlled at a plurality of links, the condition that the stored data is more lost is avoided, and the accuracy of the data is effectively improved.

In the existing distributed storage method, besides the defects of large data redundancy and large possibility of data loss, the defect of low processing efficiency also exists, so that the computing resources of the server cannot be reasonably utilized, specifically, the utilization rate is large when the computing amount of the server resources is large, and the utilization rate is small when the computing amount is small, so that the data arrival time delay is easily caused.

Based on this, the method provided by the embodiment of the present invention may further include:

(1) for the computational monitoring and regulation of the resource condition of the native system, as shown in fig. 4, the method specifically includes:

s1, acquiring the resource condition of the local system, and calculating the current resource load value of the local system;

specifically, the resource monitoring thread is started at the beginning of system startup. Thread 1 starts to acquire the resource condition of a local server at the frequency of once per second, and calculates the load value SysLoad of the current resource;

s2, if the resource load value SysLoad of the native system is larger than a first threshold value X1, reducing the data mirror image processing queue;

s3, if the resource load value SysLoad of the native system is smaller than a second threshold value X2, adding a data mirror image processing queue;

wherein the first threshold value X1 is here greater than the second threshold value X2. The native system specifically refers to a server for receiving, mirroring storage and data verification of the original data.

(2) For the calculation monitoring and regulation of the resource condition of the external distributed storage system, as shown in fig. 4, the calculation monitoring and regulation may specifically include:

s1', acquiring the resource condition of the external distributed storage system, and calculating the current resource load value of the external distributed storage system;

specifically, the resource monitoring thread is started at the beginning of system startup. The thread 2 starts to acquire the resource condition of the external distributed storage system at the frequency of once per second, and calculates the load value DFSLoad of the current resource;

s2', if the resource load value DFSLoad of the external distributed storage system is larger than a third threshold Y1, reducing the data mirroring storage queue;

s3', if the resource load value DFSLoad of the external distributed storage system is smaller than a fourth threshold Y2, adding a data mirror image storage queue;

wherein the third threshold value Y1 is here greater than the fourth threshold value Y2. The external distributed storage system is a file system which is external to the server and can perform distributed storage after dividing data files into blocks, is usually used for storage and parallel computing of large data, and has the characteristic of high availability.

It should be noted that fig. 4 only shows the case of performing calculation monitoring and control on both the local system and the external distributed storage system, and in practical application, only the local system may be monitored, only the external distributed storage system may be monitored, or both the local system and the external distributed storage system may be monitored simultaneously.

In a second aspect, an embodiment of the present invention provides a real-time big data mirroring storage device, as shown in fig. 5, including:

a data receiving module 201, configured to receive a real-time data source;

the data mirror image processing module 202 is configured to perform row-column splitting on the original data in the real-time data source to obtain an original data record number of the original data; carrying out mirror image processing on the original data according to a preset mirror image algorithm to obtain a data result after mirror image processing, storing the data result into a cache variable, and recording the number of cache data records in the cache variable;

the data checking module 203 is configured to determine whether an error between the original data record number and the cache data record number is smaller than a preset threshold value if the size of the cache variable reaches a set value;

the data mirror image storage module 204 is configured to, if the determination result is less than the predetermined threshold, store the data in the cache variable into a storage file according to a specified configuration path, and record the number of stored data records in the storage file;

the data checking module 203 is further configured to determine whether an error between the number of cached data records and the number of stored data records is smaller than a preset threshold; and if the storage file is smaller than the preset storage file, sending the storage file to an external distributed storage system for storage.

Optionally, the data mirroring processing module is further configured to:

loading a data mirror configuration table;

Optionally, the apparatus further comprises a computing resource monitoring module 205 configured to:

wherein the first threshold is greater than the second threshold.

wherein the third threshold is greater than the fourth threshold.

Fig. 6 shows a schematic structural diagram of a real-time big data mirroring storage device according to an embodiment of the present invention.

Since the real-time big data mirror storage device described in this embodiment is a device that can execute the real-time big data mirror storage method in the embodiment of the present invention, based on the real-time big data mirror storage method described in the embodiment of the present invention, a person skilled in the art can understand the specific implementation manner and various variations of the real-time big data mirror storage device in this embodiment, so a detailed description of how the real-time big data mirror storage device implements the real-time big data mirror storage method in the embodiment of the present invention is not given here. As long as a person skilled in the art implements the apparatus used in the method for real-time big data mirror storage in the embodiment of the present invention, the apparatus is within the scope of the present application.

Fig. 7 shows a block diagram of a computer device according to an embodiment of the present invention.

Referring to fig. 7, the computer apparatus includes: a processor (processor)301, a memory (memory)302, a bus 303, and a bus interface 304;

the processor 301 and the memory 302 complete communication with each other through the bus 303, and the bus interface 304 is used for interacting with external devices.

The processor 301 is configured to call program instructions in the memory 302 to perform the methods provided by the above-described method embodiments.

Embodiments of the present invention also disclose a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, which when executed by a computer, enable the computer to perform the methods provided by the above-mentioned method embodiments.

Embodiments of the present invention also provide a non-transitory computer-readable storage medium, which stores computer instructions, and the computer instructions cause the computer to execute the methods provided by the above method embodiments.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

Some component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components of a gateway, proxy server, system according to embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. A real-time big data mirror image storage method is characterized by comprising the following steps:

receiving a real-time data source;

judging whether the error between the number of the cached data records and the number of the stored data records is smaller than a preset threshold value or not; if the number of the storage files is smaller than the preset number, the storage files are sent to an external distributed storage system for storage;

the method for processing the mirror image of the original data according to a preset mirror image algorithm to obtain a data result after the mirror image comprises the following steps:

loading a data mirror configuration table;

2. The method of claim 1, further comprising:

wherein the first threshold is greater than the second threshold.

3. The method of claim 1, further comprising:

wherein the third threshold is greater than the fourth threshold.

4. A real-time big data mirror storage device, comprising:

the data receiving module is used for receiving a real-time data source;

the data checking module is further configured to determine whether an error between the number of cached data records and the number of stored data records is smaller than a preset threshold; if the number of the storage files is smaller than the preset number, the storage files are sent to an external distributed storage system for storage;

wherein the data mirror processing module is further configured to:

loading a data mirror configuration table;

5. The apparatus of claim 4, further comprising a computing resource monitoring module to:

wherein the first threshold is greater than the second threshold.

6. The apparatus of claim 4, further comprising a computing resource monitoring module to:

wherein the third threshold is greater than the fourth threshold.

7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1-3 are implemented when the program is executed by the processor.

8. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 3.