WO2017008658A1

WO2017008658A1 - Storage checking method and system for text data

Info

Publication number: WO2017008658A1
Application number: PCT/CN2016/088519
Authority: WO
Inventors: 李强
Original assignee: 阿里巴巴集团控股有限公司
Priority date: 2015-07-14
Filing date: 2016-07-05
Publication date: 2017-01-19
Also published as: CN106708648A; CN106708648B

Abstract

Disclosed are a storage checking method and system for text data. The method comprises: one or more application devices packaging one or more pieces of generated text data into one or more text data packets (101), the text data packets comprising attribute information; one or more transmission devices storing one or more text data packets in one or more pre-set storage devices (102); when storage succeeds, recording the attribute information in a pre-set statistical table (103); and when a checking device receives a storage checking request, performing storage checking according to the statistical table (104). In the method, by packaging text data, the text data on which statistics need to be made are reduced by many times, so that a statistical quantity value is greatly reduced, thereby greatly reducing the checking processing capacity, reducing the performance consumption of a system, greatly improving the practicability during big data processing, and realizing the overall storage checking for big data in cloud.

Description

Method and system for storing text data

The present application claims the priority of the Chinese Patent Application No. 201510412446.X filed on July 14, 2015, entitled "Storage Checking Method and System for Text Data", the entire contents of which are incorporated herein by reference. In the application.

Technical field

The present application relates to the field of computer processing technologies, and in particular, to a storage verification method for text data and a storage verification system for text data.

Background technique

With the advent of the cloud era, more and more platforms generate big data in social networks, e-commerce, access records and other sources, that is, very large amounts of data, for example, between 100T and 100P a day, or even larger. The amount of data, and the total number of machines that produce this data is between 10,000 and 1 million units, or even more.

The data generated by many businesses under the cloud, such as the pv (views) logs generated by the website, generally need to be stored in real time, verify the integrity of the data, and ensure the accuracy of data mining and other processing.

At present, although some systems provide a message verification mechanism to verify the success of the storage operation, this detection mechanism is only a local detection mechanism and cannot be applied to the storage verification of big data in the cloud era.

Summary of the invention

In view of the above problems, embodiments of the present application have been made in order to provide a storage verification method for text data and a corresponding storage verification system for text data that overcome the above problems or at least partially solve the above problems.

In order to solve the above problem, the embodiment of the present application discloses a method for storing and verifying text data, including:

The one or more application devices package the generated one or more text data into one or more text data packets; the text data packets have attribute information therein;

The one or more transmission devices store the one or more text data packets in a preset one or more storage devices, and when the storage is successful, record the attribute information in a preset statistical table;

When the verification device receives the storage verification request, the verification device performs storage verification according to the statistical table.

Preferably, the step of the one or more application devices packaging the generated one or more text data into one or more text data packets comprises:

When the size of the generated text data matches the preset size threshold, the generated text data is packed into text data pack;

or,

The generated text data is packaged into a text packet when the current time exceeds a preset time threshold.

Preferably, the statistical table includes a first statistical table, the first statistical table includes a storage time; the transmission device has a transmission device identifier; the attribute information includes an application device identifier, and a generation time;

The step of recording the attribute information in a preset statistical table includes:

Finding a first statistical table corresponding to the application device identifier;

Determining whether there is a feature text data packet that is not successfully stored; the generation time of the feature text data packet is smaller than a generation time of the currently stored text data packet;

If not, in the first statistics table, the generation time is updated to the storage device identifier and the storage time corresponding to the application device identifier.

Preferably, the statistical table includes a second statistical table, where the second statistical table includes a partitioning time period and a storage line number; the attribute information includes a generation time and a number of packet data lines;

In the second statistic table, searching for a partition time period corresponding to the first statistic table and the generation time;

The number of packet data lines is accumulated into the number of storage lines corresponding to the partition time period.

Preferably, the step of performing verification verification by the verification device according to the statistical table comprises:

In the first statistical table corresponding to the application device identifier, searching for a storage time with a minimum value;

It is confirmed that the text packet whose generation time is less than the storage time has been stored.

The number of storage lines falling within the partition time period of the specified proofreading period is counted, and the first total number of rows is obtained.

Preferably, the step of performing verification verification by the verification device according to the statistical table further includes:

Reading, from the storage device, a second total number of lines of text packets stored in the proofreading period;

When the first total number of rows is equal to the second total number of rows, it is confirmed that the text data packet corresponding to the proofreading period is not lost;

When the first total line number is not equal to the second total line number, it is confirmed that the text data packet corresponding to the proofreading time period is at least partially lost.

Preferably, the attribute information includes a generation time, and the storage device includes one or more storage partitions;

The one or more transmission devices store the one or more text data packets to a preset one or more stores The steps in the storage device include:

The text data packet is stored in a storage partition corresponding to the generation time in a preset storage device.

The embodiment of the present application further discloses a storage verification system for text data, where the system includes one or more application devices, one or more transmission devices, one or more storage devices, and a verification device;

The application device includes:

a text data packaging module, configured to package the generated one or more text data into one or more text data packets; the text data package has attribute information;

The transmission device includes:

a text data packet storage module, configured to store the one or more text data packets into a preset one or more storage devices;

An attribute information recording module, configured to record the attribute information in a preset statistical table when the storage is successful;

The verification device includes:

The storage verification module is configured to perform storage verification according to the statistical table when receiving the storage verification request.

Preferably, the text data packaging module comprises:

a first packaging submodule, configured to package the generated text data into a text data packet when the size of the generated text data matches a preset size threshold;

or,

The second packaging sub-module is configured to package the generated text data into a text data packet when the current time exceeds a preset time threshold.

The attribute information recording module includes:

a table search submodule, configured to search for a first statistical table corresponding to the application device identifier;

a feature text data packet judging sub-module, configured to determine whether there is a feature text data packet that is not successfully stored; if not, a time update sub-module is invoked; the feature text data packet is generated less than the currently stored text data packet Time of production;

And a time update submodule, configured to update the generation time to the storage time corresponding to the transmission device identifier and the application device identifier in the first statistics table.

The attribute information recording module includes:

a partitioning time period searching sub-module, configured to search, in the second statistical table, a partitioning time period corresponding to the first statistical table and the generating time to which the generating time belongs;

The storage line number accumulation sub-module is configured to accumulate the number of the packet data lines to the number of storage lines corresponding to the partition time period.

Preferably, the storage verification module comprises:

a storage time search submodule, configured to search for a storage time with a minimum value in a first statistical table corresponding to the application device identifier;

The storage completion confirmation sub-module is configured to confirm that the text data packet whose generation time is less than the storage time has been stored.

Preferably, the storage verification module comprises:

The storage line number statistics sub-module is configured to count the number of storage lines of the partition time period falling within the specified proofing time period, and obtain the first total number of lines.

Preferably, the storage verification module further includes:

a second total line number reading submodule, configured to read, from the storage device, a second total line number of the text data packet stored in the proofreading period;

The acknowledgment sub-module is configured to: when the first total number of rows is equal to the second total number of rows, confirm that the text data packet corresponding to the proofreading period is not lost;

And a loss confirmation submodule, configured to confirm that the text data packet corresponding to the proofreading period is at least partially lost when the first total number of rows is not equal to the second total number of rows.

The text packet storage module includes:

And a partition storage submodule, configured to store the text data packet into a storage partition corresponding to the generation time in a preset storage device.

Embodiments of the present application include the following advantages:

The application device of the embodiment of the present application packs the generated text data into a text data packet, which is stored in the storage device by the transmission device. When the storage is successful, the attribute information is recorded, and the verification device performs the verification check according to the statistical attribute information. The storage condition is verified. By packing the text data, the text data that needs to be counted is reduced by many times, thereby greatly reducing the statistical value, thereby greatly reducing the processing amount of the verification and reducing the performance of the system. Consumption greatly increases the practicality of big data processing and realizes the overall storage verification for big data in the cloud.

In the embodiment of the present application, the storage time is updated in the first statistical table, and the storage time of the minimum text value is obtained by comparing the storage times, thereby realizing the storage and persistent verification of the massive text data.

The embodiment of the present application accumulates the number of stored lines in the second statistical table, and realizes the storage quantity verification of the massive text data by accumulating the number of storage lines of the required partition time period.

The embodiment of the present application implements a storage loss verification of a large amount of text data by comparing the first total line number based on the statistics of the transmission device with the second total line number based on the statistics of the storage device.

DRAWINGS

1 is a flow chart showing the steps of an embodiment of a method for storing and verifying text data according to the present application;

2 is a structural block diagram of an embodiment of a storage verification system for text data according to the present application.

detailed description

The above described objects, features and advantages of the present application will become more apparent and understood.

The cloud platform can provide large data processing and storage capabilities to a large number of users. Many services in the cloud platform, such as a pv (views) log of a website, need to be written to the cloud platform network side device.

Assume that the front-end equipment of the website is 1000, and each device generates access.log logs in real time. If the access.log logs of the 1000 machines are written in real time to the network side device of the cloud platform, the quality of the data for the service is now higher. The higher the rate, the more traffic is not allowed to lose any data, this time requires a mechanism to ensure the integrity of the data.

When storing, it is generally necessary to determine that the log is not lost, and ensure that all devices are successfully written to the cloud platform network side device before a certain time, or accurately give the access.log of all current machines before a certain time. Successfully written to the cloud platform network side device.

In the current method, the ability to provide data without loss is end-to-end. For example, although the Flume system provides the ack mechanism of the message, the ack mechanism is only a partial confirmation mechanism, and it is impossible to confirm how much the Flume system itself has successfully written. The data cannot be confirmed whether the data in a certain period of time is lost during storage, and it is impossible to determine the total number of pieces of data on the network side device of the cloud platform.

If you finish the count operation after writing the cloud storage, it is obviously very inefficient.

Therefore, one of the core concepts of the embodiments of the present application is proposed, and the data packets are encapsulated by packing the data. Count for storage verification.

Referring to FIG. 1 , a flow chart of steps of a method for storing and verifying text data of the present application is shown, which may specifically include the following steps:

Step 101: One or more application devices package the generated one or more text data into one or more text data packets;

It should be noted that the present application can be applied to a cloud platform, that is, a computer cluster, such as a distributed system.

Taking a distributed system as an example, the distributed system can be divided into the following parts:

Distributed System Underlying Services: Provides services for coordination services, remote procedure calls, security management, and resource management that are required in a distributed environment. These underlying services support the upper distributed file system, task scheduling and other modules.

Distributed File System: Provides a massive, reliable, and scalable data storage service that aggregates the storage capabilities of each node in the cluster and automatically shields hardware and software failures to provide users with uninterrupted data access services. Incremental expansion and automatic data balancing, providing user space file access API (Application Program Interface), support random read and write and additional write operations.

Task scheduling: Provide scheduling services for tasks in the cluster system, support online service (Online Service) that emphasizes response speed, and Batch Processing Job that emphasizes processing data throughput; automatically detect faults and hotspots in the system, pass errors Retry, for long tail operations concurrent backup jobs, etc., to ensure that the operation is completed in a stable and reliable manner.

Cluster monitoring and deployment: Monitor the status of the cluster and the running status and performance indicators of the upper-layer application services, and generate alarms and records for abnormal events. Provide deployment and configuration management for the entire distributed system and upper-layer applications for operation and maintenance personnel. Online expansion of cluster expansion, capacity reduction and application services.

In a cloud platform (such as a distributed system), an application device may be a device that can generate text data during application service operation, such as a server.

It should be noted that the text data is time-series data, and is generated in time sequence, such as pv log, access.log log, system running log, and the like.

In a specific implementation, the application device can package the text data in several ways:

In one mode, when the size of the generated text data matches a preset size threshold, the generated text data is packaged into a text data packet;

In this way, you can package based on the dimension of the amount of data.

The application device packs the generated text data according to the size threshold, so that the text data that needs to be counted can be reduced by many times, thereby greatly reducing the statistical value. At the same time, the text data packet can also be set with attribute information for filtering and the like. Marking operation for real-time value-added processing.

For example, suppose the average size of each text data is 1k. If the threshold of a text packet is 512K, that is, the size of a text packet is 512K, if the generated text data is 51.2 billion, the prior art is used for statistics. Then it needs to count 51.2 billion times, and after packaging, 51.2 billion pieces of text data becomes 100 million text data packets, and the amount of text data that needs to be statistically reduced is 512 times.

In another mode, the generated text data is packaged into a text data packet when the current time exceeds a preset time threshold.

In this way, you can package based on the dimension of time.

Since the text data is time-series data, it can also be partitioned by time in the cloud network side device.

For example, if the text data is partitioned by the hour, there are 24 partitions in the cloud network side device, the names are 00, 01...23, and the text data generated by 00:00:00~00:59:59 is stored in the 00 partition. The text data from 01:00:00 to 01:59:59 is stored in the 01 partition, and the text data generated at other times is stored in a similar manner.

After being packaged, the text data packet can also be stored in the corresponding partition according to the generation time. In general, it is necessary to ensure the correctness of the text data packet falling into the partition, for example, a text data packet falls into the 00 partition, where Text data packets generally do not contain text data generated from 01:00:00 to 01:59:59, otherwise the text data generated from 01:00:00 to 01:59:59 will be placed in the 00 partition, which will result in The drift of text data (inaccurate partitioning) is also a data quality failure.

Of course, the drift of text data usually does not need to be 100% avoided, but it cannot be too large. If it is packaged with a shorter time threshold such as 5 minutes, the text data can be effectively prevented from drifting, and the situation of mis-segmentation can be controlled to one. Accepted within the error range.

Of course, the above packing mode can be used at the same time. For example, if the size threshold is 512K and the time threshold is 5 minutes, the text data of 13:00:00-13:04:59 is packaged, and three text data packets are generated, respectively. A1 (the first text data is generated at 13:00:00, the size is 512K), A2 (size is 512K), A3 (the size is 402K, and the last text data is generated at 13:04:59). ).

In addition, the foregoing packaging mode is only an example. When the embodiment of the present application is implemented, other packaging modes may be set according to actual conditions, which is not limited by the embodiment of the present application. In addition, in addition to the above-mentioned packaging method, other packaging methods may be adopted by those skilled in the art according to actual needs, and the embodiment of the present application does not limit this.

It should be noted that text packets can be compressed to save bandwidth during network transmission. The embodiment of the present application does not limit this.

When the text data is packaged successfully, attribute information can be configured for the text data packet, that is, the text data package has attribute information.

In a specific implementation, in addition to the text data itself, the structure of the text data packet may also set data called an attribute, and these attributes have corresponding names for storing corresponding attribute information, and the attribute information may include Application device identification, generation time, number of package data lines.

The application device identifier (HostName) is an identifier of an application device that generates text data in the text packet, that is, a uniquely determined information of the application device, such as an application device ID and a host address.

The time (FileTime) is the time at which the text data in the text packet is generated. In general, the time when the first piece of text data in the text packet is generated may be used as the generation time of the text packet.

The number of packets (LineCount) is the number of rows of all text data in the text packet; if in the database, you can use the select count(1) from table_name command to count the number of packet data rows in the text packet; In the database or distributed environment, you can use Map Reduce to scan the text data and add it to the number of packet data rows.

When the text packet is successfully packaged, the text packet can be sent to one or more transmission devices.

It should be noted that in a cloud platform (such as a distributed system), a mechanism such as ack is generally used to ensure the success of the transmission, and no packet loss occurs. If the text packet fails to transmit, the device continues to resend until the transmission succeeds. .

Step 102: The one or more transmission devices store the one or more text data packets in a preset one or more storage devices;

In a cloud platform (such as a distributed system), the transmission device may be a device that transmits data (such as a text packet) to a processing node (such as a storage device), and the storage device may be a device that stores data (such as a text packet).

In the embodiment of the present application, the cloud platform provides an API (Application Program Interface) for storing data, and the API is called by the transmission device to write a text packet.

In an actual application, the transmission device may allocate a storage device for the text data packet by using a plurality of allocation policies, which is not limited in this embodiment of the present application.

For example, the allocation policy is hash allocation (hash(x)%N), that is, calculating the hash value of the text packet, and assigning it to the storage device corresponding to hash(C)%N.

For another example, the allocation strategy is random allocation, taking a random number, and then distributing the text packet to the storage device corresponding to random(C)%N.

In a preferred embodiment of the present application, the storage device may include one or more storage partitions, and each storage partition may store a text packet for a certain period of time, which may be performed by a person skilled in the art according to actual conditions. The embodiment of the present application does not limit this, such as one hour, one day, and the like.

Therefore, when storing, the storage partition to which the text packet belongs can be searched, and the text packet is stored in the storage partition corresponding to the generation time in the preset storage device.

It should be noted that in a cloud platform (such as a distributed system), a mechanism such as ack is generally used to ensure the success of the transmission, and no packet loss occurs. If the text packet fails to be transmitted, the device continues to resend until the transmission succeeds. .

Step 103: When the storage is successful, record the attribute information in a preset statistical table;

If the text packet is successfully stored, the attribute information may be recorded on the text packet, and the corresponding storage check has been performed.

In a preferred embodiment of the present application, the statistical table may include a first statistical table, and the first statistical table may include a storage time. In the embodiment of the present application, the step 103 may include the following sub-steps:

Sub-step S11, searching for a first statistical table corresponding to the application device identifier;

Sub-step S12, it is determined whether there is a feature text data packet that has not been successfully stored; if not, sub-step S13 is performed;

The generation time of the feature text data packet is less than the generation time of the currently stored text data packet;

Sub-step S13, in the first statistical table, updating the generation time to the storage time corresponding to the transmission device identifier and the application device identifier.

In the embodiment of the present application, in the case that the cloud platform provides the processing and storage capability of the large data to the user, the user may rent some application devices in the cloud platform, that is, the user identifier (such as the user ID) is associated with the application device identifier. Relationships, therefore, text packets generated by the same user's application device are usually uniformly counted, and different users have different first statistical tables.

In the first statistical table, the storage time corresponding to the same transmission device identifier and the same application device identifier generally has one (ie, one-to-one relationship).

For example, the first statistical table assigned by the user A is Test1, and the application device includes the application device application_1 and the application device application_2, and the transmission device includes the transmission device transmission_1 and the transmission device transmission_2. The example of the first statistical table may be as shown in Table 1:

Table 1

第一统计表格First statistical form	传输机器标识Transfer machine identification	应用设备标识Application device identification	存储时间Storage time
Test1Test1	transmission_1Transmission_1	application_1Application_1	13:00:0013:00:00
Test1Test1	transmission_1Transmission_1	application_2Application_2	15:00:0015:00:00
Test1Test1	transmission_2Transmission_2	application_1Application_1	14:00:0014:00:00
Test1Test1	transmission_2Transmission_2	application_2Application_2	14:00:0014:00:00

The storage time is constantly refreshed, and can represent the latest time of the stored text packet generated by an application device transmitted by a certain transmission device.

It is assumed that a certain text data packet generated by the application device application_1 is successfully stored by the transmission device transmission_1 to the storage device, and the generation time of the text data packet is 13:00:03, and the modified example of the first statistical table may be as shown in Table 2. Show:

Table 2

第一统计表格First statistical form	传输设备标识Transmission equipment identification	应用设备标识Application device identification	存储时间Storage time
Test1Test1	transmission_1Transmission_1	application_1Application_1	13:00:0313:00:03
Test1Test1	transmission_1Transmission_1	application_2Application_2	15:00:0015:00:00
Test1Test1	transmission_2Transmission_2	application_1Application_1	14:00:0014:00:00
Test1Test1	transmission_2Transmission_2	application_2Application_2	14:00:0014:00:00

In addition, in order to ensure the temporality of the text data packet, it is possible to detect whether there is a feature text data packet that has not been successfully stored when the file data packet is successfully stored, and if so, it can be replaced with the generation time of the currently stored text data packet. When it is detected that there is no feature text packet that is not stored successfully, it is updated to the storage time.

For example, the transmission device transmits the text data packets A1, A2, and A3 to the storage device. If A1 and A2 have not been successfully stored, and A3 has been successfully stored, the generation time of A3 is not updated to the storage time in the first statistical table. When A1 and A2 are successfully stored, the storage time in the first statistical table is updated.

In another preferred embodiment of the present application, the statistical table includes a second statistical table, where the second statistical table includes a partitioning time period and a number of storage lines. In the embodiment of the present application, the step 103 may include the following steps. Substeps:

Sub-step S21, in the second statistical table, searching for a partition time period corresponding to the first statistical table to which the generation time belongs;

Sub-step S23, the number of the packet data lines is accumulated into the number of storage lines corresponding to the partition time period.

In the embodiment of the present application, the text data packets generated by the application device of the same user may also be uniformly counted.

In the second statistical table, the partition time period (PartitionTime) and the number of storage lines corresponding to the first table generally have One (ie, one-to-one relationship), the partition time period can be set by a person skilled in the art according to actual conditions, such as 1 hour, 15 minutes, etc., the number of stored lines is an accumulated value, which can be characterized as being stored in the partition time period. The number of data lines of a text packet generated by a user's application device.

For example, the first statistical table assigned by user A is Test1, the partitioning time period is set to 15 minutes, the first statistical table assigned by user B is Test2, and the partitioning time period is set to 10 minutes, and the example of the second statistical table may be Table 3 shows:

table 3

第一统计表格First statistical form	PartitionTimePartitionTime	lineCountlineCount
Test1Test1	12:30:0012:30:00	15001500
Test1Test1	12:45:0012:45:00	16001600
Test1Test1	13:00:0013:00:00	13001300
……......	……......	……......
Test2Test2	13:00:0013:00:00	12001200
Test2Test2	13:10:0013:10:00	11001100
Test2Test2	13:20:0013:20:00	15001500

Assume that a certain text packet is successfully stored to the storage device, and the text packet is generated at 13:00:03, belonging to a partition time period of 13:00:00, and the number of rows is 50, and the second statistical table is The modified example can be as shown in Table 4:

Table 4

第一统计表格First statistical form	PartitionTimePartitionTime	lineCountlineCount
Test1Test1	12:30:0012:30:00	15001500
Test1Test1	12:45:0012:45:00	16001600
Test1Test1	13:00:0013:00:00	13501350
……......	……......	……......
Test2Test2	13:00:0013:00:00	12001200
Test2Test2	13:10:0013:10:00	11001100
Test2Test2	13:20:0013:20:00	15001500

Step 104: When receiving the storage verification request, the verification device performs storage verification according to the statistical table.

In a cloud platform (such as a distributed system), the verification device can be a back-end device, which provides an API for the user to tune The storage verification request is used to verify the storage status of the text data packet generated by the user's application device.

The application device of the embodiment of the present application packs the generated text data into a text data packet, which is stored in the storage device by the transmission device. When the storage is successful, the attribute information is recorded, and the verification device performs the verification check according to the statistical attribute information. The storage condition is verified. By packaging the text data, the text data that needs to be counted is reduced by many times, thereby greatly reducing the statistical value, thereby greatly reducing the processing amount of the verification and reducing the performance consumption of the system. It greatly increases the practicability in big data processing and realizes the overall storage verification for the cloud.

In a preferred embodiment of the present application, step 104 may include the following sub-steps:

Sub-step S31, in the first statistical table corresponding to the application device identifier, searching for a storage time with a minimum value;

Sub-step S32, confirming that the text packet whose generation time is less than the storage time has been stored.

In the embodiment of the present application, the storage verification request may be used to verify that the text packet storage before the time point is completed (ie, persisted).

The storage verification request may include parameters such as user information (such as a user ID), a first statistical form identifier, and a first verification identifier.

The user information may be used to authenticate the storage verification request, and when the authentication is passed, the storage verification is allowed.

The first statistical table is identified as information identifying the first statistical form, such as a name, an ID, and the like.

The first check identifier is information indicating that the text packet storage before the time point is verified is completed.

In practical applications, each transmission device (transmission device identification representation) stores a successful generation of each application device (application device identification representation), and the generation time of the file data packet is less than or equal to the storage time with the smallest value, and the storage with the smallest value. There may be a file packet that has not been stored between time and other storage time, so the text packet time point of the completed storage can be characterized by the storage time with the smallest value.

For example, as shown in Table 2, the storage time of the smallest value is 13:00:03, and the generation time of the file data packets generated by the application_1 and application_2 that the transmission_1 stores successfully is less than or equal to 13:00:03, and the transmission_1 may have a generation time. The text packet at 13:00:03-15:00:00 has not been stored successfully;

Transmission_2 stores the successful application_1 and application_2 to generate file packets with a time less than 13:00:03. The transmission_1 may have a text packet whose generation time is 13:00:03-14:00:00 has not been stored successfully.

In contrast, User A's application device generated text packets before 13:00:03 have been stored.

Sub-step S41, counting the number of storage lines falling within the partition time period of the specified proofreading period, and obtaining the first total number of lines;

In the embodiment of the present application, the storage check request may be used to check the number of rows of stored text data packets between which time periods.

The storage verification request may include user information (such as a user ID), a first statistical form identifier, a second verification identifier, and the like, and a verification time period.

The second check identifier is information indicating the number of lines of the text data packet that is stored between the time periods during which the check is performed;

The proofreading period is used to count the number of lines of text packets that have been stored during that time.

In an actual application, the total number of rows of the partitioning time period falling within the specified proofing time period is summarized, and the total number of rows of the verification time period (ie, the first number of peers) can be obtained.

For the first total number of rows in the statistics:

First, the business has an intuitive statistical report, providing an intuitive data basis for value-added services or logical processing of the business;

Second, ensure the integrity of the data to provide the expected data.

After the text data falls on the storage device of the cloud platform, whether it is damaged or not, the quality can be compared by counting the number of the first total rows to determine the quality of all the text data of the service falling to the storage device of the cloud platform, if If the quality does not meet the needs of the business, it needs to be corrected.

It should be noted that the verification of the row number statistics is generally less than the time point at which the storage of the text packet is completed. Otherwise, the period in which the statistics have not been stored is completed, which may cause the statistics to lose meaning.

For example, as shown in Table 4, if user A needs to count file data packets between 12:30:00 and 13:00:00 (checking time period), the partition time period (PartitionTime) can be 12: The number of storage lines (lineCount) at 30:00 and 12:45:00 is summarized, and the first total number of lines is counted as 3100 (rows).

If user A needs to count the file data packets between 12:30:00-13:15:00 (checking time period), the time point for storing the character data packet is 13:00:03, that is, at 13 Between 00:03-13:15:00, it is possible to store file packets that have not been successfully stored. The first total number of rows counted may not be the actual number of rows.

Sub-step S42, reading, from the storage device, the second of the text data packets stored in the proofreading period Total number of lines;

Sub-step S43, when the first total line number is equal to the second total line number, it is confirmed that the text data packet corresponding to the proofreading period is not lost;

Sub-step S44, when the first total line number is not equal to the second total line number, it is confirmed that the text data packet corresponding to the proofreading time period is at least partially lost.

In the embodiment of the present application, in order to further verify whether a loss occurs in the storage, the second total number of rows counted by the storage device may be compared with the first total number of rows counted by the transmission device, and if the two are equal, the representation may be Loss occurs, and if the two are not equal, it can indicate that a loss has occurred.

It should be noted that, for the method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should understand that the embodiments of the present application are not limited by the described action sequence, because In accordance with embodiments of the present application, certain steps may be performed in other sequences or concurrently. In the following, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions involved are not necessarily required in the embodiments of the present application.

Referring to FIG. 2, there is shown a structural block diagram of an embodiment of a storage verification system for text data of the present application, which system may include one or more application devices 210, one or more transmission devices 220, one or more Storage device 230 and verification device 240;

The application device 210 may include the following modules:

a text data packaging module 211, configured to package the generated one or more text data into one or more text data packets; the text data package has attribute information;

The transmission device 220 can include the following modules:

a text data packet storage module 221, configured to store the one or more text data packets into a preset one or more storage devices 230;

The attribute information recording module 222 is configured to record the attribute information in a preset statistical table when the storage is successful;

The verification device 240 can include the following modules:

The storage verification module 241 is configured to perform storage calibration according to the statistical table when receiving the storage verification request Test.

In a preferred embodiment of the present application, the text data packaging module 211 may include the following sub-modules:

or,

In a preferred embodiment of the present application, the statistical table may include a first statistical table, the first statistical table may include a storage time; the transmission device may have a transmission device identifier; and the attribute information may include an application. Equipment identification, production time;

The attribute information recording module 222 can include the following sub-modules:

In another preferred embodiment of the present application, the statistical table may include a second statistical table, where the second statistical table may include a partitioning time period and a storage line number; the attribute information may include a generation time, a packet data. Rows;

In a preferred embodiment of the present application, the storage verification module 241 may include the following sub-modules:

In another preferred embodiment of the present application, the storage verification module 241 may include the following sub-modules:

In another preferred embodiment of the present application, the storage verification module 241 may further include the following submodules:

In another preferred embodiment of the present application, the attribute information may include a generation time, and the storage device 230 may include one or more storage partitions;

The text packet storage module 221 can include the following sub-modules:

And a partition storage submodule, configured to store the text data packet into a storage partition corresponding to the generation time in the preset storage device 230.

For the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.

The various embodiments in the present specification are described in a progressive manner, and each embodiment focuses on differences from other embodiments, and the same similar parts between the various embodiments can be referred to each other.

Those skilled in the art will appreciate that embodiments of the embodiments of the present application can be provided as a method, apparatus, or computer program product. Therefore, the embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, embodiments of the present application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

In a typical configuration, the computer device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. The memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory. Memory is an example of a computer readable medium. Computer readable media includes both permanent and non-permanent, removable and non-removable The media can be stored by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-persistent computer readable media, such as modulated data signals and carrier waves.

Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal device to produce a machine such that instructions are executed by a processor of a computer or other programmable data processing terminal device Means are provided for implementing the functions specified in one or more of the flow or in one or more blocks of the flow chart.

The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The instruction device implements the functions specified in one or more blocks of the flowchart or in a flow or block of the flowchart.

These computer program instructions can also be loaded onto a computer or other programmable data processing terminal device such that a series of operational steps are performed on the computer or other programmable terminal device to produce computer-implemented processing, such that the computer or other programmable terminal device The instructions executed above provide steps for implementing the functions specified in one or more blocks of the flowchart or in a block or blocks of the flowchart.

While a preferred embodiment of the embodiments of the present application has been described, those skilled in the art can make further changes and modifications to the embodiments once they are aware of the basic inventive concept. Therefore, the appended claims are intended to be interpreted as including all the modifications and the modifications

Finally, it should also be noted that in this context, relational terms such as first and second are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities. There is any such actual relationship or order between operations. Furthermore, the terms "include", "comprise," or "include" or "the" or "the" The terminal device includes not only those elements but also other elements not explicitly listed, or elements inherent to such processes, methods, articles or terminal devices. An element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article, or terminal device that comprises the element, without further limitation.

The above describes a storage verification method for text data and a storage verification system for text data provided by the present application, and a detailed example is applied herein to explain the principle and implementation manner of the present application. The description of the embodiments is only for helping to understand the method of the present application and its core ideas; at the same time, for those of ordinary skill in the art, according to the idea of the present application, there will be changes in specific embodiments and application scopes. The above description should not be taken as limiting the present application.

Claims

A storage verification method for text data, comprising:

The one or more application devices package the generated one or more text data into one or more text data packets; the text data packets have attribute information therein;

The one or more transmission devices store the one or more text data packets in a preset one or more storage devices, and when the storage is successful, record the attribute information in a preset statistical table;

When the verification device receives the storage verification request, the verification device performs storage verification according to the statistical table.
The method according to claim 1, wherein the step of the one or more application devices packaging the generated one or more text data into one or more text data packets comprises:

When the size of the generated text data matches the preset size threshold, the generated text data is packaged into a text data packet;

or,

The generated text data is packaged into a text packet when the current time exceeds a preset time threshold.
The method according to claim 1 or 2, wherein the statistical table comprises a first statistical table, the first statistical table including a storage time; the transmission device has a transmission device identifier; and the attribute information includes an application Equipment identification, production time;

The step of recording the attribute information in a preset statistical table includes:

Finding a first statistical table corresponding to the application device identifier;

Determining whether there is a feature text data packet that is not successfully stored; the generation time of the feature text data packet is smaller than a generation time of the currently stored text data packet;

If not, in the first statistics table, the generation time is updated to the storage device identifier and the storage time corresponding to the application device identifier.
The method according to claim 1 or 2, wherein the statistical table comprises a second statistical table, the second statistical table including a partitioning time period and a storage line number; the attribute information includes a generation time, a packet data Rows;

The step of recording the attribute information in a preset statistical table includes:

In the second statistic table, searching for a partition time period corresponding to the first statistic table and the generation time;

The number of packet data lines is accumulated into the number of storage lines corresponding to the partition time period.
The method according to claim 3, wherein said verification device proceeds according to said statistical table The steps for row storage verification include:

In the first statistical table corresponding to the application device identifier, searching for a storage time with a minimum value;

It is confirmed that the text packet whose generation time is less than the storage time has been stored.
The method according to claim 4, wherein the step of performing verification verification by the verification device according to the statistical table comprises:

The number of storage lines falling within the partition time period of the specified proofreading period is counted, and the first total number of rows is obtained.
The method according to claim 6, wherein the step of performing verification verification by the verification device according to the statistical table further comprises:

Reading, from the storage device, a second total number of lines of text packets stored in the proofreading period;

When the first total number of rows is equal to the second total number of rows, it is confirmed that the text data packet corresponding to the proofreading period is not lost;

When the first total line number is not equal to the second total line number, it is confirmed that the text data packet corresponding to the proofreading time period is at least partially lost.
The method according to claim 1 or 2 or 5 or 6 or 7, wherein the attribute information includes a generation time, and the storage device includes one or more storage partitions;

The step of the one or more transmission devices storing the one or more text data packets in a preset one or more storage devices includes:

The text data packet is stored in a storage partition corresponding to the generation time in a preset storage device.
A storage verification system for text data, characterized in that the system comprises one or more application devices, one or more transmission devices, one or more storage devices and a verification device;

The application device includes:

a text data packaging module, configured to package the generated one or more text data into one or more text data packets; the text data package has attribute information;

The transmission device includes:

a text data packet storage module, configured to store the one or more text data packets into a preset one or more storage devices;

An attribute information recording module, configured to record the attribute information in a preset statistical table when the storage is successful;

The verification device includes:

The storage verification module is configured to perform storage verification according to the statistical table when receiving the storage verification request.
The system according to claim 9, wherein the text data packaging module comprises:

a first packaging submodule, configured to package the generated text data into a text data packet when the size of the generated text data matches a preset size threshold;

or,

The second packaging sub-module is configured to package the generated text data into a text data packet when the current time exceeds a preset time threshold.
The system according to claim 9, wherein said statistical table comprises a first statistical table, said first statistical table comprising a storage time; said transmission device having a transmission device identifier; said attribute information comprising an application device identifier Time of production;

The attribute information recording module includes:

a table search submodule, configured to search for a first statistical table corresponding to the application device identifier;

a feature text data packet judging sub-module, configured to determine whether there is a feature text data packet that is not successfully stored; if not, a time update sub-module is invoked; the feature text data packet is generated less than the currently stored text data packet Time of production;

And a time update submodule, configured to update the generation time to the storage time corresponding to the transmission device identifier and the application device identifier in the first statistics table.
The system according to claim 9, wherein said statistical table comprises a second statistical table, said second statistical table comprising a partitioning time period, a number of storage lines; said attribute information comprising a generation time, a number of packet data lines ;

The attribute information recording module includes:

a partitioning time period searching sub-module, configured to search, in the second statistical table, a partitioning time period corresponding to the first statistical table and the generating time to which the generating time belongs;

The storage line number accumulation sub-module is configured to accumulate the number of the packet data lines to the number of storage lines corresponding to the partition time period.
The system of claim 11 wherein said storage verification module comprises:

a storage time search submodule, configured to search for a storage time with a minimum value in a first statistical table corresponding to the application device identifier;

The storage completion confirmation sub-module is configured to confirm that the text data packet whose generation time is less than the storage time has been stored.
The system of claim 12 wherein said storage verification module comprises:

The storage line number statistics sub-module is configured to count the number of storage lines of the partition time period falling within the specified proofing time period. Get the first total number of rows.
The system according to claim 14, wherein the storage verification module further comprises:

a second total line number reading submodule, configured to read, from the storage device, a second total line number of the text data packet stored in the proofreading period;

The acknowledgment sub-module is configured to: when the first total number of rows is equal to the second total number of rows, confirm that the text data packet corresponding to the proofreading period is not lost;

And a loss confirmation submodule, configured to confirm that the text data packet corresponding to the proofreading period is at least partially lost when the first total number of rows is not equal to the second total number of rows.
The system according to claim 9 or 10 or 13 or 14 or 15, wherein the attribute information includes a generation time, and the storage device includes one or more storage partitions;

The text packet storage module includes:

And a partition storage submodule, configured to store the text data packet into a storage partition corresponding to the generation time in a preset storage device.