WO2017008658A1 - Storage checking method and system for text data - Google Patents

Storage checking method and system for text data Download PDF

Info

Publication number
WO2017008658A1
WO2017008658A1 PCT/CN2016/088519 CN2016088519W WO2017008658A1 WO 2017008658 A1 WO2017008658 A1 WO 2017008658A1 CN 2016088519 W CN2016088519 W CN 2016088519W WO 2017008658 A1 WO2017008658 A1 WO 2017008658A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage
text data
time
statistical table
verification
Prior art date
Application number
PCT/CN2016/088519
Other languages
French (fr)
Chinese (zh)
Inventor
李强
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2017008658A1 publication Critical patent/WO2017008658A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes

Definitions

  • the present application relates to the field of computer processing technologies, and in particular, to a storage verification method for text data and a storage verification system for text data.
  • the data generated by many businesses under the cloud such as the pv (views) logs generated by the website, generally need to be stored in real time, verify the integrity of the data, and ensure the accuracy of data mining and other processing.
  • this detection mechanism is only a local detection mechanism and cannot be applied to the storage verification of big data in the cloud era.
  • embodiments of the present application have been made in order to provide a storage verification method for text data and a corresponding storage verification system for text data that overcome the above problems or at least partially solve the above problems.
  • the embodiment of the present application discloses a method for storing and verifying text data, including:
  • the one or more application devices package the generated one or more text data into one or more text data packets; the text data packets have attribute information therein;
  • the one or more transmission devices store the one or more text data packets in a preset one or more storage devices, and when the storage is successful, record the attribute information in a preset statistical table;
  • the verification device When the verification device receives the storage verification request, the verification device performs storage verification according to the statistical table.
  • the step of the one or more application devices packaging the generated one or more text data into one or more text data packets comprises:
  • the generated text data is packed into text data pack
  • the generated text data is packaged into a text packet when the current time exceeds a preset time threshold.
  • the statistical table includes a first statistical table, the first statistical table includes a storage time; the transmission device has a transmission device identifier; the attribute information includes an application device identifier, and a generation time;
  • the step of recording the attribute information in a preset statistical table includes:
  • the generation time of the feature text data packet is smaller than a generation time of the currently stored text data packet
  • the generation time is updated to the storage device identifier and the storage time corresponding to the application device identifier.
  • the statistical table includes a second statistical table, where the second statistical table includes a partitioning time period and a storage line number; the attribute information includes a generation time and a number of packet data lines;
  • the step of recording the attribute information in a preset statistical table includes:
  • the number of packet data lines is accumulated into the number of storage lines corresponding to the partition time period.
  • the step of performing verification verification by the verification device according to the statistical table comprises:
  • the step of performing verification verification by the verification device according to the statistical table comprises:
  • the number of storage lines falling within the partition time period of the specified proofreading period is counted, and the first total number of rows is obtained.
  • the step of performing verification verification by the verification device according to the statistical table further includes:
  • the attribute information includes a generation time
  • the storage device includes one or more storage partitions
  • the one or more transmission devices store the one or more text data packets to a preset one or more stores
  • the steps in the storage device include:
  • the text data packet is stored in a storage partition corresponding to the generation time in a preset storage device.
  • the embodiment of the present application further discloses a storage verification system for text data, where the system includes one or more application devices, one or more transmission devices, one or more storage devices, and a verification device;
  • the application device includes:
  • a text data packaging module configured to package the generated one or more text data into one or more text data packets; the text data package has attribute information;
  • the transmission device includes:
  • a text data packet storage module configured to store the one or more text data packets into a preset one or more storage devices
  • An attribute information recording module configured to record the attribute information in a preset statistical table when the storage is successful
  • the verification device includes:
  • the storage verification module is configured to perform storage verification according to the statistical table when receiving the storage verification request.
  • the text data packaging module comprises:
  • a first packaging submodule configured to package the generated text data into a text data packet when the size of the generated text data matches a preset size threshold
  • the second packaging sub-module is configured to package the generated text data into a text data packet when the current time exceeds a preset time threshold.
  • the statistical table includes a first statistical table, the first statistical table includes a storage time; the transmission device has a transmission device identifier; the attribute information includes an application device identifier, and a generation time;
  • the attribute information recording module includes:
  • a table search submodule configured to search for a first statistical table corresponding to the application device identifier
  • a feature text data packet judging sub-module configured to determine whether there is a feature text data packet that is not successfully stored; if not, a time update sub-module is invoked; the feature text data packet is generated less than the currently stored text data packet Time of production;
  • a time update submodule configured to update the generation time to the storage time corresponding to the transmission device identifier and the application device identifier in the first statistics table.
  • the statistical table includes a second statistical table, where the second statistical table includes a partitioning time period and a storage line number; the attribute information includes a generation time and a number of packet data lines;
  • the attribute information recording module includes:
  • a partitioning time period searching sub-module configured to search, in the second statistical table, a partitioning time period corresponding to the first statistical table and the generating time to which the generating time belongs;
  • the storage line number accumulation sub-module is configured to accumulate the number of the packet data lines to the number of storage lines corresponding to the partition time period.
  • the storage verification module comprises:
  • a storage time search submodule configured to search for a storage time with a minimum value in a first statistical table corresponding to the application device identifier
  • the storage completion confirmation sub-module is configured to confirm that the text data packet whose generation time is less than the storage time has been stored.
  • the storage verification module comprises:
  • the storage line number statistics sub-module is configured to count the number of storage lines of the partition time period falling within the specified proofing time period, and obtain the first total number of lines.
  • the storage verification module further includes:
  • a second total line number reading submodule configured to read, from the storage device, a second total line number of the text data packet stored in the proofreading period
  • the acknowledgment sub-module is configured to: when the first total number of rows is equal to the second total number of rows, confirm that the text data packet corresponding to the proofreading period is not lost;
  • a loss confirmation submodule configured to confirm that the text data packet corresponding to the proofreading period is at least partially lost when the first total number of rows is not equal to the second total number of rows.
  • the attribute information includes a generation time
  • the storage device includes one or more storage partitions
  • the text packet storage module includes:
  • a partition storage submodule configured to store the text data packet into a storage partition corresponding to the generation time in a preset storage device.
  • the application device of the embodiment of the present application packs the generated text data into a text data packet, which is stored in the storage device by the transmission device.
  • the attribute information is recorded, and the verification device performs the verification check according to the statistical attribute information.
  • the storage condition is verified.
  • the storage time is updated in the first statistical table, and the storage time of the minimum text value is obtained by comparing the storage times, thereby realizing the storage and persistent verification of the massive text data.
  • the embodiment of the present application accumulates the number of stored lines in the second statistical table, and realizes the storage quantity verification of the massive text data by accumulating the number of storage lines of the required partition time period.
  • the embodiment of the present application implements a storage loss verification of a large amount of text data by comparing the first total line number based on the statistics of the transmission device with the second total line number based on the statistics of the storage device.
  • FIG. 1 is a flow chart showing the steps of an embodiment of a method for storing and verifying text data according to the present application
  • FIG. 2 is a structural block diagram of an embodiment of a storage verification system for text data according to the present application.
  • the cloud platform can provide large data processing and storage capabilities to a large number of users. Many services in the cloud platform, such as a pv (views) log of a website, need to be written to the cloud platform network side device.
  • the ability to provide data without loss is end-to-end.
  • the Flume system provides the ack mechanism of the message
  • the ack mechanism is only a partial confirmation mechanism, and it is impossible to confirm how much the Flume system itself has successfully written.
  • the data cannot be confirmed whether the data in a certain period of time is lost during storage, and it is impossible to determine the total number of pieces of data on the network side device of the cloud platform.
  • FIG. 1 a flow chart of steps of a method for storing and verifying text data of the present application is shown, which may specifically include the following steps:
  • Step 101 One or more application devices package the generated one or more text data into one or more text data packets
  • a cloud platform that is, a computer cluster, such as a distributed system.
  • the distributed system can be divided into the following parts:
  • Distributed System Underlying Services Provides services for coordination services, remote procedure calls, security management, and resource management that are required in a distributed environment. These underlying services support the upper distributed file system, task scheduling and other modules.
  • Distributed File System Provides a massive, reliable, and scalable data storage service that aggregates the storage capabilities of each node in the cluster and automatically shields hardware and software failures to provide users with uninterrupted data access services. Incremental expansion and automatic data balancing, providing user space file access API (Application Program Interface), support random read and write and additional write operations.
  • API Application Program Interface
  • Task scheduling Provide scheduling services for tasks in the cluster system, support online service (Online Service) that emphasizes response speed, and Batch Processing Job that emphasizes processing data throughput; automatically detect faults and hotspots in the system, pass errors Retry, for long tail operations concurrent backup jobs, etc., to ensure that the operation is completed in a stable and reliable manner.
  • Online Service Online Service
  • Batch Processing Job that emphasizes processing data throughput
  • Cluster monitoring and deployment Monitor the status of the cluster and the running status and performance indicators of the upper-layer application services, and generate alarms and records for abnormal events. Provide deployment and configuration management for the entire distributed system and upper-layer applications for operation and maintenance personnel. Online expansion of cluster expansion, capacity reduction and application services.
  • an application device may be a device that can generate text data during application service operation, such as a server.
  • the text data is time-series data, and is generated in time sequence, such as pv log, access.log log, system running log, and the like.
  • the application device can package the text data in several ways:
  • the generated text data when the size of the generated text data matches a preset size threshold, the generated text data is packaged into a text data packet;
  • the application device packs the generated text data according to the size threshold, so that the text data that needs to be counted can be reduced by many times, thereby greatly reducing the statistical value.
  • the text data packet can also be set with attribute information for filtering and the like. Marking operation for real-time value-added processing.
  • each text data For example, suppose the average size of each text data is 1k. If the threshold of a text packet is 512K, that is, the size of a text packet is 512K, if the generated text data is 51.2 billion, the prior art is used for statistics. Then it needs to count 51.2 billion times, and after packaging, 51.2 billion pieces of text data becomes 100 million text data packets, and the amount of text data that needs to be statistically reduced is 512 times.
  • the generated text data is packaged into a text data packet when the current time exceeds a preset time threshold.
  • the text data is time-series data, it can also be partitioned by time in the cloud network side device.
  • the text data is partitioned by the hour, there are 24 partitions in the cloud network side device, the names are 00, 01...23, and the text data generated by 00:00:00 ⁇ 00:59:59 is stored in the 00 partition.
  • the text data from 01:00:00 to 01:59:59 is stored in the 01 partition, and the text data generated at other times is stored in a similar manner.
  • the text data packet After being packaged, the text data packet can also be stored in the corresponding partition according to the generation time. In general, it is necessary to ensure the correctness of the text data packet falling into the partition, for example, a text data packet falls into the 00 partition, where Text data packets generally do not contain text data generated from 01:00:00 to 01:59:59, otherwise the text data generated from 01:00:00 to 01:59:59 will be placed in the 00 partition, which will result in The drift of text data (inaccurate partitioning) is also a data quality failure.
  • the drift of text data usually does not need to be 100% avoided, but it cannot be too large. If it is packaged with a shorter time threshold such as 5 minutes, the text data can be effectively prevented from drifting, and the situation of mis-segmentation can be controlled to one. Accepted within the error range.
  • the above packing mode can be used at the same time.
  • the size threshold is 512K and the time threshold is 5 minutes
  • the text data of 13:00:00-13:04:59 is packaged, and three text data packets are generated, respectively.
  • A1 the first text data is generated at 13:00:00, the size is 512K
  • A2 size is 512K
  • A3 the size is 402K
  • the last text data is generated at 13:04:59.
  • packaging mode is only an example.
  • other packaging modes may be set according to actual conditions, which is not limited by the embodiment of the present application.
  • other packaging methods may be adopted by those skilled in the art according to actual needs, and the embodiment of the present application does not limit this.
  • text packets can be compressed to save bandwidth during network transmission.
  • the embodiment of the present application does not limit this.
  • attribute information can be configured for the text data packet, that is, the text data package has attribute information.
  • the structure of the text data packet may also set data called an attribute, and these attributes have corresponding names for storing corresponding attribute information, and the attribute information may include Application device identification, generation time, number of package data lines.
  • the application device identifier (HostName) is an identifier of an application device that generates text data in the text packet, that is, a uniquely determined information of the application device, such as an application device ID and a host address.
  • the time (FileTime) is the time at which the text data in the text packet is generated. In general, the time when the first piece of text data in the text packet is generated may be used as the generation time of the text packet.
  • LineCount The number of packets (LineCount) is the number of rows of all text data in the text packet; if in the database, you can use the select count(1) from table_name command to count the number of packet data rows in the text packet; In the database or distributed environment, you can use Map Reduce to scan the text data and add it to the number of packet data rows.
  • the text packet When the text packet is successfully packaged, the text packet can be sent to one or more transmission devices.
  • a mechanism such as ack is generally used to ensure the success of the transmission, and no packet loss occurs. If the text packet fails to transmit, the device continues to resend until the transmission succeeds. .
  • Step 102 The one or more transmission devices store the one or more text data packets in a preset one or more storage devices;
  • the transmission device may be a device that transmits data (such as a text packet) to a processing node (such as a storage device), and the storage device may be a device that stores data (such as a text packet).
  • the cloud platform provides an API (Application Program Interface) for storing data, and the API is called by the transmission device to write a text packet.
  • API Application Program Interface
  • the transmission device may allocate a storage device for the text data packet by using a plurality of allocation policies, which is not limited in this embodiment of the present application.
  • the allocation policy is hash allocation (hash(x)%N), that is, calculating the hash value of the text packet, and assigning it to the storage device corresponding to hash(C)%N.
  • the allocation strategy is random allocation, taking a random number, and then distributing the text packet to the storage device corresponding to random(C)%N.
  • the storage device may include one or more storage partitions, and each storage partition may store a text packet for a certain period of time, which may be performed by a person skilled in the art according to actual conditions.
  • the embodiment of the present application does not limit this, such as one hour, one day, and the like.
  • the storage partition to which the text packet belongs can be searched, and the text packet is stored in the storage partition corresponding to the generation time in the preset storage device.
  • a mechanism such as ack is generally used to ensure the success of the transmission, and no packet loss occurs. If the text packet fails to be transmitted, the device continues to resend until the transmission succeeds. .
  • Step 103 When the storage is successful, record the attribute information in a preset statistical table
  • the attribute information may be recorded on the text packet, and the corresponding storage check has been performed.
  • the statistical table may include a first statistical table, and the first statistical table may include a storage time.
  • the step 103 may include the following sub-steps:
  • Sub-step S11 searching for a first statistical table corresponding to the application device identifier
  • Sub-step S12 it is determined whether there is a feature text data packet that has not been successfully stored; if not, sub-step S13 is performed;
  • the generation time of the feature text data packet is less than the generation time of the currently stored text data packet
  • Sub-step S13 in the first statistical table, updating the generation time to the storage time corresponding to the transmission device identifier and the application device identifier.
  • the user may rent some application devices in the cloud platform, that is, the user identifier (such as the user ID) is associated with the application device identifier. Relationships, therefore, text packets generated by the same user's application device are usually uniformly counted, and different users have different first statistical tables.
  • the storage time corresponding to the same transmission device identifier and the same application device identifier generally has one (ie, one-to-one relationship).
  • the first statistical table assigned by the user A is Test1, and the application device includes the application device application_1 and the application device application_2, and the transmission device includes the transmission device transmission_1 and the transmission device transmission_2.
  • the example of the first statistical table may be as shown in Table 1:
  • the storage time is constantly refreshed, and can represent the latest time of the stored text packet generated by an application device transmitted by a certain transmission device.
  • the transmission device transmits the text data packets A1, A2, and A3 to the storage device. If A1 and A2 have not been successfully stored, and A3 has been successfully stored, the generation time of A3 is not updated to the storage time in the first statistical table. When A1 and A2 are successfully stored, the storage time in the first statistical table is updated.
  • the statistical table includes a second statistical table, where the second statistical table includes a partitioning time period and a number of storage lines.
  • the step 103 may include the following steps. Substeps:
  • Sub-step S21 in the second statistical table, searching for a partition time period corresponding to the first statistical table to which the generation time belongs;
  • Sub-step S23 the number of the packet data lines is accumulated into the number of storage lines corresponding to the partition time period.
  • the text data packets generated by the application device of the same user may also be uniformly counted.
  • the partition time period (PartitionTime) and the number of storage lines corresponding to the first table generally have One (ie, one-to-one relationship), the partition time period can be set by a person skilled in the art according to actual conditions, such as 1 hour, 15 minutes, etc., the number of stored lines is an accumulated value, which can be characterized as being stored in the partition time period.
  • the first statistical table assigned by user A is Test1
  • the partitioning time period is set to 15 minutes
  • the first statistical table assigned by user B is Test2
  • the partitioning time period is set to 10 minutes
  • the example of the second statistical table may be Table 3 shows:
  • Step 104 When receiving the storage verification request, the verification device performs storage verification according to the statistical table.
  • the verification device can be a back-end device, which provides an API for the user to tune
  • the storage verification request is used to verify the storage status of the text data packet generated by the user's application device.
  • the application device of the embodiment of the present application packs the generated text data into a text data packet, which is stored in the storage device by the transmission device.
  • the attribute information is recorded, and the verification device performs the verification check according to the statistical attribute information.
  • the storage condition is verified.
  • step 104 may include the following sub-steps:
  • Sub-step S31 in the first statistical table corresponding to the application device identifier, searching for a storage time with a minimum value
  • Sub-step S32 confirming that the text packet whose generation time is less than the storage time has been stored.
  • the storage verification request may be used to verify that the text packet storage before the time point is completed (ie, persisted).
  • the storage verification request may include parameters such as user information (such as a user ID), a first statistical form identifier, and a first verification identifier.
  • the user information may be used to authenticate the storage verification request, and when the authentication is passed, the storage verification is allowed.
  • the first statistical table is identified as information identifying the first statistical form, such as a name, an ID, and the like.
  • the first check identifier is information indicating that the text packet storage before the time point is verified is completed.
  • each transmission device stores a successful generation of each application device (application device identification representation), and the generation time of the file data packet is less than or equal to the storage time with the smallest value, and the storage with the smallest value.
  • the text packet time point of the completed storage can be characterized by the storage time with the smallest value.
  • the storage time of the smallest value is 13:00:03
  • the generation time of the file data packets generated by the application_1 and application_2 that the transmission_1 stores successfully is less than or equal to 13:00:03
  • the transmission_1 may have a generation time.
  • the text packet at 13:00:03-15:00:00 has not been stored successfully;
  • Transmission_2 stores the successful application_1 and application_2 to generate file packets with a time less than 13:00:03.
  • the transmission_1 may have a text packet whose generation time is 13:00:03-14:00:00 has not been stored successfully.
  • the storage time is updated in the first statistical table, and the storage time of the minimum text value is obtained by comparing the storage times, thereby realizing the storage and persistent verification of the massive text data.
  • step 104 may include the following sub-steps:
  • Sub-step S41 counting the number of storage lines falling within the partition time period of the specified proofreading period, and obtaining the first total number of lines;
  • the storage check request may be used to check the number of rows of stored text data packets between which time periods.
  • the storage verification request may include user information (such as a user ID), a first statistical form identifier, a second verification identifier, and the like, and a verification time period.
  • user information such as a user ID
  • first statistical form identifier such as a user ID
  • second verification identifier such as a second verification identifier
  • the second check identifier is information indicating the number of lines of the text data packet that is stored between the time periods during which the check is performed;
  • the proofreading period is used to count the number of lines of text packets that have been stored during that time.
  • the total number of rows of the partitioning time period falling within the specified proofing time period is summarized, and the total number of rows of the verification time period (ie, the first number of peers) can be obtained.
  • the business has an intuitive statistical report, providing an intuitive data basis for value-added services or logical processing of the business;
  • the quality can be compared by counting the number of the first total rows to determine the quality of all the text data of the service falling to the storage device of the cloud platform, if If the quality does not meet the needs of the business, it needs to be corrected.
  • the verification of the row number statistics is generally less than the time point at which the storage of the text packet is completed. Otherwise, the period in which the statistics have not been stored is completed, which may cause the statistics to lose meaning.
  • PartitionTime the partition time period
  • lineCount The number of storage lines (lineCount) at 30:00 and 12:45:00 is summarized, and the first total number of lines is counted as 3100 (rows).
  • the time point for storing the character data packet is 13:00:03, that is, at 13 Between 00:03-13:15:00, it is possible to store file packets that have not been successfully stored.
  • the first total number of rows counted may not be the actual number of rows.
  • the embodiment of the present application accumulates the number of stored lines in the second statistical table, and realizes the storage quantity verification of the massive text data by accumulating the number of storage lines of the required partition time period.
  • Sub-step S42 reading, from the storage device, the second of the text data packets stored in the proofreading period Total number of lines;
  • Sub-step S43 when the first total line number is equal to the second total line number, it is confirmed that the text data packet corresponding to the proofreading period is not lost;
  • Sub-step S44 when the first total line number is not equal to the second total line number, it is confirmed that the text data packet corresponding to the proofreading time period is at least partially lost.
  • the second total number of rows counted by the storage device may be compared with the first total number of rows counted by the transmission device, and if the two are equal, the representation may be Loss occurs, and if the two are not equal, it can indicate that a loss has occurred.
  • the embodiment of the present application implements a storage loss verification of a large amount of text data by comparing the first total line number based on the statistics of the transmission device with the second total line number based on the statistics of the storage device.
  • FIG. 2 there is shown a structural block diagram of an embodiment of a storage verification system for text data of the present application, which system may include one or more application devices 210, one or more transmission devices 220, one or more Storage device 230 and verification device 240;
  • the application device 210 may include the following modules:
  • a text data packaging module 211 configured to package the generated one or more text data into one or more text data packets; the text data package has attribute information;
  • the transmission device 220 can include the following modules:
  • a text data packet storage module 221, configured to store the one or more text data packets into a preset one or more storage devices 230;
  • the attribute information recording module 222 is configured to record the attribute information in a preset statistical table when the storage is successful;
  • the verification device 240 can include the following modules:
  • the storage verification module 241 is configured to perform storage calibration according to the statistical table when receiving the storage verification request Test.
  • the text data packaging module 211 may include the following sub-modules:
  • a first packaging submodule configured to package the generated text data into a text data packet when the size of the generated text data matches a preset size threshold
  • the second packaging sub-module is configured to package the generated text data into a text data packet when the current time exceeds a preset time threshold.
  • the statistical table may include a first statistical table, the first statistical table may include a storage time; the transmission device may have a transmission device identifier; and the attribute information may include an application. Equipment identification, production time;
  • the attribute information recording module 222 can include the following sub-modules:
  • a table search submodule configured to search for a first statistical table corresponding to the application device identifier
  • a feature text data packet judging sub-module configured to determine whether there is a feature text data packet that is not successfully stored; if not, a time update sub-module is invoked; the feature text data packet is generated less than the currently stored text data packet Time of production;
  • a time update submodule configured to update the generation time to the storage time corresponding to the transmission device identifier and the application device identifier in the first statistics table.
  • the statistical table may include a second statistical table, where the second statistical table may include a partitioning time period and a storage line number; the attribute information may include a generation time, a packet data. Rows;
  • the attribute information recording module 222 can include the following sub-modules:
  • a partitioning time period searching sub-module configured to search, in the second statistical table, a partitioning time period corresponding to the first statistical table and the generating time to which the generating time belongs;
  • the storage line number accumulation sub-module is configured to accumulate the number of the packet data lines to the number of storage lines corresponding to the partition time period.
  • the storage verification module 241 may include the following sub-modules:
  • a storage time search submodule configured to search for a storage time with a minimum value in a first statistical table corresponding to the application device identifier
  • the storage completion confirmation sub-module is configured to confirm that the text data packet whose generation time is less than the storage time has been stored.
  • the storage verification module 241 may include the following sub-modules:
  • the storage line number statistics sub-module is configured to count the number of storage lines of the partition time period falling within the specified proofing time period, and obtain the first total number of lines.
  • the storage verification module 241 may further include the following submodules:
  • a second total line number reading submodule configured to read, from the storage device, a second total line number of the text data packet stored in the proofreading period
  • the acknowledgment sub-module is configured to: when the first total number of rows is equal to the second total number of rows, confirm that the text data packet corresponding to the proofreading period is not lost;
  • a loss confirmation submodule configured to confirm that the text data packet corresponding to the proofreading period is at least partially lost when the first total number of rows is not equal to the second total number of rows.
  • the attribute information may include a generation time
  • the storage device 230 may include one or more storage partitions
  • the text packet storage module 221 can include the following sub-modules:
  • a partition storage submodule configured to store the text data packet into a storage partition corresponding to the generation time in the preset storage device 230.
  • embodiments of the embodiments of the present application can be provided as a method, apparatus, or computer program product. Therefore, the embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, embodiments of the present application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the computer device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-permanent, removable and non-removable The media can be stored by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include non-persistent computer readable media, such as modulated data signals and carrier waves.
  • Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG.
  • These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal device to produce a machine such that instructions are executed by a processor of a computer or other programmable data processing terminal device
  • Means are provided for implementing the functions specified in one or more of the flow or in one or more blocks of the flow chart.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the instruction device implements the functions specified in one or more blocks of the flowchart or in a flow or block of the flowchart.

Abstract

Disclosed are a storage checking method and system for text data. The method comprises: one or more application devices packaging one or more pieces of generated text data into one or more text data packets (101), the text data packets comprising attribute information; one or more transmission devices storing one or more text data packets in one or more pre-set storage devices (102); when storage succeeds, recording the attribute information in a pre-set statistical table (103); and when a checking device receives a storage checking request, performing storage checking according to the statistical table (104). In the method, by packaging text data, the text data on which statistics need to be made are reduced by many times, so that a statistical quantity value is greatly reduced, thereby greatly reducing the checking processing capacity, reducing the performance consumption of a system, greatly improving the practicability during big data processing, and realizing the overall storage checking for big data in cloud.

Description

一种文本数据的存储校验方法和系统Method and system for storing text data
本申请要求2015年07月14日递交的申请号为201510412446.X、发明名称为“一种文本数据的存储校验方法和系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims the priority of the Chinese Patent Application No. 201510412446.X filed on July 14, 2015, entitled "Storage Checking Method and System for Text Data", the entire contents of which are incorporated herein by reference. In the application.
技术领域Technical field
本申请涉及计算机处理技术领域,特别是涉及一种文本数据的存储校验方法和一种文本数据的存储校验系统。The present application relates to the field of computer processing technologies, and in particular, to a storage verification method for text data and a storage verification system for text data.
背景技术Background technique
随着云时代的来临,越来越多的平台在社交网络、电子商务、访问记录等来源中产生大数据,即量非常大的数据,例如,一天产生100T到100P之间,甚至更大的数据量,而生产这些数据的机器总数在1万到100万台之间,甚至更多。With the advent of the cloud era, more and more platforms generate big data in social networks, e-commerce, access records and other sources, that is, very large amounts of data, for example, between 100T and 100P a day, or even larger. The amount of data, and the total number of machines that produce this data is between 10,000 and 1 million units, or even more.
云下的很多业务产生的数据,如网站产生的pv(浏览量)日志,一般需要进行实时存储,检验数据的完整性,保证数据挖掘等处理的准确性。The data generated by many businesses under the cloud, such as the pv (views) logs generated by the website, generally need to be stored in real time, verify the integrity of the data, and ensure the accuracy of data mining and other processing.
目前,虽然有某些系统提供了消息的校验机制,校验存储操作是否成功,但是,这种检测机制仅仅是局部的检测机制,无法应用于云时代大数据的存储校验。At present, although some systems provide a message verification mechanism to verify the success of the storage operation, this detection mechanism is only a local detection mechanism and cannot be applied to the storage verification of big data in the cloud era.
发明内容Summary of the invention
鉴于上述问题,提出了本申请实施例以便提供一种克服上述问题或者至少部分地解决上述问题的一种文本数据的存储校验方法和相应的一种文本数据的存储校验系统。In view of the above problems, embodiments of the present application have been made in order to provide a storage verification method for text data and a corresponding storage verification system for text data that overcome the above problems or at least partially solve the above problems.
为了解决上述问题,本申请实施例公开了一种文本数据的存储校验方法,包括:In order to solve the above problem, the embodiment of the present application discloses a method for storing and verifying text data, including:
一个或多个应用设备将产生的一个或多个文本数据打包成一个或多个文本数据包;所述文本数据包中具有属性信息;The one or more application devices package the generated one or more text data into one or more text data packets; the text data packets have attribute information therein;
一个或多个传输设备将所述一个或多个文本数据包存储至预置的一个或多个存储设备中,当存储成功时,在预设的统计表格中记录所述属性信息;The one or more transmission devices store the one or more text data packets in a preset one or more storage devices, and when the storage is successful, record the attribute information in a preset statistical table;
校验设备在接收到存储校验请求时,根据所述统计表格进行存储校验。When the verification device receives the storage verification request, the verification device performs storage verification according to the statistical table.
优选地,所述一个或多个应用设备将产生的一个或多个文本数据打包成一个或多个文本数据包的步骤包括:Preferably, the step of the one or more application devices packaging the generated one or more text data into one or more text data packets comprises:
当产生的文本数据的大小与预设的大小阈值匹配时,将产生的文本数据打包成文本 数据包;When the size of the generated text data matches the preset size threshold, the generated text data is packed into text data pack;
或者,or,
在当前时间超过预设的时间阈值时,将产生的文本数据打包成文本数据包。The generated text data is packaged into a text packet when the current time exceeds a preset time threshold.
优选地,所述统计表格包括第一统计表格,所述第一统计表格包括存储时间;所述传输设备具有传输设备标识;所述属性信息包括应用设备标识、产生时间;Preferably, the statistical table includes a first statistical table, the first statistical table includes a storage time; the transmission device has a transmission device identifier; the attribute information includes an application device identifier, and a generation time;
所述在预设的统计表格中记录所述属性信息的步骤包括:The step of recording the attribute information in a preset statistical table includes:
查找所述应用设备标识对应的第一统计表格;Finding a first statistical table corresponding to the application device identifier;
判断是否具有未存储成功的特征文本数据包;所述特征文本数据包的产生时间小于当前存储成功的文本数据包的产生时间;Determining whether there is a feature text data packet that is not successfully stored; the generation time of the feature text data packet is smaller than a generation time of the currently stored text data packet;
若不具有,则在所述第一统计表格中,将所述产生时间更新至所述传输设备标识和所述应用设备标识对应的存储时间中。If not, in the first statistics table, the generation time is updated to the storage device identifier and the storage time corresponding to the application device identifier.
优选地,所述统计表格包括第二统计表格,所述第二统计表格包括分区时间段、存储行数;所述属性信息包括产生时间、包数据行数;Preferably, the statistical table includes a second statistical table, where the second statistical table includes a partitioning time period and a storage line number; the attribute information includes a generation time and a number of packet data lines;
所述在预设的统计表格中记录所述属性信息的步骤包括:The step of recording the attribute information in a preset statistical table includes:
在所述第二统计表格中,查找所述第一统计表格对应的、所述产生时间所属的分区时间段;In the second statistic table, searching for a partition time period corresponding to the first statistic table and the generation time;
将所述包数据行数累加至所述分区时间段对应的存储行数中。The number of packet data lines is accumulated into the number of storage lines corresponding to the partition time period.
优选地,所述校验设备根据所述统计表格进行存储校验的步骤包括:Preferably, the step of performing verification verification by the verification device according to the statistical table comprises:
在所述应用设备标识对应的第一统计表格中,查找值最小的存储时间;In the first statistical table corresponding to the application device identifier, searching for a storage time with a minimum value;
确认产生时间小于所述存储时间的文本数据包已存储完成。It is confirmed that the text packet whose generation time is less than the storage time has been stored.
优选地,所述校验设备根据所述统计表格进行存储校验的步骤包括:Preferably, the step of performing verification verification by the verification device according to the statistical table comprises:
统计落入指定的校对时间段的分区时间段的存储行数,获得第一总行数。The number of storage lines falling within the partition time period of the specified proofreading period is counted, and the first total number of rows is obtained.
优选地,所述校验设备根据所述统计表格进行存储校验的步骤还包括:Preferably, the step of performing verification verification by the verification device according to the statistical table further includes:
从所述存储设备中读取在所述校对时间段中存储的文本数据包的第二总行数;Reading, from the storage device, a second total number of lines of text packets stored in the proofreading period;
当所述第一总行数与所述第二总行数相等时,确认所述校对时间段对应的文本数据包未丢失;When the first total number of rows is equal to the second total number of rows, it is confirmed that the text data packet corresponding to the proofreading period is not lost;
当所述第一总行数与所述第二总行数不相等时,确认所述校对时间段对应的文本数据包至少部分丢失。When the first total line number is not equal to the second total line number, it is confirmed that the text data packet corresponding to the proofreading time period is at least partially lost.
优选地,所述属性信息包括产生时间,所述存储设备中包括一个或多个存储分区;Preferably, the attribute information includes a generation time, and the storage device includes one or more storage partitions;
所述一个或多个传输设备将所述一个或多个文本数据包存储至预置的一个或多个存 储设备中的步骤包括:The one or more transmission devices store the one or more text data packets to a preset one or more stores The steps in the storage device include:
将所述文本数据包存储至在预置的存储设备中、所述产生时间对应的存储分区中。The text data packet is stored in a storage partition corresponding to the generation time in a preset storage device.
本申请实施例还公开了一种文本数据的存储校验系统,所述系统包括一个或多个应用设备、一个或多个传输设备、一个或多个存储设备和校验设备;The embodiment of the present application further discloses a storage verification system for text data, where the system includes one or more application devices, one or more transmission devices, one or more storage devices, and a verification device;
其中,所述应用设备包括:The application device includes:
文本数据打包模块,用于将产生的一个或多个文本数据打包成一个或多个文本数据包;所述文本数据包中具有属性信息;a text data packaging module, configured to package the generated one or more text data into one or more text data packets; the text data package has attribute information;
所述传输设备包括:The transmission device includes:
文本数据包存储模块,用于将所述一个或多个文本数据包存储至预置的一个或多个存储设备中;a text data packet storage module, configured to store the one or more text data packets into a preset one or more storage devices;
属性信息记录模块,用于在存储成功时,在预设的统计表格中记录所述属性信息;An attribute information recording module, configured to record the attribute information in a preset statistical table when the storage is successful;
所述校验设备包括:The verification device includes:
存储校验模块,用于在接收到存储校验请求时,根据所述统计表格进行存储校验。The storage verification module is configured to perform storage verification according to the statistical table when receiving the storage verification request.
优选地,所述文本数据打包模块包括:Preferably, the text data packaging module comprises:
第一打包子模块,用于在产生的文本数据的大小与预设的大小阈值匹配时,将产生的文本数据打包成文本数据包;a first packaging submodule, configured to package the generated text data into a text data packet when the size of the generated text data matches a preset size threshold;
或者,or,
第二打包子模块,用于在当前时间超过预设的时间阈值时,将产生的文本数据打包成文本数据包。The second packaging sub-module is configured to package the generated text data into a text data packet when the current time exceeds a preset time threshold.
优选地,所述统计表格包括第一统计表格,所述第一统计表格包括存储时间;所述传输设备具有传输设备标识;所述属性信息包括应用设备标识、产生时间;Preferably, the statistical table includes a first statistical table, the first statistical table includes a storage time; the transmission device has a transmission device identifier; the attribute information includes an application device identifier, and a generation time;
所述属性信息记录模块包括:The attribute information recording module includes:
表格查找子模块,用于查找所述应用设备标识对应的第一统计表格;a table search submodule, configured to search for a first statistical table corresponding to the application device identifier;
特征文本数据包判断子模块,用于判断是否具有未存储成功的特征文本数据包;若不具有,则调用时间更新子模块;所述特征文本数据包的产生时间小于当前存储成功的文本数据包的产生时间;a feature text data packet judging sub-module, configured to determine whether there is a feature text data packet that is not successfully stored; if not, a time update sub-module is invoked; the feature text data packet is generated less than the currently stored text data packet Time of production;
时间更新子模块,用于在所述第一统计表格中,将所述产生时间更新至所述传输设备标识和所述应用设备标识对应的存储时间中。And a time update submodule, configured to update the generation time to the storage time corresponding to the transmission device identifier and the application device identifier in the first statistics table.
优选地,所述统计表格包括第二统计表格,所述第二统计表格包括分区时间段、存储行数;所述属性信息包括产生时间、包数据行数; Preferably, the statistical table includes a second statistical table, where the second statistical table includes a partitioning time period and a storage line number; the attribute information includes a generation time and a number of packet data lines;
所述属性信息记录模块包括:The attribute information recording module includes:
分区时间段查找子模块,用于在所述第二统计表格中,查找所述第一统计表格对应的、所述产生时间所属的分区时间段;a partitioning time period searching sub-module, configured to search, in the second statistical table, a partitioning time period corresponding to the first statistical table and the generating time to which the generating time belongs;
存储行数累加子模块,用于将所述包数据行数累加至所述分区时间段对应的存储行数中。The storage line number accumulation sub-module is configured to accumulate the number of the packet data lines to the number of storage lines corresponding to the partition time period.
优选地,所述存储校验模块包括:Preferably, the storage verification module comprises:
存储时间查找子模块,用于在所述应用设备标识对应的第一统计表格中,查找值最小的存储时间;a storage time search submodule, configured to search for a storage time with a minimum value in a first statistical table corresponding to the application device identifier;
存储完成确认子模块,用于确认产生时间小于所述存储时间的文本数据包已存储完成。The storage completion confirmation sub-module is configured to confirm that the text data packet whose generation time is less than the storage time has been stored.
优选地,所述存储校验模块包括:Preferably, the storage verification module comprises:
存储行数统计子模块,用于统计落入指定的校对时间段的分区时间段的存储行数,获得第一总行数。The storage line number statistics sub-module is configured to count the number of storage lines of the partition time period falling within the specified proofing time period, and obtain the first total number of lines.
优选地,所述存储校验模块还包括:Preferably, the storage verification module further includes:
第二总行数读取子模块,用于从所述存储设备中读取在所述校对时间段中存储的文本数据包的第二总行数;a second total line number reading submodule, configured to read, from the storage device, a second total line number of the text data packet stored in the proofreading period;
未丢失确认子模块,用于在所述第一总行数与所述第二总行数相等时,确认所述校对时间段对应的文本数据包未丢失;The acknowledgment sub-module is configured to: when the first total number of rows is equal to the second total number of rows, confirm that the text data packet corresponding to the proofreading period is not lost;
丢失确认子模块,用于在所述第一总行数与所述第二总行数不相等时,确认所述校对时间段对应的文本数据包至少部分丢失。And a loss confirmation submodule, configured to confirm that the text data packet corresponding to the proofreading period is at least partially lost when the first total number of rows is not equal to the second total number of rows.
优选地,所述属性信息包括产生时间,所述存储设备中包括一个或多个存储分区;Preferably, the attribute information includes a generation time, and the storage device includes one or more storage partitions;
所述文本数据包存储模块包括:The text packet storage module includes:
分区存储子模块,用于将所述文本数据包存储至在预置的存储设备中、所述产生时间对应的存储分区中。And a partition storage submodule, configured to store the text data packet into a storage partition corresponding to the generation time in a preset storage device.
本申请实施例包括以下优点:Embodiments of the present application include the following advantages:
本申请实施例的应用设备将产生的文本数据打包成文本数据包,由传输设备存储至存储设备中,在存储成功时,记录属性信息,由校验设备依据统计的属性信息按照存储校验对存储情况进行校验,通过将文本数据进行打包,将需要统计的文本数据缩小了很多倍,进而大大减少了统计的量值,进而大大降低了校验的处理量,降低了系统的性能 消耗,大大增加了在大数据处理时的实用性,实现了针对云中大数据的整体存储校验。The application device of the embodiment of the present application packs the generated text data into a text data packet, which is stored in the storage device by the transmission device. When the storage is successful, the attribute information is recorded, and the verification device performs the verification check according to the statistical attribute information. The storage condition is verified. By packing the text data, the text data that needs to be counted is reduced by many times, thereby greatly reducing the statistical value, thereby greatly reducing the processing amount of the verification and reducing the performance of the system. Consumption greatly increases the practicality of big data processing and realizes the overall storage verification for big data in the cloud.
本申请实施例在第一统计表格中更新存储时间,通过比对各存储时间,得出值最小的存储时间,实现了海量的文本数据的存储持久化校验。In the embodiment of the present application, the storage time is updated in the first statistical table, and the storage time of the minimum text value is obtained by comparing the storage times, thereby realizing the storage and persistent verification of the massive text data.
本申请实施例在第二统计表格中累积存储行数,通过累加所需分区时间段的存储行数,实现了海量的文本数据的存储数量校验。The embodiment of the present application accumulates the number of stored lines in the second statistical table, and realizes the storage quantity verification of the massive text data by accumulating the number of storage lines of the required partition time period.
本申请实施例通过对比基于传输设备统计的第一总行数和基于存储设备统计的第二总行数,实现了海量的文本数据的存储丢失校验。The embodiment of the present application implements a storage loss verification of a large amount of text data by comparing the first total line number based on the statistics of the transmission device with the second total line number based on the statistics of the storage device.
附图说明DRAWINGS
图1是本申请的一种文本数据的存储校验方法实施例的步骤流程图;1 is a flow chart showing the steps of an embodiment of a method for storing and verifying text data according to the present application;
图2是本申请的一种文本数据的存储校验系统实施例的结构框图。2 is a structural block diagram of an embodiment of a storage verification system for text data according to the present application.
具体实施方式detailed description
为使本申请的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本申请作进一步详细的说明。The above described objects, features and advantages of the present application will become more apparent and understood.
云平台可以向广大用户提供大数据的处理和存储能力,用户在云平台中的很多业务,如某网站的pv(浏览量)日志,需要写入云平台网络侧设备。The cloud platform can provide large data processing and storage capabilities to a large number of users. Many services in the cloud platform, such as a pv (views) log of a website, need to be written to the cloud platform network side device.
假设该网站的前端设备为1000台,每台设备实时产生access.log日志,若将这1000台机器的access.log日志实时写入云平台网络侧设备中,现在业务对数据的质量要求已经越来越高,很多业务不允许丢失任何数据,这个时候需要有机制确保数据的完整性。Assume that the front-end equipment of the website is 1000, and each device generates access.log logs in real time. If the access.log logs of the 1000 machines are written in real time to the network side device of the cloud platform, the quality of the data for the service is now higher. The higher the rate, the more traffic is not allowed to lose any data, this time requires a mechanism to ensure the integrity of the data.
在存储时,一般需要确定日志一条不丢,同时确保所有设备在某一个时间前的数据都成功写入云平台网络侧设备,或者准确给出当前所有机器某一时刻前的access.log都被成功写入了云平台网络侧设备。When storing, it is generally necessary to determine that the log is not lost, and ensure that all devices are successfully written to the cloud platform network side device before a certain time, or accurately give the access.log of all current machines before a certain time. Successfully written to the cloud platform network side device.
目前的方法中无法端到端的提供数据一条不丢的能力,例如,Flume系统虽然提供了消息的ack机制,但这个ack机制仅仅是局部的确认机制,无法确认Flume系统本身成功了写入了多少条数据,无法确认在某个时间段的数据在存储时是否发生丢失,也无法判定最后在云平台网络侧设备上的整体数据的条数是多少。In the current method, the ability to provide data without loss is end-to-end. For example, although the Flume system provides the ack mechanism of the message, the ack mechanism is only a partial confirmation mechanism, and it is impossible to confirm how much the Flume system itself has successfully written. The data cannot be confirmed whether the data in a certain period of time is lost during storage, and it is impossible to determine the total number of pieces of data on the network side device of the cloud platform.
如果写完云存储后,再进行count等操作,显然效率非常低。If you finish the count operation after writing the cloud storage, it is obviously very inefficient.
因此,提出本申请实施例的核心构思之一,通过将数据进行打包,对数据包进行统 计,以便进行存储校验。Therefore, one of the core concepts of the embodiments of the present application is proposed, and the data packets are encapsulated by packing the data. Count for storage verification.
参照图1,示出了本申请的一种文本数据的存储校验方法实施例的步骤流程图,具体可以包括如下步骤:Referring to FIG. 1 , a flow chart of steps of a method for storing and verifying text data of the present application is shown, which may specifically include the following steps:
步骤101,一个或多个应用设备将产生的一个或多个文本数据打包成一个或多个文本数据包;Step 101: One or more application devices package the generated one or more text data into one or more text data packets;
需要说明的是,本申请可以应用于云平台中,即计算机集群,如分布式系统等。It should be noted that the present application can be applied to a cloud platform, that is, a computer cluster, such as a distributed system.
以某个分布式系统为例,该分布式系统可以分为以下几部分:Taking a distributed system as an example, the distributed system can be divided into the following parts:
分布式系统底层服务:提供分布式环境下所需要的协调服务、远程过程调用、安全管理和资源管理的服务。这些底层服务为上层的分布式文件系统、任务调度等模块提供支持。Distributed System Underlying Services: Provides services for coordination services, remote procedure calls, security management, and resource management that are required in a distributed environment. These underlying services support the upper distributed file system, task scheduling and other modules.
分布式文件系统:提供一个海量的、可靠的、可扩展的数据存储服务,将集群中各个节点的存储能力聚集起来,并能够自动屏蔽软硬件故障,为用户提供不间断的数据访问服务;支持增量扩容和数据的自动平衡,提供用户空间文件访问API(Application Program Interface,应用程序编程接口),支持随机读写和追加写的操作。Distributed File System: Provides a massive, reliable, and scalable data storage service that aggregates the storage capabilities of each node in the cluster and automatically shields hardware and software failures to provide users with uninterrupted data access services. Incremental expansion and automatic data balancing, providing user space file access API (Application Program Interface), support random read and write and additional write operations.
任务调度:为集群系统中的任务提供调度服务,同时支持强调响应速度的在线服务(Online Service)和强调处理数据吞吐量的离线任务(Batch Processing Job);自动检测系统中故障和热点,通过错误重试、针对长尾作业并发备份作业等方式,保证作业稳定可靠地完成。Task scheduling: Provide scheduling services for tasks in the cluster system, support online service (Online Service) that emphasizes response speed, and Batch Processing Job that emphasizes processing data throughput; automatically detect faults and hotspots in the system, pass errors Retry, for long tail operations concurrent backup jobs, etc., to ensure that the operation is completed in a stable and reliable manner.
集群监控和部署:对集群的状态和上层应用服务的运行状态和性能指标进行监控,对异常事件产生警报和记录;为运维人员提供整个分布式系统以及上层应用的部署和配置管理,支持在线集群扩容、缩容和应用服务的在线升级。Cluster monitoring and deployment: Monitor the status of the cluster and the running status and performance indicators of the upper-layer application services, and generate alarms and records for abnormal events. Provide deployment and configuration management for the entire distributed system and upper-layer applications for operation and maintenance personnel. Online expansion of cluster expansion, capacity reduction and application services.
在云平台(如分布式系统)中,应用设备可以为在应用服务运行中可产生文本数据的设备,如服务器。In a cloud platform (such as a distributed system), an application device may be a device that can generate text data during application service operation, such as a server.
需要说明的是,该文本数据为时序数据,按照时间顺序生成,如pv日志、access.log日志、系统运行日志等日志数据。It should be noted that the text data is time-series data, and is generated in time sequence, such as pv log, access.log log, system running log, and the like.
在具体实现中,应用设备可以通过若干种方式对文本数据进行打包:In a specific implementation, the application device can package the text data in several ways:
在一种方式中,当产生的文本数据的大小与预设的大小阈值匹配时,将产生的文本数据打包成文本数据包;In one mode, when the size of the generated text data matches a preset size threshold, the generated text data is packaged into a text data packet;
在此方式中,可以基于数据量的维度进行打包。 In this way, you can package based on the dimension of the amount of data.
应用设备将产生的文本数据按照大小阈值进行打包,则可以将需要统计的文本数据缩小了很多倍,进而大大减少了统计的量值,同时,文本数据包中还可以设置属性信息进行过滤等其他实时增值处理的打标操作。The application device packs the generated text data according to the size threshold, so that the text data that needs to be counted can be reduced by many times, thereby greatly reducing the statistical value. At the same time, the text data packet can also be set with attribute information for filtering and the like. Marking operation for real-time value-added processing.
例如,假设平均每条文本数据的大小为1k,若一个文本数据包的阈值为512K,即一个文本数据包的大小为512K,若产生的文本数据为512亿条,应用在先的技术进行统计则需要统计512亿次,而经过打包后,512亿条文本数据变成1亿个文本数据包,则需要统计的文本数据的量缩小达到512倍。For example, suppose the average size of each text data is 1k. If the threshold of a text packet is 512K, that is, the size of a text packet is 512K, if the generated text data is 51.2 billion, the prior art is used for statistics. Then it needs to count 51.2 billion times, and after packaging, 51.2 billion pieces of text data becomes 100 million text data packets, and the amount of text data that needs to be statistically reduced is 512 times.
在另一种方式中,在当前时间超过预设的时间阈值时,将产生的文本数据打包成文本数据包。In another mode, the generated text data is packaged into a text data packet when the current time exceeds a preset time threshold.
在此方式中,可以基于时间的维度进行打包。In this way, you can package based on the dimension of time.
由于文本数据是时序数据,因此,在云端网络侧设备中也可以按时间分区存储。Since the text data is time-series data, it can also be partitioned by time in the cloud network side device.
例如,若文本数据按照小时分区,在云端网络侧设备中有24个分区,名字为00、01……23,则00:00:00~00:59:59产生的文本数据存储在00分区,01:00:00~01:59:59的文本数据存储在01分区,其他时间产生的文本数据的存储方式也类似。For example, if the text data is partitioned by the hour, there are 24 partitions in the cloud network side device, the names are 00, 01...23, and the text data generated by 00:00:00~00:59:59 is stored in the 00 partition. The text data from 01:00:00 to 01:59:59 is stored in the 01 partition, and the text data generated at other times is stored in a similar manner.
经过打包后,文本数据包也可以按照产生时间存储在相应的分区中,一般情况下,需要保证文本数据包落入分区的正确性,比如,某个文本数据包的落入00分区,在该文本数据包中一般不包含01:00:00~01:59:59产生的文本数据,否则01:00:00~01:59:59产生的文本数据也会被放到00分区中,会造成文本数据的漂移(分区不准确),也是一种数据质量故障。After being packaged, the text data packet can also be stored in the corresponding partition according to the generation time. In general, it is necessary to ensure the correctness of the text data packet falling into the partition, for example, a text data packet falls into the 00 partition, where Text data packets generally do not contain text data generated from 01:00:00 to 01:59:59, otherwise the text data generated from 01:00:00 to 01:59:59 will be placed in the 00 partition, which will result in The drift of text data (inaccurate partitioning) is also a data quality failure.
当然,文本数据的漂移通常不需要百分之百的避免,但不能幅度太大,若以5分钟等较短的时间阈值进行打包,可以有效防止文本数据的漂移,将写错分区的情况控制在一个可接受的误差范围内。Of course, the drift of text data usually does not need to be 100% avoided, but it cannot be too large. If it is packaged with a shorter time threshold such as 5 minutes, the text data can be effectively prevented from drifting, and the situation of mis-segmentation can be controlled to one. Accepted within the error range.
当然,上述打包方式可以同时使用,例如,假设大小阈值为512K,时间阈值为5分钟,对13:00:00-13:04:59的文本数据进行打包,生成3个文本数据包,分别为A1(第一条文本数据的产生的时间为13:00:00,大小为512K)、A2(大小为512K)、A3(大小为402K,最后一条文本数据的产生的时间为13:04:59)。Of course, the above packing mode can be used at the same time. For example, if the size threshold is 512K and the time threshold is 5 minutes, the text data of 13:00:00-13:04:59 is packaged, and three text data packets are generated, respectively. A1 (the first text data is generated at 13:00:00, the size is 512K), A2 (size is 512K), A3 (the size is 402K, and the last text data is generated at 13:04:59). ).
此外,上述打包方式只是作为示例,在实施本申请实施例时,可以根据实际情况设置其他打包方式,本申请实施例对此不加以限制。另外,除了上述打包方式外,本领域技术人员还可以根据实际需要采用其它打包方式,本申请实施例对此也不加以限制。In addition, the foregoing packaging mode is only an example. When the embodiment of the present application is implemented, other packaging modes may be set according to actual conditions, which is not limited by the embodiment of the present application. In addition, in addition to the above-mentioned packaging method, other packaging methods may be adopted by those skilled in the art according to actual needs, and the embodiment of the present application does not limit this.
需要说明的是,文本数据包可以进行压缩,以在网络传输中为了节省带宽,也可以 不进行压缩,本申请实施例对此不加以限制。It should be noted that text packets can be compressed to save bandwidth during network transmission. The embodiment of the present application does not limit this.
在文本数据打包成功时,可以对文本数据包配置属性信息,即文本数据包中具有属性信息。When the text data is packaged successfully, attribute information can be configured for the text data packet, that is, the text data package has attribute information.
在具体实现中,文本数据包的结构中除了文本数据本身之外,还可以设置称之为属性的数据,而这些属性都有相应的名字,用于存储相应的属性信息,该属性信息可以包括应用设备标识、产生时间、包数据行数。In a specific implementation, in addition to the text data itself, the structure of the text data packet may also set data called an attribute, and these attributes have corresponding names for storing corresponding attribute information, and the attribute information may include Application device identification, generation time, number of package data lines.
其中,应用设备标识(HostName)为产生该文本数据包中文本数据的应用设备的标识,即一个唯一确定的应用设备的信息,如应用设备ID、主机地址。The application device identifier (HostName) is an identifier of an application device that generates text data in the text packet, that is, a uniquely determined information of the application device, such as an application device ID and a host address.
产生时间(FileTime)为产生该文本数据包中文本数据的时间,一般情况下,可以以该文本数据包中第一条文本数据的产生的时间作为该文本数据包的产生时间。The time (FileTime) is the time at which the text data in the text packet is generated. In general, the time when the first piece of text data in the text packet is generated may be used as the generation time of the text packet.
包数据行数(LineCount)为文本数据包中所有文本数据的行数;若在数据库中,可以使用select count(1)from table_name指令来统计文本数据包中的包数据行数;若在分布式的数据库中或者分布式环境下,可以使用Map Reduce对文本数据进行条数扫描,累加至包数据行数。The number of packets (LineCount) is the number of rows of all text data in the text packet; if in the database, you can use the select count(1) from table_name command to count the number of packet data rows in the text packet; In the database or distributed environment, you can use Map Reduce to scan the text data and add it to the number of packet data rows.
在文本数据包打包成功时,则可以将该文本数据包发送至一个或多个传输设备。When the text packet is successfully packaged, the text packet can be sent to one or more transmission devices.
需要说明的是,在云平台(如分布式系统)中,一般具有ack等机制保证传输的成功性,不发生丢包,若该文本数据包发送传输设备失败,则继续重新发送,直至发送成功。It should be noted that in a cloud platform (such as a distributed system), a mechanism such as ack is generally used to ensure the success of the transmission, and no packet loss occurs. If the text packet fails to transmit, the device continues to resend until the transmission succeeds. .
步骤102,一个或多个传输设备将所述一个或多个文本数据包存储至预置的一个或多个存储设备中;Step 102: The one or more transmission devices store the one or more text data packets in a preset one or more storage devices;
在云平台(如分布式系统)中,传输设备可以为传输数据(如文本数据包)至处理节点(如存储设备)的设备,存储设备可以为存储数据(如文本数据包)的设备。In a cloud platform (such as a distributed system), the transmission device may be a device that transmits data (such as a text packet) to a processing node (such as a storage device), and the storage device may be a device that stores data (such as a text packet).
在本申请实施例中,云平台提供了用于存储数据的API(Application Program Interface,应用程序接口),由传输设备调用该API,写入文本数据包。In the embodiment of the present application, the cloud platform provides an API (Application Program Interface) for storing data, and the API is called by the transmission device to write a text packet.
在实际应用中,传输设备可以采用若干种分配策略为文本数据包分配存储设备,本申请实施例对此不加以限制。In an actual application, the transmission device may allocate a storage device for the text data packet by using a plurality of allocation policies, which is not limited in this embodiment of the present application.
例如,分配策略为哈希分配(hash(x)%N),即计算文本数据包的hash(哈希)值,分配到hash(C)%N对应的存储设备上。For example, the allocation policy is hash allocation (hash(x)%N), that is, calculating the hash value of the text packet, and assigning it to the storage device corresponding to hash(C)%N.
又例如,分配策略为随机分配,取一个随机数,然后将文本数据包分配到random(C)%N对应的存储设备上。 For another example, the allocation strategy is random allocation, taking a random number, and then distributing the text packet to the storage device corresponding to random(C)%N.
在本申请的一种优选实施例中,存储设备中可以包括一个或多个存储分区,每个存储分区可以存储某一时间段的文本数据包,该时间段可以由本领域技术人员根据实际情况进行设定,如1小时、1天等等,本申请实施例对此不加以限制。In a preferred embodiment of the present application, the storage device may include one or more storage partitions, and each storage partition may store a text packet for a certain period of time, which may be performed by a person skilled in the art according to actual conditions. The embodiment of the present application does not limit this, such as one hour, one day, and the like.
因此,在存储时,可以查找文本数据包所属的存储分区,将该文本数据包存储至在预置的存储设备中、该产生时间对应的存储分区中。Therefore, when storing, the storage partition to which the text packet belongs can be searched, and the text packet is stored in the storage partition corresponding to the generation time in the preset storage device.
需要说明的是,在云平台(如分布式系统)中,一般具有ack等机制保证传输的成功性,不发生丢包,若该文本数据包发送存储设备失败,则继续重新发送,直至发送成功。It should be noted that in a cloud platform (such as a distributed system), a mechanism such as ack is generally used to ensure the success of the transmission, and no packet loss occurs. If the text packet fails to be transmitted, the device continues to resend until the transmission succeeds. .
步骤103,当存储成功时,在预设的统计表格中记录所述属性信息;Step 103: When the storage is successful, record the attribute information in a preset statistical table;
若文本数据包存储成功,则可以对该文本数据包进行属性信息的记录,已进行相应的存储校验。If the text packet is successfully stored, the attribute information may be recorded on the text packet, and the corresponding storage check has been performed.
在本申请的一种优选实施例中,所述统计表格可以包括第一统计表格,所述第一统计表格可以包括存储时间,则在本申请实施例中,步骤103可以包括如下子步骤:In a preferred embodiment of the present application, the statistical table may include a first statistical table, and the first statistical table may include a storage time. In the embodiment of the present application, the step 103 may include the following sub-steps:
子步骤S11,查找所述应用设备标识对应的第一统计表格;Sub-step S11, searching for a first statistical table corresponding to the application device identifier;
子步骤S12,判断是否具有未存储成功的特征文本数据包;若不具有,则执行子步骤S13;Sub-step S12, it is determined whether there is a feature text data packet that has not been successfully stored; if not, sub-step S13 is performed;
其中,所述特征文本数据包的产生时间小于当前存储成功的文本数据包的产生时间;The generation time of the feature text data packet is less than the generation time of the currently stored text data packet;
子步骤S13,在所述第一统计表格中,将所述产生时间更新至所述传输设备标识和所述应用设备标识对应的存储时间中。Sub-step S13, in the first statistical table, updating the generation time to the storage time corresponding to the transmission device identifier and the application device identifier.
在本申请实施例中,在云平台向广大用户提供大数据的处理和存储能力等情况中,用户可以租用云平台中的部分应用设备,即用户标识(如用户ID)与应用设备标识具有关联关系,因此,同一用户的应用设备产生的文本数据包通常进行统一统计,不同用户具有不同的第一统计表格。In the embodiment of the present application, in the case that the cloud platform provides the processing and storage capability of the large data to the user, the user may rent some application devices in the cloud platform, that is, the user identifier (such as the user ID) is associated with the application device identifier. Relationships, therefore, text packets generated by the same user's application device are usually uniformly counted, and different users have different first statistical tables.
在第一统计表格中,同一传输设备标识和同一应用设备标识对应的存储时间一般有一个(即一对一关系)。In the first statistical table, the storage time corresponding to the same transmission device identifier and the same application device identifier generally has one (ie, one-to-one relationship).
例如,用户甲分配的第一统计表格为Test1,其应用设备包括应用设备application_1、应用设备application_2,传输设备包括传输设备transmission_1、传输设备transmission_2,则第一统计表格的示例可以如表1所示:For example, the first statistical table assigned by the user A is Test1, and the application device includes the application device application_1 and the application device application_2, and the transmission device includes the transmission device transmission_1 and the transmission device transmission_2. The example of the first statistical table may be as shown in Table 1:
表1 Table 1
第一统计表格First statistical form 传输机器标识Transfer machine identification 应用设备标识Application device identification 存储时间Storage time
Test1Test1 transmission_1Transmission_1 application_1Application_1 13:00:0013:00:00
Test1Test1 transmission_1Transmission_1 application_2Application_2 15:00:0015:00:00
Test1Test1 transmission_2Transmission_2 application_1Application_1 14:00:0014:00:00
Test1Test1 transmission_2Transmission_2 application_2Application_2 14:00:0014:00:00
该存储时间是不断刷新的,可以表征已存储的,由某个传输设备传输的、某个应用设备所产生的文本数据包的最新时间。The storage time is constantly refreshed, and can represent the latest time of the stored text packet generated by an application device transmitted by a certain transmission device.
假设,由传输设备transmission_1将应用设备application_1产生的某个文本数据包成功存储至存储设备,该文本数据包的产生时间为13:00:03,则第一统计表格的修改示例可以如表2所示:It is assumed that a certain text data packet generated by the application device application_1 is successfully stored by the transmission device transmission_1 to the storage device, and the generation time of the text data packet is 13:00:03, and the modified example of the first statistical table may be as shown in Table 2. Show:
表2Table 2
第一统计表格First statistical form 传输设备标识Transmission equipment identification 应用设备标识Application device identification 存储时间Storage time
Test1Test1 transmission_1Transmission_1 application_1Application_1 13:00:0313:00:03
Test1Test1 transmission_1Transmission_1 application_2Application_2 15:00:0015:00:00
Test1Test1 transmission_2Transmission_2 application_1Application_1 14:00:0014:00:00
Test1Test1 transmission_2Transmission_2 application_2Application_2 14:00:0014:00:00
此外,为了保证文本数据包的时序性,可以在文件数据包存储成功时,检测是否具有未存储成功的特征文本数据包,若具有,则可以换成当前存储成功的文本数据包的产生时间,待检测到不具有未存储成功的特征文本数据包时,更新至存储时间。In addition, in order to ensure the temporality of the text data packet, it is possible to detect whether there is a feature text data packet that has not been successfully stored when the file data packet is successfully stored, and if so, it can be replaced with the generation time of the currently stored text data packet. When it is detected that there is no feature text packet that is not stored successfully, it is updated to the storage time.
例如,传输设备将上述文本数据包A1、A2、A3传输至存储设备,若A1、A2尚未存储成功,而A3已存储成功,则A3的产生时间不更新至第一统计表格中的存储时间,待A1、A2存储成功时,更新至第一统计表格中的存储时间。For example, the transmission device transmits the text data packets A1, A2, and A3 to the storage device. If A1 and A2 have not been successfully stored, and A3 has been successfully stored, the generation time of A3 is not updated to the storage time in the first statistical table. When A1 and A2 are successfully stored, the storage time in the first statistical table is updated.
在本申请的另一种优选实施例中,所述统计表格包括第二统计表格,所述第二统计表格包括分区时间段、存储行数,则在本申请实施例中,步骤103可以包括如下子步骤:In another preferred embodiment of the present application, the statistical table includes a second statistical table, where the second statistical table includes a partitioning time period and a number of storage lines. In the embodiment of the present application, the step 103 may include the following steps. Substeps:
子步骤S21,在所述第二统计表格中,查找所述第一统计表格对应的、所述产生时间所属的分区时间段;Sub-step S21, in the second statistical table, searching for a partition time period corresponding to the first statistical table to which the generation time belongs;
子步骤S23,将所述包数据行数累加至所述分区时间段对应的存储行数中。Sub-step S23, the number of the packet data lines is accumulated into the number of storage lines corresponding to the partition time period.
在本申请实施例中,也可以对同一用户的应用设备产生的文本数据包进行统一统计。In the embodiment of the present application, the text data packets generated by the application device of the same user may also be uniformly counted.
在第二统计表格中,第一表格对应的分区时间段(PartitionTime)、存储行数一般有 一个(即一对一关系),该分区时间段可以由本领域技术人根据实际情况进行设定,如1小时、15分钟等等,存储行数为累计值,可以表征在该分区时间段已存储的某个用户的应用设备所产生的文本数据包的数据行数。In the second statistical table, the partition time period (PartitionTime) and the number of storage lines corresponding to the first table generally have One (ie, one-to-one relationship), the partition time period can be set by a person skilled in the art according to actual conditions, such as 1 hour, 15 minutes, etc., the number of stored lines is an accumulated value, which can be characterized as being stored in the partition time period. The number of data lines of a text packet generated by a user's application device.
例如,用户甲分配的第一统计表格为Test1,其分区时间段设置为15分钟,用户乙分配的第一统计表格为Test2,其分区时间段设置为10分钟,则第二统计表格的示例可以表3所示:For example, the first statistical table assigned by user A is Test1, the partitioning time period is set to 15 minutes, the first statistical table assigned by user B is Test2, and the partitioning time period is set to 10 minutes, and the example of the second statistical table may be Table 3 shows:
表3table 3
第一统计表格First statistical form PartitionTimePartitionTime lineCountlineCount
Test1Test1 12:30:0012:30:00 15001500
Test1Test1 12:45:0012:45:00 16001600
Test1Test1 13:00:0013:00:00 13001300
……...... ……...... ……......
Test2Test2 13:00:0013:00:00 12001200
Test2Test2 13:10:0013:10:00 11001100
Test2Test2 13:20:0013:20:00 15001500
假设,当前某个文本数据包成功存储至存储设备,该文本数据包的产生时间为13:00:03,属于13:00:00的分区时间段,行数为50,则第二统计表格的修改示例可以如表4所示:Assume that a certain text packet is successfully stored to the storage device, and the text packet is generated at 13:00:03, belonging to a partition time period of 13:00:00, and the number of rows is 50, and the second statistical table is The modified example can be as shown in Table 4:
表4Table 4
第一统计表格First statistical form PartitionTimePartitionTime lineCountlineCount
Test1Test1 12:30:0012:30:00 15001500
Test1Test1 12:45:0012:45:00 16001600
Test1Test1 13:00:0013:00:00 13501350
……...... ……...... ……......
Test2Test2 13:00:0013:00:00 12001200
Test2Test2 13:10:0013:10:00 11001100
Test2Test2 13:20:0013:20:00 15001500
步骤104,校验设备在接收到存储校验请求时,根据所述统计表格进行存储校验。Step 104: When receiving the storage verification request, the verification device performs storage verification according to the statistical table.
在云平台(如分布式系统)中,校验设备可以为后端设备,其提供API,让用户调 用存储校验请求,对该用户的应用设备产生的文本数据包的存储情况进行校验。In a cloud platform (such as a distributed system), the verification device can be a back-end device, which provides an API for the user to tune The storage verification request is used to verify the storage status of the text data packet generated by the user's application device.
本申请实施例的应用设备将产生的文本数据打包成文本数据包,由传输设备存储至存储设备中,在存储成功时,记录属性信息,由校验设备依据统计的属性信息按照存储校验对存储情况进行校验,通过将文本数据进行打包,将需要统计的文本数据缩小了很多倍,进而大大减少了统计的量值,进而大大降低了校验的处理量,降低了系统的性能消耗,大大增加了在大数据处理时的实用性,实现了针对云的整体存储校验。The application device of the embodiment of the present application packs the generated text data into a text data packet, which is stored in the storage device by the transmission device. When the storage is successful, the attribute information is recorded, and the verification device performs the verification check according to the statistical attribute information. The storage condition is verified. By packaging the text data, the text data that needs to be counted is reduced by many times, thereby greatly reducing the statistical value, thereby greatly reducing the processing amount of the verification and reducing the performance consumption of the system. It greatly increases the practicability in big data processing and realizes the overall storage verification for the cloud.
在本申请的一种优选实施例中,步骤104可以包括如下子步骤:In a preferred embodiment of the present application, step 104 may include the following sub-steps:
子步骤S31,在所述应用设备标识对应的第一统计表格中,查找值最小的存储时间;Sub-step S31, in the first statistical table corresponding to the application device identifier, searching for a storage time with a minimum value;
子步骤S32,确认产生时间小于所述存储时间的文本数据包已存储完成。Sub-step S32, confirming that the text packet whose generation time is less than the storage time has been stored.
在本申请实施例中,该存储校验请求可以用于校验哪个时间点之前的文本数据包存储完成(即持久化)。In the embodiment of the present application, the storage verification request may be used to verify that the text packet storage before the time point is completed (ie, persisted).
该存储校验请求可以包括用户信息(如用户ID)、第一统计表格标识、第一校验标识等参数。The storage verification request may include parameters such as user information (such as a user ID), a first statistical form identifier, and a first verification identifier.
其中,用户信息可以用于对存储校验请求进行鉴权,当鉴权通过时,允许进行存储校验。The user information may be used to authenticate the storage verification request, and when the authentication is passed, the storage verification is allowed.
第一统计表格标识为标识第一统计表格的信息,如名称、ID等等。The first statistical table is identified as information identifying the first statistical form, such as a name, an ID, and the like.
第一校验标识为表示进行校验哪个时间点之前的文本数据包存储完成的信息。The first check identifier is information indicating that the text packet storage before the time point is verified is completed.
在实际应用中,各传输设备(传输设备标识表征)存储成功的各应用设备(应用设备标识表征)产生的文件数据包的产生时间,都小于或等于值最小的存储时间,在值最小的存储时间与其他存储时间之间可能存在尚未存储的文件数据包,因此,可以以值最小的存储时间表征已完成存储的文本数据包时间点。In practical applications, each transmission device (transmission device identification representation) stores a successful generation of each application device (application device identification representation), and the generation time of the file data packet is less than or equal to the storage time with the smallest value, and the storage with the smallest value. There may be a file packet that has not been stored between time and other storage time, so the text packet time point of the completed storage can be characterized by the storage time with the smallest value.
例如,如表2所示,值最小的存储时间为13:00:03,transmission_1存储成功的application_1、application_2产生的文件数据包的产生时间都小于或等于13:00:03,transmission_1可能存在产生时间在13:00:03-15:00:00的文本数据包尚未存储成功;For example, as shown in Table 2, the storage time of the smallest value is 13:00:03, and the generation time of the file data packets generated by the application_1 and application_2 that the transmission_1 stores successfully is less than or equal to 13:00:03, and the transmission_1 may have a generation time. The text packet at 13:00:03-15:00:00 has not been stored successfully;
transmission_2存储成功的application_1、application_2产生的文件数据包的产生时间都小于13:00:03,transmission_1可能存在产生时间在13:00:03-14:00:00的文本数据包尚未存储成功;Transmission_2 stores the successful application_1 and application_2 to generate file packets with a time less than 13:00:03. The transmission_1 may have a text packet whose generation time is 13:00:03-14:00:00 has not been stored successfully.
相对地,用户甲的应用设备在13:00:03之前产生文本数据包均已存储完成。In contrast, User A's application device generated text packets before 13:00:03 have been stored.
本申请实施例在第一统计表格中更新存储时间,通过比对各存储时间,得出值最小的存储时间,实现了海量的文本数据的存储持久化校验。 In the embodiment of the present application, the storage time is updated in the first statistical table, and the storage time of the minimum text value is obtained by comparing the storage times, thereby realizing the storage and persistent verification of the massive text data.
在本申请的一种优选实施例中,步骤104可以包括如下子步骤:In a preferred embodiment of the present application, step 104 may include the following sub-steps:
子步骤S41,统计落入指定的校对时间段的分区时间段的存储行数,获得第一总行数;Sub-step S41, counting the number of storage lines falling within the partition time period of the specified proofreading period, and obtaining the first total number of lines;
在本申请实施例中,该存储校验请求可以用于校验哪个时间段之间的已存储的文本数据包的行数。In the embodiment of the present application, the storage check request may be used to check the number of rows of stored text data packets between which time periods.
该存储校验请求可以包括用户信息(如用户ID)、第一统计表格标识、第二校验标识等参数、校验时间段。The storage verification request may include user information (such as a user ID), a first statistical form identifier, a second verification identifier, and the like, and a verification time period.
其中,第二校验标识为表示进行校验哪个时间段之间已存储的文本数据包的行数的信息;The second check identifier is information indicating the number of lines of the text data packet that is stored between the time periods during which the check is performed;
校对时间段用于统计该段时间已存储的文本数据包的行数。The proofreading period is used to count the number of lines of text packets that have been stored during that time.
在实际应用中,对落入指定的校对时间段的分区时间段的存储行数进行汇总,则可以获得该校验时间段的总的行数(即第一同行数)。In an actual application, the total number of rows of the partitioning time period falling within the specified proofing time period is summarized, and the total number of rows of the verification time period (ie, the first number of peers) can be obtained.
对于统计的第一总行数:For the first total number of rows in the statistics:
一、业务的有了直观的统计报表,对于业务的增值服务或者逻辑处理上提供直观的数据依据;First, the business has an intuitive statistical report, providing an intuitive data basis for value-added services or logical processing of the business;
二、确保数据的完整性提供期望数据。Second, ensure the integrity of the data to provide the expected data.
文本数据落到云平台的存储设备上后,是否有没有受损,质量如何,可以通过统计第一总行数进行对比,来确定业务的所有文本数据落到云平台的存储设备的质量情况,如果质量不满足业务的需求,则需要进行纠正。After the text data falls on the storage device of the cloud platform, whether it is damaged or not, the quality can be compared by counting the number of the first total rows to determine the quality of all the text data of the service falling to the storage device of the cloud platform, if If the quality does not meet the needs of the business, it needs to be corrected.
需要说明的是,进行行数统计的校验该段时间一般小于表征文本数据包存储完成的时间点,否则,统计尚未存储完成的时间段,可能导致统计失去意义。It should be noted that the verification of the row number statistics is generally less than the time point at which the storage of the text packet is completed. Otherwise, the period in which the statistics have not been stored is completed, which may cause the statistics to lose meaning.
例如,如表4所示,若用户甲需要对12:30:00-13:00:00(校验时间段)之间文件数据包进行统计,则可以将分区时间段(PartitionTime)为12:30:00、12:45:00的存储行数(lineCount)进行汇总,统计出第一总行数为3100(行)。For example, as shown in Table 4, if user A needs to count file data packets between 12:30:00 and 13:00:00 (checking time period), the partition time period (PartitionTime) can be 12: The number of storage lines (lineCount) at 30:00 and 12:45:00 is summarized, and the first total number of lines is counted as 3100 (rows).
若用户甲需要对12:30:00-13:15:00(校验时间段)之间文件数据包进行统计,由于表征文本数据包存储完成的时间点为13:00:03,即在13:00:03-13:15:00之间可能存储尚未存储成功的文件数据包,所统计的第一总行数可能并非真实的行数。If user A needs to count the file data packets between 12:30:00-13:15:00 (checking time period), the time point for storing the character data packet is 13:00:03, that is, at 13 Between 00:03-13:15:00, it is possible to store file packets that have not been successfully stored. The first total number of rows counted may not be the actual number of rows.
本申请实施例在第二统计表格中累积存储行数,通过累加所需分区时间段的存储行数,实现了海量的文本数据的存储数量校验。The embodiment of the present application accumulates the number of stored lines in the second statistical table, and realizes the storage quantity verification of the massive text data by accumulating the number of storage lines of the required partition time period.
子步骤S42,从所述存储设备中读取在所述校对时间段中存储的文本数据包的第二 总行数;Sub-step S42, reading, from the storage device, the second of the text data packets stored in the proofreading period Total number of lines;
子步骤S43,当所述第一总行数与所述第二总行数相等时,确认所述校对时间段对应的文本数据包未丢失;Sub-step S43, when the first total line number is equal to the second total line number, it is confirmed that the text data packet corresponding to the proofreading period is not lost;
子步骤S44,当所述第一总行数与所述第二总行数不相等时,确认所述校对时间段对应的文本数据包至少部分丢失。Sub-step S44, when the first total line number is not equal to the second total line number, it is confirmed that the text data packet corresponding to the proofreading time period is at least partially lost.
在本申请实施例中,为了进一步校验在存储中是否发生丢失,可以将存储设备统计的第二总行数与传输设备统计的第一总行数进行比对,若两者相等,则可以表示未发生丢失,若两者不相等,则可以表示发生丢失。In the embodiment of the present application, in order to further verify whether a loss occurs in the storage, the second total number of rows counted by the storage device may be compared with the first total number of rows counted by the transmission device, and if the two are equal, the representation may be Loss occurs, and if the two are not equal, it can indicate that a loss has occurred.
本申请实施例通过对比基于传输设备统计的第一总行数和基于存储设备统计的第二总行数,实现了海量的文本数据的存储丢失校验。The embodiment of the present application implements a storage loss verification of a large amount of text data by comparing the first total line number based on the statistics of the transmission device with the second total line number based on the statistics of the storage device.
需要说明的是,对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请实施例并不受所描述的动作顺序的限制,因为依据本申请实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本申请实施例所必须的。It should be noted that, for the method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should understand that the embodiments of the present application are not limited by the described action sequence, because In accordance with embodiments of the present application, certain steps may be performed in other sequences or concurrently. In the following, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions involved are not necessarily required in the embodiments of the present application.
参照图2,示出了本申请的一种文本数据的存储校验系统实施例的结构框图,所述系统可以包括一个或多个应用设备210、一个或多个传输设备220、一个或多个存储设备230和校验设备240;Referring to FIG. 2, there is shown a structural block diagram of an embodiment of a storage verification system for text data of the present application, which system may include one or more application devices 210, one or more transmission devices 220, one or more Storage device 230 and verification device 240;
其中,所述应用设备210可以包括如下模块:The application device 210 may include the following modules:
文本数据打包模块211,用于将产生的一个或多个文本数据打包成一个或多个文本数据包;所述文本数据包中具有属性信息;a text data packaging module 211, configured to package the generated one or more text data into one or more text data packets; the text data package has attribute information;
所述传输设备220可以包括如下模块:The transmission device 220 can include the following modules:
文本数据包存储模块221,用于将所述一个或多个文本数据包存储至预置的一个或多个存储设备230中;a text data packet storage module 221, configured to store the one or more text data packets into a preset one or more storage devices 230;
属性信息记录模块222,用于在存储成功时,在预设的统计表格中记录所述属性信息;The attribute information recording module 222 is configured to record the attribute information in a preset statistical table when the storage is successful;
所述校验设备240可以包括如下模块:The verification device 240 can include the following modules:
存储校验模块241,用于在接收到存储校验请求时,根据所述统计表格进行存储校 验。The storage verification module 241 is configured to perform storage calibration according to the statistical table when receiving the storage verification request Test.
在本申请的一种优选实施例中,所述文本数据打包模块211可以包括如下子模块:In a preferred embodiment of the present application, the text data packaging module 211 may include the following sub-modules:
第一打包子模块,用于在产生的文本数据的大小与预设的大小阈值匹配时,将产生的文本数据打包成文本数据包;a first packaging submodule, configured to package the generated text data into a text data packet when the size of the generated text data matches a preset size threshold;
或者,or,
第二打包子模块,用于在当前时间超过预设的时间阈值时,将产生的文本数据打包成文本数据包。The second packaging sub-module is configured to package the generated text data into a text data packet when the current time exceeds a preset time threshold.
在本申请的一种优选实施例中,所述统计表格可以包括第一统计表格,所述第一统计表格可以包括存储时间;所述传输设备可以具有传输设备标识;所述属性信息可以包括应用设备标识、产生时间;In a preferred embodiment of the present application, the statistical table may include a first statistical table, the first statistical table may include a storage time; the transmission device may have a transmission device identifier; and the attribute information may include an application. Equipment identification, production time;
所述属性信息记录模块222可以包括如下子模块:The attribute information recording module 222 can include the following sub-modules:
表格查找子模块,用于查找所述应用设备标识对应的第一统计表格;a table search submodule, configured to search for a first statistical table corresponding to the application device identifier;
特征文本数据包判断子模块,用于判断是否具有未存储成功的特征文本数据包;若不具有,则调用时间更新子模块;所述特征文本数据包的产生时间小于当前存储成功的文本数据包的产生时间;a feature text data packet judging sub-module, configured to determine whether there is a feature text data packet that is not successfully stored; if not, a time update sub-module is invoked; the feature text data packet is generated less than the currently stored text data packet Time of production;
时间更新子模块,用于在所述第一统计表格中,将所述产生时间更新至所述传输设备标识和所述应用设备标识对应的存储时间中。And a time update submodule, configured to update the generation time to the storage time corresponding to the transmission device identifier and the application device identifier in the first statistics table.
在本申请的另一种优选实施例中,所述统计表格可以包括第二统计表格,所述第二统计表格可以包括分区时间段、存储行数;所述属性信息可以包括产生时间、包数据行数;In another preferred embodiment of the present application, the statistical table may include a second statistical table, where the second statistical table may include a partitioning time period and a storage line number; the attribute information may include a generation time, a packet data. Rows;
所述属性信息记录模块222可以包括如下子模块:The attribute information recording module 222 can include the following sub-modules:
分区时间段查找子模块,用于在所述第二统计表格中,查找所述第一统计表格对应的、所述产生时间所属的分区时间段;a partitioning time period searching sub-module, configured to search, in the second statistical table, a partitioning time period corresponding to the first statistical table and the generating time to which the generating time belongs;
存储行数累加子模块,用于将所述包数据行数累加至所述分区时间段对应的存储行数中。The storage line number accumulation sub-module is configured to accumulate the number of the packet data lines to the number of storage lines corresponding to the partition time period.
在本申请的一种优选实施例中,所述存储校验模块241可以包括如下子模块:In a preferred embodiment of the present application, the storage verification module 241 may include the following sub-modules:
存储时间查找子模块,用于在所述应用设备标识对应的第一统计表格中,查找值最小的存储时间;a storage time search submodule, configured to search for a storage time with a minimum value in a first statistical table corresponding to the application device identifier;
存储完成确认子模块,用于确认产生时间小于所述存储时间的文本数据包已存储完成。 The storage completion confirmation sub-module is configured to confirm that the text data packet whose generation time is less than the storage time has been stored.
在本申请的另一种优选实施例中,所述存储校验模块241可以包括如下子模块:In another preferred embodiment of the present application, the storage verification module 241 may include the following sub-modules:
存储行数统计子模块,用于统计落入指定的校对时间段的分区时间段的存储行数,获得第一总行数。The storage line number statistics sub-module is configured to count the number of storage lines of the partition time period falling within the specified proofing time period, and obtain the first total number of lines.
在本申请的另一种优选实施例中,所述存储校验模块241还可以包括如下子模块:In another preferred embodiment of the present application, the storage verification module 241 may further include the following submodules:
第二总行数读取子模块,用于从所述存储设备中读取在所述校对时间段中存储的文本数据包的第二总行数;a second total line number reading submodule, configured to read, from the storage device, a second total line number of the text data packet stored in the proofreading period;
未丢失确认子模块,用于在所述第一总行数与所述第二总行数相等时,确认所述校对时间段对应的文本数据包未丢失;The acknowledgment sub-module is configured to: when the first total number of rows is equal to the second total number of rows, confirm that the text data packet corresponding to the proofreading period is not lost;
丢失确认子模块,用于在所述第一总行数与所述第二总行数不相等时,确认所述校对时间段对应的文本数据包至少部分丢失。And a loss confirmation submodule, configured to confirm that the text data packet corresponding to the proofreading period is at least partially lost when the first total number of rows is not equal to the second total number of rows.
在本申请的另一种优选实施例中,所述属性信息可以包括产生时间,所述存储设备230中可以包括一个或多个存储分区;In another preferred embodiment of the present application, the attribute information may include a generation time, and the storage device 230 may include one or more storage partitions;
所述文本数据包存储模块221可以包括如下子模块:The text packet storage module 221 can include the following sub-modules:
分区存储子模块,用于将所述文本数据包存储至在预置的存储设备230中、所述产生时间对应的存储分区中。And a partition storage submodule, configured to store the text data packet into a storage partition corresponding to the generation time in the preset storage device 230.
对于系统实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。For the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。The various embodiments in the present specification are described in a progressive manner, and each embodiment focuses on differences from other embodiments, and the same similar parts between the various embodiments can be referred to each other.
本领域内的技术人员应明白,本申请实施例的实施例可提供为方法、装置、或计算机程序产品。因此,本申请实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the embodiments of the present application can be provided as a method, apparatus, or computer program product. Therefore, the embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, embodiments of the present application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
在一个典型的配置中,所述计算机设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。计算机可读介质包括永久性和非永久性、可移动和非可移动 媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括非持续性的电脑可读媒体(transitory media),如调制的数据信号和载波。In a typical configuration, the computer device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. The memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory. Memory is an example of a computer readable medium. Computer readable media includes both permanent and non-permanent, removable and non-removable The media can be stored by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-persistent computer readable media, such as modulated data signals and carrier waves.
本申请实施例是参照根据本申请实施例的方法、终端设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal device to produce a machine such that instructions are executed by a processor of a computer or other programmable data processing terminal device Means are provided for implementing the functions specified in one or more of the flow or in one or more blocks of the flow chart.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The instruction device implements the functions specified in one or more blocks of the flowchart or in a flow or block of the flowchart.
这些计算机程序指令也可装载到计算机或其他可编程数据处理终端设备上,使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程终端设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing terminal device such that a series of operational steps are performed on the computer or other programmable terminal device to produce computer-implemented processing, such that the computer or other programmable terminal device The instructions executed above provide steps for implementing the functions specified in one or more blocks of the flowchart or in a block or blocks of the flowchart.
尽管已描述了本申请实施例的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请实施例范围的所有变更和修改。While a preferred embodiment of the embodiments of the present application has been described, those skilled in the art can make further changes and modifications to the embodiments once they are aware of the basic inventive concept. Therefore, the appended claims are intended to be interpreted as including all the modifications and the modifications
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者 终端设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者终端设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。Finally, it should also be noted that in this context, relational terms such as first and second are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities. There is any such actual relationship or order between operations. Furthermore, the terms "include", "comprise," or "include" or "the" or "the" The terminal device includes not only those elements but also other elements not explicitly listed, or elements inherent to such processes, methods, articles or terminal devices. An element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article, or terminal device that comprises the element, without further limitation.
以上对本申请所提供的一种文本数据的存储校验方法和一种文本数据的存储校验系统,进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。 The above describes a storage verification method for text data and a storage verification system for text data provided by the present application, and a detailed example is applied herein to explain the principle and implementation manner of the present application. The description of the embodiments is only for helping to understand the method of the present application and its core ideas; at the same time, for those of ordinary skill in the art, according to the idea of the present application, there will be changes in specific embodiments and application scopes. The above description should not be taken as limiting the present application.

Claims (16)

  1. 一种文本数据的存储校验方法,其特征在于,包括:A storage verification method for text data, comprising:
    一个或多个应用设备将产生的一个或多个文本数据打包成一个或多个文本数据包;所述文本数据包中具有属性信息;The one or more application devices package the generated one or more text data into one or more text data packets; the text data packets have attribute information therein;
    一个或多个传输设备将所述一个或多个文本数据包存储至预置的一个或多个存储设备中,当存储成功时,在预设的统计表格中记录所述属性信息;The one or more transmission devices store the one or more text data packets in a preset one or more storage devices, and when the storage is successful, record the attribute information in a preset statistical table;
    校验设备在接收到存储校验请求时,根据所述统计表格进行存储校验。When the verification device receives the storage verification request, the verification device performs storage verification according to the statistical table.
  2. 根据权利要求1所述的方法,其特征在于,所述一个或多个应用设备将产生的一个或多个文本数据打包成一个或多个文本数据包的步骤包括:The method according to claim 1, wherein the step of the one or more application devices packaging the generated one or more text data into one or more text data packets comprises:
    当产生的文本数据的大小与预设的大小阈值匹配时,将产生的文本数据打包成文本数据包;When the size of the generated text data matches the preset size threshold, the generated text data is packaged into a text data packet;
    或者,or,
    在当前时间超过预设的时间阈值时,将产生的文本数据打包成文本数据包。The generated text data is packaged into a text packet when the current time exceeds a preset time threshold.
  3. 根据权利要求1或2所述的方法,其特征在于,所述统计表格包括第一统计表格,所述第一统计表格包括存储时间;所述传输设备具有传输设备标识;所述属性信息包括应用设备标识、产生时间;The method according to claim 1 or 2, wherein the statistical table comprises a first statistical table, the first statistical table including a storage time; the transmission device has a transmission device identifier; and the attribute information includes an application Equipment identification, production time;
    所述在预设的统计表格中记录所述属性信息的步骤包括:The step of recording the attribute information in a preset statistical table includes:
    查找所述应用设备标识对应的第一统计表格;Finding a first statistical table corresponding to the application device identifier;
    判断是否具有未存储成功的特征文本数据包;所述特征文本数据包的产生时间小于当前存储成功的文本数据包的产生时间;Determining whether there is a feature text data packet that is not successfully stored; the generation time of the feature text data packet is smaller than a generation time of the currently stored text data packet;
    若不具有,则在所述第一统计表格中,将所述产生时间更新至所述传输设备标识和所述应用设备标识对应的存储时间中。If not, in the first statistics table, the generation time is updated to the storage device identifier and the storage time corresponding to the application device identifier.
  4. 根据权利要求1或2所述的方法,其特征在于,所述统计表格包括第二统计表格,所述第二统计表格包括分区时间段、存储行数;所述属性信息包括产生时间、包数据行数;The method according to claim 1 or 2, wherein the statistical table comprises a second statistical table, the second statistical table including a partitioning time period and a storage line number; the attribute information includes a generation time, a packet data Rows;
    所述在预设的统计表格中记录所述属性信息的步骤包括:The step of recording the attribute information in a preset statistical table includes:
    在所述第二统计表格中,查找所述第一统计表格对应的、所述产生时间所属的分区时间段;In the second statistic table, searching for a partition time period corresponding to the first statistic table and the generation time;
    将所述包数据行数累加至所述分区时间段对应的存储行数中。The number of packet data lines is accumulated into the number of storage lines corresponding to the partition time period.
  5. 根据权利要求3所述的方法,其特征在于,所述校验设备根据所述统计表格进 行存储校验的步骤包括:The method according to claim 3, wherein said verification device proceeds according to said statistical table The steps for row storage verification include:
    在所述应用设备标识对应的第一统计表格中,查找值最小的存储时间;In the first statistical table corresponding to the application device identifier, searching for a storage time with a minimum value;
    确认产生时间小于所述存储时间的文本数据包已存储完成。It is confirmed that the text packet whose generation time is less than the storage time has been stored.
  6. 根据权利要求4所述的方法,其特征在于,所述校验设备根据所述统计表格进行存储校验的步骤包括:The method according to claim 4, wherein the step of performing verification verification by the verification device according to the statistical table comprises:
    统计落入指定的校对时间段的分区时间段的存储行数,获得第一总行数。The number of storage lines falling within the partition time period of the specified proofreading period is counted, and the first total number of rows is obtained.
  7. 根据权利要求6所述的方法,其特征在于,所述校验设备根据所述统计表格进行存储校验的步骤还包括:The method according to claim 6, wherein the step of performing verification verification by the verification device according to the statistical table further comprises:
    从所述存储设备中读取在所述校对时间段中存储的文本数据包的第二总行数;Reading, from the storage device, a second total number of lines of text packets stored in the proofreading period;
    当所述第一总行数与所述第二总行数相等时,确认所述校对时间段对应的文本数据包未丢失;When the first total number of rows is equal to the second total number of rows, it is confirmed that the text data packet corresponding to the proofreading period is not lost;
    当所述第一总行数与所述第二总行数不相等时,确认所述校对时间段对应的文本数据包至少部分丢失。When the first total line number is not equal to the second total line number, it is confirmed that the text data packet corresponding to the proofreading time period is at least partially lost.
  8. 根据权利要求1或2或5或6或7所述的方法,其特征在于,所述属性信息包括产生时间,所述存储设备中包括一个或多个存储分区;The method according to claim 1 or 2 or 5 or 6 or 7, wherein the attribute information includes a generation time, and the storage device includes one or more storage partitions;
    所述一个或多个传输设备将所述一个或多个文本数据包存储至预置的一个或多个存储设备中的步骤包括:The step of the one or more transmission devices storing the one or more text data packets in a preset one or more storage devices includes:
    将所述文本数据包存储至在预置的存储设备中、所述产生时间对应的存储分区中。The text data packet is stored in a storage partition corresponding to the generation time in a preset storage device.
  9. 一种文本数据的存储校验系统,其特征在于,所述系统包括一个或多个应用设备、一个或多个传输设备、一个或多个存储设备和校验设备;A storage verification system for text data, characterized in that the system comprises one or more application devices, one or more transmission devices, one or more storage devices and a verification device;
    其中,所述应用设备包括:The application device includes:
    文本数据打包模块,用于将产生的一个或多个文本数据打包成一个或多个文本数据包;所述文本数据包中具有属性信息;a text data packaging module, configured to package the generated one or more text data into one or more text data packets; the text data package has attribute information;
    所述传输设备包括:The transmission device includes:
    文本数据包存储模块,用于将所述一个或多个文本数据包存储至预置的一个或多个存储设备中;a text data packet storage module, configured to store the one or more text data packets into a preset one or more storage devices;
    属性信息记录模块,用于在存储成功时,在预设的统计表格中记录所述属性信息;An attribute information recording module, configured to record the attribute information in a preset statistical table when the storage is successful;
    所述校验设备包括:The verification device includes:
    存储校验模块,用于在接收到存储校验请求时,根据所述统计表格进行存储校验。The storage verification module is configured to perform storage verification according to the statistical table when receiving the storage verification request.
  10. 根据权利要求9所述的系统,其特征在于,所述文本数据打包模块包括: The system according to claim 9, wherein the text data packaging module comprises:
    第一打包子模块,用于在产生的文本数据的大小与预设的大小阈值匹配时,将产生的文本数据打包成文本数据包;a first packaging submodule, configured to package the generated text data into a text data packet when the size of the generated text data matches a preset size threshold;
    或者,or,
    第二打包子模块,用于在当前时间超过预设的时间阈值时,将产生的文本数据打包成文本数据包。The second packaging sub-module is configured to package the generated text data into a text data packet when the current time exceeds a preset time threshold.
  11. 根据权利要求9所述的系统,其特征在于,所述统计表格包括第一统计表格,所述第一统计表格包括存储时间;所述传输设备具有传输设备标识;所述属性信息包括应用设备标识、产生时间;The system according to claim 9, wherein said statistical table comprises a first statistical table, said first statistical table comprising a storage time; said transmission device having a transmission device identifier; said attribute information comprising an application device identifier Time of production;
    所述属性信息记录模块包括:The attribute information recording module includes:
    表格查找子模块,用于查找所述应用设备标识对应的第一统计表格;a table search submodule, configured to search for a first statistical table corresponding to the application device identifier;
    特征文本数据包判断子模块,用于判断是否具有未存储成功的特征文本数据包;若不具有,则调用时间更新子模块;所述特征文本数据包的产生时间小于当前存储成功的文本数据包的产生时间;a feature text data packet judging sub-module, configured to determine whether there is a feature text data packet that is not successfully stored; if not, a time update sub-module is invoked; the feature text data packet is generated less than the currently stored text data packet Time of production;
    时间更新子模块,用于在所述第一统计表格中,将所述产生时间更新至所述传输设备标识和所述应用设备标识对应的存储时间中。And a time update submodule, configured to update the generation time to the storage time corresponding to the transmission device identifier and the application device identifier in the first statistics table.
  12. 根据权利要求9所述的系统,其特征在于,所述统计表格包括第二统计表格,所述第二统计表格包括分区时间段、存储行数;所述属性信息包括产生时间、包数据行数;The system according to claim 9, wherein said statistical table comprises a second statistical table, said second statistical table comprising a partitioning time period, a number of storage lines; said attribute information comprising a generation time, a number of packet data lines ;
    所述属性信息记录模块包括:The attribute information recording module includes:
    分区时间段查找子模块,用于在所述第二统计表格中,查找所述第一统计表格对应的、所述产生时间所属的分区时间段;a partitioning time period searching sub-module, configured to search, in the second statistical table, a partitioning time period corresponding to the first statistical table and the generating time to which the generating time belongs;
    存储行数累加子模块,用于将所述包数据行数累加至所述分区时间段对应的存储行数中。The storage line number accumulation sub-module is configured to accumulate the number of the packet data lines to the number of storage lines corresponding to the partition time period.
  13. 根据权利要求11所述的系统,其特征在于,所述存储校验模块包括:The system of claim 11 wherein said storage verification module comprises:
    存储时间查找子模块,用于在所述应用设备标识对应的第一统计表格中,查找值最小的存储时间;a storage time search submodule, configured to search for a storage time with a minimum value in a first statistical table corresponding to the application device identifier;
    存储完成确认子模块,用于确认产生时间小于所述存储时间的文本数据包已存储完成。The storage completion confirmation sub-module is configured to confirm that the text data packet whose generation time is less than the storage time has been stored.
  14. 根据权利要求12所述的系统,其特征在于,所述存储校验模块包括:The system of claim 12 wherein said storage verification module comprises:
    存储行数统计子模块,用于统计落入指定的校对时间段的分区时间段的存储行数, 获得第一总行数。The storage line number statistics sub-module is configured to count the number of storage lines of the partition time period falling within the specified proofing time period. Get the first total number of rows.
  15. 根据权利要求14所述的系统,其特征在于,所述存储校验模块还包括:The system according to claim 14, wherein the storage verification module further comprises:
    第二总行数读取子模块,用于从所述存储设备中读取在所述校对时间段中存储的文本数据包的第二总行数;a second total line number reading submodule, configured to read, from the storage device, a second total line number of the text data packet stored in the proofreading period;
    未丢失确认子模块,用于在所述第一总行数与所述第二总行数相等时,确认所述校对时间段对应的文本数据包未丢失;The acknowledgment sub-module is configured to: when the first total number of rows is equal to the second total number of rows, confirm that the text data packet corresponding to the proofreading period is not lost;
    丢失确认子模块,用于在所述第一总行数与所述第二总行数不相等时,确认所述校对时间段对应的文本数据包至少部分丢失。And a loss confirmation submodule, configured to confirm that the text data packet corresponding to the proofreading period is at least partially lost when the first total number of rows is not equal to the second total number of rows.
  16. 根据权利要求9或10或13或14或15所述的系统,其特征在于,所述属性信息包括产生时间,所述存储设备中包括一个或多个存储分区;The system according to claim 9 or 10 or 13 or 14 or 15, wherein the attribute information includes a generation time, and the storage device includes one or more storage partitions;
    所述文本数据包存储模块包括:The text packet storage module includes:
    分区存储子模块,用于将所述文本数据包存储至在预置的存储设备中、所述产生时间对应的存储分区中。 And a partition storage submodule, configured to store the text data packet into a storage partition corresponding to the generation time in a preset storage device.
PCT/CN2016/088519 2015-07-14 2016-07-05 Storage checking method and system for text data WO2017008658A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510412446.XA CN106708648B (en) 2015-07-14 2015-07-14 A kind of the storage method of calibration and system of text data
CN201510412446.X 2015-07-14

Publications (1)

Publication Number Publication Date
WO2017008658A1 true WO2017008658A1 (en) 2017-01-19

Family

ID=57756682

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/088519 WO2017008658A1 (en) 2015-07-14 2016-07-05 Storage checking method and system for text data

Country Status (2)

Country Link
CN (1) CN106708648B (en)
WO (1) WO2017008658A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108121645A (en) * 2017-12-25 2018-06-05 深圳市分期乐网络科技有限公司 A kind of daily record method for evaluating quality, device, server and storage medium
CN108512726B (en) * 2018-03-29 2020-06-02 上海连尚网络科技有限公司 Data monitoring method and equipment
CN111435323B (en) * 2019-01-15 2023-06-20 阿里巴巴集团控股有限公司 Information transmission method, device, terminal, server and storage medium
CN112084183A (en) * 2020-09-11 2020-12-15 北京有竹居网络技术有限公司 Data verification method and device, electronic equipment and computer readable medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102413313A (en) * 2010-09-26 2012-04-11 索尼公司 Data integrity authentication information generation method and device as well as data integrity authentication method and device
CN103268460A (en) * 2013-06-20 2013-08-28 北京航空航天大学 Integrity verification method of cloud storage data
CN103401934A (en) * 2013-08-06 2013-11-20 广州唯品会信息科技有限公司 Method and system for acquiring log data
CN103699851A (en) * 2013-11-22 2014-04-02 杭州师范大学 Remote data completeness verification method facing cloud storage
US8799334B1 (en) * 2011-03-30 2014-08-05 Emc Corporation Remote verification of file protections for cloud data storage
CN104202168A (en) * 2014-09-19 2014-12-10 浪潮电子信息产业股份有限公司 Cloud data integrity verification method based on trusted third party

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521361A (en) * 2011-12-15 2012-06-27 北京世纪高通科技有限公司 Distributed type storage method based on massive Sequence File data
WO2014188479A1 (en) * 2013-05-20 2014-11-27 株式会社日立製作所 Storage device and method for controlling storage device
CN104391934B (en) * 2014-11-21 2016-03-16 深圳市银雁金融服务有限公司 Data verification method and device
CN104519133B (en) * 2014-12-24 2018-11-06 刘俊彪 Method and gateway based on Multi-serial port transmission medical detecting Instrument data and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102413313A (en) * 2010-09-26 2012-04-11 索尼公司 Data integrity authentication information generation method and device as well as data integrity authentication method and device
US8799334B1 (en) * 2011-03-30 2014-08-05 Emc Corporation Remote verification of file protections for cloud data storage
CN103268460A (en) * 2013-06-20 2013-08-28 北京航空航天大学 Integrity verification method of cloud storage data
CN103401934A (en) * 2013-08-06 2013-11-20 广州唯品会信息科技有限公司 Method and system for acquiring log data
CN103699851A (en) * 2013-11-22 2014-04-02 杭州师范大学 Remote data completeness verification method facing cloud storage
CN104202168A (en) * 2014-09-19 2014-12-10 浪潮电子信息产业股份有限公司 Cloud data integrity verification method based on trusted third party

Also Published As

Publication number Publication date
CN106708648A (en) 2017-05-24
CN106708648B (en) 2019-11-26

Similar Documents

Publication Publication Date Title
JP6998976B2 (en) Query processing methods, query processing systems, servers and computer-readable media
US10439937B2 (en) Service addressing in distributed environment
US20170083579A1 (en) Distributed data processing method and system
US9391866B1 (en) Method for qualitative analysis of system performance correlation factors
KR102099544B1 (en) Method and device for processing distribution of streaming data
US8719232B2 (en) Systems and methods for data integrity checking
WO2017008658A1 (en) Storage checking method and system for text data
CN111026767B (en) Block chain data storage method and device and hardware equipment
CN106933843B (en) Database heartbeat detection method and device
US8886791B2 (en) Generating alerts based on managed and unmanaged data
CN107219997A (en) A kind of method and device for being used to verify data consistency
CN104346365A (en) Device and method for determining specific service associated logs
CN106547646B (en) Data backup and recovery method and data backup and recovery device
WO2017118318A1 (en) Data storage and service processing method and device
CN111245897B (en) Data processing method, device, system, storage medium and processor
CN114745358A (en) IP address management method, system and controller in load balancing service
CN111913927A (en) Data writing method and device and computer equipment
CN114237989B (en) Database service deployment and disaster tolerance method and device
CN112000657A (en) Data management method, device, server and storage medium
CN108023920B (en) Data packet transmission method, equipment and application interface
CN115756955A (en) Data backup and data recovery method and device and computer equipment
CN106888244A (en) A kind of method for processing business and device
CN111275348A (en) Electronic order information processing method, server and electronic order information processing system
CN110688350B (en) Method and device for storing logs
CN117093639B (en) Socket connection processing method and system based on audit service

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16823805

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16823805

Country of ref document: EP

Kind code of ref document: A1