CN111563017A - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN111563017A
CN111563017A CN202010348381.8A CN202010348381A CN111563017A CN 111563017 A CN111563017 A CN 111563017A CN 202010348381 A CN202010348381 A CN 202010348381A CN 111563017 A CN111563017 A CN 111563017A
Authority
CN
China
Prior art keywords
log
stream
identifier
storage space
log stream
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010348381.8A
Other languages
Chinese (zh)
Other versions
CN111563017B (en
Inventor
王林强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Douyin Vision Co Ltd
Douyin Vision Beijing Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN202010348381.8A priority Critical patent/CN111563017B/en
Publication of CN111563017A publication Critical patent/CN111563017A/en
Application granted granted Critical
Publication of CN111563017B publication Critical patent/CN111563017B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The disclosure provides a data processing method and device. In the method, a logic storage space is set for all log streams in one log file, after the log record of a certain log stream is obtained, the obtained log record is stored in the logic storage space corresponding to the log stream, namely, the log records corresponding to all the log streams managed by the log storage system are stored in the same log file of a disk, compared with the prior art that the log records corresponding to each log stream managed by the log storage system are respectively stored in one log file, the method and the device realize the additional write operation of the log, can effectively improve the log write-in efficiency of the log storage system, and improve the throughput capacity of the log storage system.

Description

Data processing method and device
Technical Field
The present disclosure relates to the field of log storage processing, and in particular, to a data processing method and apparatus.
Background
At present, distributed storage systems are developed vigorously, and in order to improve stability and reliability of the systems, a plurality of data backups, i.e., a plurality of copies, are provided in the distributed storage systems, so that data stored by each copy is guaranteed to be consistent in the distributed storage systems. Currently, a consistency algorithm is often used to ensure that data in each copy remains consistent. For example, with a common consistency protocol Raft, a Leader node is elected in the distributed storage system, and the Leader node is responsible for managing log replication to achieve consistency of multiple copies.
When the above data consistency algorithm is implemented, a Write-ahead logging (WAL) system is relied on, which is a standard method for ensuring data integrity. In order to ensure the integrity of system data in a distributed storage system, operations on the system are often written into the WAL, and then the throughput performance of the WAL directly determines the write performance of the entire distributed storage system, so how to improve the throughput of the WAL is a technical problem to be solved at present.
Disclosure of Invention
In view of the above, the present disclosure provides at least a data processing method and apparatus.
In a first aspect, the present disclosure provides a data processing method, including:
respectively acquiring a plurality of log records corresponding to each log stream in a log storage system;
determining a logic storage space corresponding to each log stream; wherein, the logic storage space corresponding to all log streams belongs to a log file;
and respectively storing the plurality of log records corresponding to each log stream into corresponding logic storage spaces.
In a possible implementation, the determining a logical storage space corresponding to each log stream includes:
respectively setting a first identifier for identifying the logic storage space of each log stream for each log stream;
establishing a mapping relation between a first identifier corresponding to each log stream and a logic storage space of the log stream;
the respectively storing the plurality of log records corresponding to each log stream into corresponding logical storage spaces includes:
acquiring a first identifier corresponding to each log stream;
for each log stream, identifying each log record corresponding to the log stream by using the first identifier of the log stream.
In a possible implementation manner, the respectively storing the plurality of log records corresponding to each log stream into a corresponding logical storage space further includes:
for each log stream, performing aggregation processing on the plurality of log records corresponding to the log stream to obtain a log block corresponding to the log stream;
and storing the log blocks corresponding to the log streams into the log file.
In a possible implementation manner, the respectively storing the plurality of log records corresponding to each log stream into a corresponding logical storage space further includes:
performing aggregation processing on the log blocks corresponding to the log streams to obtain log packets;
and storing the log packet into the log file.
In a possible implementation, the data processing method further includes:
determining a second identifier for each log block and identifying each log block with the second identifier;
for each log record, determining a third identifier of the log record based on a generation order of the log record in a log stream to which the log record belongs;
identifying a corresponding log record using the third identifier.
In a possible implementation, the data processing method further includes:
receiving a new log block needing to be written;
determining a target log stream to which the new log block belongs;
determining a largest second identifier in the log blocks corresponding to the target log stream stored in the log storage system;
and if the second identifier of the new log block is smaller than the maximum second identifier, setting the log block of which the second identifier is greater than or equal to the second identifier of the new log block in the log blocks corresponding to the target log stream stored in the log storage system as a failure state.
In a possible implementation, the data processing method further includes:
receiving a new log block needing to be written;
determining a target log stream to which the new log block belongs;
determining a largest second identifier in the log blocks corresponding to the target log stream stored in the log storage system;
and if the second identifier of the new log block is larger than the maximum second identifier, setting the log block corresponding to the target log stream stored in the log storage system to be in a failure state.
In a second aspect, the present disclosure provides a data processing apparatus comprising:
the log acquisition module is used for respectively acquiring a plurality of log records corresponding to each log stream in the log storage system;
the storage determining module is used for determining a logic storage space corresponding to each log stream; wherein, the logic storage space corresponding to all log streams belongs to a log file;
and the storage module is used for respectively storing the plurality of log records corresponding to each log stream into the corresponding logic storage space.
In a possible implementation manner, the storage determination module, when determining the logical storage space corresponding to each log stream, is configured to:
respectively setting a first identifier for identifying the logic storage space of each log stream for each log stream;
establishing a mapping relation between a first identifier corresponding to each log stream and a logic storage space of the log stream;
when the storage module respectively stores the plurality of log records corresponding to each log stream into the corresponding logical storage space, the storage module is configured to:
acquiring a first identifier corresponding to each log stream;
for each log stream, identifying each log record corresponding to the log stream by using the first identifier of the log stream.
In a third aspect, the present disclosure provides an electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the data processing method as described above.
In a fourth aspect, the present disclosure also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the data processing method as described above.
The above-described apparatus, electronic device, and computer-readable storage medium of the present disclosure contain at least technical features that are substantially the same as or similar to the technical features of any aspect or any implementation of any aspect of the above-described method of the present disclosure.
The present disclosure provides a data processing method and apparatus, wherein, a plurality of log records corresponding to each log stream in a log storage system are respectively obtained; then, determining a logic storage space corresponding to each log stream; wherein, the logic storage space corresponding to all log streams belongs to a log file; and finally, respectively storing the plurality of log records corresponding to each log stream into corresponding logic storage spaces. In the method, the logic storage space is respectively set for all log streams in one log file, after the log record of a certain log stream is obtained, the obtained log record is stored in the logic storage space corresponding to the log stream, namely, the log records corresponding to all the log streams managed by the log storage system are stored in the same log file of a disk, compared with the prior art that the log record corresponding to each log stream managed by the log storage system is respectively stored in one log file, the method realizes the additional write operation of the log record, can effectively improve the log write efficiency of the log storage system, and improves the throughput capacity of the log storage system.
Drawings
To more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present disclosure and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings may be obtained from the drawings without inventive effort.
Fig. 1 shows a flowchart of a data processing method provided by an embodiment of the present disclosure;
FIG. 2 is a schematic diagram illustrating a structure of a journal storage system provided by an embodiment of the present disclosure;
FIG. 3 is a flow chart illustrating another data processing method provided by the embodiments of the present disclosure;
FIG. 4 illustrates a flow chart for writing a log record into a log file according to an embodiment of the present disclosure;
FIG. 5 illustrates a flowchart provided by an embodiment of the present disclosure for specifically writing log records into a log file;
FIG. 6 illustrates a flow chart for setting an identifier provided by an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of a data processing apparatus provided in an embodiment of the present disclosure;
fig. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure;
fig. 9 shows a schematic structure of a log storage system in the prior art.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it should be understood that the drawings in the present disclosure are for illustrative and descriptive purposes only and are not used to limit the scope of the present disclosure. Additionally, it should be understood that the schematic drawings are not necessarily drawn to scale. The flowcharts used in this disclosure illustrate operations implemented according to some embodiments of the present disclosure. It should be understood that the operations of the flow diagrams may be performed out of order, and steps without logical context may be performed in reverse order or simultaneously. In addition, one skilled in the art, under the direction of the present disclosure, may add one or more other operations to the flowchart, and may remove one or more operations from the flowchart.
In addition, the described embodiments are only a few embodiments of the present disclosure, not all embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.
It is to be noted that the term "comprising" will be used in the disclosed embodiments to indicate the presence of the features stated hereinafter, but does not exclude the addition of further features.
As shown in fig. 9, in the log storage system in the prior art, a large number of shards are managed on each physical machine, and log records generated by each shard form a log stream. Because each fragment is independent, a WAL storage system is deployed for each fragment, and thus, the log stream generated by each fragment can be additionally written into the log file of the fragment. The log file is used by each log stream to store the log, which causes the defect of reduced throughput of the log storage system. In view of the above drawbacks, the present disclosure provides a data processing method and apparatus. The method and the device have the advantages that the logic storage spaces are respectively set for all log streams in one log file, after the log records of a certain log stream are obtained, the obtained log records are stored in the logic storage space corresponding to the log stream, namely, the log records corresponding to all the log streams managed by the log storage system are stored in the same log file of a disk, compared with the prior art that the log records corresponding to each log stream managed by the log storage system are respectively stored in one log file, the method and the device realize the additional write operation of the log, can effectively improve the log write-in efficiency of the log storage system, and improve the throughput capacity of the log storage system.
The following describes the data processing method and apparatus of the present application in detail by using specific embodiments.
The embodiment of the present disclosure provides a data processing method, which is applied to a terminal device for performing log storage, and specifically, as shown in fig. 1, the method may include the following steps:
s110, respectively obtaining a plurality of log records corresponding to each log stream in the log storage system.
Here, each log stream corresponds to one distributed consistency raft instance, that is, the log records generated by each raft instance form one log stream, and the log storage system may be a WAL system. In particular, a journal storage system is deployed on a disk. In this embodiment, a log file is set in a log storage system, and log records corresponding to all log streams are stored in the same log file.
In this step, the log record corresponding to each log stream is obtained, in practical application, one or more log records corresponding to all log streams may be obtained, or one or more log records corresponding to only a part of log streams may be obtained, and for log streams for which no log record is obtained, the obtained result is null.
S120, determining a logic storage space corresponding to each log stream; and the logic storage space corresponding to all log streams belongs to one log file.
Here, in the log file of the log storage system, a logical storage space for storing log records in the corresponding log stream is set for each log stream.
In a specific implementation, one raft instance corresponds to one log stream, and then this step is to set a logical storage space in a log file of the log storage system for each raft instance, and log records generated by the raft instance are stored in the corresponding logical storage space. Log records generated by all the raft instances are stored in the same log file of the log storage system.
The logical storage space is a logically continuous storage space, and is not necessarily a continuous storage area actually existing in the log file, that is, the log file is not partitioned to store log records generated by each raft instance in the present disclosure. In specific implementation, an identifier is respectively set for a logical storage space corresponding to each raft instance, and the identifier corresponds to a log stream generated by the corresponding raft instance, and the identifier is used for identifying log records in the log stream generated by the corresponding raft instance. For example, a first identifier is set for a logical storage space corresponding to a certain raft instance, the logical storage space corresponding to the raft instance is identified by the first identifier, and a log record in a log stream generated by the raft instance can identify the first identifier. In this way, the log records generated by the raft instance all identify the first identifier, and after being stored in the log file, all the log records identified with the first identifier belong to the logical storage space corresponding to the first identifier.
S130, respectively storing the plurality of log records corresponding to each log stream into corresponding logic storage spaces.
In the above step 120, a first identifier for identifying the logical storage space of each log stream is set for each log stream; and respectively establishing mapping relations among each log stream, the first identifier corresponding to each log stream and the logic storage space corresponding to each log stream. In this step, the following sub-steps may be specifically utilized to store the log record into the corresponding logical storage space: acquiring a first identifier corresponding to each log stream; for each log stream, identifying each log record corresponding to the log stream by using the first identifier of the log stream.
In practical applications, for example, in a Range or Hash-based distributed key value storage system, a large number of shards (Range or Tablet) are managed on each physical machine, all shards share the same log storage system WalStore, and the sharing of the same log storage system WalStore by multiple shards is essentially the additional writing of the same log file. As shown in fig. 2, log records in all fragmented log streams WalStream are stored in corresponding logical storage spaces (logical WALs), respectively, and all fragmented log streams are stored in the same log file. And if a log file is added to a plurality of fragments together, all writes to the disk are written sequentially, and the write performance of the disk can be fully exerted.
The data processing method of the above embodiment is explained below by a specific embodiment. As shown in FIG. 3, a journaling storage system is a WAL system, i.e., a WAL instance, which includes a plurality of journaling streams WalStream, a WalStram being exclusively shared by a raft instance. When a plurality of WalStreams write data, each WalStream writes the data into a Buffer in the WalStream, then a unified dispatcher takes out the data in all the buffers and combines the data, then the data is added into the same written log file of a disk to which a log storage system belongs, and an index is established for the data in a memory so as to accelerate the speed of the data writing.
The Buffer can be identified by the identifier stream-uuid-N of the corresponding log stream. The log file written with the log is n.log, and the corresponding index is written into n.index. Because the storage capacity of the log files in the disk is preset, the log files cannot be written continuously after being written fully along with the writing, and at this time, the logs in the log streams WalStream corresponding to all the raft instances can be written into another log file at the same time. After each log file is additionally written, the index is written into the disk. When a Stream reads data, it locates the data according to the file-level index and the block-level index.
When a physical disk is used to store a log stream generated by multiple raft instances, the individual raft instances are written to the disk sequentially and eventually are not actually written to the disk sequentially. In the embodiment, the plurality of raft instances share the same WAL instance to achieve real sequential writing into the disk, so that the writing performance of the disk is fully exerted. In the case of sharing the WAL instance, the WAL instance provides an own independent logical storage space for each raft instance, and each raft instance is also sequentially written in the own logical storage space, so that the writing performance of the disk can be further exerted.
In some embodiments, as shown in fig. 4, the following steps may be utilized to perform the write operation of the log record, so as to fully utilize the performance of the disk, increase the valid data of each write, and increase the payload ratio:
s410, for each log stream, carrying out aggregation processing on the plurality of log records corresponding to the log stream to obtain a log block corresponding to the log stream.
Here, the aggregation processing of the log is to aggregate log records from different log streams and store the aggregated log records in a single log block, so as to facilitate searching and analysis.
As shown in fig. 5, each write thread corresponds to one raft instance, the write thread first stores the log Record generated by the corresponding raft instance into the corresponding cache Buffer, and then performs aggregation processing on the multiple log records in each cache Buffer to obtain three log blocks Span.
And S420, storing the log block corresponding to each log stream into the log file.
In order to improve the storage efficiency of log records, the log blocks corresponding to the log streams can be aggregated to obtain log packets, and the log packets are stored in the log file.
Here, the aggregation processing of the log blocks is to aggregate the log blocks from different log streams and store the aggregated log blocks in a single log packet, so as to facilitate searching and analysis.
As shown in fig. 5, each log block may be used as a log batch RecordBatch, and multiple spans are aggregated to obtain a log packet, which may be used as a block batch SpanBatch.
The log batch RecordBatch contains a number of log records or messages, where messages and log records are equivalent, and log batches and log collections or message collections are equivalent. The chunk batch SpanBatch described above includes several log chunks.
After obtaining the log packets, an index may be set for each log packet and then stored in the log file.
In the above embodiment, the log records are aggregated for multiple times, and then submitted in batch, and only one read/write operation is needed to sequentially trace the data to the log file, so that the IOPS (Input/Output operation per Second) of the disk can be sufficiently improved, and the write performance of the disk can be explored by using the bandwidth of the disk. If instead of using such a batch commit method, each Record writes to disk once, the iops of the disk is limited, so that very little data is written to the disk per unit time.
In order to implement the mark deletion, i.e. logical deletion, of the log record in the log storage system, for example, implementing the Truncate operation of Raft, as shown in fig. 6, the data processing method may further include the following steps:
s610, determining a second identifier of each log block, and identifying each log block by using the second identifier.
In the above embodiment, the first identifier has been set for each log stream, for example, stream _ uuid is used to represent the first identifier, and then log records belonging to the same stream _ uuid logically belong to the same space. Although a plurality of log records identified by stream _ uuid are stored interleaved on the disk, logical space discrimination can be performed by stream _ uuid. Here, a second identifier index is set for each log block in each log stream, and the second identifier can be used to identify the corresponding log block and each log record in the log block, so that multiple log records may exist in a logical storage space corresponding to the same stream _ uuid and have the same index.
In a specific implementation, the second identifier index is an id that can be provided to the Raft protocol for use, and is a log block that can be rolled back or advanced to a certain index.
S620, for each log record, determining a third identifier of the log record based on the generation sequence of the log in the log stream to which the log belongs.
The multiple log records in the logical storage space corresponding to the same stream _ uuid have the same index, but the logical storage space provided for the raft instance does not allow the multiple log records with the same index to be read simultaneously, so in this step, according to the generation order in the log stream to which the log records belong, a third identifier seq _ id is generated for each log record, which is a sequence number that monotonically increases by 1, and by recording which seq _ ids have been deleted, the purpose of allowing data to be logically deleted in the sequential writing model can be finally achieved.
S630, identifying the corresponding log record by using the third identifier.
The above-described embodiment sets an identifier for each log block and log record in each log block, and the identifier can be used to logically delete some logs. Specifically, if the second identifier of the new log block received by the log storage system is smaller than the largest second identifier in the log blocks corresponding to the log streams to which the new log block belongs, which have already been stored in the log storage system, at this time, the log blocks in the log storage system may be marked for deletion or logically deleted by using the following steps:
determining a target log stream to which the new log block belongs; determining a largest second identifier in the log blocks corresponding to the target log stream stored in the log storage system; and setting the log blocks which have been stored in the log storage system and have the second identifier larger than or equal to the second identifier of the new log block in the log blocks corresponding to the target log stream as a failure state.
In a specific application, a certain machine finds that the machine is not a host, and a second identifier of a log block in a target log stream stored in the host is smaller than a second identifier of a log block in a target log stream stored in the machine, and at this time, the machine needs to set a log block, of which the second identifier is greater than or equal to the second identifier of a newly received log block, in the target log stream stored in the machine to a failure state, and perform log rewriting starting from the second identifier of the newly received log block.
If the second identifier of the new log block received by the log storage system is greater than the largest second identifier in the log blocks corresponding to the log stream to which the new log block belongs, which have already been stored in the log storage system, at this time, the log block in the log storage system may be marked for deletion or logically deleted by using the following steps:
determining a target log stream to which the new log block belongs; and setting the log block corresponding to the target log stream stored in the log storage system to be in a failure state.
Corresponding to the data processing method, the embodiment of the present disclosure further provides a data processing apparatus, where the apparatus is applied to a terminal device that stores and processes a log, and the apparatus and each module thereof can perform the same method steps as the data processing method and can achieve the same or similar beneficial effects, and therefore, repeated parts are not described again.
As shown in fig. 7, the present disclosure provides a data processing apparatus including:
a log obtaining module 710, configured to obtain multiple log records corresponding to each log stream in the log storage system respectively;
a storage determining module 720, configured to determine a logical storage space corresponding to each log stream; wherein, the logic storage space corresponding to all log streams belongs to a log file;
the storage module 730 is configured to store the plurality of log records corresponding to each log stream into corresponding logical storage spaces, respectively.
In some embodiments, the storage determination module 720, when determining the logical storage space corresponding to each log stream, is configured to:
respectively setting a first identifier for identifying the logic storage space of each log stream for each log stream;
respectively establishing a mapping relation among each log stream, a first identifier corresponding to each log stream and a logic storage space corresponding to each log stream;
when the storage module 730 stores the plurality of log records corresponding to each log stream into the corresponding logical storage space, respectively, the storage module is configured to:
acquiring a first identifier corresponding to each log stream;
for each log stream, identifying each log record corresponding to the log stream by using the first identifier of the log stream.
An embodiment of the present disclosure discloses an electronic device, as shown in fig. 8, including: a processor 801, a memory 802, and a bus 803, the memory 802 storing machine readable instructions executable by the processor 801, the processor 801 communicating with the memory 802 via the bus 803 when the electronic device is in operation.
The machine readable instructions, when executed by the processor 801, perform the steps of the data processing method of:
respectively acquiring a plurality of log records corresponding to each log stream in a log storage system;
determining a logic storage space corresponding to each log stream; wherein, the logic storage space corresponding to all log streams belongs to a log file;
and respectively storing the plurality of log records corresponding to each log stream into corresponding logic storage spaces.
In addition, when the machine readable instructions are executed by the processor 801, the method contents in any embodiment described in the above method part may also be executed, and are not described herein again.
A computer program product corresponding to the method and the apparatus provided in the embodiments of the present disclosure includes a computer readable storage medium storing a program code, where instructions included in the program code may be used to execute the method in the foregoing method embodiments, and specific implementation may refer to the method embodiments, which is not described herein again.
The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to one another, which are not repeated herein for brevity.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to corresponding processes in the method embodiments, and are not described in detail in this disclosure. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and there may be other divisions in actual implementation, and for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some communication interfaces, and may be in an electrical, mechanical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
The above are only specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present disclosure, and shall be covered by the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (11)

1. A data processing method, comprising:
respectively acquiring a plurality of log records corresponding to each log stream in a log storage system;
determining a logic storage space corresponding to each log stream; wherein, the logic storage space corresponding to all log streams belongs to a log file;
and respectively storing the plurality of log records corresponding to each log stream into corresponding logic storage spaces.
2. The data processing method of claim 1, wherein the determining the logical storage space corresponding to each log stream comprises:
respectively setting a first identifier for identifying the logic storage space of each log stream for each log stream;
respectively establishing a mapping relation among each log stream, a first identifier corresponding to each log stream and a logic storage space corresponding to each log stream;
the respectively storing the plurality of log records corresponding to each log stream into corresponding logical storage spaces includes:
acquiring a first identifier corresponding to each log stream;
for each log stream, identifying each log record corresponding to the log stream by using the first identifier of the log stream.
3. The data processing method according to claim 2, wherein the storing the plurality of log records corresponding to each log stream into the corresponding logical storage space respectively further comprises:
for each log stream, performing aggregation processing on the plurality of log records corresponding to the log stream to obtain a log block corresponding to the log stream;
and storing the log blocks corresponding to the log streams into the log file.
4. The data processing method according to claim 3, wherein the storing the plurality of log records corresponding to each log stream into the corresponding logical storage space respectively further comprises:
performing aggregation processing on the log blocks corresponding to the log streams to obtain log packets;
and storing the log packet into the log file.
5. The data processing method of claim 3, further comprising:
determining a second identifier for each log block and identifying each log block with the second identifier;
for each log record, determining a third identifier of the log record based on a generation order of the log record in a log stream to which the log record belongs;
identifying a corresponding log record using the third identifier.
6. The data processing method of claim 5, further comprising:
receiving a new log block needing to be written;
determining a target log stream to which the new log block belongs;
determining a largest second identifier in the log blocks corresponding to the target log stream stored in the log storage system;
and if the second identifier of the new log block is smaller than the maximum second identifier, setting the log block of which the second identifier is greater than or equal to the second identifier of the new log block in the log blocks corresponding to the target log stream stored in the log storage system as a failure state.
7. The data processing method of claim 5, further comprising:
receiving a new log block needing to be written;
determining a target log stream to which the new log block belongs;
determining a largest second identifier in the log blocks corresponding to the target log stream stored in the log storage system;
and if the second identifier of the new log block is larger than the maximum second identifier, setting the log block corresponding to the target log stream stored in the log storage system to be in a failure state.
8. A data processing apparatus, comprising:
the log acquisition module is used for respectively acquiring a plurality of log records corresponding to each log stream in the log storage system;
the storage determining module is used for determining a logic storage space corresponding to each log stream; wherein, the logic storage space corresponding to all log streams belongs to a log file;
and the storage module is used for respectively storing the plurality of log records corresponding to each log stream into the corresponding logic storage space.
9. The data processing apparatus of claim 8, wherein the storage determination module, when determining the logical storage space corresponding to each log stream, is configured to:
respectively setting a first identifier for identifying the logic storage space of each log stream for each log stream;
respectively establishing a mapping relation among each log stream, a first identifier corresponding to each log stream and a logic storage space corresponding to each log stream;
when the storage module respectively stores the plurality of log records corresponding to each log stream into the corresponding logical storage space, the storage module is configured to:
acquiring a first identifier corresponding to each log stream;
for each log stream, identifying each log record corresponding to the log stream by using the first identifier of the log stream.
10. An electronic device, comprising: a processor, a storage medium and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, when the electronic device runs, the processor and the storage medium communicate through the bus, and the processor executes the machine-readable instructions to execute the data processing method according to any one of claims 1 to 7.
11. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, performs a data processing method according to any one of claims 1 to 7.
CN202010348381.8A 2020-04-28 2020-04-28 Data processing method and device Active CN111563017B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010348381.8A CN111563017B (en) 2020-04-28 2020-04-28 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010348381.8A CN111563017B (en) 2020-04-28 2020-04-28 Data processing method and device

Publications (2)

Publication Number Publication Date
CN111563017A true CN111563017A (en) 2020-08-21
CN111563017B CN111563017B (en) 2023-05-16

Family

ID=72071728

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010348381.8A Active CN111563017B (en) 2020-04-28 2020-04-28 Data processing method and device

Country Status (1)

Country Link
CN (1) CN111563017B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112486777A (en) * 2020-12-11 2021-03-12 深圳前瞻资讯股份有限公司 Big data service program log processing method and system
CN115185787A (en) * 2022-09-06 2022-10-14 北京奥星贝斯科技有限公司 Method and device for processing transaction log

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130332435A1 (en) * 2012-06-12 2013-12-12 Microsoft Corporation Partitioning optimistic concurrency control and logging
CN107408070A (en) * 2014-12-12 2017-11-28 微软技术许可有限责任公司 More transaction journals in distributed memory system
CN109347899A (en) * 2018-08-22 2019-02-15 北京百度网讯科技有限公司 The method of daily record data is written in distributed memory system
CN109446174A (en) * 2018-10-30 2019-03-08 东软集团股份有限公司 Logdata record method, apparatus and computer readable storage medium
CN109508246A (en) * 2018-06-25 2019-03-22 广州多益网络股份有限公司 Log recording method, system and computer readable storage medium
CN109508144A (en) * 2018-08-30 2019-03-22 郑州云海信息技术有限公司 A kind of log processing method and relevant apparatus
CN109542733A (en) * 2018-12-05 2019-03-29 焦点科技股份有限公司 A kind of highly reliable real-time logs collection and visual m odeling technique method
CN110209642A (en) * 2018-02-05 2019-09-06 北京智明星通科技股份有限公司 Method, apparatus, server and the computer-readable medium of information processing
US20190370088A1 (en) * 2018-06-01 2019-12-05 Micron Technology, Inc. Event logging in a multi-core system
WO2020060621A1 (en) * 2018-09-21 2020-03-26 Microsoft Technology Licensing, Llc Log destaging from fixed-size log portion

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130332435A1 (en) * 2012-06-12 2013-12-12 Microsoft Corporation Partitioning optimistic concurrency control and logging
CN107408070A (en) * 2014-12-12 2017-11-28 微软技术许可有限责任公司 More transaction journals in distributed memory system
CN110209642A (en) * 2018-02-05 2019-09-06 北京智明星通科技股份有限公司 Method, apparatus, server and the computer-readable medium of information processing
US20190370088A1 (en) * 2018-06-01 2019-12-05 Micron Technology, Inc. Event logging in a multi-core system
CN109508246A (en) * 2018-06-25 2019-03-22 广州多益网络股份有限公司 Log recording method, system and computer readable storage medium
CN109347899A (en) * 2018-08-22 2019-02-15 北京百度网讯科技有限公司 The method of daily record data is written in distributed memory system
CN109508144A (en) * 2018-08-30 2019-03-22 郑州云海信息技术有限公司 A kind of log processing method and relevant apparatus
WO2020060621A1 (en) * 2018-09-21 2020-03-26 Microsoft Technology Licensing, Llc Log destaging from fixed-size log portion
CN109446174A (en) * 2018-10-30 2019-03-08 东软集团股份有限公司 Logdata record method, apparatus and computer readable storage medium
CN109542733A (en) * 2018-12-05 2019-03-29 焦点科技股份有限公司 A kind of highly reliable real-time logs collection and visual m odeling technique method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴志力;陈希;杨世登;: "基于分布式流计算的运维安全分析", 网络安全技术与应用 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112486777A (en) * 2020-12-11 2021-03-12 深圳前瞻资讯股份有限公司 Big data service program log processing method and system
CN115185787A (en) * 2022-09-06 2022-10-14 北京奥星贝斯科技有限公司 Method and device for processing transaction log

Also Published As

Publication number Publication date
CN111563017B (en) 2023-05-16

Similar Documents

Publication Publication Date Title
US10706072B2 (en) Data replication method and storage system
CN107948334B (en) Data processing method based on distributed memory system
US7886120B1 (en) System and method for efficient backup using hashes
US8996830B1 (en) System and method for efficient backup using hashes
US9547706B2 (en) Using colocation hints to facilitate accessing a distributed data storage system
WO2017020576A1 (en) Method and apparatus for file compaction in key-value storage system
US10540240B2 (en) Method and apparatus for data backup in storage system
CN113626431A (en) LSM tree-based key value separation storage method and system for delaying garbage recovery
CN111563017A (en) Data processing method and device
CN111857603B (en) Data processing method and related device
CN104583966A (en) Backup and restore system for a deduplicated file system and corresponding server and method
CN112328697A (en) Data synchronization method based on big data
CN113885809B (en) Data management system and method
CN104965835A (en) Method and apparatus for reading and writing files of a distributed file system
CN114428764A (en) File writing method, system, electronic device and readable storage medium
CN116501264B (en) Data storage method, device, system, equipment and readable storage medium
CN115509808B (en) Data backup method, device, computer equipment and storage medium
CN113806803B (en) Data storage method, system, terminal equipment and storage medium
CN111966845B (en) Picture management method, device, storage node and storage medium
CN112988034B (en) Distributed system data writing method and device
EP4290391A1 (en) Data compression method and apparatus
WO2024183421A1 (en) Data disaster recovery method, apparatus and system, node device, and standby node device
CN115080239A (en) Data processing method, device, equipment and storage medium
CN115687170A (en) Data processing method, storage device and system
CN118585568A (en) Automatic driving platform data sharing method, system and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee after: Douyin Vision Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee before: Tiktok vision (Beijing) Co.,Ltd.

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee after: Tiktok vision (Beijing) Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee before: BEIJING BYTEDANCE NETWORK TECHNOLOGY Co.,Ltd.