CN111782134A - Data processing method, device, system and computer readable storage medium - Google Patents

Data processing method, device, system and computer readable storage medium Download PDF

Info

Publication number
CN111782134A
CN111782134A CN201910515339.8A CN201910515339A CN111782134A CN 111782134 A CN111782134 A CN 111782134A CN 201910515339 A CN201910515339 A CN 201910515339A CN 111782134 A CN111782134 A CN 111782134A
Authority
CN
China
Prior art keywords
data
memory
written
packet
group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910515339.8A
Other languages
Chinese (zh)
Other versions
CN111782134B (en
Inventor
孙健
贺伟
刘海锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201910515339.8A priority Critical patent/CN111782134B/en
Publication of CN111782134A publication Critical patent/CN111782134A/en
Application granted granted Critical
Publication of CN111782134B publication Critical patent/CN111782134B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The disclosure relates to a data processing method, a data processing device, a data processing system and a computer readable storage medium, and relates to the field of data storage. The method of the present disclosure comprises: receiving a plurality of concurrent data writing requests, wherein each data writing request comprises a group of data to be written; acquiring the topological information of a packet with the current state being a writing state; the topology information comprises address information of each memory fragment in the packet, and different packets are divided according to a time sequence; and distributing each group of data to be written to different storage fragments in the group according to the topology information, so that each storage engine corresponding to each distributed storage fragment can write each group of data to be written into the corresponding storage fragment.

Description

Data processing method, device, system and computer readable storage medium
Technical Field
The present disclosure relates to the field of data storage, and in particular, to a data processing method, apparatus, system, and computer-readable storage medium.
Background
With the development of internet technology, the data volume presents an explosive growth characteristic, and the storage and management of data become more important.
The time-series data is data recorded in chronological order, for example, log data. The time sequence data has the characteristics of continuous high concurrent writing, clear cold and hot data, repeated data and the like.
Disclosure of Invention
One technical problem to be solved by the present disclosure is: the method for storing the time sequence data is provided, and the characteristic of continuous high concurrent writing of the time sequence data is better responded.
According to some embodiments of the present disclosure, there is provided a data processing method including: receiving a plurality of concurrent data writing requests, wherein each data writing request comprises a group of data to be written; acquiring the topological information of a packet with the current state being a writing state; the topology information comprises address information of each memory fragment in the packet, and different packets are divided according to a time sequence; and distributing each group of data to be written to different storage fragments in the group according to the topology information, so that each storage engine corresponding to each distributed storage fragment can write each group of data to be written into the corresponding storage fragment.
In some embodiments, assigning each group of data to be written to a different memory slice in the group comprises: aiming at a group of data to be written, calculating the hash value of the group of data to be written according to the identification information of the group of data to be written; and determining the corresponding storage fragment according to the hash value of the group of data to be written.
In some embodiments, in the case that a packet is written full, the current state of the packet is modified to a read-only state, the packet is configured with a corresponding time window, and the range of the time window is determined according to the creation time and the write-full time; and another packet is created and set to the write state.
In some embodiments, data in a packet is deleted in the event that the retention period of the packet exceeds a preset period, the retention period being timed from the time the current state of the packet is modified to a read-only state.
In some embodiments, the data to be written is written to the corresponding memory slice after being compressed according to a string compression technique that includes converting the data to be written to an enumeration.
In some embodiments, the topology information further comprises: version information of each memory slice; the method further comprises the following steps: sending the version information of the memory fragments to corresponding memory fragments so that the memory fragments can verify the version information; and acquiring correct topology information of the packet from the control center in response to receiving the information that the version information returned by the memory fragments is wrong.
In some embodiments, the method further comprises: receiving a data query request sent by a client; acquiring corresponding grouped topology information according to the data query request; sending a data query request to each memory fragment in the corresponding group according to the topology information so that each memory engine corresponding to each memory fragment can respectively acquire data to be queried; and receiving the data to be queried returned by each memory fragment, merging the data to be queried, and returning the data to the client.
In some embodiments, obtaining topology information of a corresponding packet according to the data query request includes: under the condition that the data query request comprises a time range, acquiring the topological information of a group corresponding to the time range; in the case where the data query request does not include a time range, topology information of all packets is acquired.
According to further embodiments of the present disclosure, there is provided a data processing apparatus including: the device comprises a request receiving module, a data writing module and a data writing module, wherein the request receiving module is used for receiving a plurality of concurrent data writing requests, and each data writing request comprises a group of data to be written; the packet information acquisition module is used for acquiring the topology information of the packet with the current state being the writing state; the topology information comprises address information of each memory fragment in the packet, and different packets are divided according to a time sequence; and the data distribution module is used for distributing each group of data to be written to different storage fragments in the group according to the topology information so that each storage engine corresponding to each distributed storage fragment can write each group of data to be written into the corresponding storage fragment.
In some embodiments, the data allocation module is configured to calculate, for a set of data to be written, a hash value of the set of data to be written according to identification information of the set of data to be written; and determining the corresponding storage fragment according to the hash value of the group of data to be written.
In some embodiments, the topology information further comprises: version information of each memory slice; the device also includes: the information updating module is used for sending the version information of the memory fragments to the corresponding memory fragments so that the memory fragments can verify the version information; and acquiring correct topology information of the packet from the control center in response to receiving the information that the version information returned by the memory fragments is wrong.
In some embodiments, the request receiving module is further configured to receive a data query request sent by a client; the grouping information acquisition module is also used for acquiring the corresponding grouping topology information according to the data query request; the device also includes: the request forwarding module is used for sending a data query request to each memory fragment in the corresponding group according to the topology information so that each memory engine corresponding to each memory fragment can respectively acquire data to be queried; and the data merging module is used for receiving the data to be queried returned by each memory fragment, merging the data to be queried and returning the data to the client.
In some embodiments, the packet information obtaining module is further configured to, in a case that the data query request includes a time range, obtain topology information of a packet corresponding to the time range; in the case where the data query request does not include a time range, topology information of all packets is acquired.
According to still further embodiments of the present disclosure, there is provided a data processing system including: the data processing apparatus of any of the preceding embodiments, and a plurality of memory slices; the plurality of memory slices are divided into different groups; the memory slice is used for writing the allocated data to be written into the memory slice by using the corresponding storage engine.
In some embodiments, the data to be written is written to the corresponding memory slice after being compressed according to a string compression technique that includes converting the data to be written to an enumeration.
In some embodiments, the system further comprises: the control center is used for modifying the current state of the packet into a read-only state under the condition that the packet is fully written, configuring a corresponding time window for the packet, and determining the range of the time window according to the creation time and the full writing time; and creating another packet and setting the packet to a write state; or deleting the data in the packet under the condition that the retention time of the packet exceeds the preset time, wherein the retention time is counted from the moment that the current state of the packet is modified into the read-only state.
According to still further embodiments of the present disclosure, there is provided a data processing system including: a memory; and a processor coupled to the memory, the processor being configured to perform the steps of the data processing method according to any of the embodiments described above, based on instructions stored in the memory.
According to further embodiments of the present disclosure, a computer-readable storage medium is provided, on which a computer program is stored, wherein the program, when executed by a processor, implements the steps of the data processing method of any of the preceding embodiments.
According to the method and the device, different fragments are divided into different groups according to time sequence, topology information of the groups in the current writing state is searched under the condition that a plurality of concurrent data writing requests are received, each group of data to be written is distributed to each memory fragment according to the topology information, and the data to be written is written into the corresponding memory fragment by a memory engine corresponding to each distributed memory fragment. The characteristic of continuous high-concurrency writing of time sequence data can be better supported by a parallel writing mode of the storage engines corresponding to the plurality of fragments. And the grouping is divided according to the time sequence, and the data is written into the corresponding grouping, so that the management of time sequence data and the searching and application of subsequent data are facilitated.
Other features of the present disclosure and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 shows a flow diagram of a data processing method of some embodiments of the present disclosure.
Fig. 2 illustrates a block diagram of a packet according to some embodiments of the present disclosure.
Fig. 3 shows a flow diagram of a data processing method of further embodiments of the present disclosure.
Fig. 4 illustrates a schematic diagram of a storage structure of some embodiments of the present disclosure.
Fig. 5 shows a schematic structural diagram of a data processing apparatus of some embodiments of the present disclosure.
Fig. 6 shows a schematic configuration of a data processing apparatus according to further embodiments of the present disclosure.
FIG. 7 shows a block diagram of a data processing system of some embodiments of the present disclosure.
FIG. 8 shows a block diagram of a data processing system of further embodiments of the present disclosure.
FIG. 9 shows a block diagram of a data processing system according to further embodiments of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
Aiming at the characteristic of continuous high concurrent writing of time sequence data, the present disclosure provides a method for storing time sequence data, which can better cope with the characteristic of continuous high concurrent writing of time sequence data, and is described below with reference to fig. 1.
FIG. 1 is a flow chart of some embodiments of the disclosed data processing method. As shown in fig. 1, the method of this embodiment includes: steps S102 to S108.
In step S102, a plurality of concurrent data write requests are received, each data write request including a set of data to be written.
Various requests sent by the client can be received by the gateway, and whether the requests are data writing requests or not can be judged. The data to be written may be log data, corresponding to a specific log timestamp. Each set of timing data may be packaged into a file (Document).
In step S104, topology information of the packet whose current state is the write state is acquired.
In some embodiments, different packets are divided according to a temporal order, with different packets corresponding to different time windows. A packet includes a plurality of memory slices, and a memory slice may be understood as a memory space having a predetermined memory capacity. And only one writing state group can exist at one time, so that the management of data writing is facilitated. The number of the memory fragments in the packet can be flexibly added, and new memory fragments can be added in the packet in real time under the condition that the packet in the current writing state cannot meet the performance requirement of the data to be written. Because the memory fragments in the packet can be flexibly increased, the writing requirement can be met only by one packet at a time. Of course, a way of increasing the groups to satisfy the writing requirement may be adopted, but the workload of the subsequent data query process may be increased when a plurality of groups correspond to each other at the same time, and the query method will be described in the subsequent embodiments. Therefore, the priority is to have only one packet written to state at a time. A read-only state packet cannot be added to a new memory slice.
As shown in fig. 2, preferably, the storage space is divided into a plurality of storage slices, different storage slices are logically divided into different groups, different groups correspond to different time windows, a current state of one group is a Write & Read (Write & Read) state, and other groups are Read Only (Read Only) states. For example, Group 1(Group 1) includes shards (Shard) 1-6, corresponding time windows T1-T2, and the current state is a read-only state. In fig. 2, the number of the fragments included in each packet is the same, and may be the same or different in practical application, and the number of the fragments may be set according to requirements. The memory fragments in each group can be located in one or more storage nodes, and the advantages of distributed storage can be better played by being located in different storage nodes, so that the data writing and reading efficiency is improved. Due to the limitations of the capacities of the CPU, the memory, the disk IO, the network IO, and the like of the storage nodes, when the number of storage fragments on one storage node is too large, the data read-write capacity may be reduced. Therefore, the threshold value of the number of the memory fragments on each storage node can be set, and the data writing and reading efficiency is improved.
Further, the management processes of creation of packets, transition of states, and the like may be implemented by a control center (Master). And under the condition that the sub-packet with the current state being the writing state is fully written, the current state of the packet is modified into a read-only state, and meanwhile, the control center can create a new packet and set the state of the new packet as the writing state. The packet of write status is also readable. The packet is created corresponding to a start time, the packet is modified into a read-only state corresponding to an end time after being completely written, and a time range corresponding to the start time and the end time is used as a time window < startTime, endTime > of the packet. The time sequence data in the time range corresponding to the time window are all written into the group, so that the subsequent data query and management are facilitated.
In some embodiments, data in a packet is deleted in the event that the retention period of the packet exceeds a preset period, the retention period being timed from the time the current state of the packet is modified to a read-only state. The reserved duration of the packet is the duration from the end time to the current time. And the data of the historical packet with the overlong retention time is deleted, so that the storage space can be released, and the storage resource is saved. And by dividing the groups according to the time sequence, the aging management of the data is easier to realize, and the aging management of the data of the whole group can be realized only by managing the retention time of the groups. After the data in the packet is deleted, the packet can also be deleted, and the memory fragments in the packet are released and can be used for building a new packet. Alternatively, after the data in the packet is deleted, if a new packet is needed, the time window of the packet may be directly modified and used as a new packet.
The topology information includes address information of each memory slice in the packet, for example, IP address information, and may also include version information of each memory slice, and the like. The gateway can store the topology information of each group, the gateway can periodically update the topology information of each group, or the control center can inform the gateway to perform corresponding update after the topology information of the group is updated. The gateway can also acquire the topology information of each group from the control center, and the control center is responsible for updating and managing the topology information of all groups.
In step S106, according to the topology information, each group of data to be written is allocated to different memory slices in the group.
In some embodiments, the polling of the memory slices may be used to assign the sets of data to be written to different memory slices, for example, the 1 st set of data to be written is assigned to memory slice 1, the 2 nd set of data to be written is assigned to memory slice 2 … …, and so on, and the polling of the memory slices is performed in turn.
In other embodiments, for a set of data to be written, calculating a hash value of the set of data to be written according to identification information of the set of data to be written; and determining the corresponding storage fragment according to the hash value of the group of data to be written. For example, each group of timing data may be packaged into a file (Document), each Document may correspond to an identifier (Document ID), a hash value of the Document ID is calculated, the number of storage fragments is modulo by the hash value, and the corresponding storage fragment is selected according to the obtained value.
Other ways of allocating memory slices to data to be written are also possible. The two modes can distribute the data of each group to different storage fragments in a balanced manner as much as possible, thereby being beneficial to the load balance of each storage fragment and improving the storage efficiency.
In some embodiments, the topology information further comprises: version information of each memory slice. The gateway may send the version information of the memory fragment to the corresponding memory fragment in a case where the gateway sends the data to be written to each memory fragment, so that the memory fragment verifies the version information. And the gateway acquires the correct topology information of the packet from the control center in response to receiving the information that the version information returned by the memory fragments is wrong. The version information is used for reflecting whether the current attribute information of the memory slice changes, for example, the version changes may be caused by the change of the information such as the name, the memory range, the address, etc. of the memory slice.
In step S108, each storage engine corresponding to each allocated storage slice writes each set of data to be written into the corresponding storage slice.
The bottom layer of each memory fragment corresponds to one memory engine, and the parallel independent data writing of each memory fragment can be realized. The storage engine is, for example, a Lucene retrieval storage engine, and is not limited to the illustrated example. The Lucene retrieval and storage engine has high-performance full-text retrieval and aggregation analysis capacity, but the throughput is lower compared with the existing HBase or other retrieval and storage engines adopting an LSM (log structure merged tree) architecture, but by applying the scheme disclosed by the invention, the storage fragments are grouped, each storage fragment corresponds to one storage engine, different data to be written can be dispersed to different Lucene retrieval and storage engines, the writing throughput is improved to a great extent, and the high-continuous and concurrent writing requests can be better handled.
In some embodiments, the data to be written is written to the corresponding memory slice after being compressed according to a string compression technique. String compression techniques include converting data to be written to enumerations. For log data, for example, the time series data with high repeatability is compressed by converting character strings into enumeration and other character strings, so that the storage capacity of the data can be reduced to a great extent, and the storage space of a disk is saved.
In the method of the above embodiment, different segments are divided into different groups according to a time sequence, when a plurality of concurrent data write requests are received, topology information of the group in the current state of the write state is searched, each group of data to be written is allocated to each memory segment according to the topology information, and a memory engine corresponding to each allocated memory segment writes the data to be written into the corresponding memory segment. The characteristic of continuous high-concurrency writing of time sequence data can be better supported by a parallel writing mode of the storage engines corresponding to the plurality of fragments. And the grouping is divided according to the time sequence, and the data is written into the corresponding grouping, so that the management of time sequence data and the searching and application of subsequent data are facilitated.
For the storage manner of the data to be written in the foregoing embodiment, the present disclosure also provides a data query method, which is described below with reference to fig. 3.
FIG. 3 is a flow chart of further embodiments of the data processing method of the present disclosure. As shown in fig. 3, the method of this embodiment includes: steps S302 to S310.
In step S302, a data query request sent by a client is received.
The gateway can receive various requests sent by the client and judge whether the requests are data query requests. The data query request may include filter conditions, such as time information, key fields, etc.
In step S304, topology information of the corresponding packet is acquired according to the data query request.
In some embodiments, the gateway may parse the data query request, and in the case that the data query request includes a time range, obtain topology information of a packet corresponding to the time range. In the case where the data query request does not include a time range, topology information of all packets is acquired.
In step S306, a data query request is sent to each memory slice in the corresponding packet according to the topology information.
The gateway may send the data query request to the memory fragments in the corresponding respective groups, and the data query request is processed by the memory engine of the memory fragments.
In step S308, each storage engine corresponding to each storage slice acquires data to be queried.
Each memory slice corresponds to a memory engine, for example, the memory engine is Lucene. And the storage engine acquires the data to be queried according to the filtering condition in the data query request.
In step S310, the data to be queried returned from each memory slice is received, merged, and returned to the client.
Since the data is stored in each storage segment in a dispersed manner, after acquiring the data to be queried by the user from each storage segment, the gateway needs to merge the data and return the merged data to the client.
The method of the embodiment disperses the data query request of the user to each memory fragment for processing, thereby improving the efficiency of data query. The time sequence data are divided into groups according to the time sequence, and the time sequence data are grouped according to different groups stored by time information, so that the data query is facilitated, and the data query efficiency is further improved.
As shown in fig. 4, in the present disclosure, the gateway may be responsible for receiving a request of a client, and forwarding the request to each memory slice in a corresponding packet for processing. The creation and management of the packets may be under the responsibility of a control center (Master). The control center can manage the life cycle of the group, the control center starts a group, the group corresponds to a creation time, and when the memory fragments in the group are full, the control center can freeze the group, namely, the state of the group is modified to be read only, and a new group responsible for write operation is created. The control center can also delete a packet if the retention time of the packet exceeds the preset time. The control center may also be responsible for managing and updating topology information for individual packets.
The control center may also be responsible for, for example, health management (or fault management) of the memory slices or the nodes, for example, the memory slices or the nodes may periodically report heartbeat information to the control center, and the control center learns the health state of each slice or node according to the heartbeat information. The control center can adopt a multi-copy mode to ensure high availability, and when the master copy is abnormal, the slave copy can be quickly read and switched into the master copy to continuously provide service.
The storage cluster includes a plurality of storage nodes, and a storage space of the storage nodes may be logically divided into a metadata storage space (Meta Store) and a data storage space (e.g., a thread private data (TSD) Store). A string dictionary may be stored in the metadata storage space for converting strings to enumerations when stored. The data storage space is used for storing time sequence data and comprises a plurality of groups, and each group comprises a plurality of memory fragments.
The present disclosure also provides a data processing apparatus, which may be disposed in a gateway, as described below with reference to fig. 5.
FIG. 5 is a block diagram of some embodiments of a data processing apparatus of the present disclosure. As shown in fig. 5, the apparatus 50 of this embodiment includes: a request receiving module 502, a grouping information obtaining module 504 and a data distributing module 506.
The request receiving module 502 is configured to receive multiple concurrent data write requests, where each data write request includes a set of data to be written.
A packet information obtaining module 504, configured to obtain topology information of a packet in which a current state is a write state; the topology information includes address information of each memory slice in the packet, and different packets are divided according to a time sequence.
A data allocating module 506, configured to allocate, according to the topology information, each group of data to be written to different memory fragments in the group, so that each storage engine corresponding to each allocated memory fragment writes each group of data to be written to a corresponding memory fragment.
In some embodiments, the data allocating module 506 is configured to calculate, for a set of data to be written, a hash value of the set of data to be written according to the identification information of the set of data to be written; and determining the corresponding storage fragment according to the hash value of the group of data to be written.
Further embodiments of the data processing apparatus of the present disclosure are described below in conjunction with fig. 6.
FIG. 6 is a block diagram of some embodiments of a data processing apparatus of the present disclosure. As shown in fig. 6, the apparatus 60 of this embodiment includes: the request receiving module 602, the grouping information obtaining module 604 and the data distributing module 606 respectively have the same or similar functions as the request receiving module 502, the grouping information obtaining module 504 and the data distributing module 506.
In some embodiments, the topology information further comprises: version information of each memory slice. The apparatus 60 further comprises: the information updating module 608 is configured to send the version information of the memory segment to a corresponding memory segment, so that the memory segment verifies the version information; and acquiring correct topology information of the packet from the control center in response to receiving the information that the version information returned by the memory fragments is wrong.
In some embodiments, the request receiving module 602 is further configured to receive a data query request sent by a client. The grouping information obtaining module 604 is further configured to obtain topology information of a corresponding grouping according to the data query request. The apparatus 60 further comprises: the request forwarding module 610 is configured to send a data query request to each memory slice in the corresponding packet according to the topology information, so that each storage engine corresponding to each memory slice obtains data to be queried, respectively. And the data merging module 612 is configured to receive data to be queried, which is returned by each storage segment, merge the data to be queried, and return the merged data to the client.
In some embodiments, the grouping information obtaining module 604 is further configured to, in a case that the data query request includes a time range, obtain topology information of a grouping corresponding to the time range; in the case where the data query request does not include a time range, topology information of all packets is acquired.
The present disclosure also provides a data processing system, described below in conjunction with fig. 7.
FIG. 7 is a block diagram of some embodiments of a data processing system of the present disclosure. As shown in fig. 7, the system 7 of this embodiment includes: the data processing device 50/60 of any of the preceding embodiments, and the plurality of memory slices 72; the plurality of memory slices 72 are divided into different groups;
the memory slice 72 is used to write the allocated data to be written to the memory slice using the corresponding storage engine. The memory slice 72 is also used to retrieve the data to be queried using the corresponding memory engine and return to the data processing device 50/60.
In some embodiments, the data to be written is written to the corresponding memory slice after being compressed according to a string compression technique that includes converting the data to be written to an enumeration.
In some embodiments, the system 7 further comprises: the control center 74 is used for modifying the current state of the packet into a read-only state under the condition that the packet is written to full, configuring a corresponding time window for the packet, and determining the range of the time window according to the creation time and the write-to-full time; and creating another packet and setting the packet to a write state; or deleting the data in the packet under the condition that the retention time of the packet exceeds the preset time, wherein the retention time is counted from the moment that the current state of the packet is modified into the read-only state.
The apparatus in the data processing system in the embodiments of the present disclosure may each be implemented by various computing devices or computer systems, which are described below in conjunction with fig. 8 and 9.
FIG. 8 is a block diagram of some embodiments of a data processing system of the present disclosure. As shown in fig. 8, the data processing system 80 of this embodiment includes: a memory 810 and a processor 820 coupled to the memory 810, the processor 820 being configured to perform a data processing method in any of the embodiments of the present disclosure based on instructions stored in the memory 810.
Memory 810 may include, for example, system memory, fixed non-volatile storage media, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader (Boot Loader), a database, and other programs.
FIG. 9 is a block diagram of further embodiments of a data processing system according to the present disclosure. As shown in fig. 9, the data processing system 90 of this embodiment includes: the memory 910 and the processor 920 are similar to the memory 810 and the processor 820, respectively. An input output interface 930, a network interface 940, a storage interface 950, and the like may also be included. These interfaces 930, 940, 950 and the connection between the memory 910 and the processor 920 may be, for example, via the bus 260. The input/output interface 930 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, and a touch screen. The network interface 940 provides a connection interface for various networking devices, such as a database server or a cloud storage server. The storage interface 950 provides a connection interface for external storage devices such as an SD card and a usb disk.
As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only exemplary of the present disclosure and is not intended to limit the present disclosure, so that any modification, equivalent replacement, or improvement made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims (18)

1. A method of data processing, comprising:
receiving a plurality of concurrent data writing requests, wherein each data writing request comprises a group of data to be written;
acquiring the topological information of a packet with the current state being a writing state; wherein the topology information includes address information of each memory slice in the packet, and different packets are divided according to a time sequence;
and distributing each group of data to be written to different storage fragments in the group according to the topology information so that each storage engine corresponding to each distributed storage fragment can write each group of data to be written into the corresponding storage fragment.
2. The data processing method according to claim 1,
allocating each group of data to be written to different memory slices in the group comprises:
aiming at a group of data to be written, calculating the hash value of the group of data to be written according to the identification information of the group of data to be written;
and determining the corresponding storage fragment according to the hash value of the group of data to be written.
3. The data processing method according to claim 1,
under the condition that the packet is fully written, the current state of the packet is modified into a read-only state, the packet is configured with a corresponding time window, and the range of the time window is determined according to the creation time and the full writing time; and another packet is created and set to the write state.
4. The data processing method of claim 3,
and deleting the data in the packet under the condition that the retention time of the packet exceeds a preset time, wherein the retention time is counted from the moment that the current state of the packet is modified into a read-only state.
5. The data processing method according to claim 1,
the data to be written is compressed according to a character string compression technology and then written into the corresponding storage fragment, wherein the character string compression technology comprises the step of converting the data to be written into enumeration.
6. The data processing method according to claim 1,
the topology information further includes: version information of each memory slice;
the method further comprises the following steps:
sending the version information of the memory fragments to corresponding memory fragments so that the memory fragments can verify the version information;
and acquiring correct topology information of the group from a control center in response to receiving the information that the version information returned by the memory fragments is wrong.
7. The data processing method of claim 1, further comprising:
receiving a data query request sent by a client;
acquiring the topology information of the corresponding group according to the data query request;
sending the data query request to each memory fragment in the corresponding group according to the topology information so that each memory engine corresponding to each memory fragment respectively acquires data to be queried;
and receiving the data to be queried returned by each memory fragment, merging the data to be queried, and returning the merged data to the client.
8. The data processing method of claim 7,
the acquiring the topology information of the corresponding packet according to the data query request includes:
under the condition that the data query request comprises a time range, acquiring the topological information of a group corresponding to the time range;
and acquiring the topology information of all the groups under the condition that the data query request does not comprise a time range.
9. A data processing apparatus comprising:
the device comprises a request receiving module, a data writing module and a data writing module, wherein the request receiving module is used for receiving a plurality of concurrent data writing requests, and each data writing request comprises a group of data to be written;
the packet information acquisition module is used for acquiring the topology information of the packet with the current state being the writing state; wherein the topology information includes address information of each memory slice in the packet, and different packets are divided according to a time sequence;
and the data distribution module is used for distributing each group of data to be written to different memory fragments in the group according to the topology information so that each memory engine corresponding to each distributed memory fragment can write each group of data to be written into the corresponding memory fragment.
10. The data processing apparatus of claim 9,
the data distribution module is used for calculating the hash value of a group of data to be written according to the identification information of the group of data to be written aiming at the group of data to be written; and determining the corresponding storage fragment according to the hash value of the group of data to be written.
11. The data processing apparatus of claim 9,
the topology information further includes: version information of each memory slice;
the device further comprises:
the information updating module is used for sending the version information of the storage fragment to the corresponding storage fragment so that the storage fragment can verify the version information; and acquiring correct topology information of the group from a control center in response to receiving the information that the version information returned by the memory fragments is wrong.
12. The data processing apparatus of claim 9,
the request receiving module is also used for receiving a data query request sent by a client;
the grouping information acquisition module is also used for acquiring the topology information of the corresponding grouping according to the data query request;
the device further comprises:
a request forwarding module, configured to send the data query request to each memory slice in the corresponding packet according to the topology information, so that each storage engine corresponding to each memory slice respectively obtains data to be queried;
and the data merging module is used for receiving the data to be queried returned by each memory fragment, merging the data to be queried and returning the merged data to the client.
13. The data processing apparatus of claim 12,
the grouping information acquisition module is further used for acquiring the grouping topology information corresponding to the time range under the condition that the data query request comprises the time range; and acquiring the topology information of all the groups under the condition that the data query request does not comprise a time range.
14. A data processing system comprising: the data processing apparatus of any of claims 9-13, and a plurality of memory slices; the plurality of memory slices are divided into different groups;
the memory fragments are used for writing the distributed data to be written into the memory fragments by utilizing the corresponding memory engines.
15. The data processing system of claim 14,
the data to be written is compressed according to a character string compression technology and then written into the corresponding storage fragment, wherein the character string compression technology comprises the step of converting the data to be written into enumeration.
16. The data processing system of claim 14, further comprising:
the control center is used for modifying the current state of the packet into a read-only state under the condition that the packet is fully written, configuring a corresponding time window for the packet, and determining the range of the time window according to the creation time and the full writing time; and creating another packet and setting the packet to a write state;
or deleting the data in the packet when the retention time of the packet exceeds a preset time, wherein the retention time is timed from the time when the current state of the packet is modified into a read-only state.
17. A data processing system comprising:
a memory; and
a processor coupled to the memory, the processor being configured to perform the steps of the data processing method of any of claims 1-8 based on instructions stored in the memory.
18. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.
CN201910515339.8A 2019-06-14 2019-06-14 Data processing method, device, system and computer readable storage medium Active CN111782134B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910515339.8A CN111782134B (en) 2019-06-14 2019-06-14 Data processing method, device, system and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910515339.8A CN111782134B (en) 2019-06-14 2019-06-14 Data processing method, device, system and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN111782134A true CN111782134A (en) 2020-10-16
CN111782134B CN111782134B (en) 2024-06-14

Family

ID=72755698

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910515339.8A Active CN111782134B (en) 2019-06-14 2019-06-14 Data processing method, device, system and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111782134B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112231501A (en) * 2020-10-20 2021-01-15 浙江大华技术股份有限公司 Portrait library data storage and retrieval method and device and storage medium
CN112364019A (en) * 2020-11-04 2021-02-12 中盈优创资讯科技有限公司 Method and device for realizing fast data writing into ClickHouse by self-defined Spark data source
CN112463333A (en) * 2020-12-03 2021-03-09 北京浪潮数据技术有限公司 Data access method, device and medium based on multithreading concurrency
CN112684983A (en) * 2020-12-28 2021-04-20 北京三快在线科技有限公司 Data storage method and device, electronic equipment and readable storage medium
CN114443703A (en) * 2021-12-15 2022-05-06 北京达佳互联信息技术有限公司 Data processing method and device, electronic equipment and storage medium
WO2022110196A1 (en) * 2020-11-30 2022-06-02 华为技术有限公司 Data processing method, apparatus, and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101692655A (en) * 2009-10-23 2010-04-07 烽火通信科技股份有限公司 Data frame storage management device
CN108509652A (en) * 2018-04-17 2018-09-07 山东大众益康网络科技有限公司 Data processing system and method
CN108616556A (en) * 2016-12-13 2018-10-02 阿里巴巴集团控股有限公司 Data processing method, device and system
US20180341410A1 (en) * 2017-03-24 2018-11-29 Western Digital Technologies, Inc. System and method for adaptive early completion posting using controller memory buffer

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101692655A (en) * 2009-10-23 2010-04-07 烽火通信科技股份有限公司 Data frame storage management device
CN108616556A (en) * 2016-12-13 2018-10-02 阿里巴巴集团控股有限公司 Data processing method, device and system
US20180341410A1 (en) * 2017-03-24 2018-11-29 Western Digital Technologies, Inc. System and method for adaptive early completion posting using controller memory buffer
CN108509652A (en) * 2018-04-17 2018-09-07 山东大众益康网络科技有限公司 Data processing system and method

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112231501A (en) * 2020-10-20 2021-01-15 浙江大华技术股份有限公司 Portrait library data storage and retrieval method and device and storage medium
CN112364019A (en) * 2020-11-04 2021-02-12 中盈优创资讯科技有限公司 Method and device for realizing fast data writing into ClickHouse by self-defined Spark data source
CN112364019B (en) * 2020-11-04 2022-10-04 中盈优创资讯科技有限公司 Method and device for realizing fast data writing into ClickHouse by self-defined Spark data source
WO2022110196A1 (en) * 2020-11-30 2022-06-02 华为技术有限公司 Data processing method, apparatus, and system
CN112463333A (en) * 2020-12-03 2021-03-09 北京浪潮数据技术有限公司 Data access method, device and medium based on multithreading concurrency
CN112684983A (en) * 2020-12-28 2021-04-20 北京三快在线科技有限公司 Data storage method and device, electronic equipment and readable storage medium
CN114443703A (en) * 2021-12-15 2022-05-06 北京达佳互联信息技术有限公司 Data processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111782134B (en) 2024-06-14

Similar Documents

Publication Publication Date Title
CN111782134B (en) Data processing method, device, system and computer readable storage medium
US10795905B2 (en) Data stream ingestion and persistence techniques
US10691716B2 (en) Dynamic partitioning techniques for data streams
US9276959B2 (en) Client-configurable security options for data streams
US9794135B2 (en) Managed service for acquisition, storage and consumption of large-scale data streams
US10635644B2 (en) Partition-based data stream processing framework
US20170277556A1 (en) Distribution system, computer, and arrangement method for virtual machine
US9342529B2 (en) Directory-level referral method for parallel NFS with multiple metadata servers
CN110147411A (en) Method of data synchronization, device, computer equipment and storage medium
US20130218934A1 (en) Method for directory entries split and merge in distributed file system
US20170031948A1 (en) File synchronization method, server, and terminal
CN110347651A (en) Method of data synchronization, device, equipment and storage medium based on cloud storage
CN107368260A (en) Memory space method for sorting, apparatus and system based on distributed system
JP2014232483A (en) Database system, retrieval method and program
CN111694505B (en) Data storage management method, device and computer readable storage medium
CN115292280A (en) Cross-region data scheduling method, device, equipment and storage medium
CN116304390B (en) Time sequence data processing method and device, storage medium and electronic equipment
JP2023531751A (en) Vehicle data storage method and system
CN113905252B (en) Data storage method and device for live broadcasting room, electronic equipment and storage medium
JP2012190377A (en) Content decentralization and storage system
Lu et al. Research on Cassandra data compaction strategies for time-series data
CN115686343A (en) Data updating method and device
CN113326335A (en) Data storage system, method, device, electronic equipment and computer storage medium
JP6568232B2 (en) Computer system and device management method
JP6193491B2 (en) Computer system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant