CN111651424A - Data processing method and device, data node and storage medium - Google Patents

Data processing method and device, data node and storage medium Download PDF

Info

Publication number
CN111651424A
CN111651424A CN202010525534.1A CN202010525534A CN111651424A CN 111651424 A CN111651424 A CN 111651424A CN 202010525534 A CN202010525534 A CN 202010525534A CN 111651424 A CN111651424 A CN 111651424A
Authority
CN
China
Prior art keywords
data
node
target
local
external
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010525534.1A
Other languages
Chinese (zh)
Other versions
CN111651424B (en
Inventor
刘智
伍浩文
王洋
须成忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN202010525534.1A priority Critical patent/CN111651424B/en
Priority claimed from CN202010525534.1A external-priority patent/CN111651424B/en
Publication of CN111651424A publication Critical patent/CN111651424A/en
Application granted granted Critical
Publication of CN111651424B publication Critical patent/CN111651424B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Abstract

The application is applicable to the technical field of data processing, and provides a data processing method, a device, a data node and a storage medium, wherein the method comprises the following steps: determining a target partition address of the target data according to the external table; sending a data acquisition request to a target data node storing target data according to the target partition address; and receiving target data sent by the target data node. The scheme reduces the query dependence on the control nodes, reduces the data query load capacity of the control nodes, improves the data query response speed and improves the data processing performance of the distributed database.

Description

Data processing method and device, data node and storage medium
Technical Field
The present application belongs to the field of data processing technologies, and in particular, to a data processing method and apparatus, a data node, and a storage medium.
Background
The distributed database is an important data storage system, and usually uses a smaller computer system to form different data nodes, each data node may have a complete copy or a partial copy of a data management file, and has its own local database, and many data nodes located at different places are interconnected through a network to jointly form a complete and global logically centralized and physically distributed large database.
In the application process of the distributed database, data can be stored in different data nodes in a scattered mode. When data in different data nodes are queried or read, the data access and search from the corresponding data node are realized through the data distribution index file stored in the control node. When the number of data nodes included in the distributed database is large, the data query load of the control node is large, and the data query response speed is reduced, which affects the data processing performance of the database.
Disclosure of Invention
Embodiments of the present application provide a data processing method and apparatus, a data node, and a storage medium, so as to solve the problems that in the prior art, when the number of distributed data nodes included in a distributed database is large, a data query load of a control node is large, and a data query response speed is reduced, which affects data processing performance of the database.
A first aspect of an embodiment of the present application provides a data processing method, which is applied to a data node, where local data and an external table corresponding to the external data are stored in the data node, and the external table includes partition addresses of other data nodes in a distributed system, where the other data nodes store the external data, except for the local data node; the data processing method comprises the following steps:
determining a target partition address of target data according to the external table;
sending a data acquisition request to a target data node storing the target data according to the target partition address;
and receiving the target data sent by the target data node.
A second aspect of the embodiments of the present application provides a data processing apparatus, where local data and an external table corresponding to the external data are stored in the data processing apparatus, and the external table includes partition addresses of other data nodes in a distributed system, where the other data nodes store the external data, except for a local data node; the data processing apparatus further includes:
the determining module is used for determining the target partition address of the target data according to the external table;
the sending module is used for sending a data acquisition request to a target data node storing the target data according to the target partition address;
and the receiving module is used for receiving the target data sent by the target data node.
A third aspect of embodiments of the present application provides a data node, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method according to the first aspect when executing the computer program.
A fourth aspect of embodiments of the present application provides a computer-readable storage medium, in which a computer program is stored, which, when executed by a processor, performs the steps of the method according to the first aspect.
A fifth aspect of the present application provides a computer program product, which, when run on a data node, causes the data node to perform the steps of the method of the first aspect described above.
As can be seen from the above, in the embodiment of the present application, local data and an external table corresponding to the external data are stored in a data node, where the external table includes partition addresses of other data nodes in the distributed system, where the external data is stored in addition to the local data node, and a target partition address of target data is determined according to the external table, and a data acquisition request is sent to a target data node storing the target data according to the target partition address, and the target data sent by the target data node is received. The distributed data storage function is realized, meanwhile, the local data node can be used as an initiating node of data query, the common query function of data stored in the local data and the external data node is realized, the query dependence on the control node is reduced, the data query load capacity of the control node is reduced, the data query response speed is improved, and the data processing performance of the distributed database is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a first flowchart of a data processing method provided in an embodiment of the present application;
FIG. 2 is a block diagram of a distributed data storage system according to an embodiment of the present application;
fig. 3 is a second flowchart of a data processing method according to an embodiment of the present application;
fig. 4 is a block diagram of a data processing apparatus according to an embodiment of the present application;
fig. 5 is a structural diagram of a data node according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
It should be understood that, the sequence numbers of the steps in this embodiment do not mean the execution sequence, and the execution sequence of each process should be determined by the function and the inherent logic of the process, and should not constitute any limitation to the implementation process of the embodiment of the present application.
The embodiment of the application provides a data processing method and device, a data node and a storage medium, which are used for determining a target partition address of target data according to an external table, sending a data acquisition request to the target data node storing the target data according to the target partition address, and receiving the target data sent by the target data node. The distributed data storage function is realized, meanwhile, the local data node can be used as an initiating node of data query, the common query function of data stored in the local data and the external data node is realized, the query dependence on the control node is reduced, the data query load capacity of the control node is reduced, the data query response speed is improved, and the data processing performance of the distributed database is improved.
In order to explain the technical solution described in the present application, the following description will be given by way of specific examples.
Referring to fig. 1, fig. 1 is a first flowchart of a data processing method provided in an embodiment of the present application. The data processing method is applied to a data node, local data and an external table corresponding to the external data are stored in the data node, and the external table comprises partition addresses of other data nodes, which are stored with the external data except the local data node, in a distributed system.
Wherein, part or all of the local data and the external data are data in the same data table to be stored.
The distributed system is in particular a distributed data storage system, i.e. a distributed database. The distributed system comprises a plurality of data nodes, and distributed storage of data to be stored is realized. The local data node and the other data nodes are data nodes in the distributed data storage system.
The local data is data stored in the local data node, and the external data is data stored in other data nodes except the local data node in the distributed system. Each data node is corresponding to a partition address for indicating an access address of the data node. The local data node is an executing node of the method.
As shown in fig. 1, a data processing method includes the steps of:
step 101, determining a target partition address of the target data according to the external table.
When a user queries data in the distributed data storage system through a certain data node, the data node realizes a partition address corresponding to a data node stored in non-locally stored data (namely, external data) through an external table stored by the data node.
In the process, the local data node can grasp the data distribution condition of the data stored in the distributed data storage mode in each data node in the local data node.
Specifically, as an optional implementation, the generation process of the local data and the generation process of the external table may be:
the data processing method further comprises the following steps:
acquiring a data table sent by a control node and partition information corresponding to the data table; wherein the partition information is used for indicating a plurality of data segments of the data table and a partition address of each data segment;
if the data segment with the partition address as the local address exists in the plurality of data segments, storing the data segment with the partition address as the local address;
and if the data segment with the partition address being the non-local address exists in the plurality of data segments, generating an external table according to the partition address of the data segment with the partition address being the non-local address.
Specifically, the external table may be only one external table or one external table corresponding to each data segment whose partition address is a non-local address.
The generated external table may be a new external table or an update to an original external table.
The multiple data segments of the data table indicated in the partition information may be obtained by performing data fragmentation on the data to be stored, and specifically may be obtained by performing data fragmentation according to the number of data fragmentation set by the user. The partition information is specifically related information obtained by dividing the data table and storing the data table in a partitioned manner.
The partition information indicates a partition address of each data segment, i.e., indicates an access address of a data node to which each data segment is to be distributed. Specifically, the data in the partition information may include: the starting content and the ending content of each data segment, the size of each data segment, the data node to which each data segment is to be distributed, and the like.
The control node is connected with a plurality of data nodes to realize global control on the data nodes. Specifically, the control node may provide management functions for a user, such as increase and decrease of data nodes, fragmentation processing of a data table to be stored, migration of data segments obtained through fragmentation in the data nodes, and high availability configuration of data, and the global states of different data nodes in the distributed data storage system may be stored in a metadata table of a Catalog (directory, list) in the control node. Here, the fragmentation processing algorithm may adopt a hash algorithm SHA-1.
Specifically, the "plurality" referred to in the embodiments of the present specification is specifically at least two.
Here, the above-described data table fragmentation and data storage process will be described by way of example with reference to fig. 2.
The control node divides the data table to be stored into 3 data segments P1, P2, and P3 according to the number of data table fragments input by the user, for example, 3, and allocates the data nodes to which the data segments need to be distributed, specifically, in fig. 2, the control node distributes P1 to data node 1, P2 to data node 2, and P3 to data node 3, so as to implement distributed storage, and accordingly, the control node generates corresponding partition information and sends the data table and the partition information to the corresponding data nodes. After the data node acquires the data table and the partition information, the data segment which needs to be stored by the local data node is stored, and an external table is generated and stored based on the partition address of the data segment which does not need to be stored locally. For example, the data node 1 stores P1, generates an external table P2 'of the data segment P2 and an external table P3' of the data segment P3 and stores the external table in the local; the data node 2 stores P2, generates an external table P1 'of the data segment P1 and an external table P3' of the data segment P3 and stores the external table in the local; the data node 3 stores P3, generates and stores locally an external table P1 'for data segment P1 and an external table P2' for data segment P2.
Specifically, after the data node determines that the data segment with the partition address as the local address exists in the plurality of data segments, the data segment with the partition address as the local address is stored, or determines that the data segment with the partition address as the non-local address exists in the plurality of data segments, the obtained data table may be deleted after an external table is generated according to the partition address of the data segment with the partition address as the non-local address, so as to save a storage space.
In the process, the data nodes acquire the data table and the corresponding partition information from the control node, so that the data distribution condition of the data table in each data node in a distributed data storage mode can be mastered in the local of the data nodes.
And step 102, sending a data acquisition request to a target data node storing the target data according to the target partition address.
In this step, after the data node determines the target partition address of the target data according to the locally stored external table, the data node may access the target data node storing the target data according to the target partition address, and send a data acquisition request to the target data node.
The data obtaining request is used for obtaining the target data stored in the target data node.
Step 103, receiving the target data sent by the target data node.
The above steps realize that a certain data node is used as an initiating node of data query, realize a common query function for data stored in local data and external data nodes, reduce query dependence on control nodes, reduce data query load of the control nodes, improve data query response speed and improve data processing performance of the distributed database.
Further, as an optional implementation manner, the data processing method further includes:
if the target data is not successfully received from the target data node, determining a backup data node corresponding to the target data node according to a preset backup node mapping relation; sending the data acquisition request to the backup data node; and receiving the target data sent by the backup data node.
In this embodiment, a backup node corresponding to the data node is provided. The data nodes and the backup nodes have mapping relations, and data in the data nodes and the backup nodes having the mapping relations have data consistency.
Specifically, the distributed system includes a plurality of nodes, and the nodes may include: the data nodes and the control nodes, wherein one part of the data nodes are used as backup nodes of the other part of the data nodes, and the data nodes and the control nodes are mapped through a mapping relation and keep the consistency on data.
Here, description will be made with reference to fig. 2. In fig. 2, the architecture supports high availability and remote disaster recovery in units of zones (zones), and a node-level data synchronization can be realized between a first Zone (Zone1) and a second Zone (Zone2) in a stream replication manner, so that cross-room and remote data backup is realized, and high reliability of data is ensured.
As shown in fig. 2, the Zone2 stores a data node 4 having a mapping relationship with the data node 1 in the Zone1, a data node 5 having a mapping relationship with the data node 2 in the Zone1, and a data node 6 having a mapping relationship with the data node 3 in the Zone 1. The data stored by both sides of the data nodes with the mapping relation are the same. When a user triggers searching of data P2 from a data node 1, if the data node 1 does not successfully receive target data P2 from a data node 2, determining a backup data node corresponding to the data node 2, namely a data node 5, according to a preset backup node mapping relation, and sending a data acquisition request to the data node 5; the target data P2 sent by the data node 5 is received.
In the process, a user can transparently inquire the data of all the nodes by connecting any data node through a client, without middleware or control nodes, can obtain the excellent characteristics of the traditional relational database on the basis of a distributed database, realizes a data inquiry interface with complete semantics in the data node, and inquires global data end to end, thereby improving the data processing performance.
In the embodiment of the application, local data and an external table corresponding to the external data are stored in a data node, the external table comprises partition addresses of other data nodes, which store the external data, in a distributed system except the local data node, a target partition address of target data is determined according to the external table, a data acquisition request is sent to the target data node storing the target data according to the target partition address, and the target data sent by the target data node is received. The distributed data storage function is realized, meanwhile, the local data node can be used as an initiating node of data query, the common query function of data stored in the local data and the external data node is realized, the query dependence on the control node is reduced, the data query load capacity of the control node is reduced, the data query response speed is improved, and the data processing performance of the distributed database is improved.
As an optional implementation manner, the data processing method further includes:
acquiring a newly added node interconnection instruction sent by a control node;
and establishing a data transmission channel between the newly added node and the new node according to the interconnection instruction of the newly added node.
This process occurs after a new data node is established by the control node. After the current data node receives the interconnection instruction of the new data node sent by the control node, because the operations of data access, transmission and the like need to be realized among all the data nodes, a data transmission channel between the new data node and the original data node needs to be established.
Specifically, the newly added node is a node indicated in the interconnection instruction of the newly added node, and two nodes having a data correspondence relationship may be between the newly added node and the local data node, where the data correspondence relationship includes:
a first external table is stored in a newly added node indicated by the newly added node interconnection instruction, the first external table is an external table of local data stored in a local data node, and the first external table comprises a partition address of the local data node; or the newly added node indicated by the newly added node interconnection instruction stores first local data, and the external data corresponding to the external table stored in the local data node includes the first local data in the newly added node.
Differently, as an optional implementation, the data processing method further includes:
acquiring a data synchronization instruction sent by a control node, wherein the data synchronization instruction comprises storage migration information of the external data; and updating the partition address in the external table according to the storage migration information.
This process occurs in the event of data migration or data change in the data node. When data change occurs in other data nodes except the local data node, the control node acquires the data change information, and sends a data synchronization instruction to the local data node to indicate the storage migration information of the external data. And the local data node updates the partition address in the external table of the local storage according to the storage migration information so as to be capable of correctly acquiring the access address of the data node where the external data is located.
The updating of the partition address in the external table may be a partial update or a full update of the external table. The treatment may be performed according to actual conditions, and is not particularly limited herein.
According to the different implementation processes, the distributed data storage system supports functions of dynamic data node addition, data migration in the data nodes and data synchronization, the client can be connected with any node and accesses all global data from the node, cross-node distributed transactions are supported, the overall compatibility, universality and flexibility of the system are enhanced, and the construction of a high-availability database system with a wider application scene is realized.
The embodiment of the application also provides different implementation modes of the data processing method.
Referring to fig. 3, fig. 3 is a second flowchart of a data processing method according to an embodiment of the present application. The data processing method is applied to a data node, local data and an external table corresponding to the external data are stored in the data node, and the external table comprises partition addresses of other data nodes, which are stored with the external data except the local data node, in a distributed system.
Wherein, part or all of the local data and the external data are data in the same data table to be stored. The distributed system is specifically a distributed data storage system. The distributed system comprises a plurality of data nodes, and distributed storage of data to be stored is realized. The local data node is an executing node of the method. The local data node and the other data nodes are data nodes in the distributed data storage system. Each data node is corresponding to a partition address for indicating an access address of the data node.
As shown in fig. 3, a data processing method includes the steps of:
step 301, receiving a query instruction.
The query instruction is used for indicating query target data.
Step 302, if the target data is not found in the local data, determining a target partition address of the target data according to the external table.
After the query instruction is obtained, the query instruction can be analyzed, the query character string corresponding to the query instruction is analyzed, the keywords contained in the query character string and the data query relation among the keywords are obtained, and the target data corresponding to the keywords are determined according to the data query relation among the keywords. And under the condition that the target data is not found in the local data, the data is external data stored in an external data node, and an access address of the data node where the target data is located, namely a target partition address, is found according to a locally stored external table so as to acquire the target data.
Here, "if the target data is not found in the local data" includes: not finding all target data in the local data or not finding part of the target data in the local data.
In the process, cross-node access of data is achieved. In the process, data access of a user in a single data node can be realized without any middleware or a control node.
Correspondingly, if the target data is found in the local data, the target data is extracted from the local data.
Step 303, sending a data acquisition request to a target data node storing the target data according to the target partition address.
The implementation process of this step is the same as that of step 102 in the foregoing embodiment, and is not described here again.
Step 304, receiving the target data sent by the target data node.
The implementation process of this step is the same as that of step 103 in the foregoing embodiment, and is not described here again.
In the embodiment of the application, local data and an external table corresponding to the external data are stored in a data node, the external table comprises partition addresses of other data nodes, which store the external data, in a distributed system except the local data node, a target partition address of target data is determined according to the external table, a data acquisition request is sent to the target data node storing the target data according to the target partition address, and the target data sent by the target data node is received. The distributed data storage function is realized, meanwhile, the local data node can be used as an initiating node of data query, the common query function of data stored in the local data and the external data node is realized, the query dependence on the control node is reduced, the data query load capacity of the control node is reduced, the data query response speed is improved, and the data processing performance of the distributed database is improved.
Referring to fig. 4, fig. 4 is a structural diagram of a data processing apparatus according to an embodiment of the present application, and only a part related to the embodiment of the present application is shown for convenience of description.
The data processing device is stored with local data and an external table corresponding to the external data, wherein the external table comprises the partition addresses of other data nodes in the distributed system, except the local data node, which stores the external data.
The data processing apparatus 400 further comprises:
a determining module 401, configured to determine a target partition address of target data according to the external table;
a sending module 402, configured to send a data obtaining request to a target data node storing the target data according to the target partition address;
a receiving module 403, configured to receive the target data sent by the target data node.
Wherein the determining module 401 is specifically configured to: receiving a query instruction, wherein the query instruction is used for indicating query target data; and if the target data is not found in the local data, determining a target partition address of the target data according to the external table.
Wherein the data processing apparatus 400 further comprises:
the first acquisition module is used for acquiring a data table sent by a control node and partition information corresponding to the data table; wherein the partition information is used for indicating a plurality of data segments of the data table and a partition address of each data segment;
the storage module is used for storing the data segment with the partition address as the local address if the data segment with the partition address as the local address exists in the plurality of data segments;
and the generating module is used for generating an external table according to the partition address of the data segment with the partition address being the non-local address if the data segment with the partition address being the non-local address exists in the plurality of data segments.
Wherein the data processing apparatus 400 further comprises:
the node determining module is used for determining a backup data node corresponding to the target data node according to a preset backup node mapping relation if the target data is not successfully received from the target data node;
the sending module 402 is further configured to send the data obtaining request to the backup data node;
the receiving module 403 is further configured to receive the target data sent by the backup data node.
The data processing apparatus 400 further comprises:
the second acquisition module is used for acquiring a newly-added node interconnection instruction sent by the control node;
and the channel establishing module is used for establishing a data transmission channel between the newly added node and the channel establishing module according to the interconnection instruction of the newly added node.
The data processing apparatus 400 further comprises:
a third obtaining module, configured to obtain a data synchronization instruction sent by a control node, where the data synchronization instruction includes storage migration information of the external data;
and the updating module is used for updating the partition address in the external table according to the storage migration information.
The data processing device provided in the embodiment of the present application can implement each process of the above-mentioned embodiment of the data processing method, and can achieve the same technical effect, and for avoiding repetition, details are not repeated here.
Fig. 5 is a structural diagram of a data node according to an embodiment of the present application. As shown in the figure, the data node 5 of this embodiment includes: at least one processor 50 (only one shown in fig. 5), a memory 51, and a computer program 52 stored in the memory 51 and executable on the at least one processor 50, the steps of any of the various method embodiments described above being implemented when the computer program 52 is executed by the processor 50.
The data node 5 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The data node 5 may include, but is not limited to, a processor 50, a memory 51. Those skilled in the art will appreciate that fig. 5 is merely an example of a data node 5 and does not constitute a limitation of the data node 5 and may include more or less components than shown, or combine certain components, or different components, e.g., the data node may also include input output devices, network access devices, buses, etc.
The Processor 50 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 51 may be an internal storage unit of the data node 5, such as a hard disk or a memory of the data node 5. The memory 51 may also be an external storage device of the data node 5, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the data node 5. Further, the memory 51 may also include both an internal storage unit and an external storage device of the data node 5. The memory 51 is used for storing the computer programs and other programs and data required by the data node. The memory 51 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/data node and method may be implemented in other ways. For example, the above-described apparatus/data node embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
When the computer program product runs on a data node, the data node can implement the steps in the above method embodiments when executed.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. A data processing method is applied to a data node and is characterized in that local data and an external table corresponding to the external data are stored in the data node, and the external table comprises partition addresses of other data nodes which are stored with the external data except the local data node in a distributed system; the data processing method comprises the following steps:
determining a target partition address of target data according to the external table;
sending a data acquisition request to a target data node storing the target data according to the target partition address;
and receiving the target data sent by the target data node.
2. The data processing method of claim 1, wherein determining the target partition address of the target data according to the external table comprises:
receiving a query instruction, wherein the query instruction is used for indicating query target data;
and if the target data is not found in the local data, determining a target partition address of the target data according to the external table.
3. The data processing method of claim 1, further comprising:
acquiring a data table sent by a control node and partition information corresponding to the data table; wherein the partition information is used for indicating a plurality of data segments of the data table and a partition address of each data segment;
if the data segment with the partition address as the local address exists in the plurality of data segments, storing the data segment with the partition address as the local address;
and if the data segment with the partition address being the non-local address exists in the plurality of data segments, generating an external table according to the partition address of the data segment with the partition address being the non-local address.
4. The data processing method of claim 1, further comprising:
if the target data is not successfully received from the target data node, determining a backup data node corresponding to the target data node according to a preset backup node mapping relation;
sending the data acquisition request to the backup data node;
and receiving the target data sent by the backup data node.
5. The data processing method of claim 1, further comprising:
acquiring a newly added node interconnection instruction sent by a control node;
and establishing a data transmission channel between the newly added node and the new node according to the interconnection instruction of the newly added node.
6. The data processing method of claim 1, further comprising:
acquiring a data synchronization instruction sent by a control node, wherein the data synchronization instruction comprises storage migration information of the external data;
and updating the partition address in the external table according to the storage migration information.
7. A data processing device is characterized in that local data and an external table corresponding to the external data are stored in the data processing device, and the external table comprises partition addresses of other data nodes in a distributed system, wherein the other data nodes are stored with the external data except for local data nodes; the data processing apparatus further includes:
the determining module is used for determining the target partition address of the target data according to the external table;
the sending module is used for sending a data acquisition request to a target data node storing the target data according to the target partition address;
and the receiving module is used for receiving the target data sent by the target data node.
8. The data processing apparatus of claim 7, wherein the determining module is specifically configured to: receiving a query instruction, wherein the query instruction is used for indicating query target data; and if the target data is not found in the local data, determining a target partition address of the target data according to the external table.
9. A data node comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 6 are implemented when the computer program is executed by the processor.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.
CN202010525534.1A 2020-06-10 Data processing method, device, data node and storage medium Active CN111651424B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010525534.1A CN111651424B (en) 2020-06-10 Data processing method, device, data node and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010525534.1A CN111651424B (en) 2020-06-10 Data processing method, device, data node and storage medium

Publications (2)

Publication Number Publication Date
CN111651424A true CN111651424A (en) 2020-09-11
CN111651424B CN111651424B (en) 2024-05-03

Family

ID=

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112231405A (en) * 2020-10-19 2021-01-15 浙江大华技术股份有限公司 Data storage device
CN112612793A (en) * 2020-12-25 2021-04-06 恒生电子股份有限公司 Resource query method, device, node equipment and storage medium
CN116226137A (en) * 2023-05-06 2023-06-06 山东浪潮科学研究院有限公司 Data storage method, device, equipment and storage medium

Citations (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577440A (en) * 2012-07-27 2014-02-12 阿里巴巴集团控股有限公司 Data processing method and device in non-relational database
CN106708968A (en) * 2016-12-01 2017-05-24 成都华为技术有限公司 Distributed database system and data processing method in distributed database system
WO2017096977A1 (en) * 2015-12-08 2017-06-15 华为技术有限公司 Data backup method, apparatus and system
WO2017167171A1 (en) * 2016-03-31 2017-10-05 华为技术有限公司 Data operation method, server, and storage system
CN107707628A (en) * 2017-09-06 2018-02-16 华为技术有限公司 Method and apparatus for transmitting data processing request
WO2018040722A1 (en) * 2016-08-31 2018-03-08 华为技术有限公司 Table data query method and device
CN108287894A (en) * 2018-01-19 2018-07-17 腾讯科技(深圳)有限公司 Data processing method, device, computing device and storage medium
US20180225353A1 (en) * 2015-11-26 2018-08-09 Huawei Technologies Co., Ltd. Distributed Database Processing Method and Device
CN109240943A (en) * 2018-09-26 2019-01-18 郑州云海信息技术有限公司 Address mapping relation feedback method, device, equipment and readable storage medium storing program for executing
CN109299194A (en) * 2018-09-25 2019-02-01 平安科技(深圳)有限公司 Multi-edition data memory management method and device, electronic equipment, storage medium
CN109344094A (en) * 2018-09-26 2019-02-15 郑州云海信息技术有限公司 Address mapping relation feedback method, device, equipment and readable storage medium storing program for executing
CN109783522A (en) * 2019-01-08 2019-05-21 郑州云海信息技术有限公司 A kind of data distribution formula caching method, system, equipment and computer storage medium
CN109800179A (en) * 2019-01-31 2019-05-24 维沃移动通信有限公司 It obtains the method for data, send method, host and the embedded memory of data
CN109902114A (en) * 2019-01-24 2019-06-18 中国平安人寿保险股份有限公司 ES company-data multiplexing method, system, computer installation and storage medium
CN109922156A (en) * 2019-03-20 2019-06-21 深圳市网心科技有限公司 A kind of data communications method and its relevant device
CN109951890A (en) * 2017-12-21 2019-06-28 中国科学院深圳先进技术研究院 A kind of data communications method, relay node, terminal node and communication system
CN110008257A (en) * 2019-04-10 2019-07-12 深圳市腾讯计算机系统有限公司 Data processing method, device, system, computer equipment and storage medium
CN110060162A (en) * 2019-03-29 2019-07-26 阿里巴巴集团控股有限公司 Data grant, querying method and device based on block chain
WO2019148722A1 (en) * 2018-02-01 2019-08-08 平安科技(深圳)有限公司 Electronic device, data migrating and calling method and storage medium
CN110246017A (en) * 2019-05-21 2019-09-17 平安普惠企业管理有限公司 Data capture method, terminal device and computer storage medium based on alliance's chain
CN110245185A (en) * 2019-05-21 2019-09-17 平安普惠企业管理有限公司 Data processing method, terminal device and computer storage medium based on alliance's chain
CN110502507A (en) * 2019-08-29 2019-11-26 上海达梦数据库有限公司 A kind of management system of distributed data base, method, equipment and storage medium
US20200133875A1 (en) * 2018-10-31 2020-04-30 EMC IP Holding Company LLC Method, apparatus and computer program product for managing data access
CN111125447A (en) * 2019-12-22 2020-05-08 北京浪潮数据技术有限公司 Metadata access method, device and equipment and readable storage medium

Patent Citations (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577440A (en) * 2012-07-27 2014-02-12 阿里巴巴集团控股有限公司 Data processing method and device in non-relational database
US20180225353A1 (en) * 2015-11-26 2018-08-09 Huawei Technologies Co., Ltd. Distributed Database Processing Method and Device
WO2017096977A1 (en) * 2015-12-08 2017-06-15 华为技术有限公司 Data backup method, apparatus and system
WO2017167171A1 (en) * 2016-03-31 2017-10-05 华为技术有限公司 Data operation method, server, and storage system
WO2018040722A1 (en) * 2016-08-31 2018-03-08 华为技术有限公司 Table data query method and device
CN106708968A (en) * 2016-12-01 2017-05-24 成都华为技术有限公司 Distributed database system and data processing method in distributed database system
CN107707628A (en) * 2017-09-06 2018-02-16 华为技术有限公司 Method and apparatus for transmitting data processing request
CN109951890A (en) * 2017-12-21 2019-06-28 中国科学院深圳先进技术研究院 A kind of data communications method, relay node, terminal node and communication system
CN108287894A (en) * 2018-01-19 2018-07-17 腾讯科技(深圳)有限公司 Data processing method, device, computing device and storage medium
WO2019148722A1 (en) * 2018-02-01 2019-08-08 平安科技(深圳)有限公司 Electronic device, data migrating and calling method and storage medium
CN109299194A (en) * 2018-09-25 2019-02-01 平安科技(深圳)有限公司 Multi-edition data memory management method and device, electronic equipment, storage medium
CN109344094A (en) * 2018-09-26 2019-02-15 郑州云海信息技术有限公司 Address mapping relation feedback method, device, equipment and readable storage medium storing program for executing
CN109240943A (en) * 2018-09-26 2019-01-18 郑州云海信息技术有限公司 Address mapping relation feedback method, device, equipment and readable storage medium storing program for executing
US20200133875A1 (en) * 2018-10-31 2020-04-30 EMC IP Holding Company LLC Method, apparatus and computer program product for managing data access
CN109783522A (en) * 2019-01-08 2019-05-21 郑州云海信息技术有限公司 A kind of data distribution formula caching method, system, equipment and computer storage medium
CN109902114A (en) * 2019-01-24 2019-06-18 中国平安人寿保险股份有限公司 ES company-data multiplexing method, system, computer installation and storage medium
CN109800179A (en) * 2019-01-31 2019-05-24 维沃移动通信有限公司 It obtains the method for data, send method, host and the embedded memory of data
CN109922156A (en) * 2019-03-20 2019-06-21 深圳市网心科技有限公司 A kind of data communications method and its relevant device
CN110060162A (en) * 2019-03-29 2019-07-26 阿里巴巴集团控股有限公司 Data grant, querying method and device based on block chain
CN110008257A (en) * 2019-04-10 2019-07-12 深圳市腾讯计算机系统有限公司 Data processing method, device, system, computer equipment and storage medium
CN110245185A (en) * 2019-05-21 2019-09-17 平安普惠企业管理有限公司 Data processing method, terminal device and computer storage medium based on alliance's chain
CN110246017A (en) * 2019-05-21 2019-09-17 平安普惠企业管理有限公司 Data capture method, terminal device and computer storage medium based on alliance's chain
CN110502507A (en) * 2019-08-29 2019-11-26 上海达梦数据库有限公司 A kind of management system of distributed data base, method, equipment and storage medium
CN111125447A (en) * 2019-12-22 2020-05-08 北京浪潮数据技术有限公司 Metadata access method, device and equipment and readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
秦东明;喻剑;张波;赵勤;: "基于分布式无共享架构的海量数据并行查询平台", 计算机科学, no. 04, 15 April 2019 (2019-04-15) *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112231405A (en) * 2020-10-19 2021-01-15 浙江大华技术股份有限公司 Data storage device
CN112612793A (en) * 2020-12-25 2021-04-06 恒生电子股份有限公司 Resource query method, device, node equipment and storage medium
CN112612793B (en) * 2020-12-25 2022-11-15 恒生电子股份有限公司 Resource query method, device, node equipment and storage medium
CN116226137A (en) * 2023-05-06 2023-06-06 山东浪潮科学研究院有限公司 Data storage method, device, equipment and storage medium
CN116226137B (en) * 2023-05-06 2023-07-21 山东浪潮科学研究院有限公司 Data storage method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
US8793227B2 (en) Storage system for eliminating duplicated data
US8423733B1 (en) Single-copy implicit sharing among clones
US20160110292A1 (en) Efficient key collision handling
JP2019519025A (en) Division and movement of ranges in distributed systems
CN106484820B (en) Renaming method, access method and device
CN111078147A (en) Processing method, device and equipment for cache data and storage medium
CN111737564A (en) Information query method, device, equipment and medium
US20200042609A1 (en) Methods and systems for searching directory access groups
US10515055B2 (en) Mapping logical identifiers using multiple identifier spaces
US11221999B2 (en) Database key compression
CN109388651B (en) Data processing method and device
US10956125B2 (en) Data shuffling with hierarchical tuple spaces
CN113434501A (en) Storage method and device of relational database table and readable storage medium
CN109947667B (en) Data access prediction method and device
CN111352938B (en) Data processing method, computer device and storage medium
CN106934066A (en) A kind of metadata processing method, device and storage device
WO2022206170A1 (en) Data processing method, server and system
CN111651424B (en) Data processing method, device, data node and storage medium
CN111651424A (en) Data processing method and device, data node and storage medium
CN109857719B (en) Distributed file processing method, device, computer equipment and storage medium
US10114864B1 (en) List element query support and processing
US11093169B1 (en) Lockless metadata binary tree access
US6625614B1 (en) Implementation for efficient access of extended attribute data
US10891274B2 (en) Data shuffling with hierarchical tuple spaces
CN107679093B (en) Data query method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination