Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The data storage method provided by the application can be applied to the application environment shown in fig. 1. The data storage method is suitable for a data storage system. The data storage system includes a first terminal 102, a second terminal 104, a service agent 106, and a server 108. Wherein the first terminal 102 communicates with the service agent 106 via a network. The second terminal 104 communicates with the service agent 106 over a network. The service agent 106 communicates with each server 108 over a network. The first terminal 102 and the second terminal 104 may be, but are not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, respectively. The service proxy 106 may be a proxy server. The service broker 106 may be implemented as a stand-alone server or as a server cluster of multiple servers. Server 108 is a server cluster comprised of a plurality of servers. The server 108 provides a distributed storage service for the service client, and organizes the data structure of the file on the disk or partition of the first terminal 102. Each server 108 may support a storage type, such as NAS, TFS (Taobao File System), glusterFS, ceph, HDFS, and the like. Multiple servers 104 may be deployed in the same room.
When data storage is needed, the user may send an operation request for the target data to the service agent 106 through the first terminal 102. The service agent 106 determines the attribute characteristics corresponding to the operation request through a preset general data operation interface. The developer configures attribute features corresponding to a plurality of operation requests and a storage route corresponding to each attribute feature in the service agent 106 in advance through the second terminal 104. The service agent 106 queries the storage route corresponding to the attribute feature, invokes a processing component adapted to the storage route to perform data conversion on the operation request, and sends the operation request obtained through conversion to the server 108 to which the storage route points. The server 108 performs a storage operation on the target data according to the storage route, and returns an operation result on the target data to the first terminal 102 through the service agent 106. In the data storage process, the processing components of the access logics of different distributed storage services are encapsulated by providing a uniform universal data operation interface, so that each service client can access different distributed storage services in a uniform method and shield specific distributed storage service access details, the modification cost of the service client can be reduced, and the data storage efficiency is improved.
In one embodiment, as shown in fig. 2, a data storage method is provided, which is described by taking the application of the method to the service agent in fig. 1 as an example, and includes the following steps:
step 202, intercepting an operation request for target data sent by the first terminal.
Referring to FIG. 3, FIG. 3 is a block diagram illustrating an architecture of a data storage system according to an embodiment. As shown in fig. 3, at least one service client is running on the first terminal 102. Based on the service client, the service functions of various service types can be realized. The service clients running on different first terminals 102 may be the same or different. For example, the service client 1 may be a loan Web application for implementing service types such as loan application, loan audit, contract signing, and the like. The service client 2 may be an insurance APP (application) for implementing service types such as insurance application, underwriting, billing, and claims.
When a certain type of service needs to be performed, a user may initiate an operation request for target data through a service client on the first terminal 102 to implement a certain service function. The target data may be described in a file. The business client integrates a component for intercepting operation requests and sending the intercepted operation requests to the service proxy 106, and the component can be a Jar file package implemented based on Java language.
And step 204, determining attribute characteristics corresponding to the operation request through a preset general data operation interface.
The service agent 106 integrates a generic data manipulation interface. The generic data operation interface is used for receiving an operation request sent by the first terminal 102. The data operation interface can be a script file written by semantic methods such as read, write, ls, rm, isoist and the like based on file operation. The decentralized business clients may send operation requests of different operation types based on different business logic. The general file operation interface encapsulates general file operation logics corresponding to each operation type, so that the operation request can be subjected to standardized processing, and attribute characteristics corresponding to the operation types can be obtained. The attribute characteristics comprise at least one item of service type, operation request type, target data identification or data operation type and the like corresponding to the operation request.
By calling a standard general data operation interface, the business client can call various types of distributed storage services, namely remote service calling, only by integrating the Jar package, and the modification cost for upgrading and replacing the storage types of the business client is reduced.
And step 206, inquiring the storage route of the target data according to the attribute characteristics.
In one embodiment, before querying the storage route corresponding to the attribute feature, the method further includes: acquiring service deployment information of a plurality of servers based on a routing configuration request sent by a second terminal; sending a routing configuration page generated according to the service deployment information to the second terminal; the route configuration page shows a plurality of server identifications and a plurality of service types; and storing the incidence relation between each service type and the server identifier returned by the second terminal based on the route configuration page as a storage route.
And the developer performs service type isolation configuration on each server in advance through the second terminal. Specifically, attribute information of each server is predefined, such as server identification, information of a machine room to which the server belongs, running state information and the like. The server identifier may be information such as an IP address that can uniquely identify a server. Server 108 is a server cluster comprised of a plurality of servers, including location nodes and a plurality of storage nodes. The operating state information includes at least remaining storage space information. And the server identification of each server is associated with the corresponding service type in advance. One service type may be performed by multiple servers, each of which may also implement multiple service types. Different service types are distinguished, different servers are defined for each service type, storage isolation of different levels and types of services is achieved, and overall safety and expansibility can be improved.
In one embodiment, the storage reason includes a base route and a dynamic route; the basic route records a plurality of service types and server identifications respectively associated with each service type; the dynamic routing records different attribute feature combinations of each service type and server identifications associated with each attribute feature combination; the attribute feature combination includes at least one of a target data identification or a data operation type.
The developer corresponding to the service client may pre-configure the attribute features of the multiple operation requests and the storage route corresponding to each attribute feature through the second terminal 106, and store the correspondence between the attribute features and the storage routes to the service agent 106. The service agent stores the configuration information into a routing table. The routing table is used as a basis for the service agent to carry out shunt forwarding on the operation request, and records the corresponding relation between various attribute feature combinations and the stored route.
The stored routes include base routes and dynamic routes. The basic route comprises the corresponding relation between the service type and the server identification. The dynamic route records the corresponding relation between the attribute feature combination and the server identification. The attribute feature combination may include a target data identifier or a data operation type, and may further include an identifier of a machine room to which the attribute feature combination belongs, which is not limited herein. The target data identifier is identification information capable of uniquely identifying data specifically desired to be operated by the user, and specifically may be an identifier of a file in which the target data is recorded, or the like. The data operation types include read operations, write operations, save operations, and the like. From the underlying route it can be determined to which servers (denoted target storage servers) the operation request is sent. The target storage server to which the operation request is sent can be further precisely determined according to the dynamic routing.
In one embodiment, the storage route may further include a storage path composed of multiple levels of file directories, such as "root directory \ first level subdirectory \ second level subdirectory", without limitation.
The storage route is customized through a script language, a storage path, a machine room, a storage type and the like can be flexibly defined, and hot effect is supported. The data read-write separation and the fragment routing are really realized through the dynamic routing function, and the overall performance, the expansibility and the reliability are improved.
And step 208, invoking the processing component adaptive to the storage route to send the operation request to the server pointed by the storage route, so that the server stores the target data according to the storage route.
A developer defines a storage type corresponding to each server in advance, creates a processing component (Processor) of each storage type through a factory model, and integrally deploys the processing component in the service agent. Different processing components encapsulate implementation logic for different storage type servers to provide distributed storage services to corresponding business clients. In other words, different processing components provide different distributed storage services. The processing component is used for performing data conversion on the operation request. The data conversion refers to converting the request type of the operation request into a request type which can be responded by the storage route to the server. For example, a traditional service client a can only send an operation request to a server corresponding to the storage type a, and the service agent in this embodiment enables the service client a to send an operation request to servers of multiple storage types based on a generic data operation interface and a preset routing table. And when the operation request corresponding to the storage route passes through the server of the storage type B, converting the operation request sent by the service client A into the operation request corresponding to the server of the storage type B by using the processing component corresponding to the storage route.
The service agent also records the storage type corresponding to each server and the processing component identification applicable to each storage type in the routing table. The service agent deploys different processing components aiming at different storage types, so that the server can respond to operation requests from different service clients without any modification, the service clients can be transparently upgraded and replaced and migrated under the condition of realizing the specific physical realization of shielding data operation of the service clients, and the modification cost of the existing distributed storage service is reduced.
And the service agent sends the converted operation request to a server pointed by the storage route. And if the operation type of the operation request is a new operation, the server stores the target data and returns the data identifier to the first terminal through the service agent. And if the operation type of the operation request is a read operation, the server returns the target number corresponding to the data identifier to the first terminal through the service agent. And if the operation type of the operation request is write operation, the server stores the written target data and returns a prompt of successful writing to the first terminal through the service agent.
And step 210, sending the storage address of the target data returned by the server to the first terminal.
And the universal data operation interface feeds the target data or the data identification back to a corresponding service client in the first terminal by adopting a communication protocol such as a universal Http protocol. It is easy to understand that, in order to improve the data transmission efficiency, the target data may also be compressed by the generic data operation interface through the gzip standard and then fed back to the service client. In another embodiment, since the target data needs to be forwarded by the service agent, a plurality of service agents may be deployed in a distributed manner in order to improve data security and transmission efficiency.
According to the data storage method, the attribute characteristics corresponding to the operation request for the target data sent by the first terminal can be determined through the preset general data operation interface; according to the attribute characteristics, the storage route of the target data can be inquired; the operation request can be subjected to data conversion based on the processing component adaptive to the storage route; the operation request obtained by conversion is sent to a server pointed by the storage route, so that the server can perform storage operation on the target data according to the storage route; and sending the operation result of the target data returned by the server to the first terminal, so that the high-efficiency storage of the data can be completed. By providing a uniform universal data operation interface and a uniform processing component, each service client accesses different distributed storage services in a uniform method and shields specific distributed storage service access details, so that the modification cost of the service client can be reduced, and the data storage efficiency is improved.
In one embodiment, the data storage method further includes: when an operation request of a preset type is intercepted, inquiring a storage route corresponding to the operation request; and asynchronously copying the target data pointed by the operation request from the current server to other servers of the same storage type according to the storage route.
The data storage system further comprises a message queue which is deployed among different servers according to the data volume and the like. The message queue monitors the operation request sent by the service agent to the server. And when the operation request is monitored to be a target operation request, the message queue generates a corresponding data replication task and sends a routing query request to a corresponding server. And the server returns the storage route corresponding to the operation request obtained by query to the message queue. The message queue asynchronously copies the target data pointed by the operation request from the current server to other servers of the same storage type according to the storage route.
In the embodiment, asynchronous data copying among different servers is realized based on the message queue, so that the method is high in timeliness, can realize second-level quasi-real-time copying operation, is high in reliability, and realizes final consistency of multiple backups through the persistence and the reliability of the message queue. Meanwhile, the scheme directly solves the problem of backup across machine rooms, and the same replication backup strategy can be adopted for remote machine room storage.
In one embodiment, the target data before replication is recorded as the source data; the method further comprises the following steps: in the data copying process, extracting data identifications in source data and data identifications in target data copied to other servers; carrying out hash operation on the data identification to obtain a hash result; grouping the hash results according to the number of preset target threads; determining a target thread corresponding to each group; and calling a plurality of target threads to perform concurrent comparison on the source data and the target data corresponding to the hash result in the corresponding group.
Data storage is required in a variety of scenarios, such as a product sales scenario. The business client for product sale comprises a plurality of functional modules, such as a product module, an order module, a bill module, a user management module and the like. The different functional modules can also be regarded as an independent service client. Each functional module in the product sale client based on the microservice is provided with a corresponding distributed storage server. For example, the product module stores product data corresponding to each of a plurality of products in a server a (referred to as a product server). Order module order data corresponding to a plurality of users are stored in a server B (referred to as an order server).
Data comparison is also needed in various scenarios, for example, in the above example, when a user purchases a certain product, a corresponding data record is added to the product server. The data record may include the order number, and the product data and user data corresponding to the order, etc. And the product sale application synchronizes the newly added data record of the product server to the order server so as to add corresponding order data in the order server. At this time, the data in the product library and the order library need to be compared.
When data alignment is needed, the user can send a task configuration request to the service agent through the second terminal. And returning a task configuration page to the second terminal by the service agent according to the task configuration request. The task configuration page comprises configuration items corresponding to the plurality of task steps respectively. The task steps comprise sliding Window (Slide Window) generation, source data capture, target data capture, data comparison, comparison result storage and the like. The configuration items corresponding to the sliding window generation step comprise the size of the sliding window, the number of target threads and the like. And the corresponding comparison task is automatically operated by taking the size of the sliding window as the granularity, so that the timeliness and the continuity of data comparison can be ensured. The target thread data refers to the number of threads expected to be allocated to execute the comparison task. The number of target threads determines the concurrency of the comparison tasks. The configuration items corresponding to the source data capturing step comprise source data capturing logic, such as a source database connection identifier, a source data table name, a data structure conversion rule and the like. Similarly, the configuration items corresponding to the target data capture step include target data capture logic. The configuration items corresponding to the data comparison step comprise fields to be compared and the like.
And in the current sliding window, the service agent sends a connection request to the source database according to the source data connection identifier recorded by the task configuration information, and pulls the source data generated in a preset time period from the source database with the established connection. And the service agent sends a connection request to the target database according to the target data connection identifier recorded by the task configuration information, and pulls target data generated in a preset time period from the target database with the established connection. The preset time interval is a time slice corresponding to the current sliding window, and the time length of the preset time interval is the size of the sliding window.
The service agent extracts the data identification in the source data and the target data. The data identification is information capable of uniquely identifying a piece of source data or target data. For example, in the above example, the data identifier may be an order number, a transaction serial number, or the like, or may be identifying information obtained by splicing the contents of other fields based on the order number or the transaction serial number, or the like.
And the server performs hash calculation on the data identification through a hash algorithm to obtain a hash result corresponding to each data identification. The data identifier such as the transaction serial number may contain information such as letters, and has no numbering function. The hash result obtained by hashing the data identifier may be regarded as a data number, such as the source data 1, the target data 2, and the like. The hash result may be understood as a hash value that can uniquely distinguish each piece of source data from the target data.
And the server sends the hash result and the associated source data and target data to the server in a Key Value pair Key-Value form, wherein the Key Value is the hash result corresponding to the data identifier, and the Value is the source data and the target data corresponding to the corresponding data identifier. The Value values may be stored in a linked List (List) form in the current database, including the source data linked List and the destination data linked List.
And the service agent groups the pulled source data and the pulled target data according to the target thread number recorded by the task configuration information, namely, the source data and the target data are divided into groups with the target thread number. For example, if the number of target threads is 5, the source data and the target data are divided into 5 packets. The server assigns a target thread to each packet.
In one embodiment, grouping the hash results according to a preset number of target threads comprises: determining the thread number of each target thread; carrying out division operation on the hash result and the number of the target threads to obtain a remainder corresponding to the hash result; grouping source data and target data having the same remainder; determining the target thread corresponding to each packet comprises: and distributing each group of source data and target data to be compared to a target thread taking the corresponding remainder as a thread number.
And the service agent calls a corresponding number of target threads according to the target thread data recorded by the task configuration information, and numbers each target thread according to a preset rule to obtain a thread number corresponding to each target thread. The thread number may be a preset initial value and a result value of performing an adding count operation on the preset initial value. For example, if the preset initial value is 1, the thread numbers of the above-mentioned exemplary 5 target threads may be 1, 2, 3, 4, and 5 in sequence.
And the service agent performs division operation on each hash result and the number of the target threads to obtain a remainder corresponding to each hash result. The server divides the source data and the target data corresponding to the hash results with the same remainder into a group to obtain a plurality of groups. The remainder may be the packet number of the corresponding packet. It is easy to understand that this grouping scheme can ensure that at least one source data and at least one target data corresponding to the same data identifier are necessarily grouped into the same group. For example, if the number of the target thread data is 10, a plurality of groups can be obtained by dividing, where the group 1 includes data identifiers such as data ID1, data ID11, and data ID21, and the remainder of 10 is 1; packet 2 includes data having a data identification of data ID2, data ID12, data ID22, etc. and a remainder of 10 being 2, and so on, thereby dividing data ID1 to ID10000 into 10 packets.
And the service agent allocates the source data and the target data to be compared in each group to a target thread taking the corresponding remainder as a thread number, and stores the thread number of the allocated target thread and the corresponding hash result together as a Key value in a redis database. For example, in the above example, the target thread with the thread number "1" may be called to compare the packet endogenous data with the packet target data with the thread number "1". Different target threads are isolated during data comparison, grouping is carried out according to the data identification, all source data and target data corresponding to the same service can be guaranteed to be distributed to the same target thread to be executed, and then data comparison accuracy can be guaranteed.
And the service agent calls a plurality of target threads to perform concurrent comparison on the source data and the target data corresponding to the hash result in the corresponding packet. After dividing a large amount of source data and target data into a plurality of groups and distributing a corresponding target thread for each group, the server calls the plurality of target threads to perform concurrent comparison on the source data and the target data in the corresponding groups.
In this embodiment, grouping according to the data identifier can ensure that all source data and target data corresponding to the same service are allocated to the same target thread for execution, so as to avoid a phenomenon that the source data and the target data corresponding to the same data identifier are misjudged to be inconsistent because the source data and the target data are allocated to different target threads for isolation comparison, and ensure data comparison accuracy. The tasks are grouped according to the data identification, and the concurrent comparison is realized, so that the data comparison efficiency can be improved under the condition of ensuring the accuracy of the data comparison result.
It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 4, there is provided a data storage device comprising: an operation request module 402, a request processing module 404, and a data storage module 406, wherein:
an operation request module 402, configured to intercept an operation request for target data sent by a first terminal;
a request processing module 404, configured to determine, through a preset general data operation interface, an attribute feature corresponding to the operation request; inquiring the storage route of the target data according to the attribute characteristics;
the data storage module 406 is configured to invoke a processing component adapted to the storage route to send the operation request to the server to which the storage route points, so that the server stores the target data according to the storage route; and sending the storage address of the target data returned by the server to the first terminal.
In an embodiment, the data storage device further includes a route configuration module 408, configured to obtain service deployment information of the plurality of servers based on a route configuration request sent by the second terminal; sending a routing configuration page generated according to the service deployment information to the second terminal; the route configuration page shows a plurality of server identifications and a plurality of service types; and storing the incidence relation between each service type and the server identifier returned by the second terminal based on the route configuration page as a storage route.
In one embodiment, the storage reason includes a base route and a dynamic route; the basic route records a plurality of service types and server identifications respectively associated with each service type; the dynamic routing records different attribute feature combinations of each service type and server identifications associated with each attribute feature combination; the attribute feature combination includes at least one of a target data identification or a data operation type.
In one embodiment, the data storage apparatus includes a data replication module 410, configured to, when an operation request of a preset type is intercepted, query a storage route corresponding to the operation request; and asynchronously copying the target data pointed by the operation request from the current server to other servers of the same storage type according to the storage route.
In one embodiment, the target data before replication is recorded as source data; the data storage device further includes a data comparison module 412, configured to extract data identifiers in the source data and data identifiers in the target data copied to other servers in the data copying process; carrying out hash operation on the data identification to obtain a hash result; grouping the hash results according to the number of preset target threads; determining a target thread corresponding to each group; and calling a plurality of target threads to perform concurrent comparison on the source data and the target data corresponding to the hash result in the corresponding group.
In one embodiment, the data comparison module 412 is further configured to determine a thread number of each target thread; carrying out division operation on the hash result and the number of the target threads to obtain a remainder corresponding to the hash result; grouping source data and target data having the same remainder; determining the target thread corresponding to each packet comprises: and distributing each group of source data and target data to be compared to a target thread with the corresponding remainder as a thread number.
For specific limitations of the data storage device, reference may be made to the above limitations of the data storage method, which are not described herein again. The various modules in the data storage device described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and the internal structure thereof may be as shown in fig. 5. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operating system and the computer program to run on the non-volatile storage medium. The database of the computer device is used to store a routing table. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data storage method.
Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
A computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the data storage method as provided in any one of the embodiments of the application.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.