CN115599790B - Data storage system, data processing method, electronic equipment and storage medium - Google Patents

Data storage system, data processing method, electronic equipment and storage medium Download PDF

Info

Publication number
CN115599790B
CN115599790B CN202211407850.4A CN202211407850A CN115599790B CN 115599790 B CN115599790 B CN 115599790B CN 202211407850 A CN202211407850 A CN 202211407850A CN 115599790 B CN115599790 B CN 115599790B
Authority
CN
China
Prior art keywords
request
data processing
data
storage module
column
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211407850.4A
Other languages
Chinese (zh)
Other versions
CN115599790A (en
Inventor
刘汪根
谢玉波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Transwarp Technology Shanghai Co Ltd
Original Assignee
Transwarp Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Transwarp Technology Shanghai Co Ltd filed Critical Transwarp Technology Shanghai Co Ltd
Priority to CN202211407850.4A priority Critical patent/CN115599790B/en
Publication of CN115599790A publication Critical patent/CN115599790A/en
Application granted granted Critical
Publication of CN115599790B publication Critical patent/CN115599790B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/221Column-oriented storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/273Asynchronous replication or reconciliation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data storage system, a data processing method, electronic equipment and a storage medium. Wherein, this data storage system includes: the request scheduling service module, the row storage module and the column storage module request scheduling service module acquire a data processing request of a client and distribute the data processing request to at least one of the row storage module and the column storage module according to attribute information of the data processing request; the row storage module comprises a row data processing engine, the column storage module comprises a column data processing engine, and the row storage module and the column storage module execute data processing corresponding to the data processing request distributed by the request scheduling service module. According to the embodiment of the invention, the scheduling service module distributes the attribute information of the data processing requests to the row storage module and the column storage module, and processes different data processing requests based on different processing engines, so that the processing efficiency of different types of data processing requests is realized, and the use experience of a user is enhanced.

Description

Data storage system, data processing method, electronic equipment and storage medium
Technical Field
The present invention relates to the field of data storage technologies, and in particular, to a data storage system, a data processing method, an electronic device, and a storage medium.
Background
In database systems, data storage and access efficiency are important factors that affect the performance of the database system. The hybrid transaction and analytics processing (Hybrid Transaction and Analytical Process, HTAP) database is a distributed database that supports both online transaction processing (Online Trans action Processing, OLTP) and online analytics processing (Online Analytical Proces sing, OLAP) types of traffic, and the HTAP system is capable of analyzing data generated from the transaction processing in real-time and has the advantages described above for both OLTP and OLAP systems.
At present, OLT P and OLAP systems are connected together mainly by an Extract-Transform-Load (ETL) technology, and an OLTP system may import data into an OLAP system according to the ETL process. Because the data of the OLTP system and the OLAP system are in a completely asynchronous mode, the processing time is long, and the consistency cannot be ensured. Therefore, a data storage system capable of ensuring efficient processing of data is a currently urgent problem to be solved.
Disclosure of Invention
The invention provides a data storage system, a data processing method, electronic equipment and a storage medium, so that the database can efficiently process different types of data, and the use experience of a user is improved.
According to an aspect of the present invention, there is provided a data storage system, wherein the data storage system comprises:
a request dispatch service module, a row storage module and a column storage module;
the request scheduling service module acquires a data processing request of the client and distributes the data processing request to at least one of the row storage module and the column storage module according to attribute information of the data processing request;
the row storage module comprises a row data processing engine, the column storage module comprises a column data processing engine, and the row storage module and the column storage module execute data processing corresponding to the data processing request distributed by the request scheduling service module.
According to another aspect of the present invention, there is provided a data processing method, applied to a data storage system, the method comprising:
acquiring a data processing request of a client;
distributing a data processing request to at least one of a row storage module and a column storage module according to the attribute information of the data processing request by the request scheduling service module;
and executing data processing corresponding to the data processing request distributed by the request scheduling service module based on at least one of the row storage module and the column storage module.
According to another aspect of the present invention, there is provided a data processing apparatus, for use in a data storage system, the apparatus comprising:
The request acquisition module is used for acquiring a data processing request of the client;
the request distribution module is used for distributing the data processing request to at least one of the row storage module and the column storage module according to the attribute information of the data processing request by the request scheduling service module;
and the data processing module is used for executing data processing corresponding to the data processing request distributed by the request scheduling service module based on at least one of the row storage module and the column storage module.
According to another aspect of the present invention, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform a data processing method of any one of the embodiments of the present invention.
According to another aspect of the present invention there is provided a computer readable storage medium storing computer instructions for causing a processor to execute a data processing method embodying any one of the embodiments of the present invention.
According to the technical scheme, the data processing request of the client is acquired through the request scheduling service module, and is distributed to the row storage module or the column storage module according to the attribute information of the data processing request, and the row storage module or the column storage module executes the data processing corresponding to the data processing request distributed by the request scheduling service module, so that the data processing request is efficiently processed, the database processing performance is improved, and the use experience of a user is enhanced.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a data storage system according to a first embodiment of the present invention;
FIG. 2 is a schematic diagram of a data storage system according to a first embodiment of the present invention;
FIG. 3 is a schematic diagram of a data storage system according to a second embodiment of the present invention;
FIG. 4 is a flow chart of a data processing method according to a third embodiment of the present invention;
FIG. 5 is a flow chart of a data processing method according to a fourth embodiment of the present invention;
FIG. 6 is a flow chart of a data processing method according to a fifth embodiment of the present invention;
FIG. 7 is a flow chart of a data processing method according to a fifth embodiment of the present invention;
FIG. 8 is a flow chart of a data processing method according to a fifth embodiment of the present invention;
FIG. 9 is a schematic diagram of a data processing apparatus according to a sixth embodiment of the present invention;
fig. 10 is a schematic structural diagram of an electronic device implementing a data processing method according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
Fig. 1 is a schematic structural diagram of a data storage system according to a first embodiment of the present invention, where the embodiment is applicable to a case of storing data in a database, as shown in fig. 1, the system includes: a request dispatch service module 10, a row storage module 20, a column storage module 30.
The request scheduling service module 10 acquires a data processing request of a client and distributes the data processing request to at least one of the row storage module 20 and the column storage module 30 according to attribute information of the data processing request.
The row storage module 20 includes a row data processing engine, and the column storage module 30 includes a column data processing engine, and the row storage module 20 and the column storage module 30 perform data processing corresponding to the data processing request allocated by the request scheduling service module 10.
In the embodiment of the present invention, the request scheduling service module 10 may allocate a data processing request according to attribute information of the data processing request, where the attribute information of the data processing request may include a service type, a request type, and the like, the service type may include a transaction service, an analysis service, and the request type may include a data modification request, a data analysis request, and the like. The request scheduling service module 10 may allocate the data processing request to the row storage module 20 or the column storage module 30 according to the attribute information of the data processing request. The line storage module 20 may receive the data processing request allocated by the request scheduling service module 10, and the line storage module 20 includes a line data processing engine, which may be an engine that performs data processing on a line basis, and may parse and execute data processing corresponding to the data processing request allocated by the request scheduling service module 10. The column storage module 30 includes a column data processing engine, which may be an engine that performs data processing on a column basis, and may parse and execute data processing corresponding to the data processing request allocated by the request dispatch service module 10.
In one embodiment, the number of line memory modules 20 includes at least one, each line memory module 20 corresponding to a service type and a request type within the attribute information of the data processing request.
In an embodiment, the number of the line storage modules 20 may be plural, and the data processing requests corresponding to different line storage modules 20 may be different, and each line storage module 20 may correspond to a service type and a request type in attribute information of the data processing request. The attribute information may be stored in the data processing request, the attribute information may be extracted by parsing the data processing request, and the attribute information may include a service type and a request type. In an embodiment, each row of storage modules 20 may be configured with a corresponding relationship between a service type and a request type, and the same identifier may be configured to implement the correspondence between the storage modules 20 and attribute information.
In one embodiment, the rows of memory modules 20 are backed up with respect to each other.
In an embodiment, since the number of the row storage modules 20 may be plural, each row storage module 20 may be backed up. The logs may be synchronized to other row storage modules 20 to enable mutual backups between the row storage modules 20.
FIG. 2 is a schematic diagram of a data storage system according to a first embodiment of the present invention, and as shown in FIG. 2, the system further includes a consistency management module 40.
The consistency management module 40 is used to synchronize the stored data of the row storage modules 20 and the column storage modules 30.
In an embodiment of the present invention, after the line storage module 20 receives a data processing request including data modification information, the line storage module 20 may perform data processing corresponding to the data processing request. Other row memory modules 20 and column memory modules 30 may store data synchronously. The synchronization means may include synchronizing the log information via a consistency protocol, and the stored data may be synchronized to other row storage modules 20 or column storage modules 30 based on the log information.
In the embodiment of the invention, the request scheduling service module is used for acquiring the data processing request of the client, the data processing request is distributed to the row storage module or the column storage module according to the attribute information of the data processing request, the row storage module or the column storage module is used for executing the data processing corresponding to the data processing request distributed by the request scheduling service module, and the request scheduling service module is used for realizing that the data processing requests of different types are respectively distributed to the row storage module and the column storage module, so that the processing performance is improved, and the use experience of a user is enhanced.
Example two
Fig. 3 is a schematic diagram of a data storage system according to a second embodiment of the present invention, in which the database cluster query scheduling service module 11 is taken as the request scheduling service module 10, and OLTP requests and OLAP requests are taken as data processing requests for further description of the data storage system. As shown in fig. 3, the system includes: the database cluster queries the dispatch service module 11, the row store module 20, the column store module 30, and the consistency management module 40.
The database cluster query and scheduling service module 11 may obtain an OLTP request and an OLAP request of a client, and allocate the OLTP request to a row storage module and the OLAP request to a column storage module.
The row storage module 20 includes a row optimizer, a row executor, and a row memory, and the column storage module 30 includes a column optimizer, a column executor, and a column memory, and the row storage module 20 and the column storage module 30 execute data processing corresponding to the OLTP request and the OLAP request allocated by the database cluster query scheduling service module 11.
The consistency management module 40 is used for synchronizing the storage data stored in the row storage module and the storage data stored in the column storage module.
Example III
Fig. 4 is a flowchart of a data processing method according to a third embodiment of the present invention, where the present embodiment is applicable to a case of processing data in a database, and the method may be performed by a data processing apparatus, and the data processing apparatus may be implemented in a form of hardware and/or software. As shown in fig. 4, the method includes:
s110, acquiring a data processing request of the client.
The client can send a data processing request to be processed by the scheduling service query module, so that the service of modifying data and querying data is realized. The data processing request sent by the client may refer to a request to be processed, and the data processing request may include various types, and may include, for example, a data modification request, a data analysis request, and the like.
In the embodiment of the invention, the client can send the data processing request to the scheduling service query module, and the scheduling service query module can receive the data processing request from the client. The data processing request sent by the client may include a data modification request, a data analysis request, etc., and different data processing requests may be sent to the scheduling service query module according to different requirements of the client. For example, when the client needs to modify the data in the database, a data modification request may be sent to the scheduling service query module; when the client needs to query the data in the database, a data analysis request can be sent to the scheduling service query module.
S120, distributing the data processing request to at least one of the row storage module and the column storage module according to the attribute information of the data processing request by the request scheduling service module.
The attribute information of the data processing request may be data information carried by the data processing request, and the data processing request may have information such as a service type, a request type, and the like.
In the embodiment of the invention, the request scheduling service module can distribute the data processing request to the row storage module and/or the column storage module to execute the corresponding request according to the attribute information of the data processing request. The attribute information may include a service type, a request type, and the like. For example, the data processing request may be allocated in advance by the service type, and then allocated according to the request type; or the data processing request can be allocated in advance through the request type, and then the data processing request is allocated according to the service type; or may allocate data processing requests by both traffic type and request type. In an embodiment, when a data processing request is allocated according to a request type, when the request type carried by the data processing request is a data modification request, the scheduling service module may send the data processing request to the line storage module; when the request type carried by the data processing request is a data analysis request, the scheduling service module may send the data processing request to the row storage module or the column storage module. In an embodiment, when the request type is a data analysis request, the data processing request may be allocated to the row storage module or the column storage module according to the service type, and when the service type is a transaction service, the data processing request may be allocated to the row storage module; when the traffic type is an analytic type processing traffic, the data processing request may be assigned to a column storage module.
S130, executing data processing corresponding to the data processing request distributed by the request scheduling service module based on at least one of the row storage module and the column storage module.
In the embodiment of the invention, after the request scheduling service module distributes the data processing request to the row storage module or the column storage module, the row storage module or the column storage module can execute the data processing corresponding to the data processing request. The row storage module or the column storage module can analyze the data processing request, read the content of the data processing request and process the data according to the content of the data processing request. In one embodiment, the row storage module may perform modified data processing as well as analytical data processing and the column storage module may perform analytical data processing.
According to the embodiment of the invention, the data processing request of the client is acquired through the request scheduling service module, the data processing request is distributed to the row storage module or the column storage module according to the attribute information of the data processing request, and the row storage module or the column storage module executes the data processing corresponding to the data processing request distributed by the request scheduling service module. And the scheduling service module distributes the attribute information of the data processing request to the row storage module and the column storage module to realize that different query processing engines process different service demands, so that the data processing performance is improved, and the use experience of a user is enhanced.
Example IV
Fig. 5 is a flowchart of a data processing method according to a fourth embodiment of the present invention, and the present embodiment is further refinement of the data processing method based on the above embodiment. As shown in fig. 5, the method includes:
s2010, acquiring a data processing request of the client.
S2020, searching the route information stored in association with the service type and the request type of the data processing request in the request scheduling service module.
The service type can be classified according to different types of transactions to be processed, and the service type can comprise transaction type processing services and analysis type processing services; the request type may be a type of data processing request, and may include a data modification request, a data analysis request, and the like. The routing information may refer to route information that the information packet sends to the destination address.
In the embodiment of the invention, the request scheduling service module can store a plurality of routing information, and the routing information stored in association with the request scheduling service module can be searched in the request scheduling service module according to the service type and the request type of the data processing request. In an actual operation process, the scheduling service module may store various routing information, which may be stored in a table form or in an information base form. The routing information corresponding to the corresponding field of the service type of the data processing request can be extracted, and then the routing information corresponding to the corresponding field of the request type is extracted, so that the routing information stored in association with the service type and the request type of the data processing request is searched in the request scheduling service module; or, the service type of the data processing request and the routing information of the corresponding field of the request type can be extracted at the same time, so that the routing information is searched in the request scheduling service module.
S2030, in the case where the request type is a data modification request, sends a data processing request to the line storage module according to the routing information based on the request scheduling service module.
The line storage module may refer to a module that stores data according to a line format, where data in a line exists in a storage medium in a continuous storage format according to a line data-based logic storage unit.
In the embodiment of the invention, the routing information corresponding to different request types in the scheduling service module is different, and when the request types are data modification requests, the destination address in the routing information can be a row storage module. When the request type is a data modification request, the routing information stored in the request scheduling service module can be queried, and the data processing request can be sent to the row storage module according to the routing information.
S2040, in the case that the request type is a simple data analysis request, the service module sends a data processing request to the row storage module according to the routing information based on the request.
Wherein the simple data analysis request may be a request for data analysis based on a simple query statement. The row storage module may be a module based on a database stored in rows, the data being stored according to a row basis logical storage unit, the data in a row being stored in a continuous form in the storage medium.
In the embodiment of the invention, the scheduling service module can analyze the types of simple data analysis requests, and the data analysis requests can comprise the simple data analysis requests and the complex data analysis requests. When the request type is a simple data analysis request, the scheduling service module can distribute the request to the row storage module, and can query corresponding routing information stored by the request scheduling service module according to the request type and send a data processing request to the column storage module according to the routing information.
S2050, in the case that the request type is a complex data analysis request, sending a data processing request to the column storage module according to the routing information based on the request scheduling service module.
Wherein the complex data analysis request may be a request for data analysis based on a compound query statement. The column storage module may be a module based on a column-based stored database, with data in the column storage module being stored in column-based logical storage units.
In the embodiment of the invention, when the request type is a complex data analysis request, the destination address corresponding to the route information of the complex data analysis request stored in the scheduling service module can be queried, and the destination address can be a column storage module. In the case that the request type is a complex data analysis request, the data processing request is sent to the column storage module through the routing information stored by the request scheduling service module.
S2060, the line storage module parses the data modification command of the data processing request.
Wherein, the data modification command may refer to a statement for modifying data in the database, and the data modification command may include various types, and may include, but is not limited to, a data write command, a data modification command, a data delete command, and the like.
In an embodiment of the present invention, when the data processing request includes a data modification request, the line storage module may receive the data processing request from the request scheduling service module and may parse a data modification command of the data processing request. The line store module may read the data processing request and parse the data modification command therein. The data modification request may include one or more data modification commands, and the data modification commands may include at least one of a data write command, a data modification command, and a data delete command.
S2070, executing a data modification command according to a data processing engine of the line storage module, wherein the data modification command at least comprises a data writing command, a data modification command and a data deleting command.
In an embodiment of the present invention, the data processing engine may execute a data modification command parsed according to the data processing request. The data modification command may include various data, may include a location of the modified data, modified content, a modification format, and the like, and the data processing engine may perform data modification according to the content stored in the data modification command. In one embodiment, the data modification command at least includes a data write command, a data modification command, and a data delete command. When the data modification command is a data writing command, writing the data into the corresponding position according to the corresponding format and content according to the data writing command; when the data modification command is a data modification command, the corresponding data can be rewritten on the corresponding modified data according to the data modification command so as to realize data modification; when the data modification command is a data deletion command, blank data can be overlaid on the corresponding deleted data according to the data deletion command to realize data deletion.
S2080, according to the consistency management module, synchronizing the log information of the data modification command to other row storage modules and/or column storage modules.
The log information may refer to an event record generated by the line storage module during operation, and each line of log may record a description of related operations such as date, time, user, and action. In an embodiment, the Log may include multiple types, which may include a redo Log (redo Log), a Write Ahead Log (WAL), a bin Log, and the like.
In the embodiment of the invention, after the line storage module executes the data modification command, the log information of the data modification command can be generated, the log information of the modification command can be synchronized to other line storage modules and/or column storage modules according to the consistency management module, and the other line storage modules and/or column storage modules can synchronously store information according to the log information. The other row storage modules can copy and execute log contents according to the data in the row storage format in the log, and the other column storage modules can convert the data in the row storage format into the data in the column storage format according to the data in the row storage format in the log and store Chu Lie storage format data.
S2090, feeds back the execution result of the data processing request to the client.
In the embodiment of the invention, after the other row storage modules and/or the column storage modules synchronously store data according to the log information, the execution result of the data processing request can be fed back to the client. When the data processing request is a data writing command, the successful writing execution result can be fed back to the client; when the data processing request is a data modification command, the successful execution result of the data modification can be fed back to the client; when the data processing request is a data deleting command, a successful deleting execution result can be fed back to the client. The path for feeding back the execution result to the user terminal can be the original path according to the route information, and the feedback execution result can be sent to the client terminal through the path information in the route information.
S2100, analyzing a data analysis command of a data processing request by a row storage module or a column storage module.
Wherein the data analysis command may include a statement to query the data in the database.
In an embodiment of the present invention, when the data processing request includes a data analysis request, the row storage module or the column storage module may receive the data analysis request from the request scheduling service module and may parse a data analysis command of the data processing request. Parsing the data processing request may include extracting data locations, data information, etc. stored in the data analysis command, and parsing the analysis command of the analysis data processing request.
S2110, accessing row storage data corresponding to the data analysis command by a row data processing engine of the row storage module or accessing column storage data corresponding to the data analysis command by a column data processing engine of the column storage module to generate a data analysis result.
In the embodiment of the invention, the data processing engine can be an engine for performing data processing on a row basis, the column data processing engine can be an engine for performing data processing on a column basis, both the data processing engine and the column data processing engine can analyze and execute data processing corresponding to the data processing request distributed by the request scheduling service module, and the data processing engine can operate according to the execution plan of the data analysis command to generate a data analysis result. The row data processing engine or the column data processing engine can search corresponding row storage data or column storage data through analyzing the data positions, the data information and the like stored in the obtained data analysis command, and then generate a data analysis result after searching the corresponding row storage data or column storage data.
S2120, feeding back a data analysis result of the data processing request to the client.
In the embodiment of the invention, after the row storage module or the column storage module generates the data analysis result, the execution result of the data analysis request can be fed back to the client. The data analysis result may be sent to the client through path information in the routing information, or alternatively, the path information may be re-established to send the data analysis result to the client.
According to the embodiment of the invention, the data processing request of the client is acquired, the data modification request is sent to the row storage module, the data analysis request is sent to the column storage module, the row storage module analyzes and executes the data modification command and the simple data analysis command, the column storage module analyzes and executes the complex data analysis command, and the execution results are fed back to the client respectively, so that the data processing efficiency is improved. And storing the data in a storage layer according to a line storage mode and a column storage mode through a consistency protocol, so as to ensure the consistency of the bottom layer data.
Example five
Fig. 6 is a flowchart of a data processing method according to a fifth embodiment of the present invention, where, based on the foregoing embodiment, a cluster query scheduling service is taken as a request scheduling service module, an OLTP engine is taken as a row data processing engine, an OLAP engine is taken as a column data processing engine, and a request type is taken as an example of a data modification request, so as to further refine the data processing method. As shown in fig. 6, the method includes:
and step 11, the client sends a data modification request to the cluster inquiry scheduling service.
And step 12, the cluster inquiry scheduling service is executed by a Leader OLTP engine which routes to the consistency group according to the service type.
And step 13, the OLTP engine analyzes and executes the data processing corresponding to the data modification request.
And 14, writing the data processing corresponding to the data modification request into a line memory engine.
Step 15, synchronizing the data processing to the other line memory engine and the column memory engine through the consistency protocol.
And step 16, after the other OLTP engines and the OLAP engine complete data processing synchronization, feeding back the execution result of successful writing to the OLTP engine.
And step 17, feeding back the successfully written execution result to the cluster query scheduling service after the OLTP engine receives the feedback of other OLTP engines and OLAP engines.
And step 18, after the cluster inquiry scheduling service receives the OLTP engine feedback, feeding back an execution result of successful writing to the client.
Fig. 7 is a flowchart of a data processing method according to a fifth embodiment of the present invention, where, based on the foregoing embodiment, a cluster query scheduling service is taken as a request scheduling service module, an OLTP engine is taken as a line data processing engine, and a request type is a data analysis request that is generally simple, for example, so as to further refine the data processing method. As shown in fig. 7, the method includes:
and step 21, the client sends a common data modification request to the cluster inquiry scheduling service.
Step 22, the cluster query scheduling service is routed to the OLTP engine for execution according to the traffic type.
Step 23, the OLTP engine analyzes and executes the data processing corresponding to the data analysis request.
Step 24, the OLTP engine accesses the line memory engine data of the current instance and queries the corresponding data.
And step 25, after the OLTP engine inquires the corresponding data, feeding back an inquiry result to the cluster inquiry scheduling service.
And step 26, after the cluster query scheduling service receives the OLTP engine feedback, feeding back a query result to the client.
Fig. 8 is a flowchart of a data processing method according to a fifth embodiment of the present invention, where, based on the foregoing embodiment, a cluster query scheduling service is taken as a request scheduling service module, an OLAP engine is taken as a column data processing engine, and a request type is a complex data analysis request as an example, so as to further refine the data processing method. As shown in fig. 8, the method includes:
step 31, the client sends a complex data modification request to the cluster inquiry scheduling service.
Step 32, the cluster query scheduling service is routed to the OLAP engine for execution according to the traffic type.
Step 33, the OLAP engine parses and executes the data processing corresponding to the data analysis request.
Step 34, the OLAP engine accesses the column engine data of the current instance and queries the corresponding data.
And 35, after the OLAP engine inquires the corresponding data, feeding back an inquiry result to the cluster inquiry scheduling service.
Step 36, after the cluster query scheduling service receives the OLAP engine feedback, the query result is fed back to the client.
According to the embodiment of the invention, the method for synchronizing the line storage and the column storage data among the nodes of the database cluster through the consistency protocol is used for writing the data log (such as the Redo log/WAL log/Bin log) of the Leader node into other nodes through the consistency protocol and playing back the data based on the copied log, so that the data synchronization is performed, and the problems that the data link is long and the timeliness of log data writing cannot be ensured by changing a data capturing (Change Data Capture and CDC) mechanism in the prior art are solved.
Example six
Fig. 9 is a schematic structural diagram of a data processing apparatus according to a sixth embodiment of the present invention. As shown in fig. 9, the apparatus includes: a request acquisition module 51, a request acquisition module 52 and a data processing module 53.
The request acquiring module 51 is configured to acquire a data processing request of a client.
The request acquisition module 52 is configured to allocate the data processing request to at least one of the row storage module and the column storage module according to the attribute information of the data processing request by the request scheduling service module.
The data processing module 53 is configured to perform data processing corresponding to the data processing request allocated by the request scheduling service module based on at least one of the row storage module and the column storage module.
According to the embodiment of the invention, the request acquisition module requests the scheduling service module to acquire the data processing request of the client, the data processing module distributes the data processing request to the row storage module or the column storage module according to the attribute information of the data processing request, and the data processing module executes the data processing corresponding to the data processing request distributed by the request scheduling service module based on the row storage module or the column storage module. Different service demands are processed through different query processing engines, so that the data processing performance is improved, and the use experience of a user is enhanced.
In one embodiment, the request acquisition module 52 includes:
and the information searching unit is used for searching the routing information stored in association with the service type and the request type of the data processing request in the request scheduling service module.
And the first information sending unit is used for sending the data processing request to the row storage module according to the routing information based on the request scheduling service module under the condition that the request type is the data modification request.
And the second information sending unit is used for sending the data processing request to the column storage module according to the routing information based on the request scheduling service module under the condition that the request type is the data analysis request.
In one embodiment, the data processing module 53 includes:
and the first command processing unit is used for analyzing the data modification command of the data processing request by the row storage module.
And the command execution unit is used for executing a data modification command according to the data processing engine of the line storage module, wherein the data modification command at least comprises a data writing command, a data modification command and a data deleting command.
And the command synchronizing unit is used for synchronizing the log information of the data modification command to other row storage modules and/or column storage modules according to the consistency management module.
And the first result feedback unit is used for feeding back the execution result of the data processing request to the client.
In one embodiment, the data processing module 53 includes:
And the second command processing unit is used for analyzing the data analysis command of the data processing request by the row storage module.
And the result generating unit is used for accessing the data stored in the corresponding row of the data analysis command according to the column data processing engine of the column storage module so as to generate a data analysis result.
And the second result feedback unit is used for feeding back the data analysis result of the data processing request to the client.
The data processing device provided by the embodiment of the invention can execute the data processing method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example seven
Fig. 10 is a schematic structural diagram of an electronic device 10 implementing a data processing method according to an embodiment of the present invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 10, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as a data processing method.
In some embodiments, a data processing method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. One or more steps of a data processing method described above may be performed when a computer program is loaded into RAM 13 and executed by processor 11. Alternatively, in other embodiments, the processor 11 may be configured to perform a data processing method in any other suitable way (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (8)

1. A data storage system, the data storage system comprising:
a request dispatch service module, a row storage module and a column storage module;
the request scheduling service module acquires a data processing request of a client and distributes the data processing request to at least one of the row storage module and the column storage module according to attribute information of the data processing request; wherein the attribute information includes a service type and a request type, the service type includes a transaction service and an analysis service, and the request type includes a data modification request and a data analysis request;
The row storage module comprises a row data processing engine, the column storage module comprises a column data processing engine, and the row storage module and the column storage module execute data processing corresponding to the data processing request distributed by the request scheduling service module; the data processing engine comprises a row optimizer, a row executor and a row memory, the column data processing engine comprises a column optimizer, a column executor and a column memory, and the request scheduling service module is a database cluster query scheduling service module;
the number of the row storage modules comprises at least one, and each row storage module corresponds to a service type and a request type in the attribute information of the data processing request;
wherein the attribute information according to the data processing request is distributed to at least one of the row storage module and the column storage module, comprising:
searching route information stored in association with the service type and the request type of the data processing request in the request scheduling service module;
transmitting the data processing request to the row storage module according to the routing information based on the request scheduling service module under the condition that the request type is a data modification request;
In the case that the request type is a simple data analysis request, sending the data processing request to the row storage module according to the routing information based on the request scheduling service module;
and in the case that the request type is a complex data analysis request, the request scheduling service module sends the data processing request to the column storage module according to the routing information based on the request.
2. The system of claim 1, wherein each of said rows of memory modules are backed up with respect to each other.
3. The system of claim 1, further comprising a consistency management module for synchronizing storage data of the row storage modules and the column storage modules.
4. A data processing method for use in a data storage system, the method comprising:
acquiring a data processing request of a client;
distributing the data processing request to at least one of a row storage module and a column storage module according to the attribute information of the data processing request by a request scheduling service module; wherein the attribute information includes a service type and a request type, the service type includes a transaction service and an analysis service, and the request type includes a data modification request and a data analysis request;
Executing data processing corresponding to the data processing request distributed by the request scheduling service module based on at least one of the row storage module and the column storage module; the row storage module comprises a row data processing engine, the column storage module comprises a column data processing engine, the row data processing engine comprises a row optimizer, a row executor and a row memory, the column data processing engine comprises a column optimizer, a column executor and a column memory, and the request scheduling service module is a database cluster query scheduling service module;
the number of the row storage modules comprises at least one, and each row storage module corresponds to a service type and a request type in the attribute information of the data processing request;
wherein the dispatching service module distributes the data processing request to at least one of the row storage module and the column storage module according to the attribute information of the data processing request, and the dispatching service module comprises the following steps:
searching route information stored in association with the service type and the request type of the data processing request in the request scheduling service module;
transmitting the data processing request to the row storage module according to the routing information based on the request scheduling service module under the condition that the request type is a data modification request;
In the case that the request type is a simple data analysis request, sending the data processing request to the row storage module according to the routing information based on the request scheduling service module;
and in the case that the request type is a complex data analysis request, the request scheduling service module sends the data processing request to the column storage module according to the routing information based on the request.
5. The method of claim 4, wherein the data processing request comprises a data modification request, and wherein the performing, based on at least one of the row storage module and the column storage module, the data processing corresponding to the data processing request allocated by the request scheduling service module, respectively, comprises:
analyzing a data modification command of the data processing request at the line storage module;
executing the data modification command according to a data processing engine of the line storage module, wherein the data modification command at least comprises a data writing command, a data modification command and a data deleting command;
synchronizing log information of the data modification command to other row storage modules and/or column storage modules according to a consistency management module;
And feeding back an execution result of the data processing request to the client.
6. The method of claim 4, wherein the data processing request comprises a data analysis request, and wherein the performing, based on at least one of the row storage module and the column storage module, the data processing corresponding to the data processing request allocated by the request scheduling service module, respectively, comprises:
analyzing a data analysis command of the data processing request at the row storage module or the column storage module;
accessing row storage data corresponding to the data analysis command by a row data processing engine of the row storage module or accessing column storage data corresponding to the data analysis command by a column data processing engine of the column storage module to generate a data analysis result;
and feeding back a data analysis result of the data processing request to the client.
7. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data processing method of any one of claims 4-6.
8. A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer instructions for causing a processor to implement the data processing method of any one of claims 4-6 when executed.
CN202211407850.4A 2022-11-10 2022-11-10 Data storage system, data processing method, electronic equipment and storage medium Active CN115599790B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211407850.4A CN115599790B (en) 2022-11-10 2022-11-10 Data storage system, data processing method, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211407850.4A CN115599790B (en) 2022-11-10 2022-11-10 Data storage system, data processing method, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115599790A CN115599790A (en) 2023-01-13
CN115599790B true CN115599790B (en) 2024-03-15

Family

ID=84852889

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211407850.4A Active CN115599790B (en) 2022-11-10 2022-11-10 Data storage system, data processing method, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115599790B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20140098529A (en) * 2013-01-31 2014-08-08 한국전자통신연구원 Apparatus and method for effective simultaneous supporting both olap and oltp which have different data access patterns
CN104715039A (en) * 2015-03-23 2015-06-17 星环信息科技(上海)有限公司 Column-based storage and research method and equipment based on hard disk and internal storage
CN106777027A (en) * 2016-12-08 2017-05-31 北京国电通网络技术有限公司 MPP ranks blended data storage device and storage, querying method
CN108616581A (en) * 2018-04-11 2018-10-02 深圳纳实大数据技术有限公司 Data-storage system and method based on OLAP/OLTP mixing applications
CN110019251A (en) * 2019-03-22 2019-07-16 深圳市腾讯计算机系统有限公司 A kind of data processing system, method and apparatus
CN110222072A (en) * 2019-06-06 2019-09-10 江苏满运软件科技有限公司 Data Query Platform, method, equipment and storage medium
WO2020160265A1 (en) * 2019-02-02 2020-08-06 Alibaba Group Holding Limited Data storage apparatus, translation apparatus, and database access method
CN111858759A (en) * 2020-07-08 2020-10-30 平凯星辰(北京)科技有限公司 HTAP database based on consensus algorithm
CN114356971A (en) * 2021-12-02 2022-04-15 阿里巴巴(中国)有限公司 Data processing method, device and system
WO2022126839A1 (en) * 2020-12-15 2022-06-23 跬云(上海)信息科技有限公司 Cloud computing-based adaptive storage hierarchy system and method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8782100B2 (en) * 2011-12-22 2014-07-15 Sap Ag Hybrid database table stored as both row and column store
US10095733B2 (en) * 2014-10-07 2018-10-09 Sap Se Heterogeneous database processing archetypes for hybrid system
US11423001B2 (en) * 2019-09-13 2022-08-23 Oracle International Corporation Technique of efficiently, comprehensively and autonomously support native JSON datatype in RDBMS for both OLTP and OLAP

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20140098529A (en) * 2013-01-31 2014-08-08 한국전자통신연구원 Apparatus and method for effective simultaneous supporting both olap and oltp which have different data access patterns
CN104715039A (en) * 2015-03-23 2015-06-17 星环信息科技(上海)有限公司 Column-based storage and research method and equipment based on hard disk and internal storage
CN106777027A (en) * 2016-12-08 2017-05-31 北京国电通网络技术有限公司 MPP ranks blended data storage device and storage, querying method
CN108616581A (en) * 2018-04-11 2018-10-02 深圳纳实大数据技术有限公司 Data-storage system and method based on OLAP/OLTP mixing applications
WO2020160265A1 (en) * 2019-02-02 2020-08-06 Alibaba Group Holding Limited Data storage apparatus, translation apparatus, and database access method
CN110019251A (en) * 2019-03-22 2019-07-16 深圳市腾讯计算机系统有限公司 A kind of data processing system, method and apparatus
CN110222072A (en) * 2019-06-06 2019-09-10 江苏满运软件科技有限公司 Data Query Platform, method, equipment and storage medium
CN111858759A (en) * 2020-07-08 2020-10-30 平凯星辰(北京)科技有限公司 HTAP database based on consensus algorithm
WO2022126839A1 (en) * 2020-12-15 2022-06-23 跬云(上海)信息科技有限公司 Cloud computing-based adaptive storage hierarchy system and method
CN114356971A (en) * 2021-12-02 2022-04-15 阿里巴巴(中国)有限公司 Data processing method, device and system

Also Published As

Publication number Publication date
CN115599790A (en) 2023-01-13

Similar Documents

Publication Publication Date Title
CN110247984B (en) Service processing method, device and storage medium
CN111460023A (en) Service data processing method, device, equipment and storage medium based on elastic search
CN104731956A (en) Method and system for synchronizing data and related database
CN103440288A (en) Big data storage method and device
CN103207919A (en) Method and device for quickly inquiring and calculating MangoDB cluster
CN111339171A (en) Data query method, device and equipment
CN115934855A (en) Full-link field level blood margin analysis method, system, equipment and storage medium
US10747773B2 (en) Database management system, computer, and database management method
CN112416991A (en) Data processing method and device and storage medium
CN114398520A (en) Data retrieval method, system, device, electronic equipment and storage medium
US10552419B2 (en) Method and system for performing an operation using map reduce
CN110515979B (en) Data query method, device, equipment and storage medium
CN115599790B (en) Data storage system, data processing method, electronic equipment and storage medium
CN115495440A (en) Data migration method, device and equipment of heterogeneous database and storage medium
KR20170130178A (en) In-Memory DB Connection Support Type Scheduling Method and System for Real-Time Big Data Analysis in Distributed Computing Environment
CN116383207A (en) Data tag management method and device, electronic equipment and storage medium
EP4170512A1 (en) Time series data injection method, time series data query method and database system
CN111061719B (en) Data collection method, device, equipment and storage medium
CN114691781A (en) Data synchronization method, system, device, equipment and medium
CN113868249A (en) Data storage method and device, computer equipment and storage medium
CN109739883B (en) Method and device for improving data query performance and electronic equipment
CN112015790A (en) Data processing method and device
CN116561106B (en) Configuration item data management method and system
CN111722874B (en) Automatic cleaning method, device and equipment for mobile terminal codes and storage medium
CN115438099A (en) Data reading method and device, database node and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant