WO2021218144A1 - Procédé et appareil de traitement de données, dispositif informatique et support d'enregistrement - Google Patents
Procédé et appareil de traitement de données, dispositif informatique et support d'enregistrement Download PDFInfo
- Publication number
- WO2021218144A1 WO2021218144A1 PCT/CN2020/132378 CN2020132378W WO2021218144A1 WO 2021218144 A1 WO2021218144 A1 WO 2021218144A1 CN 2020132378 W CN2020132378 W CN 2020132378W WO 2021218144 A1 WO2021218144 A1 WO 2021218144A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- query
- query task
- keywords
- current optimal
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 25
- 238000012545 processing Methods 0.000 claims abstract description 92
- 238000000034 method Methods 0.000 claims abstract description 24
- 238000004422 calculation algorithm Methods 0.000 claims description 22
- 238000010801 machine learning Methods 0.000 claims description 12
- 238000006243 chemical reaction Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 6
- 238000012546 transfer Methods 0.000 claims description 2
- 230000003287 optical effect Effects 0.000 description 7
- 238000012795 verification Methods 0.000 description 6
- 230000005540 biological transmission Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000003491 array Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
- G06F3/0674—Disk device
Definitions
- the embodiments of the present application relate to the field of data processing, and in particular, to a data processing method, device, computer equipment, and storage medium.
- Distributed storage often uses a distributed system structure, using multiple storage servers to share the storage load, and location servers to locate storage information. It not only improves the reliability, availability, and access efficiency of the system, it is also easy to expand, and minimizes the instability factors introduced by general-purpose hardware.
- the hardware deployed in a distributed storage system determines the read and write speed of the distributed storage. That is, the use of high-performance hard disks can greatly improve the efficiency of data reading and writing, but the cost of high-performance hard disks is usually high.
- the embodiments of the present application provide a data processing method, device, computer equipment, and storage medium, which can improve data writing efficiency.
- an embodiment of the present application provides a data processing method, including:
- system writing parameters include: hard disk attribute information and/or historical statistical information
- an embodiment of the present application provides a data processing device, including:
- the optimal processing data amount determining module is used to determine the current optimal processing data amount according to the system writing parameters when there is a data writing requirement, wherein the system writing parameters include: hard disk attribute information and/or historical statistics information;
- the data writing module is used to obtain and store target data matching the current optimal processing data volume.
- the embodiments of the present application also provide a computer device, including a memory, a processor, and a computer program stored in the memory and running on the processor.
- the processor executes the program when the program is executed.
- an embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the data processing method as described in any one of the embodiments of the present application is implemented.
- the embodiment of the application determines the current optimal processing data volume according to the system write parameters that determine the system write data capacity when data writing is required, and obtains the data with the current most processed data volume for writing.
- the storage system reasonably invokes resources for write operations, avoids excessive data volume, which causes the storage system to not perform other services normally, and avoids excessive data volume from causing surplus of storage system resources, thereby reducing write efficiency and solving problems in related technologies.
- High-performance hard disks can improve writing efficiency and lead to high costs.
- the resources of the storage system can be reasonably configured, and the amount of data written can be adjusted adaptively, so that the storage system can use limited resources for data writing and improve data writing efficiency.
- FIG. 1 is a flowchart of a data processing method in Embodiment 1 of the present application.
- Fig. 2 is a flowchart of a data processing method in the second embodiment of the present application.
- Fig. 3 is a flowchart of a data processing method in the third embodiment of the present application.
- Fig. 4 is a flowchart of a data processing method in the fourth embodiment of the present application.
- FIG. 5 is a schematic structural diagram of a data processing device in Embodiment 5 of the present application.
- Fig. 6 is a schematic structural diagram of a computer device in the sixth embodiment of the present application.
- Figure 1 is a flow chart of a data processing method in the first embodiment of this application. This embodiment can be applied to the situation of storing data when data needs to be written. This method can be based on the data provided in this embodiment It is executed by a processing device, which can be implemented in software and/or hardware, and can generally be integrated into a storage system. As shown in Figure 1, the method of this embodiment specifically includes:
- the presence of a data write requirement indicates that the storage system needs to perform a write operation. For example, it may be that a data write request sent by the client is received.
- the storage system is used to read and write data, and in addition, can respond to other business requests and other functions, which can be specifically set in actual conditions. This embodiment of the present application does not specifically limit this.
- the storage system can be a server or a server cluster, including multiple server nodes.
- the system write parameters may include multiple parameters, which are used to determine the data write capability of the storage system, and specifically are used to determine the amount of data written by the storage system.
- the current optimal processing data amount may refer to the amount of data to be written. Usually, the data that needs to be written is very large. You can split the data and store it one by one. You can use the current optimal amount of processed data as the amount of data stored in the current round, and obtain the data of this amount for storage, and then store it in the next round. The data of the matching data volume in the next round is obtained for storage.
- a storage system can refer to a server or server cluster that provides services to clients.
- the storage system may include multiple server nodes.
- the hard disk attribute information includes at least one of the following: hard disk model, manufacturer, addressing time, and data transmission time
- the historical statistical information includes: network bandwidth time distribution statistics and/or memory occupancy time distribution statistics information.
- the hard disk model can refer to the name and type of the hard disk.
- the manufacturer can refer to the manufacturer of the hard disk, and the hard disk model and manufacturer are used to determine the data processing speed of the hard disk.
- the addressing time can refer to the total time that the magnetic head has experienced from the starting position to the required read/write position, which is used to determine the ability of the hard disk to read and write data.
- the data transfer time can be the rate at which data is read from the internal cache of the hard disk, and is used to determine the ability of the hard disk to read and write data.
- the network bandwidth time distribution statistics information is used to describe the statistics information of the network bandwidth changes over time, where the network bandwidth may refer to the amount of data transmitted through the network in a unit time.
- the memory usage time distribution statistics information is used to describe the statistics information of the memory usage space changes over time.
- system write parameters may also include traffic statistics information and/or emergency event information.
- traffic statistics information is used to describe the statistics information of the traffic changes over time, and the traffic may refer to the number of visits to a certain address within a unit time. Emergencies information is used to describe events for processing that do not exist in history but occupy storage system resources.
- other system write parameters can also be configured according to actual conditions, and this embodiment of the present application does not make specific restrictions.
- the determining the current optimal processing data volume according to the system write parameters includes: using a machine learning algorithm to calculate the current optimal processing data volume according to the system write parameters.
- Machine learning algorithms are used to determine the amount of data to be written.
- a machine learning algorithm can refer to an algorithm in which a machine learns by analyzing a large amount of data and makes predictions.
- the machine learning algorithm can be a decision tree.
- the machine learning algorithm can also include linear regression algorithm, support vector machine algorithm, nearest neighbor/k-nearest neighbor algorithm, logistic regression algorithm, k-average algorithm, random forest algorithm, and naive Bayesian algorithm, dimensionality reduction algorithm, gradient enhancement algorithm, etc., can be selected according to actual conditions, and the embodiment of the present application does not specifically limit this.
- the data volume of the target data is equal to the current optimal processing data volume.
- the target data storage can be local storage in the storage system or distributed storage.
- the target data storage can be stored in the cache or in the memory. You can choose to store it in the memory and store the data in the memory. Compared with the cache, it can improve the speed of reading and writing data.
- the target data includes batch data.
- Batch processing of data can refer to batch processing of data, which can be understood as processing multiple pieces of data at the same time.
- the current optimal processing data volume may refer to the number of pieces of data.
- the target data By configuring the target data as batch data, a large amount of data can be written at the same time, which improves the efficiency of data writing.
- the embodiment of the application determines the current optimal processing data volume according to the system write parameters that determine the system write data capacity when data writing is required, and obtains the data with the current most processed data volume for writing.
- the storage system reasonably invokes resources for write operations, avoids excessive data volume, which causes the storage system to not perform other services normally, and avoids excessive data volume from causing surplus of storage system resources, thereby reducing write efficiency and solving problems in related technologies.
- High-performance hard disks can improve writing efficiency and lead to high costs.
- the resources of the storage system can be reasonably configured, and the amount of data written can be adjusted adaptively, so that the storage system can use limited resources for data writing and improve data writing efficiency.
- FIG. 2 is a flowchart of a data processing method in the second embodiment of the application. This embodiment is optimized based on the above-mentioned embodiment.
- the storage system is a KUDU system, and the KUDU system includes multiple server nodes.
- the acquiring and storing the target data matching the current optimal processing data volume includes: acquiring the target data matching the current optimal processing data volume; and storing the target data in a distributed manner on each of the servers Node.
- the method of this embodiment specifically includes:
- S210 When there is a data writing requirement, determine the current optimal processing data amount according to the system writing parameter, where the system writing parameter includes: hard disk attribute information and/or historical statistical information;
- the hard disk attribute information includes at least one of the following: hard disk model, manufacturer, addressing time, and data transmission time.
- the historical statistical information includes: network bandwidth time distribution statistical information and/or memory occupancy time distribution statistical information.
- the determining the current optimal processing data volume according to the system write parameters includes: using a machine learning algorithm to calculate the current optimal processing data volume according to the system write parameters.
- S220 Acquire target data matching the current optimal processing data volume.
- the target data includes batch data.
- S230 Distributedly store the target data in each of the server nodes.
- the server node is a node in the KUDU system.
- the KUDU system is a distributed storage system
- the storage method is columnar storage. Exemplarily, each column of data in the table to be stored is stored together, and different columns are stored separately.
- the KUDU system usually consists of a master server node and multiple slave server nodes. Data can be stored in each slave server node of the KUDU system, and each slave server node is connected to the master server of the KUDU system, so that the master server node can be connected to the outside Information exchange is carried out to realize the management of the data stored in each slave server node through the master server node of the KUDU system.
- KUDU system is usually able to efficiently manage read and write caches, and supports automatic hierarchical storage, and can realize the separate deployment of high-speed storage and low-speed storage, or mix them at any ratio to reduce latency.
- multiple copy backups are used, which is effective Reduce data loss, improve the fault tolerance of the system, and reduce storage costs.
- the embodiment of the application adopts the KUDU system to store data in a distributed manner, which can share the storage load, improve the efficiency of writing data, and at the same time, improve the fault tolerance, stability, and reliability of the system.
- FIG. 3 is a flowchart of a data processing method in Embodiment 3 of the application. This embodiment is optimized based on the above-mentioned embodiment.
- the data processing method further includes: obtaining data query when there is a data reading requirement A task, the data query task is used to query in pre-written data; the data query task is split into at least one data query subtask, and distributed to at least one matching server node in the storage system, In order to enable each of the server nodes to perform a query according to the query subtask; receive the query subdata fed back by each of the server nodes and merge them to obtain query data that matches the data query task.
- the method of this embodiment specifically includes:
- S310 When there is a data writing requirement, determine the current optimal processing data amount according to the system writing parameter, where the system writing parameter includes: hard disk attribute information and/or historical statistical information;
- the hard disk attribute information includes at least one of the following: hard disk model, manufacturer, addressing time, and data transmission time.
- the historical statistical information includes: network bandwidth time distribution statistical information and/or memory occupancy time distribution statistical information.
- the determining the current optimal processing data volume according to the system write parameters includes: using a machine learning algorithm to calculate the current optimal processing data volume according to the system write parameters.
- S320 Acquire and store target data matching the current optimal processing data volume.
- the target data includes batch data.
- the storage system is a KUDU system
- the KUDU system includes a plurality of server nodes
- the acquiring and storing target data that matches the current optimal processing data volume includes: acquiring and storing the target data that matches the current optimal Optimal processing of target data matching the amount of data; distributed storage of the target data in each of the server nodes.
- the presence of a data read requirement indicates that the storage system needs to perform a read operation, that is, a query operation, for example, it may be a data acquisition request sent by a client.
- the data query task is used to query data from the data stored in the storage system.
- the pre-written data may refer to the data stored in the storage system.
- S340 Split the data query task into at least one data query subtask, and distribute it to at least one matching server node in the storage system, so that each server node executes a query according to the query subtask.
- the storage system is a server cluster, including multiple server nodes.
- the storage system can use a distributed storage structure to store data, that is, each server node stores part of the data.
- the data corresponding to the data query task can be distributed in different server nodes. Therefore, the data query subtask can be split into different data query subtasks and sent to the server node where the data is located for query.
- the split method may be to split the data query task into at least one data query subtask according to the distribution of the query data matched by the data query task in the storage system.
- the query data matched by the data query task are respectively distributed in server node A, server node B, and server node C, where server node A stores data a, server node B stores data b, and server node Data c is stored in C.
- the data query task can be formed according to the query operations of data a, data b, and data c to form a first data query subtask, a second data query subtask, and a third data query subtask, respectively.
- the data query subtask is used for server node execution to obtain locally stored data. It should be noted that the data stored by each server node can be completely different, can be partially the same, or can be completely the same. In this regard, it can be set according to the actual situation, and the embodiment of the present application does not make specific restrictions.
- S350 Receive query sub-data fed back by each of the server nodes, merge them, and obtain query data that matches the data query task.
- the query data may refer to the data that needs to be acquired by the data query task.
- the query sub-data is used to form the query data. Since the data stored by the server node may be different, the data obtained from the query is only a part of the complete query data.
- the query sub-data fed back by each server node can be merged, and if there is duplicate data, it can be deleted to obtain complete and non-duplicated query data.
- the embodiment of the application splits the data query task into multiple data query subtasks when there is a reading demand, and sends them to the server node for processing respectively, so as to realize that multiple server nodes perform query operations at the same time and improve the data query performance. efficient.
- the data query task includes a structured query language statement; the data query The task is split into at least one data query subtask, including: analyzing the data query task according to a preset keyword list to obtain at least one keyword corresponding to the data query task, and each of the keywords Corresponding variable value; construct a grammatical parse tree according to each of the keywords and the variable value corresponding to each of the keywords; determine at least one data processing instruction according to the grammatical parse tree as data query subtasks.
- the method of this embodiment specifically includes:
- S410 When there is a data writing requirement, determine the current optimal processing data amount according to the system writing parameter, where the system writing parameter includes: hard disk attribute information and/or historical statistical information;
- the hard disk attribute information includes at least one of the following: hard disk model, manufacturer, addressing time, and data transmission time.
- the historical statistical information includes: network bandwidth time distribution statistical information and/or memory occupancy time distribution statistical information.
- the determining the current optimal processing data volume according to the system write parameters includes: using a machine learning algorithm to calculate the current optimal processing data volume according to the system write parameters.
- the target data includes batch data.
- the storage system is a KUDU system
- the KUDU system includes a plurality of server nodes
- the obtaining and storing target data matching the current optimal processing data volume includes: obtaining Optimal processing of target data matching the amount of data; distributed storage of the target data in each of the server nodes.
- SQL Structured Query Language
- SQL is a standard computer language used for relational database management and data manipulation. SQL is used for querying, inserting, updating and modifying data. Databases on different platforms support SQL, so SQL can be used Realize cross-platform data processing.
- the data query task package is:
- the data query task is used to query the name c_name of a person whose age is less than 20 (c_age ⁇ 20) in the query data table table_c.
- S440 Analyze the data query task according to a preset keyword list to obtain at least one keyword corresponding to the data query task and a variable value corresponding to each keyword.
- Keywords can refer to parameters included in the data query task, and specifically, can include at least one of the following query types, table names, and field names.
- the variable value corresponding to the keyword refers to the assignment of the keyword, which is generally a character, such as at least one of numbers, letters, and symbols.
- the keyword list is used to determine the keywords included in the data query task. Specifically, the standard keywords included in the keyword list are pre-designated keywords.
- the keywords refer to SELECT (query content), FROM (query data table name), and WHERE (data filter condition) in the previous example, and the variable values of each keyword are:able_c, c_age ⁇ 20, and c_name.
- the analyzing the data query task according to a preset keyword list to obtain at least one keyword corresponding to the data query task and the variable value corresponding to each of the keywords include: Query in the data query task according to at least one standard keyword stored in the keyword list to obtain at least one keyword included in the data query task and a variable value corresponding to each keyword.
- Standard keywords are used as parameters that can be recognized by the storage system, so that the storage system can recognize the content of the data query task and perform data query.
- the data query task can be accurately parsed to determine the data query subtask for data query, thereby improving the accuracy of data query.
- S450 Construct a grammar parse tree according to each of the keywords and the variable values corresponding to each of the keywords.
- the parse tree is formed by a series of nodes connected in series, which is used to describe the data structure of the data query task.
- the parse tree is used to convert the data structure of the data query task into a target data structure, and the target data structure can be recognized by the storage system.
- the data query task is data in text form, which cannot be recognized by the storage system.
- constructing a grammar parse tree according to each of the keywords and the variable values corresponding to each of the keywords includes: querying the data according to each of the keywords and the variable values corresponding to each of the keywords The task performs legality verification; when the data query task is legally verified, a syntax parse tree is constructed according to each of the keywords and the variable values corresponding to each of the keywords.
- Validity verification is used to verify whether the SQL statement has a syntax error. Legality verification can be verified through preset grammar verification rules. If it is determined that the SQL statement syntax is illegal, an error message is generated and fed back to the sender of the data query task, so that the sender can input the correct SQL sentence according to the error message.
- the subsequent parse tree construction operation can prevent the generation of an incorrect parse tree based on illegal SQL statements, resulting in incorrect data being queried, thereby improving Data query accuracy rate.
- S460 Determine at least one data processing instruction according to the syntax parse tree, respectively as data query subtasks, and distribute to at least one matching server node in the storage system, so that each server node can perform the query according to the query
- the subtask executes the query.
- the storage system can quickly identify the data query task and determine at least one data processing instruction.
- the storage system may determine to generate one data processing instruction or at least two data processing instructions according to the query data volume of the data query task.
- a data processing instruction can be used as a query subtask.
- the method before determining at least one data processing instruction according to the syntax parse tree, the method further includes: acquiring a data object of the syntax parse tree, and performing data conversion to enable the storage system to recognize; and according to the conversion After the syntax parse tree, at least one data processing instruction is determined.
- the data object can refer to a keyword in the syntax parse tree, or a variable value corresponding to the keyword.
- Data conversion is used to convert data objects into target data objects that can be recognized by the storage system.
- Data conversion can be performed through preset conversion rules.
- the conversion rules specify the correspondence between keywords in the SQL statement and the parameters that can be recognized by the storage system, as well as the variable values corresponding to the keywords in the SQL statement. Correspondence of identifiable parameter values in the storage system.
- the syntax analysis tree species includes a node as a query type, and the variable value stored under the node is drop.
- the query type is represented by the parameter X.
- the SQL statement uses the variable value drop to characterize "delete”
- the storage system uses the parameter value n to characterize "delete”.
- the storage system cannot recognize the query type and variable value in the parse tree.
- After data conversion for example, replace the node "query type" in the grammar parse tree with node "X”, and store the parameter value n under node X, so as to ensure the data structure of the grammar parse tree and the grammar parse tree
- the various parameters and parameter values can be identified by the storage system.
- S470 Receive query sub-data fed back by each of the server nodes, merge them, and obtain query data that matches the data query task.
- the embodiment of the application extracts keywords in the data query task and variable values corresponding to the keywords through the keyword list when the data query task includes SQL statements, and constructs a parse tree to make the storage system identifiable and then store The system quickly parses data query tasks and processes them to improve the efficiency of data reading.
- FIG. 5 is a schematic diagram of a data processing device in Embodiment 5 of this application.
- the fourth embodiment is a corresponding device that implements the data processing method provided in the foregoing embodiments of the present application.
- the device can be implemented in software and/or hardware, and can generally be integrated into a storage system.
- the device of this embodiment may include:
- the optimal processing data amount determining module 510 is used to determine the current optimal processing data amount according to the system writing parameters when there is a data writing requirement, wherein the system writing parameters include: hard disk attribute information and/or history Statistics;
- the data writing module 520 is configured to obtain and store target data matching the current optimal processing data volume.
- the embodiment of the application determines the current optimal processing data volume according to the system write parameters that determine the system write data capacity when data writing is required, and obtains the current most processed data volume data for writing, which can be realized
- the storage system reasonably invokes resources for write operations, avoids excessive data volume, which causes the storage system to not perform other services normally, and avoids excessive data volume from causing surplus of storage system resources, thereby reducing write efficiency and solving problems in related technologies.
- High-performance hard disks can improve writing efficiency and lead to high costs.
- the resources of the storage system can be reasonably configured, and the amount of data written can be adjusted adaptively, so that the storage system can use limited resources to write data and improve the efficiency of data writing.
- the hard disk attribute information includes at least one of the following: hard disk model, manufacturer, addressing time, and data transmission time.
- the historical statistical information includes: network bandwidth time distribution statistical information and/or memory occupation time distribution statistical information.
- the optimal processing data amount determining module 510 includes: a machine learning algorithm calculation module, configured to use a machine learning algorithm to calculate the current optimal processing data amount according to the system write parameters.
- the target data includes batch data.
- the storage system is a KUDU system
- the KUDU system includes multiple server nodes
- the data writing module 520 includes: a distributed storage unit for obtaining data that matches the current optimal processing data volume Target data; the target data is distributed and stored in each of the server nodes.
- the data processing method further includes: a data reading module, configured to obtain a data query task when there is a data reading requirement, and the data query task is used to query data written in advance;
- the data query task is split into at least one data query subtask, and distributed to at least one matching server node in the storage system, so that each server node executes the query according to the query subtask;
- the query sub-data fed back by the server node is merged to obtain query data matching the data query task.
- the data query task includes a structured query language statement
- the data reading module includes: a data query task splitting unit for analyzing the data query task according to a preset keyword list , Obtain at least one keyword corresponding to the data query task and the variable value corresponding to each keyword; construct a grammar parse tree according to each keyword and the variable value corresponding to each keyword; The syntax parse tree is described, and at least one data processing instruction is determined, which are respectively used as data query subtasks.
- the data query task splitting unit includes: a grammar parse tree construction subunit, which is used to verify the validity of the data query task according to each of the keywords and the variable value corresponding to each of the keywords. Verification; when the data query task is legally verified, a grammar parse tree is constructed according to each of the keywords and the variable values corresponding to each of the keywords.
- the data query task splitting unit includes: a keyword query subunit for querying in the data query task according to at least one standard keyword stored in the keyword list to obtain the The data query task includes at least one keyword and the variable value corresponding to each keyword.
- the data reading module further includes: a data conversion unit, configured to obtain a data object of the syntax parse tree and perform data conversion before determining at least one data processing instruction according to the syntax parse tree , So that the storage system can recognize; and determine at least one data processing instruction according to the converted syntax parse tree.
- a data conversion unit configured to obtain a data object of the syntax parse tree and perform data conversion before determining at least one data processing instruction according to the syntax parse tree , So that the storage system can recognize; and determine at least one data processing instruction according to the converted syntax parse tree.
- the above-mentioned data processing device can execute the data processing method provided in the embodiments of the present application, and has the corresponding functional modules and beneficial effects of the executed data processing method.
- FIG. 6 is a schematic structural diagram of a computer device in Embodiment 6 of this application.
- Fig. 6 shows a block diagram of an exemplary computer device 12 suitable for implementing embodiments of the present application.
- the computer device 12 shown in FIG. 6 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present application.
- the computer device 12 is represented in the form of a general-purpose computing device.
- the components of the computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 connecting different system components (including the system memory 28 and the processing unit 16).
- the computer device 12 may be a device connected to a bus.
- the bus 18 represents one or more of several types of bus structures, including a memory bus or a memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any bus structure among multiple bus structures.
- these architectures include but are not limited to Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, enhanced ISA bus, Video Electronics Standards Association (Video Electronics Standards) Association, VESA) local bus and Peripheral Component Interconnect (PCI) bus.
- the computer device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by the computer device 12, including volatile and nonvolatile media, removable and non-removable media.
- the system memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32.
- the computer device 12 may further include other removable/non-removable, volatile/non-volatile computer system storage media.
- the storage system 34 may be used to read and write non-removable, non-volatile magnetic media (not shown in FIG. 6 and generally referred to as a "hard drive").
- a disk drive for reading and writing to a removable non-volatile disk (such as a "floppy disk"), and a removable non-volatile optical disk (such as a compact disk read-only memory (Compact Disk)) can be provided.
- the system memory 28 may include at least one program product, the program product having a set (for example, at least one) program modules, and these program modules are configured to perform the functions of the embodiments of the present application.
- a program/utility tool 40 having a set of (at least one) program module 42 can be stored in, for example, the system memory 28.
- Such program module 42 includes, but is not limited to, an operating system, one or more application programs, and others.
- Program modules and program data, each of these examples or some combination may include the implementation of a network environment.
- the program module 42 usually executes the functions and/or methods in the embodiments described in this application.
- the computer device 12 may also communicate with one or more external devices 14 (such as keyboards, pointing devices, displays 24, etc.), and may also communicate with one or more devices that enable users to interact with the computer device 12, and/or communicate with Any device (such as a network card, modem, etc.) that enables the computer device 12 to communicate with one or more other computing devices. This communication can be performed through an input/output (Input/Output, I/O) interface 22.
- the computer device 12 may also communicate with one or more networks (for example, Local Area Network (LAN), Wide Area Network, WAN) through the network adapter 20. As shown in the figure, the network adapter 20 communicates with one or more networks through the bus 18. Communication with other modules of the computer equipment 12.
- LAN Local Area Network
- WAN Wide Area Network
- the processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, such as implementing a data processing method provided by any embodiment of the present application.
- the seventh embodiment of the present application provides a computer-readable storage medium on which a computer program is stored.
- the data processing method as provided in all the application embodiments of this application is realized: that is, the program is processed
- the implementation of the processor is realized: when there is a data writing requirement, the current optimal processing data amount is determined according to the system writing parameters, where the system writing parameters include: hard disk attribute information and/or historical statistical information; The target data matching the current optimal processing data volume is described and stored.
- the computer storage medium of the embodiment of the present application may adopt any combination of one or more computer-readable media.
- the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium.
- the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination of the above. More specific examples of computer-readable storage media (non-exhaustive list) include: electrical connections with one or more wires, portable computer disks, hard disks, RAM, Read Only Memory (ROM), erasable Erasable Programmable Read Only Memory (EPROM), flash memory, optical fiber, portable CD-ROM, optical storage device, magnetic storage device, or any suitable combination of the above.
- the computer-readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, apparatus, or device.
- the computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and computer-readable program code is carried therein. This propagated data signal can take many forms, including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing.
- the computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium, and the computer-readable medium may send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device .
- the program code contained on the computer-readable medium can be transmitted by any suitable medium, including, but not limited to, wireless, wire, optical cable, radio frequency (Radio Frequency, RF), etc., or any suitable combination of the above.
- suitable medium including, but not limited to, wireless, wire, optical cable, radio frequency (Radio Frequency, RF), etc., or any suitable combination of the above.
- the computer program code used to perform the operations of this application can be written in one or more programming languages or a combination thereof.
- the programming languages include object-oriented programming languages—such as Java, Smalltalk, C++, and also conventional Procedural programming language-such as "C" language or similar programming language.
- the program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or server.
- the remote computer may be connected to the user computer through any kind of network including LAN or WAN, or may be connected to an external computer (for example, using an Internet service provider to connect through the Internet).
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010362373.9 | 2020-04-30 | ||
CN202010362373.9A CN111562885A (zh) | 2020-04-30 | 2020-04-30 | 数据处理方法、装置、计算机设备及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021218144A1 true WO2021218144A1 (fr) | 2021-11-04 |
Family
ID=72070779
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/132378 WO2021218144A1 (fr) | 2020-04-30 | 2020-11-27 | Procédé et appareil de traitement de données, dispositif informatique et support d'enregistrement |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111562885A (fr) |
WO (1) | WO2021218144A1 (fr) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115361343A (zh) * | 2022-08-03 | 2022-11-18 | 重庆川仪自动化股份有限公司 | 一种工业设备的信息发送方法、系统、介质及电子设备 |
CN115563167A (zh) * | 2022-12-02 | 2023-01-03 | 浙江大华技术股份有限公司 | 数据查询方法、电子设备以及计算机可读存储介质 |
CN116312676A (zh) * | 2023-05-17 | 2023-06-23 | 上海芯存天下电子科技有限公司 | Nor flash的写入方法、装置、编程电路及设备 |
CN117055821A (zh) * | 2023-10-11 | 2023-11-14 | 创云融达信息技术(天津)股份有限公司 | 一种基于维度的分布式存储方法、装置、设备和介质 |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111562885A (zh) * | 2020-04-30 | 2020-08-21 | 苏州亿歌网络科技有限公司 | 数据处理方法、装置、计算机设备及存储介质 |
CN112068776A (zh) * | 2020-09-02 | 2020-12-11 | 深圳市硅格半导体有限公司 | 存储器管理算法的自适应调整方法、系统、设备及介质 |
CN112214487A (zh) * | 2020-09-28 | 2021-01-12 | 京东数字科技控股股份有限公司 | 数据写入方法及装置、计算机可读存储介质以及电子设备 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150193173A1 (en) * | 2014-01-07 | 2015-07-09 | International Business Machines Corporation | Virtual data storage cartridge memory (cm) |
CN106569893A (zh) * | 2015-10-09 | 2017-04-19 | 阿里巴巴集团控股有限公司 | 流量控制方法及设备 |
CN110399393A (zh) * | 2018-04-16 | 2019-11-01 | 北京三快在线科技有限公司 | 数据处理方法、装置、介质及电子设备 |
CN111562885A (zh) * | 2020-04-30 | 2020-08-21 | 苏州亿歌网络科技有限公司 | 数据处理方法、装置、计算机设备及存储介质 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6735678B2 (en) * | 2000-05-24 | 2004-05-11 | Seagate Technology Llc | Method and apparatus for disc drive defragmentation |
CN101577671A (zh) * | 2008-05-07 | 2009-11-11 | 北京启明星辰信息技术股份有限公司 | 一种对等联网业务自动流量控制方法及系统 |
CN103399713B (zh) * | 2013-08-02 | 2016-01-20 | 浙江大学 | 平衡多级存储性能与固态硬盘寿命的数据缓冲方法 |
CN104834478B (zh) * | 2015-03-25 | 2018-05-22 | 中国科学院计算技术研究所 | 一种基于异构混合存储设备的数据写入及读取方法 |
CN107273504A (zh) * | 2017-06-19 | 2017-10-20 | 浪潮软件集团有限公司 | 一种基于Kudu的数据查询方法和装置 |
-
2020
- 2020-04-30 CN CN202010362373.9A patent/CN111562885A/zh active Pending
- 2020-11-27 WO PCT/CN2020/132378 patent/WO2021218144A1/fr active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150193173A1 (en) * | 2014-01-07 | 2015-07-09 | International Business Machines Corporation | Virtual data storage cartridge memory (cm) |
CN106569893A (zh) * | 2015-10-09 | 2017-04-19 | 阿里巴巴集团控股有限公司 | 流量控制方法及设备 |
CN110399393A (zh) * | 2018-04-16 | 2019-11-01 | 北京三快在线科技有限公司 | 数据处理方法、装置、介质及电子设备 |
CN111562885A (zh) * | 2020-04-30 | 2020-08-21 | 苏州亿歌网络科技有限公司 | 数据处理方法、装置、计算机设备及存储介质 |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115361343A (zh) * | 2022-08-03 | 2022-11-18 | 重庆川仪自动化股份有限公司 | 一种工业设备的信息发送方法、系统、介质及电子设备 |
CN115563167A (zh) * | 2022-12-02 | 2023-01-03 | 浙江大华技术股份有限公司 | 数据查询方法、电子设备以及计算机可读存储介质 |
CN115563167B (zh) * | 2022-12-02 | 2023-03-31 | 浙江大华技术股份有限公司 | 数据查询方法、电子设备以及计算机可读存储介质 |
CN116312676A (zh) * | 2023-05-17 | 2023-06-23 | 上海芯存天下电子科技有限公司 | Nor flash的写入方法、装置、编程电路及设备 |
CN116312676B (zh) * | 2023-05-17 | 2023-08-25 | 上海芯存天下电子科技有限公司 | Nor flash的写入方法、装置、编程电路及设备 |
CN117055821A (zh) * | 2023-10-11 | 2023-11-14 | 创云融达信息技术(天津)股份有限公司 | 一种基于维度的分布式存储方法、装置、设备和介质 |
CN117055821B (zh) * | 2023-10-11 | 2024-02-02 | 创云融达信息技术(天津)股份有限公司 | 一种基于维度的分布式存储方法、装置、设备和介质 |
Also Published As
Publication number | Publication date |
---|---|
CN111562885A (zh) | 2020-08-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021218144A1 (fr) | Procédé et appareil de traitement de données, dispositif informatique et support d'enregistrement | |
US11068439B2 (en) | Unsupervised method for enriching RDF data sources from denormalized data | |
KR101621137B1 (ko) | 아파치 하둡을 위한 로우 레이턴시 쿼리 엔진 | |
CN111344693B (zh) | 动态和分布式计算系统中的聚合 | |
US20130132372A1 (en) | Systems and methods for dynamic service integration | |
US11429566B2 (en) | Approach for a controllable trade-off between cost and availability of indexed data in a cloud log aggregation solution such as splunk or sumo | |
CN108762898B (zh) | 一种线程接口的管理方法、终端设备及计算机可读存储介质 | |
US11762775B2 (en) | Systems and methods for implementing overlapping data caching for object application program interfaces | |
WO2022083436A1 (fr) | Procédé et appareil de traitement de données, et dispositif et support de stockage lisible | |
US10866960B2 (en) | Dynamic execution of ETL jobs without metadata repository | |
CN116383238B (zh) | 基于图结构的数据虚拟化系统、方法、装置、设备及介质 | |
JP5844895B2 (ja) | データの分散検索システム、データの分散検索方法及び管理計算機 | |
US11263542B2 (en) | Technologies for auto discover and connect to a rest interface | |
CN113254519B (zh) | 多源异构数据库的访问方法、装置、设备和存储介质 | |
CN109241100B (zh) | 一种查询方法、装置、设备及存储介质 | |
US11704099B1 (en) | Discovering matching code segments according to index and comparative similarity | |
US11704327B2 (en) | Querying distributed databases | |
US11216477B2 (en) | System and method for performing semantically-informed federated queries across a polystore | |
CN112883088B (zh) | 一种数据处理方法、装置、设备及存储介质 | |
CN117271554A (zh) | 一种分布式数据库视图处理方法、装置、设备及存储介质 | |
US9537941B2 (en) | Method and system for verifying quality of server | |
US10628416B2 (en) | Enhanced database query processing | |
CN113076197A (zh) | 负载均衡方法及装置、存储介质及电子设备 | |
US20240202207A1 (en) | Distributed function data transformation system | |
US10546069B2 (en) | Natural language processing system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20933420 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20933420 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20933420 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 15/06/2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20933420 Country of ref document: EP Kind code of ref document: A1 |