CN112486979A - Data processing method, device and system, electronic equipment and computer readable storage medium - Google Patents

Data processing method, device and system, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN112486979A
CN112486979A CN201910866399.4A CN201910866399A CN112486979A CN 112486979 A CN112486979 A CN 112486979A CN 201910866399 A CN201910866399 A CN 201910866399A CN 112486979 A CN112486979 A CN 112486979A
Authority
CN
China
Prior art keywords
data
index
query
proportion
queried
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910866399.4A
Other languages
Chinese (zh)
Other versions
CN112486979B (en
Inventor
王烨
周祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201910866399.4A priority Critical patent/CN112486979B/en
Publication of CN112486979A publication Critical patent/CN112486979A/en
Application granted granted Critical
Publication of CN112486979B publication Critical patent/CN112486979B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a data processing method, a data processing device, a data processing system, electronic equipment and a computer readable storage medium. The method comprises the following steps: acquiring a data query instruction of a user, wherein the data query instruction at least comprises a field to be queried and a query condition aiming at the field to be queried; acquiring the index proportion of index data corresponding to the field to be queried; and generating a data query scheme meeting the query condition according to the index proportion. According to the embodiment of the invention, indexes are partially constructed for the data by introducing the index proportion, so that a data query scheme meeting query conditions is generated according to the index proportion of index data in source data, and data query operation can be respectively executed based on the index data and the source data according to the intention of a user, so that partial calculation acceleration is realized, and the requirement of the user on balance between the calculation cost and the query performance is met.

Description

Data processing method, device and system, electronic equipment and computer readable storage medium
Technical Field
The present invention relates to the field of cloud computing technologies, and in particular, to a data processing method, an apparatus and a system, an electronic device, and a computer-readable storage medium.
Background
With the development of cloud technology, a cloud service network architecture with completely separated storage and computation has appeared, so that the advantage of hardware concentration can be fully exerted by using high-speed and high-throughput network channels. In such a cloud service, how to efficiently and conveniently store and use a large amount of data in a cloud database becomes very important.
In the prior art, in order to facilitate Storage and use of various data, an Object Storage Service (OSS) technology is used, which has the advantages of simple and friendly read-write interface, low price, support for Storage of various heterogeneous data, and the like, so that the OSS technology is widely applied to a cloud Service network architecture with separated Storage and computation, and becomes the most important data source in the cloud Service architecture.
Due to the heterogeneous data formats of the OSS technology, that is, various file formats and various storage systems exist, a cloud service user cannot well control the relationship between the query performance and the computation cost when processing (e.g., analyzing) data. For example, the special format of the data that the user needs to acquire or the complex requirements of the scene for the data that the user needs to acquire on the data may lead to an increase in the computational cost of the user or a decrease in the data acquisition performance of the user.
However, the current data storage schemes do not provide a convenient and intuitive data processing implementation scheme for users.
Disclosure of Invention
Embodiments of the present invention provide a data processing method, apparatus and system, an electronic device, and a computer-readable storage medium, so as to solve a defect that a relationship between query performance and computation cost cannot be well controlled in the prior art.
To achieve the above object, an embodiment of the present invention provides a data processing method, including:
acquiring a data query instruction of a user, wherein the data query instruction at least comprises a field to be queried and a query condition aiming at the field to be queried;
acquiring an index proportion of index data corresponding to the field to be queried, wherein the index proportion is the proportion of the index data in source data;
and generating a data query scheme meeting the query condition according to the index proportion, wherein the data query scheme is used for indicating that data query operation is executed based on the index data and data query operation is executed based on the source data.
An embodiment of the present invention further provides a data processing apparatus, including:
the device comprises an instruction acquisition module, a query module and a query module, wherein the instruction acquisition module is used for acquiring a data query instruction of a user, and the data query instruction at least comprises a field to be queried and a query condition aiming at the field to be queried;
an index proportion obtaining module, configured to obtain an index proportion of index data corresponding to the field to be queried, where the index proportion is a proportion of the index data in source data;
and the scheme generation module is used for generating a data query scheme meeting the query condition according to the index proportion, wherein the data query scheme is used for indicating that data query operation is executed based on the index data and data query operation is executed based on the source data.
An embodiment of the present invention further provides a data processing system, including: a front-end node, a compute node, and a storage node, wherein,
the front node is used for acquiring a data query instruction of a user, wherein the data query instruction at least comprises a field to be queried and a query condition aiming at the field to be queried; acquiring an index proportion of index data corresponding to the field to be queried, wherein the index proportion is the proportion of the index data in source data; generating a data query scheme meeting the query condition according to the index proportion, wherein the data query scheme is used for indicating that data query operation is executed based on the index data and data query operation is executed based on the source data;
the computing node is used for receiving the data query scheme generated by the front node and executing computing tasks in the data query scheme;
the storage node is used for storing the source data.
An embodiment of the present invention further provides an electronic device, including:
a memory for storing a program;
a processor for executing the program stored in the memory for:
acquiring a data query instruction of a user, wherein the data query instruction at least comprises a field to be queried and a query condition aiming at the field to be queried;
acquiring an index proportion of index data corresponding to the field to be queried, wherein the index proportion is the proportion of the index data in source data;
and generating a data query scheme meeting the query condition according to the index proportion, wherein the data query scheme is used for indicating that data query operation is executed based on the index data and data query operation is executed based on the source data.
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements:
acquiring a data query instruction of a user, wherein the data query instruction at least comprises a field to be queried and a query condition aiming at the field to be queried;
acquiring an index proportion of index data corresponding to the field to be queried, wherein the index proportion is the proportion of the index data in source data;
and generating a data query scheme meeting the query condition according to the index proportion, wherein the data query scheme is used for indicating that data query operation is executed based on the index data and data query operation is executed based on the source data.
According to the data processing method, the data processing device and the data processing system, the electronic equipment and the computer readable storage medium, indexes are partially built for data by introducing the index proportion, so that a data query scheme meeting query conditions is generated according to the index proportion of index data in source data, data query operation can be respectively executed according to user intention based on the index data and the source data, partial computation acceleration is realized, and the requirement of a user on balance between computation cost and query performance is met.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a system block diagram of a data processing system provided by an embodiment of the present invention;
FIG. 2 is a schematic diagram of an application scenario of a data processing system according to an embodiment of the present invention;
FIG. 3 is a flow chart of an embodiment of a data processing method provided by the present invention;
FIG. 4 is a flow chart of data query according to a data query scheme according to an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an embodiment of an electronic device provided in the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
In the prior art, due to the fact that the data formats of the OSS technology are heterogeneous, that is, various file formats and various storage systems exist, when a cloud service user processes data, the relationship between query performance and computing cost cannot be well controlled. Therefore, the present application proposes a data processing scheme, whose main principle is: when the data is stored, an index is partially built for the source data according to a preset index proportion, the index proportion can be set by a user according to the requirement of the user, when the data is inquired, an inquiry scheme meeting the requirement of the user can be generated according to the index proportion of the data, and according to the data inquiry scheme, data inquiry operation can be respectively carried out on the index data and the source data. Therefore, partial calculation acceleration can be realized according to the user intention, so that the user's requirement for balance between calculation cost and query performance can be satisfied.
The above embodiments are illustrations of technical principles of the embodiments of the present invention, and specific technical solutions of the embodiments of the present invention are further described in detail below by using a plurality of embodiments.
Example one
Fig. 1 is a system block diagram of a data processing system according to an embodiment of the present invention, and the structure shown in fig. 1 is only one example of a data processing system to which the technical solution of the present invention can be applied. The data processing system may be used to perform the process flows shown in fig. 3 and 4, described below. As shown in fig. 1, the data processing system includes: the system comprises a front node, a computing node and a storage node. The front node is used for acquiring a data query instruction of a user, wherein the data query instruction at least comprises a field to be queried and a query condition aiming at the field to be queried; the index proportion of the index data corresponding to the field to be queried is obtained, wherein the index proportion is the proportion of the index data in the source data; and generating a data query scheme meeting the query condition according to the index proportion, wherein the data query scheme is used for indicating that the data query operation is executed based on the index data and the data query operation is executed based on the source data. The computing node is used for receiving the data query scheme generated by the front node and executing the computing task in the data query scheme. The storage nodes are used for storing the source data.
In the data processing system provided by the embodiment of the invention, a plurality of front nodes can be arranged, and the front nodes serve peer-to-peer to form a front node pool. The data query command of the user can be sent to a certain front node in the front node pool for analysis processing after the load balancing device (SLB) performs load balancing. After receiving the data query instruction, the front node acquires the field to be queried from the data query instruction, and further acquires the index proportion of the index data corresponding to the field to be queried, for example, the front node may read related metadata information from a metadata module (Meta module). After the index proportion of the corresponding index data is obtained, the front node generates a data query scheme meeting the query condition in the data query instruction according to the index proportion, and sends the data query scheme to the computing node for execution. In the embodiment of the invention, a plurality of computing nodes can be arranged, and the computing nodes form a computing node pool. The data query plan generated by the front node may be distributively executed by a plurality of compute nodes in the compute node pool. In the data processing system provided by the embodiment of the invention, a structure with separated calculation can be adopted, and the calculation node reads source data from different storage nodes. In the embodiment of the present invention, the source data may be distributed in various Storage nodes such as OSS, table Storage (TableStore), MongoDB, Network Attached Storage (NAS), Relational Database Service (RDS), and the like. The relevant metadata information may also be read from the metadata module as the compute node performs the compute operation.
Fig. 2 is a schematic view of an application scenario of the data processing system according to the embodiment of the present invention. As shown in fig. 2, in this application scenario, a user may define a table structure pointing to an OSS file in a data analysis platform built by relying on a native cloud technology architecture, create a data table, and define related index information, create index data mapped to the data table. When a user writes source data in an OSS file, corresponding data change messages are synchronized to a data analysis platform, an index building module in the data analysis platform monitors the data change messages, then pulls the latest data, builds an incremental index and writes the incremental index into index data. When a user inquires data, a data inquiry instruction, for example, an SQL instruction, is sent to the data analysis platform, a computing engine in the data analysis platform inquires metadata information of the data table and index information corresponding to the data table, and generates a suitable data inquiry scheme to push down the computation according to the situation of the elastic index that has been actually constructed, and returns the result.
The data processing system provided by the embodiment of the invention partially constructs indexes for data by introducing the index proportion, so that a data query scheme meeting query conditions is generated according to the index proportion of index data in source data, and data query operation can be respectively executed based on the index data and the source data according to the intention of a user, so that partial computation acceleration is realized, and the requirement of the user on balance between computation cost and query performance is met. And the generation of incremental index data is driven by monitoring data change messages, so that the index data can be efficiently used in real time without waiting gaps and blocking the query of a user.
Example two
Fig. 3 is a flowchart of an embodiment of a data processing method provided by the present invention, where an execution main body of the method may be a front-end node in the data processing system, or may be various servers with data processing capability, or may be a device or a chip integrated on these devices. As shown in fig. 3, the data processing method includes the steps of:
s301, acquiring a data query instruction of a user.
In the embodiment of the invention, the data query instruction input by the user at least comprises a field to be queried and a query condition aiming at the field to be queried.
S302, obtaining the index proportion of the index data corresponding to the field to be inquired.
In the embodiment of the invention, after a data query instruction input by a user is obtained, a field to be queried is obtained from the data query instruction, and then the index proportion of index data corresponding to the field to be queried is obtained, wherein the index proportion is the proportion of the index data in source data. The user may preset the value of the index ratio according to his own needs, for example, 50%, 70%, etc.
In addition, in the embodiment of the present invention, when the user writes or deletes data in the source data, a corresponding data change message is generated. After a data change message for identifying a data change in the source data, that is, writing new data into the source data or deleting data in the source data is acquired, the index data may be updated for the changed data in the source data according to a preset index ratio.
Specifically, according to the offset of the data, the number of data that satisfies the index proportion may be selected from the changed data in the source data, and the index construction operation may be performed to form new index data. For example, the index ratio is 50%. Assuming that the total file length of the source data is 1000 characters, the index data can be constructed to the source data having an offset of 500. Of course, the user can arbitrarily set the index ratio, and elastic expansion (increase or decrease) of the index ratio is supported. After the user increases the index proportion, the data after the current offset can be automatically read according to the increased index proportion, index construction operation is carried out, and the data are updated to the index data. After the user reduces the index proportion, the corresponding index data can be deleted according to the index proportion after taking effect and aiming at the data before the current offset.
And S303, generating a data query scheme meeting the query condition according to the index proportion.
In the embodiment of the present invention, after the data query plan is generated, the data query plan may be used to instruct the compute node to perform a data query operation based on the index data and to perform a data query operation based on the source data.
In addition, if the user updates the index ratio, the operation of generating the data query plan is re-executed according to the updated index ratio after the user updates the index ratio.
Fig. 4 is a flowchart of data query according to a data query scheme according to an embodiment of the present invention. As shown in fig. 4, the process includes the following steps:
s401, based on the index data, index query operation is carried out.
S402, conducting aggregation operation on the results of the index query.
S403, performing table scanning operation based on the source data.
S404, a filtering process is performed on the result of the table scan operation.
And S405, performing aggregation operation on the result of the filtering processing.
And S406, merging the aggregation result of the index query and the aggregation result of the source data filtering processing.
And S407, outputting the combined result.
In the embodiment of the invention, a mixed query scheme that part of index data is used and part of index data is pulled from source data can be realized, so that part of calculation is accelerated, and the time requirement on business is finally met.
According to the data processing method provided by the embodiment of the invention, indexes are partially constructed for data by introducing the index proportion, so that a data query scheme meeting query conditions is generated according to the index proportion of index data in source data, and data query operation can be respectively executed according to user intention based on the index data and the source data, so that partial computation acceleration is realized, and the requirement of a user on balance between computation cost and query performance is met. And the generation of incremental index data is driven by monitoring data change messages, so that the index data can be efficiently used in real time without waiting gaps and blocking the query of a user.
EXAMPLE III
Fig. 5 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention, which can be used to execute the method steps shown in fig. 3. As shown in fig. 5, the data processing apparatus may include: an instruction obtaining module 51, an index proportion obtaining module 52 and a scheme generating module 53.
The instruction obtaining module 51 is configured to obtain a data query instruction of a user, where the data query instruction at least includes a field to be queried and a query condition for the field to be queried; the index proportion obtaining module 52 is configured to obtain an index proportion of index data corresponding to the field to be queried, where the index proportion is a proportion of the index data in the source data; the scheme generating module 53 is configured to generate a data query scheme meeting the query condition according to the index proportion, where the data query scheme is used to instruct to perform a data query operation based on the index data and to perform the data query operation based on the source data.
In the embodiment of the present invention, after the instruction obtaining module 51 obtains the data query instruction input by the user, the index ratio obtaining module 52 obtains the field to be queried from the data query instruction, and further obtains the index ratio of the index data corresponding to the field to be queried. The user may preset the value of the index ratio according to his own needs, for example, 50%, 70%, etc. A data query plan is then generated by the plan generation module 53 to instruct the compute nodes to perform data query operations based on the index data and to perform data query operations based on the source data.
In addition, the data processing apparatus provided in the embodiment of the present invention may further include: an index update module 54. The index updating module 54 may be configured to update index data for changed data in the source data according to a preset index proportion after obtaining a data change message, where the data change message is used to identify data change in the source data.
Specifically, the index updating module 54 may include: a message acquisition unit 541 and an index update unit 542.
The message acquiring unit 541 may be configured to acquire a data change message; the index updating unit 542 may be configured to select, according to the offset of the data, data of a number that satisfies the index proportion from the changed data in the source data, and perform an index building operation to form new index data.
Further, in the embodiment of the present invention, the index updating unit may be further configured to, after the user increases the index proportion, read data after the current offset according to the increased index proportion, perform an index building operation, and update the index data. The index updating unit 542 may further be configured to, after the index proportion is decreased by the user, delete the corresponding index data with respect to the data before the current offset according to the decreased index proportion.
Furthermore, in this embodiment of the present invention, the scheme generating module 53 may be further configured to, after the user updates the index ratio, re-execute the operation of generating the data query scheme according to the updated index ratio.
The functions of the modules in the embodiments of the present invention are described in detail in the above method embodiments, and are not described herein again.
The data processing device provided by the embodiment of the invention partially constructs indexes for data by introducing the index proportion, so that a data query scheme meeting query conditions is generated according to the index proportion of index data in source data, and data query operation can be respectively executed based on the index data and the source data according to the intention of a user, so that partial computation acceleration is realized, and the requirement of the user on balance between computation cost and query performance is met. And the generation of incremental index data is driven by monitoring data change messages, so that the index data can be efficiently used in real time without waiting gaps and blocking the query of a user.
Example four
The internal functions and structure of the data processing apparatus, which can be implemented as an electronic device, are described above. Fig. 6 is a schematic structural diagram of an embodiment of an electronic device provided in the present invention. As shown in fig. 6, the electronic device includes a memory 61 and a processor 62.
And a memory 61 for storing programs. In addition to the above-described programs, the memory 61 may also be configured to store other various data to support operations on the electronic device. Examples of such data include instructions for any application or method operating on the electronic device, contact data, phonebook data, messages, pictures, videos, and so forth.
The memory 61 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The processor 62 is not limited to a Central Processing Unit (CPU), but may be a processing chip such as a Graphic Processing Unit (GPU), a Field Programmable Gate Array (FPGA), an embedded neural Network Processor (NPU), or an Artificial Intelligence (AI) chip. A processor 62, coupled to the memory 61, that executes programs stored by the memory 61 for:
acquiring a data query instruction of a user, wherein the data query instruction at least comprises a field to be queried and a query condition aiming at the field to be queried;
acquiring an index proportion of index data corresponding to a field to be queried, wherein the index proportion is the proportion of the index data in source data;
and generating a data query scheme meeting the query condition according to the index proportion, wherein the data query scheme is used for indicating that the data query operation is executed based on the index data and the data query operation is executed based on the source data.
Further, as shown in fig. 6, the electronic device may further include: communication components 63, power components 64, audio components 65, a display 66, and other components. Only some of the components are schematically shown in fig. 6, and the electronic device is not meant to include only the components shown in fig. 6.
The communication component 63 is configured to facilitate wired or wireless communication between the electronic device and other devices. The electronic device may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 63 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 63 further comprises a Near Field Communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
A power supply component 64 provides power to the various components of the electronic device. The power components 64 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for an electronic device.
The audio component 65 is configured to output and/or input an audio signal. For example, the audio assembly 65 includes a Microphone (MIC) configured to receive external audio signals when the electronic device is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 61 or transmitted via the communication component 63. In some embodiments, audio assembly 65 also includes a speaker for outputting audio signals.
The display 66 includes a screen, which may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (17)

1. A data processing method, comprising:
acquiring a data query instruction of a user, wherein the data query instruction at least comprises a field to be queried and a query condition aiming at the field to be queried;
acquiring an index proportion of index data corresponding to the field to be queried, wherein the index proportion is the proportion of the index data in source data;
and generating a data query scheme meeting the query condition according to the index proportion, wherein the data query scheme is used for indicating that data query operation is executed based on the index data and data query operation is executed based on the source data.
2. The data processing method of claim 1, further comprising:
after a data change message is acquired, updating the index data aiming at the changed data in the source data according to a preset index proportion, wherein the data change message is used for identifying the data change in the source data.
3. The data processing method of claim 2, wherein the data change message is used to identify writing of new data into the source data.
4. The data processing method of claim 2, wherein the data change message is used to identify deletion of data in the source data.
5. The data processing method according to claim 2, wherein the updating the index data for the changed data in the source data according to a preset index proportion comprises:
and selecting data with the quantity meeting the index proportion from the changed data in the source data according to the offset of the data, and performing index construction operation to form new index data.
6. The data processing method of claim 5, further comprising:
and after the index proportion is increased by the user, reading the data after the current offset according to the increased index proportion, performing index construction operation, and updating the index data.
7. The data processing method of claim 5, further comprising:
and after the index proportion is reduced by the user, deleting the corresponding index data aiming at the data before the current offset according to the reduced index proportion.
8. The data processing method of any one of claims 1 to 7, further comprising:
and after the index proportion is updated by the user, re-executing the operation of generating the data query scheme according to the updated index proportion.
9. A data processing apparatus, comprising:
the device comprises an instruction acquisition module, a query module and a query module, wherein the instruction acquisition module is used for acquiring a data query instruction of a user, and the data query instruction at least comprises a field to be queried and a query condition aiming at the field to be queried;
an index proportion obtaining module, configured to obtain an index proportion of index data corresponding to the field to be queried, where the index proportion is a proportion of the index data in source data;
and the scheme generation module is used for generating a data query scheme meeting the query condition according to the index proportion, wherein the data query scheme is used for indicating that data query operation is executed based on the index data and data query operation is executed based on the source data.
10. The data processing apparatus of claim 9, further comprising:
and the index updating module is used for updating the index data aiming at the changed data in the source data according to a preset index proportion after a data change message is obtained, wherein the data change message is used for identifying the data change in the source data.
11. The data processing apparatus of claim 10, wherein the index update module comprises:
a message acquiring unit, configured to acquire the data change message;
and the index updating unit is used for selecting data with the quantity meeting the index proportion from the changed data in the source data according to the offset of the data, and performing index construction operation to form new index data.
12. The data processing apparatus according to claim 11, wherein the index updating unit is further configured to, after the user increases the index proportion, read data after the current offset according to the increased index proportion, perform an index building operation, and update the index data.
13. The data processing apparatus according to claim 11, wherein the index updating unit is further configured to delete the corresponding index data for the data before the current offset according to the reduced index proportion after the index proportion is reduced by the user.
14. The data processing apparatus according to any of claims 9 to 13, wherein the schema generation module is further configured to, after the user updates the index scale, re-execute the operation of generating the data query schema according to the updated index scale.
15. A data processing system, comprising: a front-end node, a compute node, and a storage node, wherein,
the front node is used for acquiring a data query instruction of a user, wherein the data query instruction at least comprises a field to be queried and a query condition aiming at the field to be queried; acquiring an index proportion of index data corresponding to the field to be queried, wherein the index proportion is the proportion of the index data in source data; generating a data query scheme meeting the query condition according to the index proportion, wherein the data query scheme is used for indicating that data query operation is executed based on the index data and data query operation is executed based on the source data;
the computing node is used for receiving the data query scheme generated by the front node and executing computing tasks in the data query scheme;
the storage node is used for storing the source data.
16. An electronic device, comprising:
a memory for storing a program;
a processor for executing the program stored in the memory for:
acquiring a data query instruction of a user, wherein the data query instruction at least comprises a field to be queried and a query condition aiming at the field to be queried;
acquiring an index proportion of index data corresponding to the field to be queried, wherein the index proportion is the proportion of the index data in source data;
and generating a data query scheme meeting the query condition according to the index proportion, wherein the data query scheme is used for indicating that data query operation is executed based on the index data and data query operation is executed based on the source data.
17. A computer-readable storage medium on which a computer program is stored which, when executed by a processor, implements:
acquiring a data query instruction of a user, wherein the data query instruction at least comprises a field to be queried and a query condition aiming at the field to be queried;
acquiring an index proportion of index data corresponding to the field to be queried, wherein the index proportion is the proportion of the index data in source data;
generating a data query plan meeting the query condition according to the index proportion, wherein,
the data query plan is used for indicating that data query operation is executed based on the index data and data query operation is executed based on the source data.
CN201910866399.4A 2019-09-12 2019-09-12 Data processing method, device and system, electronic equipment and computer readable storage medium Active CN112486979B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910866399.4A CN112486979B (en) 2019-09-12 2019-09-12 Data processing method, device and system, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910866399.4A CN112486979B (en) 2019-09-12 2019-09-12 Data processing method, device and system, electronic equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN112486979A true CN112486979A (en) 2021-03-12
CN112486979B CN112486979B (en) 2023-12-22

Family

ID=74920740

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910866399.4A Active CN112486979B (en) 2019-09-12 2019-09-12 Data processing method, device and system, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112486979B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113934760A (en) * 2021-10-15 2022-01-14 珠海百丰网络科技有限公司 Financial data identification and transmission system and method based on artificial intelligence model

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1164509A2 (en) * 2000-06-15 2001-12-19 Ncr International Inc. Aggregate join index for relational databases
CN1828607A (en) * 2006-04-03 2006-09-06 无锡永中科技有限公司 Data search method for tree-type structural file
CN102968464A (en) * 2012-11-08 2013-03-13 广东电子工业研究院有限公司 Index-based local resource quick retrieval system and retrieval method thereof
CN104536962A (en) * 2014-11-11 2015-04-22 珠海天琴信息科技有限公司 Data query method and data query device used in embedded system
CN105302869A (en) * 2015-09-29 2016-02-03 烽火通信科技股份有限公司 HBase secondary index query and storage system and query method
CN106383830A (en) * 2016-08-23 2017-02-08 浙江宇视科技有限公司 Data retrieval method and equipment
WO2017067117A1 (en) * 2015-10-21 2017-04-27 华为技术有限公司 Data query method and device
CN106909642A (en) * 2017-02-20 2017-06-30 中国银行股份有限公司 Database index method and system
CN109063186A (en) * 2018-08-27 2018-12-21 郑州云海信息技术有限公司 A kind of General query method and relevant apparatus
CN109376156A (en) * 2015-06-08 2019-02-22 南京航空航天大学 Read the method with the hybrid index of storage perception
CN109902088A (en) * 2019-02-13 2019-06-18 北京航空航天大学 A kind of data index method towards streaming time series data
WO2019128318A1 (en) * 2017-12-29 2019-07-04 华为技术有限公司 Data processing method, apparatus and system

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1164509A2 (en) * 2000-06-15 2001-12-19 Ncr International Inc. Aggregate join index for relational databases
CN1828607A (en) * 2006-04-03 2006-09-06 无锡永中科技有限公司 Data search method for tree-type structural file
CN102968464A (en) * 2012-11-08 2013-03-13 广东电子工业研究院有限公司 Index-based local resource quick retrieval system and retrieval method thereof
CN104536962A (en) * 2014-11-11 2015-04-22 珠海天琴信息科技有限公司 Data query method and data query device used in embedded system
CN109376156A (en) * 2015-06-08 2019-02-22 南京航空航天大学 Read the method with the hybrid index of storage perception
CN105302869A (en) * 2015-09-29 2016-02-03 烽火通信科技股份有限公司 HBase secondary index query and storage system and query method
WO2017067117A1 (en) * 2015-10-21 2017-04-27 华为技术有限公司 Data query method and device
CN106383830A (en) * 2016-08-23 2017-02-08 浙江宇视科技有限公司 Data retrieval method and equipment
CN106909642A (en) * 2017-02-20 2017-06-30 中国银行股份有限公司 Database index method and system
WO2019128318A1 (en) * 2017-12-29 2019-07-04 华为技术有限公司 Data processing method, apparatus and system
CN109063186A (en) * 2018-08-27 2018-12-21 郑州云海信息技术有限公司 A kind of General query method and relevant apparatus
CN109902088A (en) * 2019-02-13 2019-06-18 北京航空航天大学 A kind of data index method towards streaming time series data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
郑晓东;王梅;陈德华;张碧莹;: "一种基于Spark的分布式时态索引方法", 计算机应用与软件, no. 05 *
陆慧琳;黄博;: "基于双索引的子图查询算法", 计算机工程, no. 01 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113934760A (en) * 2021-10-15 2022-01-14 珠海百丰网络科技有限公司 Financial data identification and transmission system and method based on artificial intelligence model
CN113934760B (en) * 2021-10-15 2022-06-17 珠海百丰网络科技有限公司 Financial data identification and transmission system and method based on artificial intelligence model

Also Published As

Publication number Publication date
CN112486979B (en) 2023-12-22

Similar Documents

Publication Publication Date Title
KR102423125B1 (en) Database syncing
CN107943840B (en) Data processing method, system and computer readable storage medium
CN107133309B (en) Method and device for storing and querying process example, storage medium and electronic equipment
CN108572789B (en) Disk storage method and device, message pushing method and device and electronic equipment
CN108932286B (en) Data query method and device
CN110765165B (en) Method, device and system for synchronously processing cross-system data
US11516169B2 (en) Electronic messaging platform that allows users to change the content and attachments of messages after sending
US10241716B2 (en) Global occupancy aggregator for global garbage collection scheduling
CN112764663B (en) Space management method, device and system for cloud storage space, electronic equipment and computer readable storage medium
CN110442844B (en) Data processing method, device, electronic equipment and storage medium
CN112486979B (en) Data processing method, device and system, electronic equipment and computer readable storage medium
CN113190517A (en) Data integration method and device, electronic equipment and computer readable medium
US11797218B2 (en) Method and device for detecting slow node and computer-readable storage medium
CN110851398A (en) Garbage data recovery processing method and device and electronic equipment
CN115422203A (en) Data management method, device, equipment and medium for block chain distributed system
CN114969044A (en) Materialized column creating method based on data lake and data query method
CN115495519A (en) Report data processing method and device
CN114153895A (en) Construction method of real-time warehouse counting frame and real-time warehouse counting frame
CN114036917A (en) Report generation method and device, computer equipment and storage medium
CN110515807B (en) Database table monitoring method, device, equipment and storage medium
CN111831472B (en) Snapshot creation method and device and electronic equipment
CN113378022A (en) In-station search platform, search method and related device
CN112988822A (en) Data query method, device, equipment, readable storage medium and product
CN111726402A (en) User behavior data processing method and device, electronic equipment and storage medium
CN110704691A (en) Data management method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant