CN111400427A - Data storage method, data query method, data storage device, data query device and computing equipment - Google Patents

Data storage method, data query method, data storage device, data query device and computing equipment Download PDF

Info

Publication number
CN111400427A
CN111400427A CN201910002235.7A CN201910002235A CN111400427A CN 111400427 A CN111400427 A CN 111400427A CN 201910002235 A CN201910002235 A CN 201910002235A CN 111400427 A CN111400427 A CN 111400427A
Authority
CN
China
Prior art keywords
text
block
query
line
text object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910002235.7A
Other languages
Chinese (zh)
Inventor
吴金虎
徐奇
邵楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201910002235.7A priority Critical patent/CN111400427A/en
Publication of CN111400427A publication Critical patent/CN111400427A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data storage method, which comprises the following steps: acquiring a text object to be processed, wherein the text object comprises a plurality of text lines; dividing the text object into a plurality of text blocks, wherein each text block comprises at least one text line; and storing the number of text blocks included in the text object and the block information of each text block as index information of the text object in association with the text object. The invention also discloses a corresponding data storage device, a data query method and device and computing equipment.

Description

Data storage method, data query method, data storage device, data query device and computing equipment
Technical Field
The invention relates to the technical field of data storage, in particular to a data storage method, a data query method, a data storage device, a data query device and computing equipment.
Background
An Object Storage (Object Storage) is a data Storage method without a hierarchical structure, and an Object Storage Service (OSS) is a cloud Storage Service based on an Object Storage technology. In the cloud storage system, each Object (Object) is attached to a certain storage space (Bucket). The inside of the same storage space is flat, and all objects in the storage space are directly attached to the storage space, so that a hierarchical directory relation does not exist.
In the prior art, after a user uploads an object to a cloud storage, if the user wants to query the object, all object data needs to be downloaded to the local by using a GetObject interface, and then analysis and filtering are performed. The query method is long in time consumption, low in efficiency and capable of wasting bandwidth and client resources. The problem of query efficiency is more obvious under the scenes that the size of an object file to be queried is larger, the object is frequently queried and the like.
Disclosure of Invention
To this end, the present invention provides a data storage, query method, apparatus and computing device in an attempt to solve, or at least alleviate, the problems identified above.
According to an aspect of the present invention, there is provided a data storage method, including: acquiring a text object to be processed, wherein the text object comprises a plurality of text lines; dividing the text object into a plurality of text blocks, wherein each text block comprises at least one text line; and taking the number of text blocks included in the text object and the block information of each text block as index information of the text object, and storing the index information in association with the text object.
According to an aspect of the present invention, there is provided a data query method, including: receiving a query request, the query request including a target object; acquiring index information of a target object, wherein the index information comprises the number of text blocks included in the target object and block information of each text block; dividing the target object into a plurality of tiles, the tiles including at least one text block; the plurality of tiles are queried in parallel.
According to an aspect of the present invention, there is provided a data storage device comprising: the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is suitable for acquiring a text object to be processed, and the text object comprises a plurality of text lines; a block module adapted to divide the text object into a plurality of text blocks, the text blocks including at least one text line; and the index generation module is suitable for storing the number of text blocks included in the text object and the block information of each text block as the index information of the text object in a way of being associated with the text object.
According to an aspect of the present invention, there is provided a data query apparatus including: a receiving module adapted to receive a query request, the query request including a target object; the index acquisition module is suitable for acquiring index information of the target object, wherein the index information comprises the number of text blocks included in the target object and block information of each text block; a partitioning module adapted to partition the target object into a plurality of tiles, the tiles comprising at least one text block; and the parallel query module is suitable for performing query on the plurality of subareas in parallel.
According to one aspect of the present invention, there is provided a data storage system comprising a client and a server, the client being adapted to send an index generation request to the server, the index generation request comprising a text object to be processed, the text object comprising a plurality of text lines; the server divides the text object into a plurality of text blocks based on the index generation request, each text block comprises at least one text line, and the number of the text blocks included in the text object and the block information of each text block are used as the index information of the text object and are stored in a correlation mode with the text object.
According to one aspect of the invention, a data query system is provided, which comprises a client and a server, wherein the client is suitable for sending a data query request to the server, and the query request comprises a target object; the server acquires index information of the target object based on the query request, wherein the index information comprises the number of text blocks included in the target object and block information of each text block; dividing the target object into a plurality of tiles, the tiles including at least one text block; the plurality of tiles are queried in parallel.
According to an aspect of the invention, there is provided a computing device comprising: at least one processor; and a memory storing program instructions, wherein the program instructions are configured to be executed by the at least one processor, the program instructions comprising instructions for performing the data storage method and/or the data query method as described above.
According to still another aspect of the present invention, there is provided a readable storage medium storing program instructions which, when read and executed by a computing device, cause the computing device to perform the data storage method and/or the data query method as described above.
According to the technical scheme of the invention, index information is established for the text object, and the index information comprises the number of text blocks included in the text object and block information of each text block. When a user initiates a query request, the target object can be queried in segments according to the index information of the target object to be queried, and the query processes of all segments are performed in parallel, so that the query time is greatly shortened, and the query efficiency is improved.
In addition, according to the technical scheme of the invention, the fragment query process of the target object is executed at the server side. And after summarizing the fragment query result by the server, returning the fragment query result to the client. Compared with the technical scheme that the client downloads all the object data to the local and then queries, the method saves the bandwidth and the client resources.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings, which are indicative of various ways in which the principles disclosed herein may be practiced, and all aspects and equivalents thereof are intended to be within the scope of the claimed subject matter. The above and other objects, features and advantages of the present disclosure will become more apparent from the following detailed description read in conjunction with the accompanying drawings. Throughout this disclosure, like reference numerals generally refer to like parts or elements.
FIG. 1 shows a schematic diagram of a data storage system 100 according to one embodiment of the present invention;
FIG. 2 illustrates a flow diagram of a data storage method 200 according to one embodiment of the invention;
FIG. 3 shows a schematic diagram of a data query method 300, according to one embodiment of the invention;
FIGS. 4-6 are schematic diagrams illustrating interaction processes of data storage and query according to three embodiments of the present invention;
FIG. 7 shows a schematic diagram of a computing device 700, according to one embodiment of the invention;
FIG. 8 shows a schematic diagram of a data storage device 800 according to one embodiment of the invention;
fig. 9 shows a schematic diagram of a data query apparatus 900 according to an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
FIG. 1 shows a schematic diagram of a data storage system 100 according to one embodiment of the invention. As shown in FIG. 1, data storage system 100 includes clients 110 and servers 120.
The client 110 is a device on the user side, and may be a personal computer such as a desktop computer and a notebook computer, or a mobile device such as a mobile phone, a tablet computer, a multimedia device, and a smart wearable device, but is not limited thereto. The server 120 is a server for providing an Object Storage Service (OSS), and may be implemented as a single server or a distributed service cluster composed of a plurality of servers.
The server 120 may be used to provide functions such as, but not limited to, file upload, download, query, delete, etc. to clients. The file uploaded to the server 120 by the client 110 is stored in a storage space (Bucket) in the form of an Object (Object) (the storage space is created by the user before uploading the file). The format of the file may be, for example, CSV (Comma-Separated Values), JSON (JavaScript Object Notation), etc., but is not limited thereto. The Object is composed of an Object identifier (Key), user Data (Data) and Meta information (Object Meta), wherein the Object identifier is used for uniquely identifying a certain Object in the storage space; the meta information is a group of key value pairs and is used for representing some attributes of the object, such as the last modification time, the file size and other information, and the user can also store some customized information in the meta information.
The user can query the object which is uploaded to the server 120 through the client 110, that is, the client 110 sends a query request to the server 120, and the server 120 returns the query result to the client 110. In order to improve the efficiency of data query, the present invention provides a data storage and data query method, which is executed in the server 120. The data storage method is used for generating index information for the stored text objects, and the data query method is used for carrying out fragment parallel query on the target object to be queried according to the index information. It should be noted that, although only one server 120 is shown in fig. 1, it can be understood by those skilled in the art that the server for generating the index information of the text object and the server for performing the fragmented parallel query on the target object may be the same server or different servers. For simplifying the description, the technical solution of the present invention will be described below by taking the same server (server 120) as the server for generating the text object index information and the server for performing the fragment-wise parallel query on the target object as an example.
FIG. 2 shows a flow diagram of a data storage method 200 according to one embodiment of the invention. Method 200 is performed on server 120 for generating index information for text objects. As shown in fig. 2, the method 200 begins at step S210.
In step S210, a text object to be processed is obtained, where the text object includes a plurality of text lines.
In one embodiment, the server 120 may autonomously perform the method 200 without being triggered by the client 110, and accordingly, in step S210, the server 120 autonomously acquires a text object to be processed and generates index information of the text object.
In another embodiment, the server 120 executes the method 200 based on the index generation request sent by the client 110, and accordingly, in step S210, the server 120 receives the index generation request sent by the client 110, and the index generation request includes the text object to be processed; subsequently, the server 120 obtains the text object to be processed specified by the client 110.
The text object includes a plurality of text lines, each text line including at least one character. The text object may be in a Comma Separated Value (CSV) file format, or in a file format using other self-defined separators, such as a Tab-separated value (TSV) file format, and the data storage method 200 of the present invention is applicable to a text object stored in a file format having separators, but the present invention is not limited to the specific file format of the text object.
For example, the following is an example of a text object in CSV file format:
aaa,bbb,“ccc”CRLF
ddd,eee,fff CRLF
ggg,hhh,iii CRLF
jjj,kkk,lll CRLF
mmm,nnn,ooo CRLF
ppp,qqq,rrr CRLF
sss,ttt,uuu CRLF
vvv,www,xxx CRLF
yyy,zzz,000CRLF
the CSV file comprises 9 text lines, wherein line separators are CR L F (Carriage-Return L ine-Feed) to separate the lines, and the line separators are CR L F (r \ n). The CSV file comprises 3 columns, and the column separators are half-angle commas (,), that is, each text line comprises 3 modules, for example, in the first text line, aaa, bbb and ccc are respectively a module, and the modules are separated by half-angle commas.
When the server 120 performs the method 200 based on the index generation request sent by the client 110, the user may specify the row separator, column separator, and quotation mark characters of the file object, i.e., the row separator, column separator, and quotation mark characters are included in the index generation request sent by the client 110. According to an embodiment, the index generation request may further include an overlay identifier (overlay identity), where a value of the overlay identifier is yes (true) or no (false), and the overlay identifier is used to indicate whether the newly generated index information covers the existing index information.
Subsequently, in step S220, the text object is divided into a plurality of text blocks, each text block including at least one text line.
According to an embodiment, step S220 may further be implemented according to the following steps S222 to S226.
Step S222: an upper limit on the number of text blocks is set. The upper limit of the number of text blocks can be set by a person skilled in the art, and the specific value of the upper limit of the number is not limited by the invention.
Step S224: and determining the number of text lines included in each text block according to the set upper limit of the number and the total number of the text lines included in the text object.
The total number of lines of a text object may be determined from the line separators. For example, according to the RFC4180 specification, if a text object is a CSV file without header information, if the file ends with a line separator, then the total number of lines of the text object is the same as the number of line separators; if the end of the file is not a line separator, then the total number of lines of the text object is the number of line separators plus one. Based on this rule, for the example of a text object in CSV format given above, it includes 9 line separators and ends with a line separator, so the text object includes 9 lines of text.
It should be noted that the number of text lines included in each text block is determined according to the upper limit of the number of text blocks and the total number of lines of the text object, but the invention is not limited to the specific corresponding algorithm of the number of text lines included in each text block, the upper limit of the number of text lines and the total number of lines. In addition, the number of text lines included in each text block may be the same or different.
In one embodiment, the number of text lines included in each text block is the same, and the number of text lines included in each text block is determined according to the following method: if the total line number is less than the upper limit of the number, each text block comprises a text line; if the total number of lines is greater than or equal to the upper limit of the number, the number of text lines included in each text block is the minimum integer greater than or equal to the quotient of the total number of lines and the upper limit of the number, that is, the number of text lines included in each text block is
Figure BDA0001934152900000071
Wherein the content of the first and second substances,
Figure BDA0001934152900000072
indicating rounding up. Those skilled in the art can understand that when the total number of lines is greater than or equal to the upper limit of the number, if the total number of lines can be divided by the upper limit of the number, the number of text lines included in each text block can be strictly the same; if the total line number cannot be exactly divided by the upper limit of the number, the number of text lines included in the text blocks except the last text block is the same, and the number of text lines included in the last text block is less than that of the text blocks.
For example, the text object in CSV format given above comprises 9 text lines, i.e. a total number of lines of 9. If the upper limit of the number of text blocks is set to 10, the total number of lines is smaller than the upper limit of the number, and each text block comprises one text line. If the upper limit of the number of text blocks is set to 5, the total number of lines is greater than the upper limit of the number, and the number of text lines included in each text block is the smallest integer greater than or equal to 9/5, that is, the number of text lines included in each text block is 9/5
Figure BDA0001934152900000073
In addition, in order to better illustrate the technical solution of the present invention, in the embodiments given in the present specification, the total number of rows, the total number of blocks, and the upper limit of the number are all small. However, those skilled in the art can understand that in an actual application scenario, the total lines, the total blocks, and the upper limit of the number are much larger than the numerical values in the embodiments of the present specification, and especially the total lines of the text object may reach the order of hundreds of thousands and millions, and the present invention is also proposed to solve the fast query of such a large text object.
Step S226: and dividing the text object into a plurality of text blocks according to the number of text lines included in each text block.
The text object may be segmented according to the number of text lines included in each text block determined in step S224. For example, if each text block includes 1 text line with the same number of text lines, the 1 st text line of the text object is taken as the 1 st text block, the 2 nd text line is taken as the 2 nd text block, the 3 rd text line is taken as the 3 rd text block, and so on. If each text block comprises 5 text lines, taking the first 5 text lines of the text object as the 1 st text block, taking the 6 th to 10 th text lines as the 2 nd text block, taking the 11 th to 15 th text lines as the 3 rd text block, and so on. If the number of text lines included in each text block is different, the number of text lines included in the ith text block is niThen the first in the text object
Figure BDA0001934152900000081
To the first
Figure BDA0001934152900000082
The line is taken as the ith text block.
Subsequently, in step S230, the number of text blocks included in the text object and the block information of each text block are stored in association with the text object as the index information of the text object. In one embodiment, the index information may be written in the Meta information (Object Meta) of the text Object.
It should be noted that the block information is any information capable of representing the characteristics of the text block, and can be used for quickly positioning the text block, the text line, and even the character. In some embodiments, the block information may include, for example and without limitation, the number of text lines included in the text block, the line sequence number of the first text line of the text block in the text object, the character sequence number of the first character of the text block in the text object, and the like. It will be understood by those skilled in the art that the block information may include other information items besides the three information items listed above, such as block numbers of text blocks in text objects, the number of characters included in text blocks, and so on, and the present invention does not limit the number and kinds of information items included in the block information.
In one embodiment, the block information is a line number of a first text line of the text block in the text object, and accordingly, the index information of the text object includes the number of text blocks included in the text object and the line number of the first text line of each text block in the text object. For the text object in the CSV format given above, if the number of text lines included in each text block is 2, the 1 st and 2 nd lines of the text object are the 1 st text block, the 3 rd and 4 th lines of the text object are the 2 nd text block, …, and the 9 th line of the text object is the 5 th text block. Accordingly, the index information of the text object includes: the total number of text blocks is 5, the line sequence number of the first text line of the 1 st text block in the text object is 1, the line sequence number of the first text line of the 2 nd text block in the text object is 3, the line sequence number of the first text line of the 3 rd text block in the text object is 5, the line sequence number of the first text line of the 4 th text block in the text object is 7, and the line sequence number of the first text line of the 5 th text block in the text object is 9.
In another embodiment, the block information includes, in addition to the line sequence number of the first text line of the text block in the text object, the character sequence number of the first character of each text block in the text object. For example, for the text object in the CSV format given above, if the number of text lines included in each text block is 2, the 1 st and 2 nd lines of the text object are the 1 st text block, the 3 rd and 4 th lines of the text object are the 2 nd text block, …, and the 9 th line of the text object is the 5 th text block. Accordingly, the index information of the text object includes: a total number of text blocks of 5; the line sequence number 1 of the first text line of the 1 st text block in the text object and the character sequence number 1 of the first character in the text object; the line sequence number 3 of the first text line of the 2 nd text block in the text object, and the character sequence number 19 of the first character in the text object; line number 5 of the first text line of the 3 rd text block in the text object, character number 37 of the first character in the text object; line number 7 of the first text line of the 3 rd text block in the text object, character number 55 of the first character in the text object; the line sequence number 9 of the first text line of the 3 rd text block in the text object and the character sequence number 73 of the first character in the text object.
After storing the index information of the text object in association with the text object, according to an embodiment, if the server 120 executes the method 200 based on the index generation request sent by the client 110, after the method 200 is executed, the server 120 preferably returns feedback information for executing the method 200 to the client 110, for example, the total number of lines and the total number of blocks included in the text object are returned to the client.
FIG. 3 shows a flow diagram of a data query method 300 according to one embodiment of the invention. The method 300 is executed on the server 120, and is used for performing fragment query on a target object to be queried. As shown in fig. 3, the method 300 begins at step S310.
In step S310, a query request is received, the query request including a target object.
For example, the client 110 sends a query request to the server 120, which may be embodied as an SQ L statement, an example of which is the SQ L statement as follows:
select*from ossobject where_2==‘qqq’
the ossobject in the SQ L statement is the target object, and _2 ═ qqq' is the query condition, and the SQ L statement means that all data records (one data record, i.e. one text line) with the value of qqq in the 2 nd column are returned.
Subsequently, in step S320, index information of the target object is acquired, the index information including the number of text blocks included in the target object and block information of each text block.
Subsequently, in step S330, the target object is divided into a plurality of tiles, each tile including at least one text block.
According to one embodiment, first, a concurrency degree is set, the number of text blocks included in each tile is equal to or greater than the minimum integer of the quotient of the total number of blocks and the concurrency degree, namely, the number of text blocks included in each tile is equal to
Figure BDA0001934152900000101
Wherein the content of the first and second substances,
Figure BDA0001934152900000102
indicating rounding up. For example, for the text object in the CSV format given above, if the number of text lines included in each text block is 2, the text object includes 5 text blocks, that is, the total number of blocks is 5. If the concurrency is set to 3, the number of text blocks included in each tile is 3
Figure BDA0001934152900000103
Then, the target object is divided into a plurality of sections according to the number of text blocks included in each section. For example, if the total number of blocks of the text object is 5 and the number of text blocks included in each tile is 2, the 1 st and 2 nd text blocks are divided into the 1 st tile, the 3 rd and 4 th text blocks are divided into the 2 nd tile, and the 5 th text block is divided into the 3 rd tile.
Subsequently, in step S340, a plurality of tiles are queried in parallel. And parallelly inquiring whether the data records meeting the inquiry conditions exist in the plurality of subareas, merging the inquiry results of the plurality of subareas, and returning the merged inquiry results to the client.
For example, for the text object in the CSV format given above, if the number of text lines included in each text block is 2, the text object is divided into 5 text blocks: the 1 st text block includes lines of 1, 2 text, the 2 nd text block includes lines of 3, 4 text, …, and the 5 th text block includes the 9 th line of text. The concurrency is set to 3, i.e. the text object is divided into 3 tiles: the 1 st parcel comprises the 1 st and 2 nd text blocks, the 2 nd parcel comprises the 3 rd and 4 th text blocks, and the 3 rd parcel comprises the 5 th text block.
When the SQ L statement select from the object, where _2 is qqq ', the 3 tiles are queried in parallel, that is, whether there is a data record satisfying the query condition — 2 is qqq' in each of the 3 tiles is queried, and after the query, there is no data record satisfying the condition in the 1 st tile (i.e., the 1 st to 4 th text lines) and the 3 rd tile (i.e., the 9 th text line), and the 6 th text line in the 2 nd tile satisfies the query condition.
Fig. 4 to 6 illustrate three embodiments of the data storage method 200 and the data query method 300 according to the present invention. In describing the embodiments shown in FIGS. 4-6, the text objects are the following CSV files named "ossobject":
aaa,bbb,“ccc”CRLF
ddd,eee,fff CRLF
ggg,hhh,iii CRLF
jjj,kkk,lll CRLF
mmm,nnn,ooo CRLF
ppp,qqq,rrr CRLF
sss,ttt,uuu CRLF
vvv,www,xxx CRLF
yyy,zzz,000CRLF
the query requests are all as follows: select from the object, where _2, qqq'.
In the embodiment shown in figure 4 of the drawings,
in step S410, the client 110 sends an index generation request to the server 120, where the index generation request indicates that the text object to be processed is an ossobject, and defines a line separator of the text object as a CR L F character (enter line change, i.e.,' r \ n), a column separator as a half-angle comma (,), a quotation character as a double quotation mark (""), and a coverage identifier as no (overwriting information exists), that is, the newly generated index information does not cover the existing index information.
Step S420: the server 120 determines whether the text object ossobject has index information based on the index generation request transmitted from the client 110. If the ossobject has the index information, the method 200 is not executed any more, and the existing index information is directly read and returned to the client in the subsequent step S430. If the text object ossobject does not have the index information, the method 200 is executed to generate the index information of the text object.
Step S430: the server 120 returns index information of the text object ossobject to the client. The server 120 may return all index information of the ossobject to the client, or may return a part of the index information to the client. For example, only the total number of lines and the total number of blocks of the text object ossobject are returned to the client.
In step S440, the client 110 sends a data query request to the server 120, the data query request being, for example, SQ L statement, select from object word _2 ═ qqq'.
Step S450: the server 120 executes the aforementioned method 300 to perform fragment query on the ossobject based on the data query request sent by the client 110.
Step S460: the server 120 returns the query result (line 6) to the client 110.
In the embodiment shown in FIG. 4, the index generation request is sent before the data query request, i.e., method 200 is invoked asynchronously before method 300, which may further save query time.
In the embodiment shown in figure 5 of the drawings,
step S510: the client 110 transmits an index generation request and a data query request to the server 120.
The index generation request indicates that the text object to be processed is an ossobject, and a line separator of the text object is defined as a CR L F symbol (enter line and change line, i.e., ' r \ n '), a column separator is a half-angle comma (,), a quotation character is a double quotation mark ('), and a coverage flag is no (overwriting information exists), i.e., newly generated index information does not cover existing index information.
The data query request is an SQ L statement, select from object word _2 ═ qqq'.
Step S520: the server 120 receives an index generation request and a data query request transmitted by the client 110. Based on the index generation request, it is determined whether the text object ossobject has index information. If the ossobject of the text object already has the index information, the method 200 is not executed any more, and the existing index information is directly read. If the text object ossobject does not have the index information, the method 200 is executed to generate the index information of the text object.
Step S530: and performing fragment query on the text object ossobject based on the data query request and the index information.
Step S540: the index generation result (total number of rows and total number of blocks for the ossobject) and the query result (line 6 of text) are returned to the client 110.
In the embodiment shown in fig. 5, the index generation request and the data query request are transmitted together, and at the same time, the override flag is specified as no in the index generation request. If the text object ossobject already has index information, the method 200 need not be executed to generate index information, and accordingly, the query time of the embodiment of fig. 5 is substantially equivalent to that of the embodiment of fig. 4. If the ossobject has no index information, the method 200 needs to be executed first to generate the index information, and accordingly, the query time of the embodiment of fig. 5 is longer than that of the embodiment of fig. 4.
In the embodiment shown in figure 6 of the drawings,
step S610: the server 120 executes the method 200 itself without being triggered by the client 110, and generates index information of the text object ossobject.
In step S620, the client 110 sends a data query request to the server 120, where the data query request is SQ L statement select from object word _2 ═ qqq'.
Step S630: the server 120 executes the aforementioned method 300 to perform fragment query on the ossobject based on the data query request sent by the client 110.
Step S640: the server 120 returns the query result (line 6) to the client 110.
FIG. 7 shows a schematic diagram of a computing device 700, according to one embodiment of the invention. As shown in fig. 7, in a basic configuration 702, a computing device 700 typically includes a system memory 706 and one or more processors 704. A memory bus 708 may be used for communicating between the processor 704 and the system memory 706.
Depending on the desired configuration, processor 704 may be any type of processing including, but not limited to, a microprocessor (μ P), a microcontroller (μ C), a digital information processor (DSP), or any combination thereof processor 604 may include one or more levels of cache such as a level one cache 710 and a level two cache 712, a processor core 714, and registers 716 example processor core 714 may include an arithmetic logic unit (A L U), a Floating Point Unit (FPU), a digital signal processing core (DSP core), or any combination thereof example memory controller 718 may be used with processor 704 or, in some implementations, memory controller 718 may be an internal part of processor 704.
Depending on the desired configuration, the system memory 706 may be any type of memory including, but not limited to: volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. The system memory 706 may include an operating system 720, one or more applications 722, and program data 724. The application 722 is actually a plurality of program instructions that direct the processor 704 to perform corresponding operations. In some embodiments, the application 722 may be arranged to cause the processor 704 to operate with program data 724 on an operating system.
The computing device 700 may also include an interface bus 740 that facilitates communication from various interface devices (e.g., output devices 742, peripheral interfaces 744, and communication devices 746) to the basic configuration 702 via the bus/interface controller 730. The example output devices 742 include a graphics processing unit 748 and an audio processing unit 750. They may be configured to facilitate communication with various external devices, such as a display or speakers, via one or more a/V ports 752. Example peripheral interfaces 744 can include a serial interface controller 754 and a parallel interface controller 756, which can be configured to facilitate communications with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device) or other peripherals (e.g., printer, scanner, etc.) via one or more I/O ports 758. An example communication device 746 may include a network controller 760, which may be arranged to facilitate communications with one or more other computing devices 762 over a network communication link via one or more communication ports 764.
A network communication link may be one example of a communication medium. Communication media may typically be embodied by computer readable instructions, data structures, program modules, and may include any information delivery media, such as carrier waves or other transport mechanisms, in a modulated data signal. A "modulated data signal" may be a signal that has one or more of its data set or its changes made in such a manner as to encode information in the signal. By way of non-limiting example, communication media may include wired media such as a wired network or private-wired network, and various wireless media such as acoustic, Radio Frequency (RF), microwave, Infrared (IR), or other wireless media. The term computer readable media as used herein may include both storage media and communication media.
In the computing device 700 according to the invention, the application 722 may include, for example, the data storage device 800 and/or the data query device 900, each of the devices 800 or 900 including a plurality of program instructions, but not the specific program instructions included therein. The data storage device 800 may instruct the processor 704 to perform the data storage method 200 of the present invention and the data query device 900 may instruct the processor 704 to perform the data query method 300 of the present invention, such that the computing device 700 may be implemented as the server 120 of the present invention.
FIG. 8 shows a schematic diagram of a data storage device 800 according to one embodiment of the invention. The data storage device 800 resides in the server 120 for performing the data storage method 200 of the present invention. As shown in fig. 8, the data storage device 800 includes an acquisition module 810, a chunking module 820, and an index generation module 830.
The obtaining module 810 is adapted to obtain a text object to be processed, where the text object includes a plurality of text lines. The obtaining module 810 is specifically configured to execute the method in step S210, and for processing logic and functions of the obtaining module 810, reference may be made to the related description of step S210, which is not described herein again.
A partitioning module 820 adapted to partition the text object into a plurality of text blocks, each text block comprising at least one text line. The block module 820 is specifically configured to execute the method of step S220, and for processing logic and function of the block module 820, reference may be made to the related description of step S220, which is not described herein again.
The index generating module 830 is adapted to store the number of text blocks included in the text object and the block information of each text block as the index information of the text object in association with the text object. The index generating module 830 is specifically configured to execute the method of step S230, and for processing logic and functions of the index generating module 830, reference may be made to the related description of step S230, which is not described herein again.
Fig. 9 shows a schematic diagram of a data query apparatus 900 according to an embodiment of the invention. The data query apparatus 900 resides in the server 120 and is used for executing the data query method 300 of the present invention. As shown in fig. 9, the data query apparatus 900 includes a receiving module 910, an index obtaining module 920, a partitioning module 930, and a parallel query module 940.
The receiving module 910 is adapted to receive a query request, where the query request includes a target object. The receiving module 910 is specifically configured to execute the method of step S310, and for processing logic and functions of the receiving module 910, reference may be made to the related description of step S310, which is not described herein again.
The index obtaining module 920 is adapted to obtain index information of the target object, where the index information includes the number of text blocks included in the target object and block information of each text block. The index obtaining module 920 is specifically configured to execute the method in step S320, and for processing logic and functions of the index obtaining module 920, reference may be made to the related description in step S320, which is not described herein again.
A partitioning module 930 adapted to partition the target object into a plurality of tiles, each tile comprising at least one text block. The partition module 930 is specifically configured to perform the method of step S330, and for processing logic and functions of the partition module 930, reference may be made to the related description of step S330, which is not described herein again.
A parallel query module 940 adapted to query the plurality of tiles in parallel. The parallel query module 940 is specifically configured to execute the method of step S340, and for processing logic and functions of the parallel query module 940, reference may be made to the relevant description of step S340, which is not described herein again.
The various techniques described herein may be implemented in connection with hardware or software or, alternatively, with a combination of both. Thus, the methods and apparatus of the present invention, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as removable hard drives, U.S. disks, floppy disks, CD-ROMs, or any other machine-readable storage medium, wherein, when the program is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.
In the case of program code execution on programmable computers, the computing device will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Wherein the memory is configured to store program code; the processor is configured to execute the data storage method and/or the data query method of the present invention according to instructions in the program code stored in the memory.
By way of example, and not limitation, readable media may comprise readable storage media and communication media. Readable storage media store information such as computer readable instructions, data structures, program modules or other data. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. Combinations of any of the above are also included within the scope of readable media.
In the description provided herein, algorithms and displays are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with examples of this invention. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules or units or components of the devices in the examples disclosed herein may be arranged in a device as described in this embodiment or alternatively may be located in one or more devices different from the devices in this example. The modules in the foregoing examples may be combined into one module or may be further divided into multiple sub-modules.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
Furthermore, some of the described embodiments are described herein as a method or combination of method elements that can be performed by a processor of a computer system or by other means of performing the described functions. A processor having the necessary instructions for carrying out the method or method elements thus forms a means for carrying out the method or method elements. Further, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is used to implement the functions performed by the elements for the purpose of carrying out the invention.
As used herein, unless otherwise specified the use of the ordinal adjectives "first", "second", "third", etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The present invention has been disclosed in an illustrative rather than a restrictive sense with respect to the scope of the invention, as defined in the appended claims.

Claims (19)

1. A method of data storage, comprising:
acquiring a text object to be processed, wherein the text object comprises a plurality of text lines;
dividing the text object into a plurality of text blocks, wherein each text block comprises at least one text line;
and taking the number of text blocks included in the text object and the block information of each text block as index information of the text object, and storing the index information in association with the text object.
2. The method of claim 1, wherein the block information comprises at least one of:
the number of text lines included in a text block, the line sequence number of the first text line of the text block in the text object, and the character sequence number of the first character of the text block in the text object.
3. The method of claim 1 or 2, wherein the step of dividing the text object into a plurality of text blocks comprises:
setting an upper limit of the number of text blocks;
determining the number of text lines included in each text block according to the upper limit of the number and the total number of the text lines included in the text object;
and dividing the text object into a plurality of text blocks according to the number of text lines included in each text block.
4. The method of claim 3, wherein the determining the number of text lines included in each text block according to the upper number limit and the total number of lines of text included in the text object comprises:
if the total line number is smaller than the upper limit of the number, each text block comprises a text line;
and if the total line number is greater than or equal to the upper limit of the number, the number of the text lines included in each text block is greater than or equal to the minimum integer of the quotient of the total line number and the upper limit of the number.
5. The method of claim 1, wherein the step of obtaining a text object to be processed comprises:
receiving an index generation request sent by a client, wherein the index generation request comprises a text object to be processed;
and acquiring the text object to be processed.
6. The method of claim 5, wherein the index generation request further includes a line separator, the total number of lines of the text object being determined from the line separator.
7. The method of claim 5, wherein the index generation request further comprises a column separator for separating columns in a line of text, an apostrophe, and an overlay identifier for indicating whether newly generated index information overlays existing index information.
8. The method of claim 5, further comprising: and returning the total line number and the total block number included by the text object to the client.
9. The method of claim 1, wherein the text object is in a Comma Separated Value (CSV) file format or in a custom separator file format.
10. A method of data query, comprising:
receiving a query request, the query request including a target object;
acquiring index information of a target object, wherein the index information comprises the number of text blocks included in the target object and block information of each text block;
dividing the target object into a plurality of tiles, the tiles including at least one text block;
the plurality of tiles are queried in parallel.
11. The method of claim 10, wherein the dividing the target object into a plurality of tiles comprises:
setting concurrency, wherein the number of the text blocks included in each fragment area is greater than or equal to the minimum integer of the quotient of the number of the text blocks and the concurrency;
and dividing the target object into a plurality of subareas according to the number of text blocks included in each subarea.
12. The method of claim 10, wherein the query request further includes a query condition;
the step of querying the plurality of tiles in parallel comprises: and inquiring whether data records meeting the inquiry condition exist in the plurality of subareas in parallel.
13. The method of claim 10, wherein after the step of querying the plurality of tiles in parallel, further comprising:
and merging the query results of the plurality of subareas and returning the merged query results to the client.
14. A data storage device comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is suitable for acquiring a text object to be processed, and the text object comprises a plurality of text lines;
a block module adapted to divide the text object into a plurality of text blocks, the text blocks including at least one text line;
and the index generation module is suitable for storing the number of text blocks included in the text object and the block information of each text block as the index information of the text object in a way of being associated with the text object.
15. A data query apparatus, comprising:
a receiving module adapted to receive a query request, the query request including a target object;
the index acquisition module is suitable for acquiring index information of the target object, wherein the index information comprises the number of text blocks included in the target object and block information of each text block;
a partitioning module adapted to partition the target object into a plurality of tiles, the tiles comprising at least one text block;
and the parallel query module is suitable for performing query on the plurality of subareas in parallel.
16. A data storage system comprises a client and a server,
the client is suitable for sending an index generation request to the server, wherein the index generation request comprises a text object to be processed, and the text object comprises a plurality of text lines;
the server divides the text object into a plurality of text blocks based on the index generation request, each text block comprises at least one text line, and the number of the text blocks included in the text object and the block information of each text block are used as the index information of the text object and are stored in a correlation mode with the text object.
17. A data query system comprises a client and a server,
the client is suitable for sending a data query request to the server, wherein the query request comprises a target object;
the server acquires index information of the target object based on the query request, wherein the index information comprises the number of text blocks included in the target object and block information of each text block; dividing the target object into a plurality of tiles, the tiles including at least one text block; the plurality of tiles are queried in parallel.
18. A computing device, comprising:
at least one processor; and
a memory storing program instructions configured to be suitable for execution by the at least one processor, the program instructions comprising instructions for performing the method of any of claims 1-9 and/or 10-13.
19. A readable storage medium storing program instructions that, when read and executed by a computing device, cause the computing device to perform the method of any of claims 1-9 and/or 10-13.
CN201910002235.7A 2019-01-02 2019-01-02 Data storage method, data query method, data storage device, data query device and computing equipment Pending CN111400427A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910002235.7A CN111400427A (en) 2019-01-02 2019-01-02 Data storage method, data query method, data storage device, data query device and computing equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910002235.7A CN111400427A (en) 2019-01-02 2019-01-02 Data storage method, data query method, data storage device, data query device and computing equipment

Publications (1)

Publication Number Publication Date
CN111400427A true CN111400427A (en) 2020-07-10

Family

ID=71428298

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910002235.7A Pending CN111400427A (en) 2019-01-02 2019-01-02 Data storage method, data query method, data storage device, data query device and computing equipment

Country Status (1)

Country Link
CN (1) CN111400427A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020010703A1 (en) * 1997-12-22 2002-01-24 Thelen Gregory W. Methods and system for browsing large text files
CN101178693A (en) * 2007-12-14 2008-05-14 沈阳东软软件股份有限公司 Data cache method and system
CN104424287A (en) * 2013-08-30 2015-03-18 深圳市腾讯计算机系统有限公司 Query method and query device for data
CN105677903A (en) * 2016-02-05 2016-06-15 华为技术有限公司 Data acquisition method and device as well as computer device
CN106233287A (en) * 2015-03-02 2016-12-14 微软技术许可有限责任公司 Management to the data base querying of large data collection
CN106339431A (en) * 2016-08-18 2017-01-18 佛山智能装备技术研究院 Processing method and system for robot program files based on text blocks

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020010703A1 (en) * 1997-12-22 2002-01-24 Thelen Gregory W. Methods and system for browsing large text files
CN101178693A (en) * 2007-12-14 2008-05-14 沈阳东软软件股份有限公司 Data cache method and system
CN104424287A (en) * 2013-08-30 2015-03-18 深圳市腾讯计算机系统有限公司 Query method and query device for data
CN106233287A (en) * 2015-03-02 2016-12-14 微软技术许可有限责任公司 Management to the data base querying of large data collection
CN105677903A (en) * 2016-02-05 2016-06-15 华为技术有限公司 Data acquisition method and device as well as computer device
CN106339431A (en) * 2016-08-18 2017-01-18 佛山智能装备技术研究院 Processing method and system for robot program files based on text blocks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
周进刚;邢铁军;纪勇;赵大哲;: "一种结构化数据缓存方法" *

Similar Documents

Publication Publication Date Title
US20210211471A1 (en) Highly scalable four-dimensional web-rendering geospatial data system for simulated worlds
US11296940B2 (en) Centralized configuration data in a distributed file system
CN101611402B (en) System and method for optimizing changes of data sets
US9910895B2 (en) Push subscriptions
US20140164487A1 (en) File saving system and method
CN109614402B (en) Multidimensional data query method and device
CN111475483B (en) Database migration method and device and computing equipment
CN110187880B (en) Method and device for identifying similar elements and computing equipment
CN110943961A (en) Data processing method, device and storage medium
CN107480205B (en) Method and device for partitioning data
US9355250B2 (en) Method and system for rapidly scanning files
US10169348B2 (en) Using a file path to determine file locality for applications
US20210174004A1 (en) Methods and systems for dynamic customization of independent webpage section templates
CN112199344B (en) Log classification method and device
CN109753424B (en) AB test method and device
CN111949856A (en) Object storage query method and device based on web
CN113014608A (en) Flow distribution control method and device, electronic equipment and storage medium
CN105843809B (en) Data processing method and device
CN112559913B (en) Data processing method, device, computing equipment and readable storage medium
CN101739410A (en) Method, device and system for revealing operation result
CN111400427A (en) Data storage method, data query method, data storage device, data query device and computing equipment
CN114611039B (en) Analysis method and device of asynchronous loading rule, storage medium and electronic equipment
CN108628895B (en) Map data processing method and device
US20220050811A1 (en) Method and apparatus for synchronizing file
CN103631930A (en) Statistical method and statistical system for search engine space occupation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination