WO2014036684A1

WO2014036684A1 - Method and device for storing and retrieving data

Info

Publication number: WO2014036684A1
Application number: PCT/CN2012/080963
Authority: WO
Inventors: 曹莉; 吴向阳
Original assignee: 华为技术有限公司
Priority date: 2012-09-04
Filing date: 2012-09-04
Publication date: 2014-03-13
Also published as: CN103891244B; CN103891244A

Abstract

The present invention relates to the information processing field. Disclosed are a method and device for storing and retrieving data, so that distributed storage of data requiring full-text retrieval and data not requiring full-text retrieval in the parallel database is implemented, thereby reducing the storage redundancy; meanwhile, the full-text retrieval and structured retrieval of the data to be retrieved are implemented in the parallel database, thereby improving the retrieval efficiency. The present invention comprises: a control node, in a parallel database system architecture, storing data requiring full-text retrieval and data not requiring full-text retrieval on different types of data nodes respectively; according to a retrieval request sent from a client, sending different types of retrieval instructions to different types of data nodes, so that a retrieval execution node can simultaneously execute, according to different retrieval execution instructions, different types of retrieval operations on the data to be retrieved. Embodiments of the present invention are mainly applicable to data storage and retrieval processes.

Description

Method and device for performing data storage and retrieval

Technical field

The present invention relates to the field of information processing, and in particular, to a method and apparatus for performing data storage and retrieval.

Background technique

Different applications have different data retrieval needs. For different data retrieval needs of different applications, the existing technical solution is to use structured search language SQL to retrieve structured data, and use existing full-text search engine to perform full-text search on unstructured data. However, there are applications that require a structured search of a portion of the data and a full-text search of another portion of the data to get the desired results. The technical solution provided by the prior art is to perform a structured search by sending the data to be retrieved to the SQL retrieval system, and sending the data to be retrieved to the full-text search engine for full-text search, and then facing the two independent The retrieval results obtained by the retrieval system are integrated. When the full-text search engine needs to retrieve a large amount of data to be retrieved, more data is generated in the full-text retrieval process than the data to be retrieved, which greatly reduces the efficiency of full-text retrieval. Thereby reducing the retrieval efficiency of the entire retrieval process.

Summary of the invention

Embodiments of the present invention provide a method and apparatus for performing data storage and retrieval, which realizes distributed storage of data requiring full-text search and data that does not need to be full-text searched in a parallel database, and reduces storage redundancy; At the same time, full-text search and structured search of the retrieved data under the parallel database can be realized, and the retrieval efficiency is improved.

In order to achieve the above object, the embodiment of the present invention adopts the following technical solutions:

A method for performing data storage, wherein the method is applied to a parallel database system, the parallel database system comprising a control node and a data node; the data node comprising a full-text search data node and an underlying database node; Methods include:

Receiving, by the control node, a data storage request sent by the client;

Creating a data table according to the data storage request; Determining, according to the metadata of the data table, a full-text search field of the data table; sending, according to the full-text search field, a first data storage instruction to the full-text search data node; the first data storage instruction is used to indicate Storing the full-text search data node to store data corresponding to the full-text search field;

And sending, according to the field other than the full-text search field of the data table, a second data storage instruction to the underlying database node; the second data storage instruction is configured to instruct the bottom database node to store the data table full-text search field The data corresponding to other fields.

A method for data retrieval, characterized in that it is applied to a parallel database system, the parallel database system comprising a control node and a data node; the data node comprises a full-text search data node and an underlying database node; the method comprises:

The control node receives a retrieval request sent by a client;

Determining a field to be retrieved and a data table to be retrieved according to the retrieval request;

Obtaining metadata of the data table to be retrieved, and determining, according to the metadata of the data table to be retrieved, a full-text search field in the field to be retrieved;

And sending, by the full-text search field in the to-be-retrieved field, a first search instruction to the full-text search data node, where the first search instruction is used to instruct the full-text search data node to retrieve a full-text search in the to-be-retrieved field The data corresponding to the field; the library node sends a second retrieval instruction, where the second retrieval instruction is used to instruct the bottom database node to receive the retrieval result returned by the full-text retrieval data node and the underlying database node, and retrieve the data node and the bottom layer in full text. The search results returned by the database nodes are aggregated, and the aggregated result is returned to the client as a complete search result.

A control node for performing data storage, characterized in that, in a parallel database system, the parallel database system includes the control node and a data node; the data node includes a full-text search data node and an underlying database node; The control node includes:

a receiving unit, configured to receive a data storage request sent by the client; a creating unit, configured to create a data table according to the data storage request received by the receiving unit, and a determining unit, configured to determine a full-text search field of the data table according to the metadata of the data table created by the creating unit;

a sending unit, configured to send, according to the full-text search field determined by the determining unit, a first data storage instruction to the full-text search data node, where the first data storage instruction is used to indicate that the full-text search data node is stored Data corresponding to the full-text search field; sending a second data storage instruction to the underlying database node according to a field other than the full-text search field of the data table determined by the determining unit; the second data storage instruction is used to indicate the The underlying database node stores ^"^^, Wang Yi into 4, thousand and I - « six thousand again,"

A control node for performing data retrieval, characterized in that, in a parallel database system, the parallel database system includes a control node and a data node; the data node includes a full-text search data node and an underlying database node; Control nodes include:

a receiving unit, configured to receive a retrieval request sent by the client;

An obtaining unit, configured to obtain, according to the retrieval request received by the receiving unit, a field to be retrieved and a data table to be retrieved;

a determining unit, configured to determine a field to be retrieved and a data table to be retrieved according to the retrieval request; determine, according to the metadata of the data table to be retrieved, a full-text search field in the field to be retrieved; and retrieve data from the full text The node sends a first retrieval instruction, where the first retrieval instruction is used to indicate

The underlying database node sends a second retrieval instruction, and the second retrieval instruction is configured to instruct the bottom number receiving unit to receive the retrieval result returned by each of the full-text retrieval data node and the underlying database node;

An aggregation unit, configured to aggregate the search results returned by the full-text search data node and the bottom database node received by the receiving unit; The sending unit is further configured to return the convergence result to the client as a complete retrieval result.

The method and device for data storage and retrieval provided by the embodiments of the present invention can store data that needs to be full-text searched and data that does not need to be full-text searched in different types of data nodes in a parallel database system architecture. The distributed storage of data is realized, and the redundancy of data storage is reduced compared with the unified storage method of the prior art; meanwhile, different types are sent to different types of data nodes according to the retrieval request sent by the client. The retrieval instruction realizes that the retrieval execution node can perform different types of retrieval on the retrieval data according to different retrieval execution instructions. Compared with the method provided by the prior art, the method provided by the embodiment of the present invention simultaneously provides the full-text search capability and the structured search capability in the same system, and does not need to use two independent retrieval systems for searching, thereby Search steps to improve retrieval efficiency.

DRAWINGS

The drawings used in the examples or the description of the prior art are described in a single manner. It is obvious that the drawings in the following description are only some embodiments of the present invention, and those of ordinary skill in the art do not pay Other drawings can also be obtained from these drawings on the premise of creative labor.

1 is a flowchart of a method for performing data storage according to Embodiment 1 of the present invention;

2 is a flowchart of a method for performing data retrieval according to Embodiment 1 of the present invention;

3 is a flowchart of a method for performing data storage according to Embodiment 2 of the present invention;

FIG. 4 is a flowchart of another method for performing data storage according to Embodiment 2 of the present invention; FIG. 5 is a flowchart of a method for performing data retrieval according to Embodiment 3 of the present invention;

6 is a block diagram of a control node for performing data storage in Embodiment 4 of the present invention;

7 is a block diagram showing another composition of a control node for performing data storage in Embodiment 4 of the present invention;

FIG. 8 is a block diagram of a control node for performing data retrieval according to Embodiment 4 of the present invention; Figure

9 is a block diagram showing a composition of a control node for performing data storage in Embodiment 5 of the present invention;

Figure 10 is a block diagram showing the composition of a control node for performing data retrieval in Embodiment 5 of the present invention.

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. example. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

Example 1

An embodiment of the present invention provides a method for performing data storage and retrieval, wherein the method is applied to a parallel database system, where the parallel database system includes a control node and a data node; and the data node includes full-text search data. Node and underlying database nodes.

The data node includes a full-text search data node and an underlying database node, and the full-text search data node is configured to store data corresponding to the full-text search field, and perform full-text search on the data stored by itself; The data corresponding to the full-text search field is not stored, and the structured search is performed on the data stored by itself.

In this parallel database system architecture, the full-text retrieval data node and the underlying database node can be distributed in the following two ways:

The first distribution method: all the full-text retrieval data nodes are set in one pool, and the underlying database nodes are all set in another pool. The two pools are logically independent and can be physically separated. It can be deployed on two devices or it can be integrated on the same device. In this way, the pool in which the full-text search data node is set stores data that needs to be retrieved in full-text, and the pool in which the underlying database node is set stores data that does not need to be retrieved in full-text.

The second distribution method: a full-text search data node and an underlying database node as a pair of nodes are set together on the same data retrieval node. This data retrieval node only includes one The full-text retrieval data node and an underlying database node, and different data retrieval nodes set different full-text retrieval data nodes and underlying database nodes. In this way, each data retrieval node internally stores different data. In the same data retrieval node, the full-text retrieval data node stores data that needs to be retrieved in full-text, the underlying database node stores data that does not need to be retrieved in full-text, and the full-text retrieval data node and the underlying database node in the same data retrieval node The stored data needs to have information corresponding to the same distribution key value and the like.

Based on the foregoing parallel database system architecture, an embodiment of the present invention provides a method for performing data storage, which may be implemented by a control node. As shown in FIG. 1, the method includes:

101. Receive a data storage request sent by the client.

102. Create a data table according to the data storage request.

The data table may be represented by a structured data storage form, wherein the stored content is provided by a client. Table 1 below is a specific structure of one embodiment of the data table.

Structure of the data table

Name, Provider, and Summary are field names, and each column corresponding to each field stores data corresponding to each field.

103. Determine a full-text search field of the data table according to metadata of the data table.

The metadata is used to indicate which of the all fields of the data table are full-text search fields, and which are not full-text search fields. The representation of the metadata can be specifically set in the data table, and it is identified whether the data stored in each field exceeds the maximum length of the field, or the metadata is set to correspond to the data table, but is set to be independently stored outside the data table. Data sheet. Metadata can be set to use a combination of "0""1" or "yes""no" to indicate which ones need to be full-text checked. Which is not required, for example, "0" means no, "1" means it is needed; or "no" means no, "yes" means it is needed. It can be implemented in the following two ways, including:

The first method: if the data stored in the field of the data table exceeds the maximum length of the field, it is determined that the field is a full-text search field.

The second method: if the data table has a search according to a keyword in the index field, then the field is determined to be a full-text search field.

The first method is that the control node performs statistics on the length of data stored in each field, and compares with the maximum length of each field to determine a full-text search field and a non-full-text search field, and the control node pairs each field. The type is identified.

The second method is that the keywords in the index field are identified by the client, and the control node is only used to store these identifiers and identify the meaning of these tags. For example, the client sets the index field to "va lue" and corresponds to the field that does not need to be full-text searched when "va lue=0". When "va lue=l ", it corresponds to the field that needs to be searched in full. Then the control node identifies each field according to this correspondence.

104. Send, according to the full-text search field, a first data storage instruction to the full-text search data node, where the first data storage instruction is used to indicate that the full-text search data node stores the corresponding full-text search field data.

The first data storage instruction is specifically configured to instruct the full-text search data node to create an index for the full-text search field or update an index stored by the full-text search data node.

105. Send, according to the field other than the full-text search field of the data table, a second data storage instruction to the bottom database node, where the second data storage instruction is used to instruct the bottom database node to store the full-text search field of the data table Data corresponding to other fields.

The second data storage instruction is specifically configured to instruct the underlying database node to perform structured data storage.

Based on the foregoing parallel database system architecture, an embodiment of the present invention provides a data retrieval method. Method, the method is implemented by a control node, as shown in FIG. 2, the method includes:

201. Receive a retrieval request sent by a client.

The search request may be a standard query statement specified by the SQL, for example, "Sel ect id, name from A where comment Like ' roman ' group by name" , the statement means that the query "comment" from the table A contains "roman" " Corresponding id and name, and sort by name.

202. Determine, according to the retrieval request, a field to be retrieved and a data table to be retrieved.

203. Obtain metadata of the data table to be retrieved, and determine, according to the metadata of the data table to be retrieved, a full-text search field in the to-be-searched field.

The first method in the to-be-retrieved field is determined according to the metadata of the to-be-retrieved data table: if at least one field in the to-be-searched field exists in the data to be retrieved in the to-be-retrieved data table If the maximum length of the at least one field is exceeded, then the at least one field is determined to be a full-text search field.

In the second method, if there is a field in the field to be retrieved that needs to be retrieved according to a keyword in the index field, it is determined that the field that needs to be retrieved according to the keyword in the index field is a full-text search field.

It should be noted that the identification of the full-text search field by the first method and the second method corresponds to the method of how to mark the full-text search field described in the above step 103.

204. Send, according to the full-text search field in the to-be-retrieved field, a first retrieval instruction to the full-text search data node, where the first retrieval instruction is used to instruct the full-text retrieval data node to retrieve the to-be-retrieved field. The data corresponding to the full-text search field. The database node sends a second retrieval instruction, where the second retrieval instruction is used to indicate the underlying database, wherein the first retrieval instruction herein refers to the search keyword in the request after the full-text retrieval data node receives the search request. The process of scanning the inverted index table. The search instruction mainly includes The following information:

1) The type of word breaker currently used, for example: 3⁄4 port WhitespaceAnalyzer, StandardAnalyzer, SimpleAnalyzer, ChineseAnalyzer, etc.

2) The keyword carried in this request, for example, for q=" what is search engine", the keyword is search, engine, and ' 'what", " is' ' is filtered by stop words.

3) Whether to perform a spell check, for example, for "aplle", if a spell check is performed, the system corrects as ' 'apple'

4) Whether it contains synonyms, for example, "Dad", if it contains synonyms, the system will add "爹", "Dry" to search.

5) Search hit score calculation, the size of this value, determines the sorting of the search results.

The second retrieval instruction here refers to a set of operations to be performed on a particular query database. Each step of the execution instruction describes a specific database operation such as table scan, join, aggregation, sort, and the like.

A complete database retrieval instruction, which mainly includes the following information:

1) Cost: The evaluation of system resource consumption when the data is retrieved;

2) rows: Retrieves the evaluation of the total number of rows returned. Reflects the selectivity of estimating the conditions of any WHERE clause.

3) width: Retrieves the total number of bytes of the total number of rows returned, reflecting the size of the data set that satisfies the search criteria.

206. Receive a search result returned by each of the full-text search data node and the bottom database node.

207. Perform a convergence process on the search results returned by the full-text search data node and the underlying database node.

The method for performing aggregation processing on the retrieval result returned by each data node may be: performing equivalent connection according to a specific field. For example, as shown in Tables 2, 3, and 4 below, Table 2 shows that the search result obtained by the full-text search data node is id field value 2, 3; the search result obtained by the underlying database node of Table 3 is id field value 1, 2 And 3, and the corresponding Name field information, according to the value of the id field, the search result of the id field value of 2, 3 is merged into a table, such as Table 4 shows.

Table 2 Full-text search node search results

Id

3

2

Table 3 Underlying database node search results

Table 4 Complete search results

It should be noted that, in the present embodiment, the aggregation processing method described in connection with the above Tables 2, 3, and 4 is only an example.

208. Return the convergence result to the client as a complete retrieval result.

A method for data storage and retrieval provided by an embodiment of the present invention can store data that needs to be full-text searched and data that does not need to be full-text searched separately with different types of data nodes in a parallel database system architecture. The distributed storage of data reduces the redundancy of data storage compared with the unified storage method of the prior art; meanwhile, different types of retrieval are sent to different types of data nodes according to the retrieval request sent by the client. The instruction realizes that the retrieval execution node can perform different types of retrieval on the data to be retrieved according to different retrieval execution instructions. Compared with the method provided by the prior art, the method provided by the embodiment of the present invention simultaneously provides the full-text search capability and the structured search capability in the same system, and does not need to use two independent retrieval systems for searching, thereby Search steps to improve retrieval efficiency.

Example 2 Based on the first distribution manner of the full-text search data node and the underlying database node described in Embodiment 1, the embodiment of the present invention provides a method for performing data storage, which is implemented by a control node, as shown in FIG. Methods include:

301. Receive a data storage request sent by the client.

302. Create a data table according to the data storage request.

The description of the data table is the same as that described in the foregoing step 102, and details are not described herein again.

303. Determine a full-text search field of the data table according to metadata of the data table.

The description of the metadata and the method for determining the full-text search field of the data table according to the metadata of the data table are the same as those described in the foregoing step 103, and the details are not described herein.

304. Send, according to the full-text search field, a first data storage instruction to the full-text search data node, where the first data storage instruction is used to indicate that the full-text search data node stores the corresponding full-text search field data.

305. Send a second data storage instruction to the bottom database node according to a field other than the full-text search field of the data table. The second data storage instruction is used to instruct the bottom database node to store the data table full-text search field. Data corresponding to other fields.

Further, it is worth noting that steps 304 and 305 described herein do not require a significant order of execution in the order of execution.

Further, it is worth noting that, in the process of performing the above step 303, if it is determined that there is no field that requires full-text search, only step 305 is performed. If it is determined that all of the full-text search fields are required, then only step 304 can be performed.

In addition, it is worth noting that when the number of clients is relatively large and the amount of data to be stored is relatively large, a data loading server, including a full-text retrieval data loading server, is provided in the parallel database system framework as described in Embodiment 1. And the underlying database load server. The full-text search data loading server is configured to load data corresponding to the full-text search field in the data table to the full-text search data node under the control of the control node, so that the full-text search data node stores corresponding The underlying database loading server is configured to load data corresponding to the full-text search field in the data table to the underlying database node under the control of the control node, so that the underlying database node stores the corresponding data.

At this time, the specific step of the step 304 includes: the control node divides the data table into columns, divides the column fields that need full-text search, and then generates corresponding data loading tasks according to the divided fields and sends them to the full-text search. In the data loading server, the full-text search data loading server allocates data corresponding to the column fields that need full-text search to different full-text search data nodes for storage according to a preset distribution strategy.

The full-text search data loading task is a process of storing unstructured data of an application or data that needs to support full-text search to a full-text search server, and generally refers to a process of creating an inverted index. If it is structured data, directly through the word segmentation, filtering, creating an inverted index table, otherwise it also contains the data information extraction process. There is no strict industry standard for data loading in full-text search. Generally, the private API is opened to the outside world, and the creation of data index is completed.

The pre-set distribution policy may be set to store different information of different regions by different data nodes, for example, node 1 stores information in Beijing, node 2 stores information in Shanghai, and the like. The distribution policy can be set according to actual needs, which is not limited by the embodiment of the present invention.

Similarly, the specific steps of the step 305 include: the control node divides the data table into columns, divides the column fields that do not need full-text search, and then generates corresponding data loading tasks according to the divided fields. In the underlying data loading database loading server, the underlying database loading server allocates data corresponding to the column fields that do not need full-text retrieval to different underlying database nodes for storage according to a preset distribution strategy.

The underlying database data loading task is to store the specific data of the application in the underlying database through the standard SQL loading statement COPY FROM, for example:

COPY tab leA FROM ' /opt/data/data info. tb l' DELIMITERS ' \ ' \ where tabl eA represents the data table to be loaded, /opt/data/data inf o. tbl indicates the specific path of the data to be loaded. A separator between multiple fields representing a row of data. See SQL Standard 99 for details. If an index is created during the creation of the table, during the data loading process, At the same time, it is necessary to complete the index generation of the data table.

In addition, it is worth noting that in order to ensure that the retrieval result of the full-text retrieval data node and the retrieval result of the underlying database node can be merged, in the process of generating the data loading task, it is also necessary to determine which fields are the primary key fields, that is, mas ter Key field, these fields are used to represent the association between data stored in different nodes. Specifically, when performing data storage, the full-text search data node must store the data corresponding to the field of the main key field in addition to the data corresponding to the field that needs full-text search, and the storage method of the underlying database node is similar.

An embodiment of the present invention further provides a method for performing data storage, where the method is applied to a second distribution manner of a full-text search data node and an underlying database node as described in the embodiment, as shown in FIG. 4, in this method. Before the determining, by the step 303, the full-text search field of the data table according to the metadata of the data table, the method further includes: step 306, the data table is divided into rows; then the step 307 is replaced, the step is 307 is data that needs to be stored for each row, and the full-text search field of each row is determined according to the metadata of the data table. The execution of the other steps is the same as the method of performing data storage corresponding to the first distribution mode of the full-text search data node and the underlying database node in this embodiment.

A method for performing data storage according to an embodiment of the present invention may be implemented in a parallel database system architecture, where data required for full-text retrieval and data not required for full-text retrieval are separately stored on different types of data nodes. The distributed storage of data reduces the redundancy of data storage compared to the unified storage method of the prior art.

Example 3

In this embodiment, the data table created by the control node is represented by a data table as shown in Table 5 below. The data table name is A, and there are three different fields: id, name, comment, where id is mas Ter key field, comment field requires full-text search. The id field and the name field are stored in the underlying database node, and the id field and the co ent field are stored in the full-text retrieval node.

Table 5 Data table to be retrieved

Id Name Comment 1 Notor ious Following the convict ion of her German father for treason against US , Alicia Huberman takes to drink and men. She is approached by a government agent (TR Devi in) who asks her to spy on a group of her father ' s Nazi friends operating out of Rio de Janer io.

2 Ti tanic Fictional romantic tale of a rich girl and poor boy who meet on the ill-fated voyage of the 'uns inkable' ship, Kate Wins let Leonardo Dicapr io , Billy Zane

3 Gift The story of how an economic French shop keeper and amateur film maker attempted to locate , only have the artist turn the camera back on its owner. A flawless chime of romantic and reality

Based on the data to be retrieved, the method for performing data retrieval provided by the embodiment of the present invention, as shown in FIG. 5, includes:

401. The control node receives a retrieval request sent by the client. In this embodiment, the query corresponding to the search request is Select id, name from A where comment Like 'roman' group by name, and the statement means that the query "comment" contains the id and name corresponding to "roman" from the table A. And use name to perform 4 unordered.

402. The control node determines, according to the retrieval request, a field to be retrieved and a data table to be retrieved.

403. The control node acquires metadata of the data table to be retrieved, and determines a full-text search field in the to-be-searched field according to the metadata of the data table to be retrieved.

404. The control node sends a first retrieval instruction to the full-text retrieval data node according to the full-text search field in the to-be-retrieved field. The underlying database node sends a second retrieval instruction.

406. The full-text search data node searches the to-be-searched data stored in the full-text search data node according to the first search instruction, and obtains a full-text search node search result. 407. The bottom database node searches the data to be retrieved stored by the underlying database node according to the second retrieval instruction, and obtains a retrieval result of the underlying database node.

In the present embodiment, the retrieval results of the full-text search nodes are shown in Table 6 below, and the results of the underlying database nodes are as shown in Table 7 below.

Table 6 Full-text search node search results

Id

3

2

Table 7 Underlying database node search results

408. The control node receives the full-text search node search result sent by the full-text search data node and the bottom-level database node search result sent by the underlying database node.

409. The control node aggregates the retrieval result of the full-text retrieval node sent by the received full-text retrieval node and the retrieval result of the underlying database node sent by the underlying database node, to obtain a complete retrieval result. In the present embodiment, the complete search results obtained here are shown in Table 8 below.

Table 8 Complete search results

410. The control node sends the complete search result to the client.

It should be noted that the search results described in Tables 6 to 8 above are the search results generated based on the first distribution manner described in Embodiment 1, that is, the pool corresponding to the full-text search node returns a full-text search result to the control node. The pool corresponding to the underlying database node returns an underlying data retrieval result to the control node, and the control node aggregates the two types of retrieval results. According to the second distribution method described in Embodiment 1, each data retrieval node returns its own retrieval result. If the data retrieval node 1, the data retrieval node 2 and the data retrieval node 3 are provided, and each data retrieval node has a full-text retrieval data node and an underlying database node, the retrieval results of the three data retrieval nodes can be as follows: To the table shown in Table 11. Table 9 Data retrieval node 1 search results:

Table 10 Data Retrieval Node 2 Search Results:

Table 11 Data Retrieval Node 3 Search Results:

At this time, after receiving the search results sent by the three data retrieval nodes, the control node aggregates the three search results to obtain a complete search result. The search results are the same as in Table 8 above.

A method for data retrieval provided by an embodiment of the present invention can send different types of retrieval instructions to different types of data nodes according to a retrieval request sent by a client in a parallel database system architecture, so that the retrieval execution node can be Different retrieval execution instructions simultaneously perform different types of retrieval on the retrieved data. Compared with the method provided by the prior art, the method provided by the embodiment of the present invention simultaneously provides the full-text search capability and the structured search capability in the same system, and does not need to use two independent retrieval systems for searching, thereby Search step, improved search efficiency Rate.

Example 4

An embodiment of the present invention provides a control node for performing data storage, which may be applied to a parallel database system, where the parallel database system includes the control node and a data node; and the data node includes a full-text search data node and a bottom layer. As shown in FIG. 6, the control node includes: a receiving unit 51, a creating unit 52, a determining unit 53, and a sending unit 54.

The receiving unit 51 is configured to receive a data storage request sent by the client.

The creating unit 52 is configured to create a data table according to the data storage request received by the receiving unit 51.

The determining unit 53 is configured to determine a full-text search field of the data table according to the metadata of the data table created by the creating unit 52.

The sending unit 54 is configured to send, according to the full-text search field determined by the determining unit 53, a first data storage instruction to the full-text search data node, where the first data storage instruction is used to indicate that the full-text search data is to be The node stores data corresponding to the full-text search field; and sends a second data storage instruction to the bottom database node according to a field other than the data table full-text search field determined by the determining unit 53; the second data storage instruction is used for Instructing the underlying database node to store data corresponding to other fields than the full-text search field of the data table.

Optionally, the determining unit 53 is specifically configured to: when the data stored in the field of the data table exceeds a maximum length of the field, determine that the field is a full-text search field; and have an index according to the index in the data table. When the keywords in the field are retrieved, it is determined that the field is a full-text search field.

Optionally, as shown in FIG. 7, the control node further includes: a segmentation unit 55. Before the full-text search field of the data table, the data table is segmented by row.

The determining unit 53 is further configured to determine, for each row of data, a full-text search field of each row according to metadata of the data table.

Optionally, the first data storage instruction is used to indicate that the full-text search data node stores data corresponding to the full-text search field, and specifically includes: indicating the full-text search data section The point creates an index for the full-text search field or updates an index stored by the full-text search data node.

The embodiment of the present invention further provides a control node for performing data retrieval, which can be applied to a parallel database system, where the parallel database system includes a control node and a data node; and the data node includes a full-text search data node and an underlying database. As shown in FIG. 8, the control node includes: a receiving unit 61, an obtaining unit 62, a determining unit 63, a sending unit 64, a receiving unit 65, and a convergence unit 66.

The receiving unit 61 is configured to receive a retrieval request sent by the client.

The obtaining unit 62 is configured to obtain a to-be-retrieved field and a to-be-retrieved data table according to the retrieval request received by the receiving unit 61.

a determining unit 63, configured to determine, according to the retrieval request, a field to be retrieved and a data table to be retrieved; and determine, according to the metadata of the data table to be retrieved, a full-text search field in the field to be retrieved.

The sending unit 64 is configured to send, according to the full-text search field in the to-be-retrieved field determined by the determining unit 63, a first retrieving instruction, where the first retrieving instruction is used,

Sending a second retrieval instruction to the underlying database node, the second retrieval instruction being used to indicate the data.

The receiving unit 65 is configured to receive a search result returned by each of the full-text search data node and the bottom database node.

The aggregation unit 66 is configured to perform a convergence process on the search results returned by the full-text search data node and the bottom database node received by the receiving unit 65.

The sending unit 64 is further configured to return the convergence result to the client as a complete retrieval result.

Optionally, the determining unit 63 is configured to: at least one field in the to-be-retrieved field, where the data stored in the to-be-retrieved data table exceeds a maximum length of the at least one field, Determining that the at least one field is a full-text search field; when there is a field in the to-be-searched field that needs to be retrieved according to a keyword in the index field, determining that the field that needs to be retrieved according to a keyword in the index field is the full text Retrieve the field.

The device for storing and retrieving data according to an embodiment of the present invention can store data that needs to be full-text searched and data that does not need to be full-text searched in different types of data nodes in a parallel database system architecture. The distributed storage of data reduces the redundancy of data storage compared with the unified storage method of the prior art; meanwhile, different types of retrieval are sent to different types of data nodes according to the retrieval request sent by the client. The instruction realizes that the retrieval execution node can perform different types of retrieval on the data to be retrieved according to different retrieval execution instructions. Compared with the method provided by the prior art, the method provided by the embodiment of the present invention simultaneously provides the full-text search capability and the structured search capability in the same system, and does not need to use two independent retrieval systems for searching, thereby Search steps to improve retrieval efficiency.

Example 5

An embodiment of the present invention provides a control node for performing data storage, which may be applied to a parallel database system, where the parallel database system includes the control node and a data node; and the data node includes a full-text search data node and a bottom layer. The database node; as shown in FIG. 9, the control node includes: a processor 71 and a memory 72.

The processor 71 is configured to receive a data storage request sent by the client, create a data table according to the data storage request, and determine a full-text search field of the data table according to the metadata of the data table; according to the full-text search field, Transmitting, to the full-text search data node, a first data storage instruction, where the first data storage instruction is used to indicate that the full-text search data node stores data corresponding to the full-text search field; a field other than the field, sending a second data storage instruction to the underlying database node; the second data storage instruction is configured to instruct the bottom database node to store data corresponding to a field other than the full-text search field of the data table.

The memory 72 is configured to store a data storage request, metadata of the data table, a first data storage instruction, and a second data storage instruction.

Optionally, the processor 71 is specifically configured to store data in a field of the data table. When the maximum length of the field is determined, the field is determined to be a full-text search field; when the data table is searched according to a keyword in the index field, the field is determined to be a full-text search field.

Optionally, the processor is further configured to divide the data table into rows before determining the full-text search field of the data table according to the metadata of the data table. For each row of data that needs to be stored, the full-text search field of each row is determined based on the metadata of the data table.

Optionally, the first data storage instruction is used to indicate that the full-text search data node stores the data corresponding to the full-text search field, and specifically includes: indicating that the full-text search data node is the full-text search field Create an index or update an index stored by the full-text search data node.

The embodiment of the present invention further provides a control node for performing data retrieval, which can be applied to a parallel database system, where the parallel database system includes a control node and a data node; and the data node includes a full-text search data node and an underlying database. As shown in FIG. 10, the control node includes: a processor 73 and a memory 74.

The processor 73 is configured to receive a retrieval request sent by the client, determine, according to the retrieval request, a field to be retrieved and a data table to be retrieved, obtain metadata of the data table to be retrieved, and according to the data table to be retrieved Metadata, determining a full-text search field in the to-be-searched field; sending, according to the full-text search field in the to-be-retrieved field, a first search instruction to the full-text search data node, where the first search instruction is used to indicate The full-text search data node retrieves a field in the to-be-retrieved field, and sends a second retrieval instruction to the underlying database node, where the second retrieval instruction is used to refer to corresponding data; receiving a full-text retrieval data node and an underlying database node The returned search result is aggregated and the search result returned by the full-text search data node and the underlying database node is aggregated, and the aggregated result is returned to the client as a complete search result.

The memory 74 is configured to store a retrieval request, metadata of the data table to be retrieved, a first retrieval instruction, a second retrieval instruction, and a retrieval result.

Optionally, the processor 73 is specifically configured to have at least one word in the to-be-retrieved field. The data stored in the data table to be retrieved exceeds the maximum length of the at least one field, and the at least one field is determined to be a full-text search field; in the field to be retrieved, it is required to perform a keyword according to the index field. When the field is retrieved, it is determined that the field that needs to be retrieved according to the keyword in the index field is a full-text search field.

Through the description of the above embodiments, those skilled in the art can clearly understand that the present invention can be implemented by means of software plus necessary general hardware, and of course, by hardware, but in many cases, the former is a better implementation. . Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a readable storage medium, such as a floppy disk of a computer. A hard disk or optical disk or the like includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present invention.

The above is only the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the appended claims.

Claims

Claim

A method for performing data storage, the method being applied to a parallel database system, the parallel database system comprising a control node and a data node; the data node comprising a full-text search data node and an underlying database node; The method includes:

Receiving, by the control node, a data storage request sent by the client;

Creating a data table according to the data storage request;

Determining a full-text search field of the data table according to metadata of the data table;

Transmitting, according to the full-text search field, a first data storage instruction to the full-text search data node, where the first data storage instruction is used to indicate that the full-text search data node stores data corresponding to the full-text search field;

Sending, according to the field other than the full-text search field of the data table, a second data storage instruction to the underlying database node; the second data storage instruction is used to indicate the underlying database node storage

The method according to claim 1, wherein the determining the full-text search field of the data table according to the metadata of the data table comprises:

If the data stored in the field of the data table exceeds the maximum length of the field, determining that the field is a full-text search field; or

If the data table has a search based on a keyword in the index field, then the field is determined to be a full-text search field.

The method according to claim 1 or 2, further comprising: dividing the data table by row;

Determining the full-text search field of the data table according to the metadata of the data table, specifically comprising: determining, for each row of data, a full-text search field of each row according to metadata of the data table.

The method according to any one of claims 1 to 3, wherein the first data storage instruction is configured to instruct the storing the full-text search data node to correspond to the full-text search field The data specifically includes:

Instructing the full-text search data node to create an index for the full-text search field or to update an index stored by the full-text search data node.

5. A method for performing data retrieval, characterized in that it is applied to a parallel database system, the parallel database system comprising a control node and a data node; the data node comprising a full-text search data node and an underlying database node; Includes:

The control node receives a retrieval request sent by a client;

Obtaining metadata of the data table to be retrieved, and determining a full-text search field in the to-be-retrieved field according to the metadata of the data table to be retrieved;

And sending, by the full-text search field in the to-be-retrieved field, a first search instruction to the full-text search data node, where the first search instruction is used to instruct the full-text search data node to retrieve a full-text search in the to-be-retrieved field The data corresponding to the field; the node sends a second retrieval instruction, where the second retrieval instruction is used to instruct the bottom database node to retrieve the retrieval result returned by the receiving full-text retrieval data node and the underlying database node, and retrieve the data node and the bottom layer in full text. The search results returned by the database nodes are aggregated, and the aggregated result is returned to the client as a complete search result.

The method according to claim 5, wherein the determining, according to the metadata of the data table to be retrieved, the full-text search field in the to-be-retrieved field comprises:

If the data stored in the to-be-retrieved data table exceeds a maximum length of the at least one field, the at least one field is determined to be a full-text search field; or

If there is a field in the to-be-searched field that needs to be retrieved according to a keyword in the index field, it is determined that the field that needs to be retrieved according to the keyword in the index field is a full-text search field.

7. A control node for performing data storage, characterized in that it is applied to a parallel database system In the system, the parallel database system includes the control node and the data node; the data node includes a full-text search data node and an underlying database node; and the control node includes:

a receiving unit, configured to receive a data storage request sent by the client;

a creating unit, configured to create a data table according to the data storage request received by the receiving unit, and a determining unit, configured to determine a full-text search field of the data table according to the metadata of the data table created by the creating unit;

a sending unit, configured to send, according to the full-text search field determined by the determining unit, a first data storage instruction to the full-text search data node, where the first data storage instruction is used to indicate that the full-text search data node is stored Data corresponding to the full-text search field; sending a second data storage instruction to the underlying database node according to a field other than the full-text search field of the data table determined by the determining unit; the second data storage instruction is used to indicate the The underlying database node stores data corresponding to fields other than the full-text search field of the data table.

The control node according to claim 7, wherein the determining unit is configured to determine that the field is a full-text search field when data stored in a field of the data table exceeds a maximum length of the field When the data table has a search according to a keyword in the index field, it is determined that the field is a full-text search field.

The control node according to claim 7 or 8, wherein the control node further comprises:

a segmentation unit, configured to slice the data table by rows before determining, by the determining unit, the full-text search field of the data table according to the metadata of the data table;

The determining unit is further configured to determine, for each row of data, a full-text search field of each row according to metadata of the data table.

The control node according to any one of claims 7-9, wherein the first data storage instruction is used to instruct the storing the full-text search data node to store data corresponding to the full-text search field. Specifically, the method includes: instructing the full-text search data node to create an index for the full-text search field or update an index stored by the full-text search data node.

1 1. A control node for performing data retrieval, which is characterized in that it is applied to a parallel database In the system, the parallel database system includes a control node and a data node; the data node includes a full-text search data node and an underlying database node; and the control node includes:

a receiving unit, configured to receive a retrieval request sent by the client;

a determining unit, configured to determine a field to be retrieved and a data table to be retrieved according to the retrieval request; determine, according to the metadata of the data table to be retrieved, a full-text search field in the field to be retrieved; and retrieve data from the full text The node sends a first retrieval instruction, the first retrieval instruction is used to indicate the

And sending, by the library node, a second retrieval instruction, where the second retrieval instruction is used to indicate the bottom database node receiving unit, configured to receive a retrieval result returned by each of the full-text retrieval data node and the bottom database node;

a convening unit, configured to perform a convergence process on the search result returned by the full-text search data node and the bottom database node received by the receiving unit, where the sending unit is further configured to return the convergence result as a complete search result to the Client.

The control node according to claim 11, wherein the data stored in the data table to be retrieved exceeds a maximum length of the at least one field, and the at least one field is determined to be a full-text search field; When there is a field that needs to be retrieved according to a keyword in the index field, it is determined that the field that needs to be retrieved according to the keyword in the index field is a full-text search field.