WO2014036684A1 - Method and device for storing and retrieving data - Google Patents

Method and device for storing and retrieving data Download PDF

Info

Publication number
WO2014036684A1
WO2014036684A1 PCT/CN2012/080963 CN2012080963W WO2014036684A1 WO 2014036684 A1 WO2014036684 A1 WO 2014036684A1 CN 2012080963 W CN2012080963 W CN 2012080963W WO 2014036684 A1 WO2014036684 A1 WO 2014036684A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
full
field
node
text search
Prior art date
Application number
PCT/CN2012/080963
Other languages
French (fr)
Chinese (zh)
Inventor
曹莉
吴向阳
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2012/080963 priority Critical patent/WO2014036684A1/en
Priority to CN201280001730.2A priority patent/CN103891244B/en
Publication of WO2014036684A1 publication Critical patent/WO2014036684A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Definitions

  • the present invention relates to the field of information processing, and in particular, to a method and apparatus for performing data storage and retrieval.
  • the existing technical solution is to use structured search language SQL to retrieve structured data, and use existing full-text search engine to perform full-text search on unstructured data.
  • SQL structured search language
  • full-text search engine to perform full-text search on unstructured data.
  • the technical solution provided by the prior art is to perform a structured search by sending the data to be retrieved to the SQL retrieval system, and sending the data to be retrieved to the full-text search engine for full-text search, and then facing the two independent
  • the retrieval results obtained by the retrieval system are integrated.
  • Embodiments of the present invention provide a method and apparatus for performing data storage and retrieval, which realizes distributed storage of data requiring full-text search and data that does not need to be full-text searched in a parallel database, and reduces storage redundancy; At the same time, full-text search and structured search of the retrieved data under the parallel database can be realized, and the retrieval efficiency is improved.
  • a method for performing data storage wherein the method is applied to a parallel database system, the parallel database system comprising a control node and a data node; the data node comprising a full-text search data node and an underlying database node; Methods include:
  • Creating a data table according to the data storage request Determining, according to the metadata of the data table, a full-text search field of the data table; sending, according to the full-text search field, a first data storage instruction to the full-text search data node; the first data storage instruction is used to indicate Storing the full-text search data node to store data corresponding to the full-text search field;
  • a method for data retrieval characterized in that it is applied to a parallel database system, the parallel database system comprising a control node and a data node; the data node comprises a full-text search data node and an underlying database node; the method comprises:
  • the control node receives a retrieval request sent by a client
  • the library node sends a second retrieval instruction, where the second retrieval instruction is used to instruct the bottom database node to receive the retrieval result returned by the full-text retrieval data node and the underlying database node, and retrieve the data node and the bottom layer in full text.
  • the search results returned by the database nodes are aggregated, and the aggregated result is returned to the client as a complete search result.
  • a control node for performing data storage characterized in that, in a parallel database system, the parallel database system includes the control node and a data node; the data node includes a full-text search data node and an underlying database node;
  • the control node includes:
  • a receiving unit configured to receive a data storage request sent by the client;
  • a creating unit configured to create a data table according to the data storage request received by the receiving unit, and a determining unit, configured to determine a full-text search field of the data table according to the metadata of the data table created by the creating unit;
  • a sending unit configured to send, according to the full-text search field determined by the determining unit, a first data storage instruction to the full-text search data node, where the first data storage instruction is used to indicate that the full-text search data node is stored Data corresponding to the full-text search field; sending a second data storage instruction to the underlying database node according to a field other than the full-text search field of the data table determined by the determining unit; the second data storage instruction is used to indicate the
  • the underlying database node stores ⁇ " ⁇ , Wang Yi into 4, thousand and I - « six thousand again,"
  • control nodes for performing data retrieval, characterized in that, in a parallel database system, the parallel database system includes a control node and a data node; the data node includes a full-text search data node and an underlying database node; Control nodes include:
  • a receiving unit configured to receive a retrieval request sent by the client
  • An obtaining unit configured to obtain, according to the retrieval request received by the receiving unit, a field to be retrieved and a data table to be retrieved;
  • a determining unit configured to determine a field to be retrieved and a data table to be retrieved according to the retrieval request; determine, according to the metadata of the data table to be retrieved, a full-text search field in the field to be retrieved; and retrieve data from the full text
  • the node sends a first retrieval instruction, where the first retrieval instruction is used to indicate
  • the underlying database node sends a second retrieval instruction, and the second retrieval instruction is configured to instruct the bottom number receiving unit to receive the retrieval result returned by each of the full-text retrieval data node and the underlying database node;
  • An aggregation unit configured to aggregate the search results returned by the full-text search data node and the bottom database node received by the receiving unit;
  • the sending unit is further configured to return the convergence result to the client as a complete retrieval result.
  • the method and device for data storage and retrieval provided by the embodiments of the present invention can store data that needs to be full-text searched and data that does not need to be full-text searched in different types of data nodes in a parallel database system architecture.
  • the distributed storage of data is realized, and the redundancy of data storage is reduced compared with the unified storage method of the prior art; meanwhile, different types are sent to different types of data nodes according to the retrieval request sent by the client.
  • the retrieval instruction realizes that the retrieval execution node can perform different types of retrieval on the retrieval data according to different retrieval execution instructions.
  • the method provided by the embodiment of the present invention simultaneously provides the full-text search capability and the structured search capability in the same system, and does not need to use two independent retrieval systems for searching, thereby Search steps to improve retrieval efficiency.
  • FIG. 1 is a flowchart of a method for performing data storage according to Embodiment 1 of the present invention
  • FIG. 2 is a flowchart of a method for performing data retrieval according to Embodiment 1 of the present invention
  • FIG. 3 is a flowchart of a method for performing data storage according to Embodiment 2 of the present invention.
  • FIG. 4 is a flowchart of another method for performing data storage according to Embodiment 2 of the present invention
  • FIG. 5 is a flowchart of a method for performing data retrieval according to Embodiment 3 of the present invention
  • FIG. 6 is a block diagram of a control node for performing data storage in Embodiment 4 of the present invention.
  • FIG. 7 is a block diagram showing another composition of a control node for performing data storage in Embodiment 4 of the present invention.
  • FIG. 8 is a block diagram of a control node for performing data retrieval according to Embodiment 4 of the present invention.
  • FIG. 9 is a block diagram showing a composition of a control node for performing data storage in Embodiment 5 of the present invention.
  • Figure 10 is a block diagram showing the composition of a control node for performing data retrieval in Embodiment 5 of the present invention.
  • An embodiment of the present invention provides a method for performing data storage and retrieval, wherein the method is applied to a parallel database system, where the parallel database system includes a control node and a data node; and the data node includes full-text search data. Node and underlying database nodes.
  • the data node includes a full-text search data node and an underlying database node, and the full-text search data node is configured to store data corresponding to the full-text search field, and perform full-text search on the data stored by itself;
  • the data corresponding to the full-text search field is not stored, and the structured search is performed on the data stored by itself.
  • the full-text retrieval data node and the underlying database node can be distributed in the following two ways:
  • the first distribution method all the full-text retrieval data nodes are set in one pool, and the underlying database nodes are all set in another pool.
  • the two pools are logically independent and can be physically separated. It can be deployed on two devices or it can be integrated on the same device. In this way, the pool in which the full-text search data node is set stores data that needs to be retrieved in full-text, and the pool in which the underlying database node is set stores data that does not need to be retrieved in full-text.
  • the second distribution method a full-text search data node and an underlying database node as a pair of nodes are set together on the same data retrieval node.
  • This data retrieval node only includes one The full-text retrieval data node and an underlying database node, and different data retrieval nodes set different full-text retrieval data nodes and underlying database nodes. In this way, each data retrieval node internally stores different data.
  • the full-text retrieval data node stores data that needs to be retrieved in full-text
  • the underlying database node stores data that does not need to be retrieved in full-text
  • the full-text retrieval data node and the underlying database node in the same data retrieval node The stored data needs to have information corresponding to the same distribution key value and the like.
  • an embodiment of the present invention provides a method for performing data storage, which may be implemented by a control node. As shown in FIG. 1, the method includes:
  • the data table may be represented by a structured data storage form, wherein the stored content is provided by a client.
  • Table 1 below is a specific structure of one embodiment of the data table.
  • Name, Provider, and Summary are field names, and each column corresponding to each field stores data corresponding to each field.
  • the metadata is used to indicate which of the all fields of the data table are full-text search fields, and which are not full-text search fields.
  • the representation of the metadata can be specifically set in the data table, and it is identified whether the data stored in each field exceeds the maximum length of the field, or the metadata is set to correspond to the data table, but is set to be independently stored outside the data table.
  • Data sheet. Metadata can be set to use a combination of "0""1" or “yes”"no” to indicate which ones need to be full-text checked. Which is not required, for example, “0” means no, “1” means it is needed; or “no” means no, "yes” means it is needed. It can be implemented in the following two ways, including:
  • the first method if the data stored in the field of the data table exceeds the maximum length of the field, it is determined that the field is a full-text search field.
  • the second method if the data table has a search according to a keyword in the index field, then the field is determined to be a full-text search field.
  • the first method is that the control node performs statistics on the length of data stored in each field, and compares with the maximum length of each field to determine a full-text search field and a non-full-text search field, and the control node pairs each field.
  • the type is identified.
  • the second method is that the keywords in the index field are identified by the client, and the control node is only used to store these identifiers and identify the meaning of these tags.
  • the control node identifies each field according to this correspondence.
  • the first data storage instruction is specifically configured to instruct the full-text search data node to create an index for the full-text search field or update an index stored by the full-text search data node.
  • the second data storage instruction is specifically configured to instruct the underlying database node to perform structured data storage.
  • an embodiment of the present invention provides a data retrieval method.
  • the method is implemented by a control node, as shown in FIG. 2, the method includes:
  • the search request may be a standard query statement specified by the SQL, for example, "Sel ect id, name from A where comment Like ' roman ' group by name” , the statement means that the query "comment” from the table A contains “roman” " Corresponding id and name, and sort by name.
  • the first method in the to-be-retrieved field is determined according to the metadata of the to-be-retrieved data table: if at least one field in the to-be-searched field exists in the data to be retrieved in the to-be-retrieved data table If the maximum length of the at least one field is exceeded, then the at least one field is determined to be a full-text search field.
  • the second method if there is a field in the field to be retrieved that needs to be retrieved according to a keyword in the index field, it is determined that the field that needs to be retrieved according to the keyword in the index field is a full-text search field.
  • identification of the full-text search field by the first method and the second method corresponds to the method of how to mark the full-text search field described in the above step 103.
  • the search instruction mainly includes The following information:
  • the second retrieval instruction here refers to a set of operations to be performed on a particular query database.
  • Each step of the execution instruction describes a specific database operation such as table scan, join, aggregation, sort, and the like.
  • a complete database retrieval instruction which mainly includes the following information:
  • Cost The evaluation of system resource consumption when the data is retrieved
  • rows Retrieves the evaluation of the total number of rows returned. Reflects the selectivity of estimating the conditions of any WHERE clause.
  • width retrieves the total number of bytes of the total number of rows returned, reflecting the size of the data set that satisfies the search criteria.
  • the method for performing aggregation processing on the retrieval result returned by each data node may be: performing equivalent connection according to a specific field.
  • Table 2 shows that the search result obtained by the full-text search data node is id field value 2, 3; the search result obtained by the underlying database node of Table 3 is id field value 1, 2 And 3, and the corresponding Name field information, according to the value of the id field, the search result of the id field value of 2, 3 is merged into a table, such as Table 4 shows.
  • a method for data storage and retrieval provided by an embodiment of the present invention can store data that needs to be full-text searched and data that does not need to be full-text searched separately with different types of data nodes in a parallel database system architecture.
  • the distributed storage of data reduces the redundancy of data storage compared with the unified storage method of the prior art; meanwhile, different types of retrieval are sent to different types of data nodes according to the retrieval request sent by the client.
  • the instruction realizes that the retrieval execution node can perform different types of retrieval on the data to be retrieved according to different retrieval execution instructions.
  • the method provided by the embodiment of the present invention simultaneously provides the full-text search capability and the structured search capability in the same system, and does not need to use two independent retrieval systems for searching, thereby Search steps to improve retrieval efficiency.
  • Example 2 Based on the first distribution manner of the full-text search data node and the underlying database node described in Embodiment 1, the embodiment of the present invention provides a method for performing data storage, which is implemented by a control node, as shown in FIG. Methods include:
  • the second data storage instruction is used to instruct the bottom database node to store the data table full-text search field. Data corresponding to other fields.
  • steps 304 and 305 described herein do not require a significant order of execution in the order of execution.
  • step 303 if it is determined that there is no field that requires full-text search, only step 305 is performed. If it is determined that all of the full-text search fields are required, then only step 304 can be performed.
  • a data loading server including a full-text retrieval data loading server, is provided in the parallel database system framework as described in Embodiment 1.
  • the underlying database load server is configured to load data corresponding to the full-text search field in the data table to the full-text search data node under the control of the control node, so that the full-text search data node stores corresponding
  • the underlying database loading server is configured to load data corresponding to the full-text search field in the data table to the underlying database node under the control of the control node, so that the underlying database node stores the corresponding data.
  • the specific step of the step 304 includes: the control node divides the data table into columns, divides the column fields that need full-text search, and then generates corresponding data loading tasks according to the divided fields and sends them to the full-text search.
  • the full-text search data loading server allocates data corresponding to the column fields that need full-text search to different full-text search data nodes for storage according to a preset distribution strategy.
  • the full-text search data loading task is a process of storing unstructured data of an application or data that needs to support full-text search to a full-text search server, and generally refers to a process of creating an inverted index. If it is structured data, directly through the word segmentation, filtering, creating an inverted index table, otherwise it also contains the data information extraction process. There is no strict industry standard for data loading in full-text search. Generally, the private API is opened to the outside world, and the creation of data index is completed.
  • the pre-set distribution policy may be set to store different information of different regions by different data nodes, for example, node 1 stores information in Beijing, node 2 stores information in Shanghai, and the like.
  • the distribution policy can be set according to actual needs, which is not limited by the embodiment of the present invention.
  • the specific steps of the step 305 include: the control node divides the data table into columns, divides the column fields that do not need full-text search, and then generates corresponding data loading tasks according to the divided fields.
  • the underlying database loading server allocates data corresponding to the column fields that do not need full-text retrieval to different underlying database nodes for storage according to a preset distribution strategy.
  • the underlying database data loading task is to store the specific data of the application in the underlying database through the standard SQL loading statement COPY FROM, for example:
  • the full-text search data node in order to ensure that the retrieval result of the full-text retrieval data node and the retrieval result of the underlying database node can be merged, in the process of generating the data loading task, it is also necessary to determine which fields are the primary key fields, that is, mas ter Key field, these fields are used to represent the association between data stored in different nodes. Specifically, when performing data storage, the full-text search data node must store the data corresponding to the field of the main key field in addition to the data corresponding to the field that needs full-text search, and the storage method of the underlying database node is similar.
  • An embodiment of the present invention further provides a method for performing data storage, where the method is applied to a second distribution manner of a full-text search data node and an underlying database node as described in the embodiment, as shown in FIG. 4, in this method.
  • the method further includes: step 306, the data table is divided into rows; then the step 307 is replaced, the step is 307 is data that needs to be stored for each row, and the full-text search field of each row is determined according to the metadata of the data table.
  • the execution of the other steps is the same as the method of performing data storage corresponding to the first distribution mode of the full-text search data node and the underlying database node in this embodiment.
  • a method for performing data storage according to an embodiment of the present invention may be implemented in a parallel database system architecture, where data required for full-text retrieval and data not required for full-text retrieval are separately stored on different types of data nodes.
  • the distributed storage of data reduces the redundancy of data storage compared to the unified storage method of the prior art.
  • the data table created by the control node is represented by a data table as shown in Table 5 below.
  • the data table name is A, and there are three different fields: id, name, comment, where id is mas Ter key field, comment field requires full-text search.
  • the id field and the name field are stored in the underlying database node, and the id field and the co ent field are stored in the full-text retrieval node.
  • the method for performing data retrieval provided by the embodiment of the present invention, as shown in FIG. 5, includes:
  • the control node receives a retrieval request sent by the client.
  • the query corresponding to the search request is Select id, name from A where comment Like 'roman' group by name, and the statement means that the query "comment" contains the id and name corresponding to "roman" from the table A. And use name to perform 4 unordered.
  • the control node determines, according to the retrieval request, a field to be retrieved and a data table to be retrieved.
  • the control node acquires metadata of the data table to be retrieved, and determines a full-text search field in the to-be-searched field according to the metadata of the data table to be retrieved.
  • the control node sends a first retrieval instruction to the full-text retrieval data node according to the full-text search field in the to-be-retrieved field.
  • the underlying database node sends a second retrieval instruction.
  • the full-text search data node searches the to-be-searched data stored in the full-text search data node according to the first search instruction, and obtains a full-text search node search result. 407.
  • the bottom database node searches the data to be retrieved stored by the underlying database node according to the second retrieval instruction, and obtains a retrieval result of the underlying database node.
  • the retrieval results of the full-text search nodes are shown in Table 6 below, and the results of the underlying database nodes are as shown in Table 7 below.
  • the control node receives the full-text search node search result sent by the full-text search data node and the bottom-level database node search result sent by the underlying database node.
  • the control node aggregates the retrieval result of the full-text retrieval node sent by the received full-text retrieval node and the retrieval result of the underlying database node sent by the underlying database node, to obtain a complete retrieval result.
  • the complete search results obtained here are shown in Table 8 below.
  • the control node sends the complete search result to the client.
  • search results described in Tables 6 to 8 above are the search results generated based on the first distribution manner described in Embodiment 1, that is, the pool corresponding to the full-text search node returns a full-text search result to the control node.
  • the pool corresponding to the underlying database node returns an underlying data retrieval result to the control node, and the control node aggregates the two types of retrieval results.
  • each data retrieval node returns its own retrieval result.
  • the retrieval results of the three data retrieval nodes can be as follows: To the table shown in Table 11. Table 9 Data retrieval node 1 search results:
  • the control node aggregates the three search results to obtain a complete search result.
  • the search results are the same as in Table 8 above.
  • a method for data retrieval provided by an embodiment of the present invention can send different types of retrieval instructions to different types of data nodes according to a retrieval request sent by a client in a parallel database system architecture, so that the retrieval execution node can be Different retrieval execution instructions simultaneously perform different types of retrieval on the retrieved data.
  • the method provided by the embodiment of the present invention simultaneously provides the full-text search capability and the structured search capability in the same system, and does not need to use two independent retrieval systems for searching, thereby Search step, improved search efficiency Rate.
  • An embodiment of the present invention provides a control node for performing data storage, which may be applied to a parallel database system, where the parallel database system includes the control node and a data node; and the data node includes a full-text search data node and a bottom layer.
  • the control node includes: a receiving unit 51, a creating unit 52, a determining unit 53, and a sending unit 54.
  • the receiving unit 51 is configured to receive a data storage request sent by the client.
  • the creating unit 52 is configured to create a data table according to the data storage request received by the receiving unit 51.
  • the determining unit 53 is configured to determine a full-text search field of the data table according to the metadata of the data table created by the creating unit 52.
  • the sending unit 54 is configured to send, according to the full-text search field determined by the determining unit 53, a first data storage instruction to the full-text search data node, where the first data storage instruction is used to indicate that the full-text search data is to be
  • the node stores data corresponding to the full-text search field; and sends a second data storage instruction to the bottom database node according to a field other than the data table full-text search field determined by the determining unit 53; the second data storage instruction is used for Instructing the underlying database node to store data corresponding to other fields than the full-text search field of the data table.
  • the determining unit 53 is specifically configured to: when the data stored in the field of the data table exceeds a maximum length of the field, determine that the field is a full-text search field; and have an index according to the index in the data table. When the keywords in the field are retrieved, it is determined that the field is a full-text search field.
  • control node further includes: a segmentation unit 55. Before the full-text search field of the data table, the data table is segmented by row.
  • the determining unit 53 is further configured to determine, for each row of data, a full-text search field of each row according to metadata of the data table.
  • the first data storage instruction is used to indicate that the full-text search data node stores data corresponding to the full-text search field, and specifically includes: indicating the full-text search data section The point creates an index for the full-text search field or updates an index stored by the full-text search data node.
  • the embodiment of the present invention further provides a control node for performing data retrieval, which can be applied to a parallel database system, where the parallel database system includes a control node and a data node; and the data node includes a full-text search data node and an underlying database.
  • the control node includes: a receiving unit 61, an obtaining unit 62, a determining unit 63, a sending unit 64, a receiving unit 65, and a convergence unit 66.
  • the receiving unit 61 is configured to receive a retrieval request sent by the client.
  • the obtaining unit 62 is configured to obtain a to-be-retrieved field and a to-be-retrieved data table according to the retrieval request received by the receiving unit 61.
  • a determining unit 63 configured to determine, according to the retrieval request, a field to be retrieved and a data table to be retrieved; and determine, according to the metadata of the data table to be retrieved, a full-text search field in the field to be retrieved.
  • the sending unit 64 is configured to send, according to the full-text search field in the to-be-retrieved field determined by the determining unit 63, a first retrieving instruction, where the first retrieving instruction is used,
  • the receiving unit 65 is configured to receive a search result returned by each of the full-text search data node and the bottom database node.
  • the aggregation unit 66 is configured to perform a convergence process on the search results returned by the full-text search data node and the bottom database node received by the receiving unit 65.
  • the sending unit 64 is further configured to return the convergence result to the client as a complete retrieval result.
  • the determining unit 63 is configured to: at least one field in the to-be-retrieved field, where the data stored in the to-be-retrieved data table exceeds a maximum length of the at least one field, Determining that the at least one field is a full-text search field; when there is a field in the to-be-searched field that needs to be retrieved according to a keyword in the index field, determining that the field that needs to be retrieved according to a keyword in the index field is the full text Retrieve the field.
  • the device for storing and retrieving data can store data that needs to be full-text searched and data that does not need to be full-text searched in different types of data nodes in a parallel database system architecture.
  • the distributed storage of data reduces the redundancy of data storage compared with the unified storage method of the prior art; meanwhile, different types of retrieval are sent to different types of data nodes according to the retrieval request sent by the client.
  • the instruction realizes that the retrieval execution node can perform different types of retrieval on the data to be retrieved according to different retrieval execution instructions.
  • the method provided by the embodiment of the present invention simultaneously provides the full-text search capability and the structured search capability in the same system, and does not need to use two independent retrieval systems for searching, thereby Search steps to improve retrieval efficiency.
  • An embodiment of the present invention provides a control node for performing data storage, which may be applied to a parallel database system, where the parallel database system includes the control node and a data node; and the data node includes a full-text search data node and a bottom layer.
  • the processor 71 is configured to receive a data storage request sent by the client, create a data table according to the data storage request, and determine a full-text search field of the data table according to the metadata of the data table; according to the full-text search field, Transmitting, to the full-text search data node, a first data storage instruction, where the first data storage instruction is used to indicate that the full-text search data node stores data corresponding to the full-text search field; a field other than the field, sending a second data storage instruction to the underlying database node; the second data storage instruction is configured to instruct the bottom database node to store data corresponding to a field other than the full-text search field of the data table.
  • the memory 72 is configured to store a data storage request, metadata of the data table, a first data storage instruction, and a second data storage instruction.
  • the processor 71 is specifically configured to store data in a field of the data table.
  • the field is determined to be a full-text search field; when the data table is searched according to a keyword in the index field, the field is determined to be a full-text search field.
  • the processor is further configured to divide the data table into rows before determining the full-text search field of the data table according to the metadata of the data table. For each row of data that needs to be stored, the full-text search field of each row is determined based on the metadata of the data table.
  • the first data storage instruction is used to indicate that the full-text search data node stores the data corresponding to the full-text search field, and specifically includes: indicating that the full-text search data node is the full-text search field Create an index or update an index stored by the full-text search data node.
  • the embodiment of the present invention further provides a control node for performing data retrieval, which can be applied to a parallel database system, where the parallel database system includes a control node and a data node; and the data node includes a full-text search data node and an underlying database.
  • the control node includes: a processor 73 and a memory 74.
  • the processor 73 is configured to receive a retrieval request sent by the client, determine, according to the retrieval request, a field to be retrieved and a data table to be retrieved, obtain metadata of the data table to be retrieved, and according to the data table to be retrieved Metadata, determining a full-text search field in the to-be-searched field; sending, according to the full-text search field in the to-be-retrieved field, a first search instruction to the full-text search data node, where the first search instruction is used to indicate The full-text search data node retrieves a field in the to-be-retrieved field, and sends a second retrieval instruction to the underlying database node, where the second retrieval instruction is used to refer to corresponding data; receiving a full-text retrieval data node and an underlying database node The returned search result is aggregated and the search result returned by the full-text search data node and the underlying database node is aggregated, and the aggregated result is returned to the client as a complete search result.
  • the memory 74 is configured to store a retrieval request, metadata of the data table to be retrieved, a first retrieval instruction, a second retrieval instruction, and a retrieval result.
  • the processor 73 is specifically configured to have at least one word in the to-be-retrieved field.
  • the data stored in the data table to be retrieved exceeds the maximum length of the at least one field, and the at least one field is determined to be a full-text search field; in the field to be retrieved, it is required to perform a keyword according to the index field.
  • the field is retrieved, it is determined that the field that needs to be retrieved according to the keyword in the index field is a full-text search field.
  • the device for storing and retrieving data can store data that needs to be full-text searched and data that does not need to be full-text searched in different types of data nodes in a parallel database system architecture.
  • the distributed storage of data reduces the redundancy of data storage compared with the unified storage method of the prior art; meanwhile, different types of retrieval are sent to different types of data nodes according to the retrieval request sent by the client.
  • the instruction realizes that the retrieval execution node can perform different types of retrieval on the data to be retrieved according to different retrieval execution instructions.
  • the method provided by the embodiment of the present invention simultaneously provides the full-text search capability and the structured search capability in the same system, and does not need to use two independent retrieval systems for searching, thereby Search steps to improve retrieval efficiency.
  • the present invention can be implemented by means of software plus necessary general hardware, and of course, by hardware, but in many cases, the former is a better implementation. .
  • the technical solution of the present invention which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a readable storage medium, such as a floppy disk of a computer.
  • a hard disk or optical disk or the like includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to the information processing field. Disclosed are a method and device for storing and retrieving data, so that distributed storage of data requiring full-text retrieval and data not requiring full-text retrieval in the parallel database is implemented, thereby reducing the storage redundancy; meanwhile, the full-text retrieval and structured retrieval of the data to be retrieved are implemented in the parallel database, thereby improving the retrieval efficiency. The present invention comprises: a control node, in a parallel database system architecture, storing data requiring full-text retrieval and data not requiring full-text retrieval on different types of data nodes respectively; according to a retrieval request sent from a client, sending different types of retrieval instructions to different types of data nodes, so that a retrieval execution node can simultaneously execute, according to different retrieval execution instructions, different types of retrieval operations on the data to be retrieved. Embodiments of the present invention are mainly applicable to data storage and retrieval processes.

Description

一种进行数据存储和检索的方法及装置  Method and device for performing data storage and retrieval
技术领域 Technical field
本发明涉及信息处理领域, 尤其涉及一种进行数据存储和检索的方法及 装置。  The present invention relates to the field of information processing, and in particular, to a method and apparatus for performing data storage and retrieval.
背景技术 Background technique
不同的应用存在不同的数据检索需求。 对于不同应用的不同的数据检索 需求, 现有的技术方案是使用结构化检索语言 SQL对结构化数据进行检索, 使用现有的全文检索引擎对非结构化数据进行全文检索。 但是, 有一些应用 需要对数据中的一部分信息进行结构化检索, 并对该数据中的另一部分信息 进行全文检索之后, 才能得到想要的结果。 现有技术提供的技术方案是通过 将待检索的数据向发送到 SQL检索系统中进行结构化检索, 并将待检索的数 据发送至全文检索引擎的系统中进行全文检索, 然后对着两个独立的检索系 统得到的检索结果进行整合, 当需要让全文检索引擎对大量的待检索数据进 行检索时, 全文检索过程中会产生比待检索数据多得多的数据, 极大地降低 了全文检索效率, 从而降低了整个检索过程的检索效率。  Different applications have different data retrieval needs. For different data retrieval needs of different applications, the existing technical solution is to use structured search language SQL to retrieve structured data, and use existing full-text search engine to perform full-text search on unstructured data. However, there are applications that require a structured search of a portion of the data and a full-text search of another portion of the data to get the desired results. The technical solution provided by the prior art is to perform a structured search by sending the data to be retrieved to the SQL retrieval system, and sending the data to be retrieved to the full-text search engine for full-text search, and then facing the two independent The retrieval results obtained by the retrieval system are integrated. When the full-text search engine needs to retrieve a large amount of data to be retrieved, more data is generated in the full-text retrieval process than the data to be retrieved, which greatly reduces the efficiency of full-text retrieval. Thereby reducing the retrieval efficiency of the entire retrieval process.
发明内容 Summary of the invention
本发明的实施例提供一种进行数据存储和检索的方法及装置, 实现了需 要进行全文检索的数据和不需要进行全文检索的数据在并行数据库中的分布 式存储, 降低了存储冗余度; 同时, 能够实现在并行数据库下对待检索数据 的全文检索和结构化检索, 提高了检索效率。  Embodiments of the present invention provide a method and apparatus for performing data storage and retrieval, which realizes distributed storage of data requiring full-text search and data that does not need to be full-text searched in a parallel database, and reduces storage redundancy; At the same time, full-text search and structured search of the retrieved data under the parallel database can be realized, and the retrieval efficiency is improved.
为达到上述目的, 本发明的实施例采用如下技术方案:  In order to achieve the above object, the embodiment of the present invention adopts the following technical solutions:
一种进行数据存储的方法, 其特征在于, 所述方法应用于并行数据库系 统中, 所述并行数据库系统包括控制节点和数据节点; 所述数据节点包括全 文检索数据节点和底层数据库节点; 所述方法包括:  A method for performing data storage, wherein the method is applied to a parallel database system, the parallel database system comprising a control node and a data node; the data node comprising a full-text search data node and an underlying database node; Methods include:
所述控制节点接收客户端发送的数据存储请求;  Receiving, by the control node, a data storage request sent by the client;
根据所述数据存储请求创建数据表; 根据所述数据表的元数据确定所述数据表的全文检索字段; 根据所述全文检索字段, 向所述全文检索数据节点发送第一数据存储指 令; 所述第一数据存储指令用于指示所述将所述全文检索数据节点存储所述 全文检索字段对应的数据; Creating a data table according to the data storage request; Determining, according to the metadata of the data table, a full-text search field of the data table; sending, according to the full-text search field, a first data storage instruction to the full-text search data node; the first data storage instruction is used to indicate Storing the full-text search data node to store data corresponding to the full-text search field;
根据所述数据表全文检索字段以外的字段, 向所述底层数据库节点发送 第二数据存储指令; 所述第二数据存储指令用于指示所述底层数据库节点存 储所述数据表全文检索字段以外的其他字段对应的数据。  And sending, according to the field other than the full-text search field of the data table, a second data storage instruction to the underlying database node; the second data storage instruction is configured to instruct the bottom database node to store the data table full-text search field The data corresponding to other fields.
一种进行数据检索的方法, 其特征在于, 应用于并行数据库系统中, 所 述并行数据库系统包括控制节点和数据节点; 所述数据节点包括全文检索数 据节点和底层数据库节点; 所述方法包括:  A method for data retrieval, characterized in that it is applied to a parallel database system, the parallel database system comprising a control node and a data node; the data node comprises a full-text search data node and an underlying database node; the method comprises:
所述控制节点接收客户端发送的检索请求;  The control node receives a retrieval request sent by a client;
根据所述检索请求, 确定待检索字段和待检索数据表;  Determining a field to be retrieved and a data table to be retrieved according to the retrieval request;
获取待检索数据表的元数据, 并根据所述待检索数据表的元数据, 确定 所述待检索字段中的全文检索字段;  Obtaining metadata of the data table to be retrieved, and determining, according to the metadata of the data table to be retrieved, a full-text search field in the field to be retrieved;
根据所述待检索字段中的全文检索字段, 向所述全文检索数据节点发送 第一检索指令, 所述第一检索指令用于指示所述全文检索数据节点检索所述 待检索字段中的全文检索字段对应的数据; 库节点发送第二检索指令, 所述第二检索指令用于指示所述底层数据库节点 接收全文检索数据节点和底层数据库节点各自返回的检索结果, 并将全 文检索数据节点和底层数据库节点各自返回的检索结果进行汇聚处理, 将汇 聚结果作为完整的检索结果返回给所述客户端。  And sending, by the full-text search field in the to-be-retrieved field, a first search instruction to the full-text search data node, where the first search instruction is used to instruct the full-text search data node to retrieve a full-text search in the to-be-retrieved field The data corresponding to the field; the library node sends a second retrieval instruction, where the second retrieval instruction is used to instruct the bottom database node to receive the retrieval result returned by the full-text retrieval data node and the underlying database node, and retrieve the data node and the bottom layer in full text. The search results returned by the database nodes are aggregated, and the aggregated result is returned to the client as a complete search result.
一种用于进行数据存储的控制节点, 其特征在于, 应用于并行数据库系 统中, 所述并行数据库系统包括所述控制节点和数据节点; 所述数据节点包 括全文检索数据节点和底层数据库节点; 所述控制节点包括:  A control node for performing data storage, characterized in that, in a parallel database system, the parallel database system includes the control node and a data node; the data node includes a full-text search data node and an underlying database node; The control node includes:
接收单元, 用于接收客户端发送的数据存储请求; 创建单元, 用于根据所述接收单元接收到的数据存储请求创建数据表; 确定单元, 用于根据所述创建单元创建的数据表的元数据确定所述数据 表的全文检索字段; a receiving unit, configured to receive a data storage request sent by the client; a creating unit, configured to create a data table according to the data storage request received by the receiving unit, and a determining unit, configured to determine a full-text search field of the data table according to the metadata of the data table created by the creating unit;
发送单元, 用于根据所述确定单元确定的全文检索字段, 向所述全文检 索数据节点发送第一数据存储指令; 所述第一数据存储指令用于指示所述将 所述全文检索数据节点存储所述全文检索字段对应的数据; 根据所述确定单 元确定的数据表全文检索字段以外的字段, 向所述底层数据库节点发送第二 数据存储指令; 所述第二数据存储指令用于指示所述底层数据库节点存储所 ^"^^,王一入 4 于、 千又 I - « 六 千又 、」
Figure imgf000005_0001
a sending unit, configured to send, according to the full-text search field determined by the determining unit, a first data storage instruction to the full-text search data node, where the first data storage instruction is used to indicate that the full-text search data node is stored Data corresponding to the full-text search field; sending a second data storage instruction to the underlying database node according to a field other than the full-text search field of the data table determined by the determining unit; the second data storage instruction is used to indicate the The underlying database node stores ^"^^, Wang Yi into 4, thousand and I - « six thousand again,"
Figure imgf000005_0001
一种用于进行数据检索的控制节点, 其特征在于, 应用于并行数据库系 统中, 所述并行数据库系统包括控制节点和数据节点; 所述数据节点包括全 文检索数据节点和底层数据库节点; 所述控制节点包括:  A control node for performing data retrieval, characterized in that, in a parallel database system, the parallel database system includes a control node and a data node; the data node includes a full-text search data node and an underlying database node; Control nodes include:
接收单元, 用于接收客户端发送的检索请求;  a receiving unit, configured to receive a retrieval request sent by the client;
获取单元, 用于根据所述接收单元接收到的检索请求, 获取待检索字段 和待检索数据表;  An obtaining unit, configured to obtain, according to the retrieval request received by the receiving unit, a field to be retrieved and a data table to be retrieved;
确定单元, 用于根据所述检索请求, 确定待检索字段和待检索数据表; 根据所述待检索数据表的元数据, 确定所述待检索字段中的全文检索字段; 向所述全文检索数据节点发送第一检索指令, 所述第一检索指令用于指示所  a determining unit, configured to determine a field to be retrieved and a data table to be retrieved according to the retrieval request; determine, according to the metadata of the data table to be retrieved, a full-text search field in the field to be retrieved; and retrieve data from the full text The node sends a first retrieval instruction, where the first retrieval instruction is used to indicate
底层数据库节点发送第二检索指令, 所述第二检索指令用于指示所述底层数 接收单元, 用于接收全文检索数据节点和底层数据库节点各自返回的检 索结果; The underlying database node sends a second retrieval instruction, and the second retrieval instruction is configured to instruct the bottom number receiving unit to receive the retrieval result returned by each of the full-text retrieval data node and the underlying database node;
汇聚单元, 用于将所述接收单元接收到的全文检索数据节点和底层数据 库节点各自返回的检索结果进行汇聚处理; 所述发送单元, 还用于将汇聚结果作为完整的检索结果返回给所述客户 端。 An aggregation unit, configured to aggregate the search results returned by the full-text search data node and the bottom database node received by the receiving unit; The sending unit is further configured to return the convergence result to the client as a complete retrieval result.
本发明实施例提供的一种进行数据存储和检索的方法及装置, 可以在并 行数据库系统架构中, 对需要进行全文检索的数据和不需要进行全文检索的 数据分别存储与不同类型的数据节点上, 实现了对数据的分布式存储, 相对 于现有技术的统一存储方式来说, 降低了数据存储的冗余度; 同时, 根据客 户端发送的检索请求, 向不同类型的数据节点发送不同类型的检索指令, 实 现了检索执行节点可以根据不同的检索执行指令同时对待检索数据进行不同 类型的检索。 与现有技术提供的方法相比, 本发明实施例提供的方法在同一 个系统中同时提供了全文检索能力和结构化检索能力, 不需要使用两个独立 的检索系统进行检索, 从而筒化了检索步骤, 提高的检索效率。  The method and device for data storage and retrieval provided by the embodiments of the present invention can store data that needs to be full-text searched and data that does not need to be full-text searched in different types of data nodes in a parallel database system architecture. The distributed storage of data is realized, and the redundancy of data storage is reduced compared with the unified storage method of the prior art; meanwhile, different types are sent to different types of data nodes according to the retrieval request sent by the client. The retrieval instruction realizes that the retrieval execution node can perform different types of retrieval on the retrieval data according to different retrieval execution instructions. Compared with the method provided by the prior art, the method provided by the embodiment of the present invention simultaneously provides the full-text search capability and the structured search capability in the same system, and does not need to use two independent retrieval systems for searching, thereby Search steps to improve retrieval efficiency.
附图说明 DRAWINGS
施例或现有技术描述中所需要使用的附图作筒单地介绍, 显而易见地, 下面 描述中的附图仅仅是本发明的一些实施例, 对于本领域普通技术人员来讲, 在不付出创造性劳动的前提下, 还可以根据这些附图获得其他的附图。 The drawings used in the examples or the description of the prior art are described in a single manner. It is obvious that the drawings in the following description are only some embodiments of the present invention, and those of ordinary skill in the art do not pay Other drawings can also be obtained from these drawings on the premise of creative labor.
图 1为本发明实施例 1中的一种进行数据存储的方法流程图;  1 is a flowchart of a method for performing data storage according to Embodiment 1 of the present invention;
图 2为本发明实施例 1中的一种进行数据检索的方法流程图;  2 is a flowchart of a method for performing data retrieval according to Embodiment 1 of the present invention;
图 3为本发明实施例 2中的一种进行数据存储的方法流程图;  3 is a flowchart of a method for performing data storage according to Embodiment 2 of the present invention;
图 4为本发明实施例 2中的另一种进行数据存储的方法流程图; 图 5为本发明实施例 3中的一种进行数据检索的方法流程图;  FIG. 4 is a flowchart of another method for performing data storage according to Embodiment 2 of the present invention; FIG. 5 is a flowchart of a method for performing data retrieval according to Embodiment 3 of the present invention;
图 6为本发明实施例 4 中的一种用于进行数据存储的控制节点的组成框 图;  6 is a block diagram of a control node for performing data storage in Embodiment 4 of the present invention;
图 7为本发明实施例 4 中的另一种用于进行数据存储的控制节点的组成 框图;  7 is a block diagram showing another composition of a control node for performing data storage in Embodiment 4 of the present invention;
图 8为本发明实施例 4 中的一种用于进行数据检索的控制节点的组成框 图; FIG. 8 is a block diagram of a control node for performing data retrieval according to Embodiment 4 of the present invention; Figure
图 9为本发明实施例 5 中的一种用于进行数据存储的控制节点的组成框 图;  9 is a block diagram showing a composition of a control node for performing data storage in Embodiment 5 of the present invention;
图 1 0为本发明实施例 5中的一种用于进行数据检索的控制节点的组成框 图。  Figure 10 is a block diagram showing the composition of a control node for performing data retrieval in Embodiment 5 of the present invention.
具体实施方式 下面将结合本发明实施例中的附图, 对本发明实施例中的技术方案进行 清楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明一部分实施例, 而 不是全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没有作 出创造性劳动前提下所获得的所有其他实施例, 都属于本发明保护的范围。 The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. example. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
实施例 1  Example 1
本发明实施例提供了一种进行数据存储和检索的方法, 其特征在于, 所 述方法应用于并行数据库系统中, 所述并行数据库系统包括控制节点和数据 节点; 所述数据节点包括全文检索数据节点和底层数据库节点。  An embodiment of the present invention provides a method for performing data storage and retrieval, wherein the method is applied to a parallel database system, where the parallel database system includes a control node and a data node; and the data node includes full-text search data. Node and underlying database nodes.
其中, 所述数据节点包括全文检索数据节点和底层数据库节点, 所述全 文检索数据节点用于存储需要进行全文检索字段对应的数据, 并针对自身存 储的数据进行全文检索; 所述底层数据库节点用于存储不需要进行全文检索 字段对应的数据, 并针对自身存储的数据进行结构化检索。  The data node includes a full-text search data node and an underlying database node, and the full-text search data node is configured to store data corresponding to the full-text search field, and perform full-text search on the data stored by itself; The data corresponding to the full-text search field is not stored, and the structured search is performed on the data stored by itself.
在此并行数据库系统架构中, 所述全文检索数据节点和底层数据库节点 可以由如下两种分布方式:  In this parallel database system architecture, the full-text retrieval data node and the underlying database node can be distributed in the following two ways:
第一种分布方式: 将所有的全文检索数据节点都设置在一个池中, 将所 述的底层数据库节点都设置在另一个池中, 这两个池在逻辑上是独立的, 物 理上可以分别部署在两个设备, 也可以集成在同一个设备上。 在这种方式下, 设置有全文检索数据节点的池存储需要进行全文检索的数据, 设置有底层数 据库节点的池存储不需要进行全文检索的数据。  The first distribution method: all the full-text retrieval data nodes are set in one pool, and the underlying database nodes are all set in another pool. The two pools are logically independent and can be physically separated. It can be deployed on two devices or it can be integrated on the same device. In this way, the pool in which the full-text search data node is set stores data that needs to be retrieved in full-text, and the pool in which the underlying database node is set stores data that does not need to be retrieved in full-text.
第二种分布方式: 将一个全文检索数据节点与一个底层数据库节点作为 一对节点, 共同设置在同一个数据检索节点上。 这个数据检索节点只包括一 个全文检索数据节点与一个底层数据库节点, 并且不同的数据检索节点设置 不同的全文检索数据节点与底层数据库节点。 在这种方式下, 每个数据检索 节点内部存储不同的数据。 在同一个数据检索节点中, 全文检索数据节点存 储需要进行全文检索的数据, 底层数据库节点存储不需要进行全文检索的数 据, 并且, 在同一个数据检索节点中的全文检索数据节点和底层数据库节点 存储的数据需要对应相同的分布键值等具有标识功能的信息。 The second distribution method: a full-text search data node and an underlying database node as a pair of nodes are set together on the same data retrieval node. This data retrieval node only includes one The full-text retrieval data node and an underlying database node, and different data retrieval nodes set different full-text retrieval data nodes and underlying database nodes. In this way, each data retrieval node internally stores different data. In the same data retrieval node, the full-text retrieval data node stores data that needs to be retrieved in full-text, the underlying database node stores data that does not need to be retrieved in full-text, and the full-text retrieval data node and the underlying database node in the same data retrieval node The stored data needs to have information corresponding to the same distribution key value and the like.
基于上述并行数据库系统架构, 本发明实施例提供了一种进行数据存储 的方法, 该方法可以由控制节点实现, 如图 1所示, 该方法包括:  Based on the foregoing parallel database system architecture, an embodiment of the present invention provides a method for performing data storage, which may be implemented by a control node. As shown in FIG. 1, the method includes:
101、 接收客户端发送的数据存储请求。  101. Receive a data storage request sent by the client.
102、 根据所述数据存储请求创建数据表。  102. Create a data table according to the data storage request.
其中, 所述数据表可以使用结构化数据存储形式来表示, 其中存储的内 容由客户端提供。 下表 1为数据表的一个实施例的具体结构。  The data table may be represented by a structured data storage form, wherein the stored content is provided by a client. Table 1 below is a specific structure of one embodiment of the data table.
数据表的结构  Structure of the data table
Figure imgf000008_0001
Figure imgf000008_0001
其中, Name、 Provider和 Summary均为字段名称, 与各字段对应的各列 中存储的就是各个字段各自对应的数据。  Name, Provider, and Summary are field names, and each column corresponding to each field stores data corresponding to each field.
103、 根据所述数据表的元数据确定所述数据表的全文检索字段。  103. Determine a full-text search field of the data table according to metadata of the data table.
其中, 所述元数据用于指示数据表的所有字段中哪些是全文检索字段, 哪些不是全文检索字段。 元数据的表现形式可以具体设置在数据表内, 标识 每个字段存储的数据是否超过其字段的最大长度, 或者, 将元数据设置为一 个与数据表对应的, 但设置在数据表外独立存储的数据表。 元数据可以设置 为用 "0" "1" 或者 "yes" "no" 等组合表示方法来表示哪些需要进行全文检 索哪些不需要, 例如, "0" 则表示不需要, "1" 则表示需要; 或者, "no" 则 表示不需要, "yes" 则表示需要。 以通过以下两种方法实现, 具体包括: The metadata is used to indicate which of the all fields of the data table are full-text search fields, and which are not full-text search fields. The representation of the metadata can be specifically set in the data table, and it is identified whether the data stored in each field exceeds the maximum length of the field, or the metadata is set to correspond to the data table, but is set to be independently stored outside the data table. Data sheet. Metadata can be set to use a combination of "0""1" or "yes""no" to indicate which ones need to be full-text checked. Which is not required, for example, "0" means no, "1" means it is needed; or "no" means no, "yes" means it is needed. It can be implemented in the following two ways, including:
第一种方法: 如果所述数据表的字段存储的数据超过所述字段的最大长 度, 则确定所述字段为全文检索字段。  The first method: if the data stored in the field of the data table exceeds the maximum length of the field, it is determined that the field is a full-text search field.
第二中方法: 如果所述数据表中有根据索引字段中的关键字进行检索时, 则确定所述字段为全文检索字段。  The second method: if the data table has a search according to a keyword in the index field, then the field is determined to be a full-text search field.
其中, 所述第一种方法是由控制节点对各字段存储的数据长度进行统计, 并与各字段的最大长度进行比较, 来确定全文检索字段和非全文检索字段, 并由控制节点对各字段的类型进行标识。  The first method is that the control node performs statistics on the length of data stored in each field, and compares with the maximum length of each field to determine a full-text search field and a non-full-text search field, and the control node pairs each field. The type is identified.
第二种方法是索引字段中的关键字由客户端进行标识的, 而控制节点仅 用于存储这些标识, 并识别这些标记的含义。 例如, 客户端设置索引字段为 "va lue" , 并当 "va lue=0"时对应不需要进行全文检索的字段, 当 "va lue=l " 时, 对应需要进行全文检索的字段。 则控制节点就按照这一对应关系来识别 各个字段。  The second method is that the keywords in the index field are identified by the client, and the control node is only used to store these identifiers and identify the meaning of these tags. For example, the client sets the index field to "va lue" and corresponds to the field that does not need to be full-text searched when "va lue=0". When "va lue=l ", it corresponds to the field that needs to be searched in full. Then the control node identifies each field according to this correspondence.
104、 根据所述全文检索字段, 向所述全文检索数据节点发送第一数据存 储指令; 所述第一数据存储指令用于指示所述将所述全文检索数据节点存储 所述全文检索字段对应的数据。  104. Send, according to the full-text search field, a first data storage instruction to the full-text search data node, where the first data storage instruction is used to indicate that the full-text search data node stores the corresponding full-text search field data.
其中, 所述第一数据存储指令具体用于指示所述全文检索数据节点为所 述全文检索字段创建索引或者更新所述全文检索数据节点存储的索引。  The first data storage instruction is specifically configured to instruct the full-text search data node to create an index for the full-text search field or update an index stored by the full-text search data node.
105、 根据所述数据表全文检索字段以外的字段, 向所述底层数据库节点 发送第二数据存储指令; 所述第二数据存储指令用于指示所述底层数据库节 点存储所述数据表全文检索字段以外的其他字段对应的数据。  105. Send, according to the field other than the full-text search field of the data table, a second data storage instruction to the bottom database node, where the second data storage instruction is used to instruct the bottom database node to store the full-text search field of the data table Data corresponding to other fields.
其中, 所述第二数据存储指令具体用于指示底层数据库节点进行结构化 数据存储。  The second data storage instruction is specifically configured to instruct the underlying database node to perform structured data storage.
基于上述并行数据库系统架构, 本发明实施例提供了一种进行数据检索 的方法, 该方法由控制节点实现, 如图 2所示, 该方法包括; Based on the foregoing parallel database system architecture, an embodiment of the present invention provides a data retrieval method. Method, the method is implemented by a control node, as shown in FIG. 2, the method includes:
201、 接收客户端发送的检索请求。  201. Receive a retrieval request sent by a client.
其中, 所述检索请求可以为 SQL 规定的标准查询语句, 例如 "Sel ect id, name from A where comment Like ' roman ' group by name" , 该语句意 思为从表 A中查询 "comment" 包含 "roman" 对应的 id和 name, 并用 name 进行排序。  The search request may be a standard query statement specified by the SQL, for example, "Sel ect id, name from A where comment Like ' roman ' group by name" , the statement means that the query "comment" from the table A contains "roman" " Corresponding id and name, and sort by name.
202、 根据所述检索请求, 确定待检索字段和待检索数据表。  202. Determine, according to the retrieval request, a field to be retrieved and a data table to be retrieved.
203、 获取待检索数据表的元数据, 并根据所述待检索数据表的元数据, 确定所述待检索字段中的全文检索字段。  203. Obtain metadata of the data table to be retrieved, and determine, according to the metadata of the data table to be retrieved, a full-text search field in the to-be-searched field.
其中, 所述根据所述待检索数据表的元数据, 确定所述待检索字段中的 第一种方法: 如果所述待检索字段中存在至少一个字段在所述待检索数 据表中存储的数据超过所述至少一个字段的最大长度, 则确定所述至少一个 字段为全文检索字段。  The first method in the to-be-retrieved field is determined according to the metadata of the to-be-retrieved data table: if at least one field in the to-be-searched field exists in the data to be retrieved in the to-be-retrieved data table If the maximum length of the at least one field is exceeded, then the at least one field is determined to be a full-text search field.
第二种方法, 如果所述待检索字段中有需要根据索引字段中的关键字进 行检索的字段时, 则确定所述需要根据索引字段中的关键字进行检索的字段 为全文检索字段。  In the second method, if there is a field in the field to be retrieved that needs to be retrieved according to a keyword in the index field, it is determined that the field that needs to be retrieved according to the keyword in the index field is a full-text search field.
值得说明的是, 第一种方法和第二种方法对于全文检索字段的识别, 与 上述步骤 103中记载的如何标记全文检索字段的方法是对应的。  It should be noted that the identification of the full-text search field by the first method and the second method corresponds to the method of how to mark the full-text search field described in the above step 103.
204、 根据所述待检索字段中的全文检索字段, 向所述全文检索数据节点 发送第一检索指令, 所述第一检索指令用于指示所述全文检索数据节点检索 所述待检索字段中的全文检索字段对应的数据。 数据库节点发送第二检索指令, 所述第二检索指令用于指示所述底层数据库 其中, 这里的第一检索指令, 是指全文检索数据节点接收到搜索请求后, 针对请求中的搜索关键字, 扫描倒排索引表的过程。 该检索指令主要包含了 以下信息: 204. Send, according to the full-text search field in the to-be-retrieved field, a first retrieval instruction to the full-text search data node, where the first retrieval instruction is used to instruct the full-text retrieval data node to retrieve the to-be-retrieved field. The data corresponding to the full-text search field. The database node sends a second retrieval instruction, where the second retrieval instruction is used to indicate the underlying database, wherein the first retrieval instruction herein refers to the search keyword in the request after the full-text retrieval data node receives the search request. The process of scanning the inverted index table. The search instruction mainly includes The following information:
1) 当 前 使用 的 分词 器 类 型 , 例 :¾口 WhitespaceAnalyzer 、 StandardAnalyzer , SimpleAnalyzer , ChineseAnalyzer等。  1) The type of word breaker currently used, for example: 3⁄4 port WhitespaceAnalyzer, StandardAnalyzer, SimpleAnalyzer, ChineseAnalyzer, etc.
2)此次请求中携带的关键字, 例如对于 q=" what is search engine"而言, 关键字就是 search, engine,而' 'what", " is' '以停用词方式被过滤。  2) The keyword carried in this request, for example, for q=" what is search engine", the keyword is search, engine, and ' 'what", " is' ' is filtered by stop words.
3)是否进行拼写检查,例如对于 "aplle", 如果进行拼写检查, 则系统校 正为' 'apple"  3) Whether to perform a spell check, for example, for "aplle", if a spell check is performed, the system corrects as ' 'apple'
4)是否包含同义词, 例如对于"爸爸",如果包含同义词, 则系统将增加" 爹", "干爹"进行搜索。  4) Whether it contains synonyms, for example, "Dad", if it contains synonyms, the system will add "爹", "Dry" to search.
5)搜索命中得分计算,这个值的大小, 决定检索结果的排序。  5) Search hit score calculation, the size of this value, determines the sorting of the search results.
这里的第二检索指令是指对一个特定的查询数据库要执行的一系列操作 的集合 , 执行指令的每一步描述了一个特定的数据库操作如 table scan, join, aggregation, sort等。  The second retrieval instruction here refers to a set of operations to be performed on a particular query database. Each step of the execution instruction describes a specific database operation such as table scan, join, aggregation, sort, and the like.
一个完整的数据库检索指令, 主要包括以下信息:  A complete database retrieval instruction, which mainly includes the following information:
1) Cost:数据被检索时, 对系统资源消耗的评估;  1) Cost: The evaluation of system resource consumption when the data is retrieved;
2) rows:检索返回结果的总行数的评估。 反映了估计任何 WHERE子句条件 的选择性。  2) rows: Retrieves the evaluation of the total number of rows returned. Reflects the selectivity of estimating the conditions of any WHERE clause.
3) width: 检索返回的总行数的总字节数的评估, 反映满足检索条件的数 据集大小。  3) width: Retrieves the total number of bytes of the total number of rows returned, reflecting the size of the data set that satisfies the search criteria.
206、 接收全文检索数据节点和底层数据库节点各自返回的检索结果。 206. Receive a search result returned by each of the full-text search data node and the bottom database node.
207、 将全文检索数据节点和底层数据库节点各自返回的检索结果进行汇 聚处理。 207. Perform a convergence process on the search results returned by the full-text search data node and the underlying database node.
其中, 所述对各个数据节点返回的检索结果进行汇聚处理的实现方法可 以为: 根据一个特定的字段进行等值连接。 例如, 如下表 2、 3、 4所示, 表 2 表示在全文检索数据节点得到的检索结果为 id字段值为 2、 3; 表 3底层数据 库节点得到的检索结果为 id字段值为 1、 2、 3以及对应的 Name字段的信息, 则可根据 id字段的值, 将 id字段值为 2、 3的检索结果合并到一个表中, 如 表 4所示。 The method for performing aggregation processing on the retrieval result returned by each data node may be: performing equivalent connection according to a specific field. For example, as shown in Tables 2, 3, and 4 below, Table 2 shows that the search result obtained by the full-text search data node is id field value 2, 3; the search result obtained by the underlying database node of Table 3 is id field value 1, 2 And 3, and the corresponding Name field information, according to the value of the id field, the search result of the id field value of 2, 3 is merged into a table, such as Table 4 shows.
表 2 全文检索节点检索结果  Table 2 Full-text search node search results
Id  Id
3  3
2  2
表 3 底层数据库节点检索结果  Table 3 Underlying database node search results
Figure imgf000012_0001
Figure imgf000012_0001
表 4 完整的检索结果  Table 4 Complete search results
Figure imgf000012_0002
Figure imgf000012_0002
需要说明的是, 在本实施例中, 结合以上述表 2、 3、 4描述的汇聚处理 方法仅是一种示例。  It should be noted that, in the present embodiment, the aggregation processing method described in connection with the above Tables 2, 3, and 4 is only an example.
208、 将汇聚结果作为完整的检索结果返回给所述客户端。  208. Return the convergence result to the client as a complete retrieval result.
本发明实施例提供的一种进行数据存储和检索的方法, 可以在并行数据 库系统架构中, 对需要进行全文检索的数据和不需要进行全文检索的数据分 别存储与不同类型的数据节点上, 实现了对数据的分布式存储, 相对于现有 技术的统一存储方式来说, 降低了数据存储的冗余度; 同时, 根据客户端发 送的检索请求, 向不同类型的数据节点发送不同类型的检索指令, 实现了检 索执行节点可以根据不同的检索执行指令同时对待检索数据进行不同类型的 检索。 与现有技术提供的方法相比, 本发明实施例提供的方法在同一个系统 中同时提供了全文检索能力和结构化检索能力, 不需要使用两个独立的检索 系统进行检索, 从而筒化了检索步骤, 提高的检索效率。  A method for data storage and retrieval provided by an embodiment of the present invention can store data that needs to be full-text searched and data that does not need to be full-text searched separately with different types of data nodes in a parallel database system architecture. The distributed storage of data reduces the redundancy of data storage compared with the unified storage method of the prior art; meanwhile, different types of retrieval are sent to different types of data nodes according to the retrieval request sent by the client. The instruction realizes that the retrieval execution node can perform different types of retrieval on the data to be retrieved according to different retrieval execution instructions. Compared with the method provided by the prior art, the method provided by the embodiment of the present invention simultaneously provides the full-text search capability and the structured search capability in the same system, and does not need to use two independent retrieval systems for searching, thereby Search steps to improve retrieval efficiency.
实施例 2 基于实施例 1 中描述的全文检索数据节点和底层数据库节点的第一种分 布方式, 本发明实施例提供了一种进行数据存储的方法, 该方法有控制节点 实现, 如图 3所示, 该方法包括: Example 2 Based on the first distribution manner of the full-text search data node and the underlying database node described in Embodiment 1, the embodiment of the present invention provides a method for performing data storage, which is implemented by a control node, as shown in FIG. Methods include:
301、 接收客户端发送的数据存储请求。  301. Receive a data storage request sent by the client.
302、 根据所述数据存储请求创建数据表。  302. Create a data table according to the data storage request.
其中, 所述数据表的有关描述与所述步骤 102 中的有关描述相同, 本发 明实施例对此不再赘述。  The description of the data table is the same as that described in the foregoing step 102, and details are not described herein again.
303、 根据所述数据表的元数据确定所述数据表的全文检索字段。  303. Determine a full-text search field of the data table according to metadata of the data table.
其中, 所述元数据的有关描述以及实现根据所述数据表的元数据确定所 述数据表的全文检索字段的方法与上述步骤 103 中的有关描述相同, 本发明 实施例对此不再赘述。  The description of the metadata and the method for determining the full-text search field of the data table according to the metadata of the data table are the same as those described in the foregoing step 103, and the details are not described herein.
304、 根据所述全文检索字段, 向所述全文检索数据节点发送第一数据存 储指令; 所述第一数据存储指令用于指示所述将所述全文检索数据节点存储 所述全文检索字段对应的数据。  304. Send, according to the full-text search field, a first data storage instruction to the full-text search data node, where the first data storage instruction is used to indicate that the full-text search data node stores the corresponding full-text search field data.
305、 根据所述数据表全文检索字段以外的字段, 向所述底层数据库节点 发送第二数据存储指令; 所述第二数据存储指令用于指示所述底层数据库节 点存储所述数据表全文检索字段以外的其他字段对应的数据。  305. Send a second data storage instruction to the bottom database node according to a field other than the full-text search field of the data table. The second data storage instruction is used to instruct the bottom database node to store the data table full-text search field. Data corresponding to other fields.
进一步, 值得说明的是, 此处记载的步骤 304和 305在执行顺序上不要 求明显的先后顺序。  Further, it is worth noting that steps 304 and 305 described herein do not require a significant order of execution in the order of execution.
进一步, 值得说明的是, 在执行上述步骤 303 的过程中, 若确定无需要 全文检索的字段则只执行步骤 305 即可。 若确定全部都需要进行全文检索字 段时, 则只执行步骤 304即可。  Further, it is worth noting that, in the process of performing the above step 303, if it is determined that there is no field that requires full-text search, only step 305 is performed. If it is determined that all of the full-text search fields are required, then only step 304 can be performed.
另外, 值得说明的是, 在客户端数量比较大, 待存储的数据量比较大时, 在如实施例 1 中所述的并行数据库系统框架中会设置有数据加载服务器, 包 括全文检索数据加载服务器和底层数据库加载服务器。 其中, 所述全文检索 数据加载服务器用于在控制节点的控制下, 将数据表中需要进行全文检索字 段对应的数据加载到全文检索数据节点, 以使得全文检索数据节点存储相应 数据; 底层数据库加载服务器用于在控制节点的控制下, 将数据表中不需要 进行全文检索字段对应的数据加载到底层数据库节点, 以使得底层数据库节 点存储相应数据。 In addition, it is worth noting that when the number of clients is relatively large and the amount of data to be stored is relatively large, a data loading server, including a full-text retrieval data loading server, is provided in the parallel database system framework as described in Embodiment 1. And the underlying database load server. The full-text search data loading server is configured to load data corresponding to the full-text search field in the data table to the full-text search data node under the control of the control node, so that the full-text search data node stores corresponding The underlying database loading server is configured to load data corresponding to the full-text search field in the data table to the underlying database node under the control of the control node, so that the underlying database node stores the corresponding data.
此时, 所述步骤 304 的具体步骤则包括: 控制节点将数据表按列切分, 将需要全文检索的列字段划分到一起, 然后根据划分好的字段生成相应的数 据加载任务发送至全文检索数据加载服务器中, 全文检索数据加载服务器根 据预设置的分布策略, 将需要全文检索的列字段对应的数据分配给不同的全 文检索数据节点进行存储。  At this time, the specific step of the step 304 includes: the control node divides the data table into columns, divides the column fields that need full-text search, and then generates corresponding data loading tasks according to the divided fields and sends them to the full-text search. In the data loading server, the full-text search data loading server allocates data corresponding to the column fields that need full-text search to different full-text search data nodes for storage according to a preset distribution strategy.
其中, 所述全文检索数据加载任务, 是将应用的非结构化数据或需要支 持全文检索的数据存储到全文检索服务器的过程, 通常是指创建倒排索引的 过程。 如果是结构化数据, 则直接通过分词、 过滤、 创建倒排索引表, 否则 还包含数据信息提取过程。 全文检索的数据加载没有严格的行业标准, 通常 对外开放私有的 API , 完成数据索引的创建。  The full-text search data loading task is a process of storing unstructured data of an application or data that needs to support full-text search to a full-text search server, and generally refers to a process of creating an inverted index. If it is structured data, directly through the word segmentation, filtering, creating an inverted index table, otherwise it also contains the data information extraction process. There is no strict industry standard for data loading in full-text search. Generally, the private API is opened to the outside world, and the creation of data index is completed.
其中, 预设置的分布策略可以设置为不同的数据节点存储不同地区的信 息, 例如, 节点 1存储北京的信息, 节点 2存储上海的信息等。 分布策略可 根据实际需要进行设置, 本发明实施例对此不进行限制。  The pre-set distribution policy may be set to store different information of different regions by different data nodes, for example, node 1 stores information in Beijing, node 2 stores information in Shanghai, and the like. The distribution policy can be set according to actual needs, which is not limited by the embodiment of the present invention.
同样的, 而所述步骤 305 的具体步骤则包括: 控制节点将数据表按列切 分, 将不需要全文检索的列字段划分到一起, 然后根据划分好的字段生成相 应的数据加载任务发送至底层数据加载数据库加载服务器中, 底层数据库加 载服务器根据预设置的分布策略, 将不需要全文检索的列字段对应的数据分 配给不同的底层数据库节点进行存储。  Similarly, the specific steps of the step 305 include: the control node divides the data table into columns, divides the column fields that do not need full-text search, and then generates corresponding data loading tasks according to the divided fields. In the underlying data loading database loading server, the underlying database loading server allocates data corresponding to the column fields that do not need full-text retrieval to different underlying database nodes for storage according to a preset distribution strategy.
其中, 底层数据库数据加载任务, 是通过标准的 SQL加载语句 COPY FROM 完成应用的具体数据在底层数据库的存储,例如:  The underlying database data loading task is to store the specific data of the application in the underlying database through the standard SQL loading statement COPY FROM, for example:
COPY tab leA FROM ' /opt/data/data info. tb l' DELIMITERS ' \ ' \ 其中, tabl eA表示需要加载的数据表, /opt/data/data inf o. tbl 表示待 加载数据的具体路径, 表示一行数据的多个字段之间的分隔符。 具体参 见 SQL标准 99。如果在创建表的过程中创建了索引, 则在数据加载的过程中, 同时还需要完成数据表的索引生成。 COPY tab leA FROM ' /opt/data/data info. tb l' DELIMITERS ' \ ' \ where tabl eA represents the data table to be loaded, /opt/data/data inf o. tbl indicates the specific path of the data to be loaded. A separator between multiple fields representing a row of data. See SQL Standard 99 for details. If an index is created during the creation of the table, during the data loading process, At the same time, it is necessary to complete the index generation of the data table.
另外, 值得说明的是, 为保证全文检索数据节点的检索结果和底层数据 库节点的检索结果能够进行合并, 在进行数据加载任务生成的过程中, 还需 要确定哪些字段是主钥匙字段, 即 mas ter key字段, 这些字段用于表示存储 在不同的节点内的数据之间的关联关系。 具体在进行数据存储时, 全文检索 数据节点除了要存储需要全文检索的字段对应的数据, 还必须存储主钥匙字 段对应的数据, 底层数据库节点的存储方法类似。  In addition, it is worth noting that in order to ensure that the retrieval result of the full-text retrieval data node and the retrieval result of the underlying database node can be merged, in the process of generating the data loading task, it is also necessary to determine which fields are the primary key fields, that is, mas ter Key field, these fields are used to represent the association between data stored in different nodes. Specifically, when performing data storage, the full-text search data node must store the data corresponding to the field of the main key field in addition to the data corresponding to the field that needs full-text search, and the storage method of the underlying database node is similar.
本发明实施例还提供了一种进行数据存储的方法, 该方法应用于如实施 例中描述的全文检索数据节点和底层数据库节点的第二种分布方式中,如图 4 所示, 在此方法中, 在所述步骤 303根据所述数据表的元数据确定所述数据 表的全文检索字段之前, 还包括步骤 306将所述数据表按行切分; 则所述步 骤 307替换, 所述步骤 307为对每行需要存储的数据, 根据所述数据表的元 数据确定所述每行的全文检索字段。 其它步骤的执行与本实施例中基于全文 检索数据节点和底层数据库节点的第一种分布方式对应的进行数据存储的方 法相同。  An embodiment of the present invention further provides a method for performing data storage, where the method is applied to a second distribution manner of a full-text search data node and an underlying database node as described in the embodiment, as shown in FIG. 4, in this method. Before the determining, by the step 303, the full-text search field of the data table according to the metadata of the data table, the method further includes: step 306, the data table is divided into rows; then the step 307 is replaced, the step is 307 is data that needs to be stored for each row, and the full-text search field of each row is determined according to the metadata of the data table. The execution of the other steps is the same as the method of performing data storage corresponding to the first distribution mode of the full-text search data node and the underlying database node in this embodiment.
本发明实施例提供的一种进行数据存储的方法, 可以在并行数据库系统 架构中, 对需要进行全文检索的数据和不需要进行全文检索的数据分别存储 与不同类型的数据节点上, 实现了对数据的分布式存储, 相对于现有技术的 统一存储方式来说, 降低了数据存储的冗余度。  A method for performing data storage according to an embodiment of the present invention may be implemented in a parallel database system architecture, where data required for full-text retrieval and data not required for full-text retrieval are separately stored on different types of data nodes. The distributed storage of data reduces the redundancy of data storage compared to the unified storage method of the prior art.
实施例 3  Example 3
在本实施例中, 控制节点创建的数据表以如下表 5所示的数据表来表示, 该数据表表名为 A, 共有三个不同的字段: id、 name, comment , 其中, id为 mas ter key字段, comment字段需要全文检索。 id字段和 name字段存储在底 层数据库节点中, id字段和 co匪 ent字段存储在全文检索节点中。  In this embodiment, the data table created by the control node is represented by a data table as shown in Table 5 below. The data table name is A, and there are three different fields: id, name, comment, where id is mas Ter key field, comment field requires full-text search. The id field and the name field are stored in the underlying database node, and the id field and the co ent field are stored in the full-text retrieval node.
表 5 待检索数据表  Table 5 Data table to be retrieved
Id Name Comment 1 Notor ious Following the convict ion of her German father for treason against the U. S. , Alicia Huberman takes to drink and men. She is approached by a government agent (T. R. Devi in) who asks her to spy on a group of her father ' s Nazi friends operating out of Rio de Janer io.Id Name Comment 1 Notor ious Following the convict ion of her German father for treason against US , Alicia Huberman takes to drink and men. She is approached by a government agent (TR Devi in) who asks her to spy on a group of her father ' s Nazi friends operating out of Rio de Janer io.
2 Ti tanic Fictional romantic tale of a rich girl and poor boy who meet on the ill-fated voyage of the 'uns inkable' ship, Kate Wins let Leonardo Dicapr io , Billy Zane 2 Ti tanic Fictional romantic tale of a rich girl and poor boy who meet on the ill-fated voyage of the 'uns inkable' ship, Kate Wins let Leonardo Dicapr io , Billy Zane
3 Gift The story of how an economic French shop keeper and amateur film maker attempted to locate , only have the artist turn the camera back on its owner. A flawless chime of romantic and reality  3 Gift The story of how an economic French shop keeper and amateur film maker attempted to locate , only have the artist turn the camera back on its owner. A flawless chime of romantic and reality
基于该待检索数据表, 本发明实施例提供的进行数据检索的方法, 如图 5 所示, 包括:  Based on the data to be retrieved, the method for performing data retrieval provided by the embodiment of the present invention, as shown in FIG. 5, includes:
401、 控制节点接收客户端发送的检索请求。 在本实施例中, 该检索请求 对应的语句为 Select id, name from A where comment Like ' roman' group by name, 该语句意思为从表 A 中查询 "comment" 包含 " roman" 对应的 id 和 name, 并用 name进行 4非序。  401. The control node receives a retrieval request sent by the client. In this embodiment, the query corresponding to the search request is Select id, name from A where comment Like 'roman' group by name, and the statement means that the query "comment" contains the id and name corresponding to "roman" from the table A. And use name to perform 4 unordered.
402、 控制节点根据所述检索请求, 确定待检索字段和待检索数据表。 402. The control node determines, according to the retrieval request, a field to be retrieved and a data table to be retrieved.
403、 控制节点获取待检索数据表的元数据, 并根据所述待检索数据表的 元数据, 确定所述待检索字段中的全文检索字段。 403. The control node acquires metadata of the data table to be retrieved, and determines a full-text search field in the to-be-searched field according to the metadata of the data table to be retrieved.
404、 控制节点根据所述待检索字段中的全文检索字段, 向所述全文检索 数据节点发送第一检索指令。 所述底层数据库节点发送第二检索指令。  404. The control node sends a first retrieval instruction to the full-text retrieval data node according to the full-text search field in the to-be-retrieved field. The underlying database node sends a second retrieval instruction.
406、 全文检索数据节点根据所述第一检索指令, 对全文检索数据节点内 存储的待检索数据进行检索, 得到全文检索节点检索结果。 407、 底层数据库节点根据所述第二检索指令, 对底层数据库节点存储的 待检索数据进行检索, 得到底层数据库节点检索结果。 406. The full-text search data node searches the to-be-searched data stored in the full-text search data node according to the first search instruction, and obtains a full-text search node search result. 407. The bottom database node searches the data to be retrieved stored by the underlying database node according to the second retrieval instruction, and obtains a retrieval result of the underlying database node.
在本实施例中, 全文检索节点检索结果如下表 6 所示, 底层数据库节点 检索结果如下表 7所示。  In the present embodiment, the retrieval results of the full-text search nodes are shown in Table 6 below, and the results of the underlying database nodes are as shown in Table 7 below.
表 6 全文检索节点检索结果  Table 6 Full-text search node search results
Id  Id
3  3
2  2
表 7 底层数据库节点检索结果  Table 7 Underlying database node search results
Figure imgf000017_0001
Figure imgf000017_0001
408、 控制节点接收全文检索数据节点发送的全文检索节点检索结果和底 层数据库节点发送的底层数据库节点检索结果。  408. The control node receives the full-text search node search result sent by the full-text search data node and the bottom-level database node search result sent by the underlying database node.
409、 控制节点对接收到的全文检索节点发送的全文检索节点检索结果和 底层数据库节点发送的底层数据库节点检索结果进行汇聚处理, 得到完整的 检索结果。 在本实施例中, 此处得到的完整检索结果如下表 8所示。  409. The control node aggregates the retrieval result of the full-text retrieval node sent by the received full-text retrieval node and the retrieval result of the underlying database node sent by the underlying database node, to obtain a complete retrieval result. In the present embodiment, the complete search results obtained here are shown in Table 8 below.
表 8 完整的检索结果  Table 8 Complete search results
Figure imgf000017_0002
Figure imgf000017_0002
410、 控制节点将所述完整的的检索结果发送至客户端。  410. The control node sends the complete search result to the client.
需要说明的是, 上述表 6至表 8所述的检索结果是基于实施例 1 中描述 的第一种分布方式产生的检索结果, 即全文检索节点对应的池返回给控制节 点一个全文检索结果, 底层数据库节点对应的池返回给控制节点一个底层数 据检索结果, 由控制节点对这两类检索结果进行汇聚处理。 若基于实施例 1 中描述的第二种分布方式, 则每个数据检索节点都会返 回自己的检索结果。 若设置有数据检索节点 1、数据检索节点 2和数据检索节 点 3 , 每个数据检索节点中各有一个全文检索数据节点和一个底层数据库节 点, 则三个数据检索节点的检索结果可以如下表 9至表 11所示。 表 9 数据检索节点 1检索结果: It should be noted that the search results described in Tables 6 to 8 above are the search results generated based on the first distribution manner described in Embodiment 1, that is, the pool corresponding to the full-text search node returns a full-text search result to the control node. The pool corresponding to the underlying database node returns an underlying data retrieval result to the control node, and the control node aggregates the two types of retrieval results. According to the second distribution method described in Embodiment 1, each data retrieval node returns its own retrieval result. If the data retrieval node 1, the data retrieval node 2 and the data retrieval node 3 are provided, and each data retrieval node has a full-text retrieval data node and an underlying database node, the retrieval results of the three data retrieval nodes can be as follows: To the table shown in Table 11. Table 9 Data retrieval node 1 search results:
Figure imgf000018_0001
Figure imgf000018_0001
表 10 数据检索节点 2检索结果:  Table 10 Data Retrieval Node 2 Search Results:
Figure imgf000018_0002
Figure imgf000018_0002
表 11 数据检索节点 3检索结果:  Table 11 Data Retrieval Node 3 Search Results:
Figure imgf000018_0003
Figure imgf000018_0003
此时, 控制节点在接收到这三个数据检索节点发送的检索结果后, 会对 这三个检索结果进行汇聚处理得到完整的检索结果。 检索结果与上述表 8 一 样。  At this time, after receiving the search results sent by the three data retrieval nodes, the control node aggregates the three search results to obtain a complete search result. The search results are the same as in Table 8 above.
本发明实施例提供的一种进行数据检索的方法, 可以在并行数据库系统 架构中, 根据客户端发送的检索请求, 向不同类型的数据节点发送不同类型 的检索指令, 实现了检索执行节点可以根据不同的检索执行指令同时对待检 索数据进行不同类型的检索。 与现有技术提供的方法相比, 本发明实施例提 供的方法在同一个系统中同时提供了全文检索能力和结构化检索能力, 不需 要使用两个独立的检索系统进行检索, 从而筒化了检索步骤, 提高的检索效 率。 A method for data retrieval provided by an embodiment of the present invention can send different types of retrieval instructions to different types of data nodes according to a retrieval request sent by a client in a parallel database system architecture, so that the retrieval execution node can be Different retrieval execution instructions simultaneously perform different types of retrieval on the retrieved data. Compared with the method provided by the prior art, the method provided by the embodiment of the present invention simultaneously provides the full-text search capability and the structured search capability in the same system, and does not need to use two independent retrieval systems for searching, thereby Search step, improved search efficiency Rate.
实施例 4  Example 4
本发明实施例提供了一种用于进行数据存储的控制节点, 可以应用于并 行数据库系统中, 所述并行数据库系统包括所述控制节点和数据节点; 所述 数据节点包括全文检索数据节点和底层数据库节点; 如图 6 所示, 所述控制 节点包括: 接收单元 51、 创建单元 52、 确定单元 53、 发送单元 54。  An embodiment of the present invention provides a control node for performing data storage, which may be applied to a parallel database system, where the parallel database system includes the control node and a data node; and the data node includes a full-text search data node and a bottom layer. As shown in FIG. 6, the control node includes: a receiving unit 51, a creating unit 52, a determining unit 53, and a sending unit 54.
接收单元 51 , 用于接收客户端发送的数据存储请求。  The receiving unit 51 is configured to receive a data storage request sent by the client.
创建单元 52 ,用于根据所述接收单元 51接收到的数据存储请求创建数据 表。  The creating unit 52 is configured to create a data table according to the data storage request received by the receiving unit 51.
确定单元 53 ,用于根据所述创建单元 52创建的数据表的元数据确定所述 数据表的全文检索字段。  The determining unit 53 is configured to determine a full-text search field of the data table according to the metadata of the data table created by the creating unit 52.
发送单元 54 , 用于根据所述确定单元 53确定的全文检索字段, 向所述全 文检索数据节点发送第一数据存储指令; 所述第一数据存储指令用于指示所 述将所述全文检索数据节点存储所述全文检索字段对应的数据; 根据所述确 定单元 53确定的数据表全文检索字段以外的字段, 向所述底层数据库节点发 送第二数据存储指令; 所述第二数据存储指令用于指示所述底层数据库节点 存储所述数据表全文检索字段以外的其他字段对应的数据。  The sending unit 54 is configured to send, according to the full-text search field determined by the determining unit 53, a first data storage instruction to the full-text search data node, where the first data storage instruction is used to indicate that the full-text search data is to be The node stores data corresponding to the full-text search field; and sends a second data storage instruction to the bottom database node according to a field other than the data table full-text search field determined by the determining unit 53; the second data storage instruction is used for Instructing the underlying database node to store data corresponding to other fields than the full-text search field of the data table.
可选的是, 所述确定单元 53具体用于在所述数据表的字段存储的数据超 过所述字段的最大长度时, 确定所述字段为全文检索字段; 在所述数据表中 有根据索引字段中的关键字进行检索时, 确定所述字段为全文检索字段。  Optionally, the determining unit 53 is specifically configured to: when the data stored in the field of the data table exceeds a maximum length of the field, determine that the field is a full-text search field; and have an index according to the index in the data table. When the keywords in the field are retrieved, it is determined that the field is a full-text search field.
可选的是, 如图 7所示, 所述控制节点还包括: 切分单元 55。 数据表的全文检索字段之前, 将所述数据表按行切分。  Optionally, as shown in FIG. 7, the control node further includes: a segmentation unit 55. Before the full-text search field of the data table, the data table is segmented by row.
所述确定单元 53 , 还用于对每行需要存储的数据, 根据所述数据表的元 数据确定所述每行的全文检索字段。  The determining unit 53 is further configured to determine, for each row of data, a full-text search field of each row according to metadata of the data table.
可选的是, 所述第一数据存储指令用于指示所述将所述全文检索数据节 点存储所述全文检索字段对应的数据, 具体包括: 指示所述全文检索数据节 点为所述全文检索字段创建索引或者更新所述全文检索数据节点存储的索 引。 Optionally, the first data storage instruction is used to indicate that the full-text search data node stores data corresponding to the full-text search field, and specifically includes: indicating the full-text search data section The point creates an index for the full-text search field or updates an index stored by the full-text search data node.
本发明实施例还提供了一种用于进行数据检索的控制节点, 可以应用于 并行数据库系统中, 所述并行数据库系统包括控制节点和数据节点; 所述数 据节点包括全文检索数据节点和底层数据库节点; 如图 8 所示, 所述控制节 点包括: 接收单元 61、 获取单元 62、 确定单元 63、 发送单元 64、 接收单元 65、 汇聚单元 66。  The embodiment of the present invention further provides a control node for performing data retrieval, which can be applied to a parallel database system, where the parallel database system includes a control node and a data node; and the data node includes a full-text search data node and an underlying database. As shown in FIG. 8, the control node includes: a receiving unit 61, an obtaining unit 62, a determining unit 63, a sending unit 64, a receiving unit 65, and a convergence unit 66.
接收单元 61 , 用于接收客户端发送的检索请求。  The receiving unit 61 is configured to receive a retrieval request sent by the client.
获取单元 62 , 用于根据所述接收单元 61接收到的检索请求, 获取待检索 字段和待检索数据表。  The obtaining unit 62 is configured to obtain a to-be-retrieved field and a to-be-retrieved data table according to the retrieval request received by the receiving unit 61.
确定单元 63 , 用于根据所述检索请求, 确定待检索字段和待检索数据表; 根据所述待检索数据表的元数据, 确定所述待检索字段中的全文检索字段。  a determining unit 63, configured to determine, according to the retrieval request, a field to be retrieved and a data table to be retrieved; and determine, according to the metadata of the data table to be retrieved, a full-text search field in the field to be retrieved.
发送单元 64 ,用于根据所述确定单元 63确定的待检索字段中的全文检索 字段, 向所述全文检索数据节点发送第一检索指令, 所述第一检索指令用于  The sending unit 64 is configured to send, according to the full-text search field in the to-be-retrieved field determined by the determining unit 63, a first retrieving instruction, where the first retrieving instruction is used,
向所述底层数据库节点发送第二检索指令, 所述第二检索指令用于指示所述 数据。 Sending a second retrieval instruction to the underlying database node, the second retrieval instruction being used to indicate the data.
接收单元 65 , 用于接收全文检索数据节点和底层数据库节点各自返回的 检索结果。  The receiving unit 65 is configured to receive a search result returned by each of the full-text search data node and the bottom database node.
汇聚单元 66 ,用于将所述接收单元 65接收到的全文检索数据节点和底层 数据库节点各自返回的检索结果进行汇聚处理。  The aggregation unit 66 is configured to perform a convergence process on the search results returned by the full-text search data node and the bottom database node received by the receiving unit 65.
所述发送单元 64 , 还用于将汇聚结果作为完整的检索结果返回给所述客 户端。  The sending unit 64 is further configured to return the convergence result to the client as a complete retrieval result.
可选的是, 所述确定单元 63具体用于在所述待检索字段中存在至少一个 字段在所述待检索数据表中存储的数据超过所述至少一个字段的最大长度, 确定所述至少一个字段为全文检索字段; 在所述待检索字段中有需要根据索 引字段中的关键字进行检索的字段时, 确定所述需要根据索引字段中的关键 字进行检索的字段为全文检索字段。 Optionally, the determining unit 63 is configured to: at least one field in the to-be-retrieved field, where the data stored in the to-be-retrieved data table exceeds a maximum length of the at least one field, Determining that the at least one field is a full-text search field; when there is a field in the to-be-searched field that needs to be retrieved according to a keyword in the index field, determining that the field that needs to be retrieved according to a keyword in the index field is the full text Retrieve the field.
本发明实施例提供的一种进行数据存储和检索的装置, 可以在并行数据 库系统架构中, 对需要进行全文检索的数据和不需要进行全文检索的数据分 别存储与不同类型的数据节点上, 实现了对数据的分布式存储, 相对于现有 技术的统一存储方式来说, 降低了数据存储的冗余度; 同时, 根据客户端发 送的检索请求, 向不同类型的数据节点发送不同类型的检索指令, 实现了检 索执行节点可以根据不同的检索执行指令同时对待检索数据进行不同类型的 检索。 与现有技术提供的方法相比, 本发明实施例提供的方法在同一个系统 中同时提供了全文检索能力和结构化检索能力, 不需要使用两个独立的检索 系统进行检索, 从而筒化了检索步骤, 提高的检索效率。  The device for storing and retrieving data according to an embodiment of the present invention can store data that needs to be full-text searched and data that does not need to be full-text searched in different types of data nodes in a parallel database system architecture. The distributed storage of data reduces the redundancy of data storage compared with the unified storage method of the prior art; meanwhile, different types of retrieval are sent to different types of data nodes according to the retrieval request sent by the client. The instruction realizes that the retrieval execution node can perform different types of retrieval on the data to be retrieved according to different retrieval execution instructions. Compared with the method provided by the prior art, the method provided by the embodiment of the present invention simultaneously provides the full-text search capability and the structured search capability in the same system, and does not need to use two independent retrieval systems for searching, thereby Search steps to improve retrieval efficiency.
实施例 5  Example 5
本发明实施例提供了一种用于进行数据存储的控制节点, 可以应用于并 行数据库系统中, 所述并行数据库系统包括所述控制节点和数据节点; 所述 数据节点包括全文检索数据节点和底层数据库节点; 如图 9 所示, 所述控制 节点包括: 处理器 71、 存储器 72。  An embodiment of the present invention provides a control node for performing data storage, which may be applied to a parallel database system, where the parallel database system includes the control node and a data node; and the data node includes a full-text search data node and a bottom layer. The database node; as shown in FIG. 9, the control node includes: a processor 71 and a memory 72.
处理器 71 , 用于接收客户端发送的数据存储请求; 根据所述数据存储请 求创建数据表; 根据所述数据表的元数据确定所述数据表的全文检索字段; 根据所述全文检索字段, 向所述全文检索数据节点发送第一数据存储指令; 所述第一数据存储指令用于指示所述将所述全文检索数据节点存储所述全文 检索字段对应的数据; 根据所述数据表全文检索字段以外的字段, 向所述底 层数据库节点发送第二数据存储指令; 所述第二数据存储指令用于指示所述 底层数据库节点存储所述数据表全文检索字段以外的其他字段对应的数据。  The processor 71 is configured to receive a data storage request sent by the client, create a data table according to the data storage request, and determine a full-text search field of the data table according to the metadata of the data table; according to the full-text search field, Transmitting, to the full-text search data node, a first data storage instruction, where the first data storage instruction is used to indicate that the full-text search data node stores data corresponding to the full-text search field; a field other than the field, sending a second data storage instruction to the underlying database node; the second data storage instruction is configured to instruct the bottom database node to store data corresponding to a field other than the full-text search field of the data table.
存储器 72 , 用于存储数据存储请求、 数据表的元数据、 第一数据存储指 令和第二数据存储指令。  The memory 72 is configured to store a data storage request, metadata of the data table, a first data storage instruction, and a second data storage instruction.
可选的是, 所述处理器 71具体用于在所述数据表的字段存储的数据超过 所述字段的最大长度时, 确定所述字段为全文检索字段; 在所述数据表中有 根据索引字段中的关键字进行检索时, 确定所述字段为全文检索字段。 Optionally, the processor 71 is specifically configured to store data in a field of the data table. When the maximum length of the field is determined, the field is determined to be a full-text search field; when the data table is searched according to a keyword in the index field, the field is determined to be a full-text search field.
可选的是, 所述处理器还用于在根据所述数据表的元数据确定所述数据 表的全文检索字段之前, 将所述数据表按行切分。 对每行需要存储的数据, 根据所述数据表的元数据确定所述每行的全文检索字段。  Optionally, the processor is further configured to divide the data table into rows before determining the full-text search field of the data table according to the metadata of the data table. For each row of data that needs to be stored, the full-text search field of each row is determined based on the metadata of the data table.
可选的是, 所述第一数据存储指令用于指示所述将所述全文检索数据节 点存储所述全文检索字段对应的数据, 具体包括: 指示所述全文检索数据节 点为所述全文检索字段创建索引或者更新所述全文检索数据节点存储的索 引。  Optionally, the first data storage instruction is used to indicate that the full-text search data node stores the data corresponding to the full-text search field, and specifically includes: indicating that the full-text search data node is the full-text search field Create an index or update an index stored by the full-text search data node.
本发明实施例还提供了一种用于进行数据检索的控制节点, 可以应用于 并行数据库系统中, 所述并行数据库系统包括控制节点和数据节点; 所述数 据节点包括全文检索数据节点和底层数据库节点; 如图 10所示, 所述控制节 点包括: 处理器 73和存储器 74。  The embodiment of the present invention further provides a control node for performing data retrieval, which can be applied to a parallel database system, where the parallel database system includes a control node and a data node; and the data node includes a full-text search data node and an underlying database. As shown in FIG. 10, the control node includes: a processor 73 and a memory 74.
所述处理器 73 , 用于接收客户端发送的检索请求; 根据所述检索请求, 确定待检索字段和待检索数据表; 获取待检索数据表的元数据, 并根据所述 待检索数据表的元数据, 确定所述待检索字段中的全文检索字段; 根据所述 待检索字段中的全文检索字段, 向所述全文检索数据节点发送第一检索指令, 所述第一检索指令用于指示所述全文检索数据节点检索所述待检索字段中的 字段, 向所述底层数据库节点发送第二检索指令, 所述第二检索指令用于指 对应的数据; 接收全文检索数据节点和底层数据库节点各自返回的检索结果, 并将全文检索数据节点和底层数据库节点各自返回的检索结果进行汇聚处 理, 将汇聚结果作为完整的检索结果返回给所述客户端。  The processor 73 is configured to receive a retrieval request sent by the client, determine, according to the retrieval request, a field to be retrieved and a data table to be retrieved, obtain metadata of the data table to be retrieved, and according to the data table to be retrieved Metadata, determining a full-text search field in the to-be-searched field; sending, according to the full-text search field in the to-be-retrieved field, a first search instruction to the full-text search data node, where the first search instruction is used to indicate The full-text search data node retrieves a field in the to-be-retrieved field, and sends a second retrieval instruction to the underlying database node, where the second retrieval instruction is used to refer to corresponding data; receiving a full-text retrieval data node and an underlying database node The returned search result is aggregated and the search result returned by the full-text search data node and the underlying database node is aggregated, and the aggregated result is returned to the client as a complete search result.
所述存储器 74 , 用于存储检索请求、 所述待检索数据表的元数据、 第一 检索指令、 第二检索指令和检索结果.  The memory 74 is configured to store a retrieval request, metadata of the data table to be retrieved, a first retrieval instruction, a second retrieval instruction, and a retrieval result.
可选的是, 所述处理器 73具体用于在所述待检索字段中存在至少一个字 段在所述待检索数据表中存储的数据超过所述至少一个字段的最大长度, 确 定所述至少一个字段为全文检索字段; 在所述待检索字段中有需要根据索引 字段中的关键字进行检索的字段时, 确定所述需要根据索引字段中的关键字 进行检索的字段为全文检索字段。 Optionally, the processor 73 is specifically configured to have at least one word in the to-be-retrieved field. The data stored in the data table to be retrieved exceeds the maximum length of the at least one field, and the at least one field is determined to be a full-text search field; in the field to be retrieved, it is required to perform a keyword according to the index field. When the field is retrieved, it is determined that the field that needs to be retrieved according to the keyword in the index field is a full-text search field.
本发明实施例提供的一种进行数据存储和检索的装置, 可以在并行数据 库系统架构中, 对需要进行全文检索的数据和不需要进行全文检索的数据分 别存储与不同类型的数据节点上, 实现了对数据的分布式存储, 相对于现有 技术的统一存储方式来说, 降低了数据存储的冗余度; 同时, 根据客户端发 送的检索请求, 向不同类型的数据节点发送不同类型的检索指令, 实现了检 索执行节点可以根据不同的检索执行指令同时对待检索数据进行不同类型的 检索。 与现有技术提供的方法相比, 本发明实施例提供的方法在同一个系统 中同时提供了全文检索能力和结构化检索能力, 不需要使用两个独立的检索 系统进行检索, 从而筒化了检索步骤, 提高的检索效率。  The device for storing and retrieving data according to an embodiment of the present invention can store data that needs to be full-text searched and data that does not need to be full-text searched in different types of data nodes in a parallel database system architecture. The distributed storage of data reduces the redundancy of data storage compared with the unified storage method of the prior art; meanwhile, different types of retrieval are sent to different types of data nodes according to the retrieval request sent by the client. The instruction realizes that the retrieval execution node can perform different types of retrieval on the data to be retrieved according to different retrieval execution instructions. Compared with the method provided by the prior art, the method provided by the embodiment of the present invention simultaneously provides the full-text search capability and the structured search capability in the same system, and does not need to use two independent retrieval systems for searching, thereby Search steps to improve retrieval efficiency.
通过以上的实施方式的描述, 所属领域的技术人员可以清楚地了解到本 发明可借助软件加必需的通用硬件的方式来实现, 当然也可以通过硬件, 但 很多情况下前者是更佳的实施方式。 基于这样的理解, 本发明的技术方案本 质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来, 该 计算机软件产品存储在可读取的存储介质中, 如计算机的软盘, 硬盘或光盘 等, 包括若干指令用以使得一台计算机设备(可以是个人计算机, 服务器, 或者网络设备等)执行本发明各个实施例所述的方法。  Through the description of the above embodiments, those skilled in the art can clearly understand that the present invention can be implemented by means of software plus necessary general hardware, and of course, by hardware, but in many cases, the former is a better implementation. . Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a readable storage medium, such as a floppy disk of a computer. A hard disk or optical disk or the like includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present invention.
以上所述, 仅为本发明的具体实施方式, 但本发明的保护范围并不局限 于此, 任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可轻易 想到变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发明的保护 范围应以所述权利要求的保护范围为准。  The above is only the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the appended claims.

Claims

权利要求 书 Claim
1、 一种进行数据存储的方法, 其特征在于, 所述方法应用于并行数据库系 统中, 所述并行数据库系统包括控制节点和数据节点; 所述数据节点包括全文 检索数据节点和底层数据库节点; 所述方法包括: A method for performing data storage, the method being applied to a parallel database system, the parallel database system comprising a control node and a data node; the data node comprising a full-text search data node and an underlying database node; The method includes:
所述控制节点接收客户端发送的数据存储请求;  Receiving, by the control node, a data storage request sent by the client;
根据所述数据存储请求创建数据表;  Creating a data table according to the data storage request;
根据所述数据表的元数据确定所述数据表的全文检索字段;  Determining a full-text search field of the data table according to metadata of the data table;
根据所述全文检索字段, 向所述全文检索数据节点发送第一数据存储指令; 所述第一数据存储指令用于指示所述将所述全文检索数据节点存储所述全文检 索字段对应的数据;  Transmitting, according to the full-text search field, a first data storage instruction to the full-text search data node, where the first data storage instruction is used to indicate that the full-text search data node stores data corresponding to the full-text search field;
根据所述数据表全文检索字段以外的字段, 向所述底层数据库节点发送第 二数据存储指令; 所述第二数据存储指令用于指示所述底层数据库节点存储所  Sending, according to the field other than the full-text search field of the data table, a second data storage instruction to the underlying database node; the second data storage instruction is used to indicate the underlying database node storage
2、 根据权利要求 1所述的方法, 其特征在于, 所述根据所述数据表的元数 据确定所述数据表的全文检索字段, 具体包括: The method according to claim 1, wherein the determining the full-text search field of the data table according to the metadata of the data table comprises:
如果所述数据表的字段存储的数据超过所述字段的最大长度, 则确定所述 字段为全文检索字段; 或者,  If the data stored in the field of the data table exceeds the maximum length of the field, determining that the field is a full-text search field; or
如果所述数据表中有根据索引字段中的关键字进行检索时, 则确定所述字 段为全文检索字段。  If the data table has a search based on a keyword in the index field, then the field is determined to be a full-text search field.
3、 根据权利要求 1或 2所述的方法, 其特征在于, 还包括: 数据表按行切分;  The method according to claim 1 or 2, further comprising: dividing the data table by row;
则根据所述数据表的元数据确定所述数据表的全文检索字段, 具体包括: 对每行需要存储的数据, 根据所述数据表的元数据确定所述每行的全文检 索字段。  Determining the full-text search field of the data table according to the metadata of the data table, specifically comprising: determining, for each row of data, a full-text search field of each row according to metadata of the data table.
4、 根据权利要求 1至 3任意一项所述的方法, 其特征在于, 所述第一数据 存储指令用于指示所述将所述全文检索数据节点存储所述全文检索字段对应的 数据, 具体包括: The method according to any one of claims 1 to 3, wherein the first data storage instruction is configured to instruct the storing the full-text search data node to correspond to the full-text search field The data specifically includes:
指示所述全文检索数据节点为所述全文检索字段创建索引或者更新所述全 文检索数据节点存储的索引。  Instructing the full-text search data node to create an index for the full-text search field or to update an index stored by the full-text search data node.
5、 一种进行数据检索的方法, 其特征在于, 应用于并行数据库系统中, 所 述并行数据库系统包括控制节点和数据节点; 所述数据节点包括全文检索数据 节点和底层数据库节点; 所述方法包括:  5. A method for performing data retrieval, characterized in that it is applied to a parallel database system, the parallel database system comprising a control node and a data node; the data node comprising a full-text search data node and an underlying database node; Includes:
所述控制节点接收客户端发送的检索请求;  The control node receives a retrieval request sent by a client;
根据所述检索请求, 确定待检索字段和待检索数据表;  Determining a field to be retrieved and a data table to be retrieved according to the retrieval request;
获取待检索数据表的元数据, 并根据所述待检索数据表的元数据, 确定所 述待检索字段中的全文检索字段;  Obtaining metadata of the data table to be retrieved, and determining a full-text search field in the to-be-retrieved field according to the metadata of the data table to be retrieved;
根据所述待检索字段中的全文检索字段, 向所述全文检索数据节点发送第 一检索指令, 所述第一检索指令用于指示所述全文检索数据节点检索所述待检 索字段中的全文检索字段对应的数据; 节点发送第二检索指令, 所述第二检索指令用于指示所述底层数据库节点检索 接收全文检索数据节点和底层数据库节点各自返回的检索结果, 并将全文 检索数据节点和底层数据库节点各自返回的检索结果进行汇聚处理, 将汇聚结 果作为完整的检索结果返回给所述客户端。  And sending, by the full-text search field in the to-be-retrieved field, a first search instruction to the full-text search data node, where the first search instruction is used to instruct the full-text search data node to retrieve a full-text search in the to-be-retrieved field The data corresponding to the field; the node sends a second retrieval instruction, where the second retrieval instruction is used to instruct the bottom database node to retrieve the retrieval result returned by the receiving full-text retrieval data node and the underlying database node, and retrieve the data node and the bottom layer in full text. The search results returned by the database nodes are aggregated, and the aggregated result is returned to the client as a complete search result.
6、 根据权利要求 5所述的方法, 其特征在于, 所述根据所述待检索数据表 的元数据, 确定所述待检索字段中的全文检索字段包括:  The method according to claim 5, wherein the determining, according to the metadata of the data table to be retrieved, the full-text search field in the to-be-retrieved field comprises:
如果所述待检索字段中存在至少一个字段在所述待检索数据表中存储的数 据超过所述至少一个字段的最大长度, 则确定所述至少一个字段为全文检索字 段; 或者,  If the data stored in the to-be-retrieved data table exceeds a maximum length of the at least one field, the at least one field is determined to be a full-text search field; or
如果所述待检索字段中有需要根据索引字段中的关键字进行检索的字段 时, 则确定所述需要根据索引字段中的关键字进行检索的字段为全文检索字段。  If there is a field in the to-be-searched field that needs to be retrieved according to a keyword in the index field, it is determined that the field that needs to be retrieved according to the keyword in the index field is a full-text search field.
7、 一种用于进行数据存储的控制节点, 其特征在于, 应用于并行数据库系 统中, 所述并行数据库系统包括所述控制节点和数据节点; 所述数据节点包括 全文检索数据节点和底层数据库节点; 所述控制节点包括: 7. A control node for performing data storage, characterized in that it is applied to a parallel database system In the system, the parallel database system includes the control node and the data node; the data node includes a full-text search data node and an underlying database node; and the control node includes:
接收单元, 用于接收客户端发送的数据存储请求;  a receiving unit, configured to receive a data storage request sent by the client;
创建单元, 用于根据所述接收单元接收到的数据存储请求创建数据表; 确定单元, 用于根据所述创建单元创建的数据表的元数据确定所述数据表 的全文检索字段;  a creating unit, configured to create a data table according to the data storage request received by the receiving unit, and a determining unit, configured to determine a full-text search field of the data table according to the metadata of the data table created by the creating unit;
发送单元, 用于根据所述确定单元确定的全文检索字段, 向所述全文检索 数据节点发送第一数据存储指令; 所述第一数据存储指令用于指示所述将所述 全文检索数据节点存储所述全文检索字段对应的数据; 根据所述确定单元确定 的数据表全文检索字段以外的字段, 向所述底层数据库节点发送第二数据存储 指令; 所述第二数据存储指令用于指示所述底层数据库节点存储所述数据表全 文检索字段以外的其他字段对应的数据。  a sending unit, configured to send, according to the full-text search field determined by the determining unit, a first data storage instruction to the full-text search data node, where the first data storage instruction is used to indicate that the full-text search data node is stored Data corresponding to the full-text search field; sending a second data storage instruction to the underlying database node according to a field other than the full-text search field of the data table determined by the determining unit; the second data storage instruction is used to indicate the The underlying database node stores data corresponding to fields other than the full-text search field of the data table.
8、 根据权利要求 7所述的控制节点, 其特征在于, 所述确定单元具体用于 在所述数据表的字段存储的数据超过所述字段的最大长度时, 确定所述字段为 全文检索字段; 在所述数据表中有根据索引字段中的关键字进行检索时, 确定 所述字段为全文检索字段。  The control node according to claim 7, wherein the determining unit is configured to determine that the field is a full-text search field when data stored in a field of the data table exceeds a maximum length of the field When the data table has a search according to a keyword in the index field, it is determined that the field is a full-text search field.
9、 根据权利要求 7或 8所述的控制节点, 其特征在于, 所述控制节点还包 括:  The control node according to claim 7 or 8, wherein the control node further comprises:
切分单元, 用于在所述确定单元根据所述数据表的元数据确定所述数据表 的全文检索字段之前, 将所述数据表按行切分;  a segmentation unit, configured to slice the data table by rows before determining, by the determining unit, the full-text search field of the data table according to the metadata of the data table;
所述确定单元, 还用于对每行需要存储的数据, 根据所述数据表的元数据 确定所述每行的全文检索字段。  The determining unit is further configured to determine, for each row of data, a full-text search field of each row according to metadata of the data table.
1 0、 根据权利要求 7-9任意一项所述的控制节点, 其特征在于, 所述第一 数据存储指令用于指示所述将所述全文检索数据节点存储所述全文检索字段对 应的数据, 具体包括: 指示所述全文检索数据节点为所述全文检索字段创建索 引或者更新所述全文检索数据节点存储的索引。  The control node according to any one of claims 7-9, wherein the first data storage instruction is used to instruct the storing the full-text search data node to store data corresponding to the full-text search field. Specifically, the method includes: instructing the full-text search data node to create an index for the full-text search field or update an index stored by the full-text search data node.
1 1、 一种用于进行数据检索的控制节点, 其特征在于, 应用于并行数据库 系统中, 所述并行数据库系统包括控制节点和数据节点; 所述数据节点包括全 文检索数据节点和底层数据库节点; 所述控制节点包括: 1 1. A control node for performing data retrieval, which is characterized in that it is applied to a parallel database In the system, the parallel database system includes a control node and a data node; the data node includes a full-text search data node and an underlying database node; and the control node includes:
接收单元, 用于接收客户端发送的检索请求;  a receiving unit, configured to receive a retrieval request sent by the client;
获取单元, 用于根据所述接收单元接收到的检索请求, 获取待检索字段和 待检索数据表;  An obtaining unit, configured to obtain, according to the retrieval request received by the receiving unit, a field to be retrieved and a data table to be retrieved;
确定单元, 用于根据所述检索请求, 确定待检索字段和待检索数据表; 根 据所述待检索数据表的元数据, 确定所述待检索字段中的全文检索字段; 向所述全文检索数据节点发送第一检索指令, 所述第一检索指令用于指示所述  a determining unit, configured to determine a field to be retrieved and a data table to be retrieved according to the retrieval request; determine, according to the metadata of the data table to be retrieved, a full-text search field in the field to be retrieved; and retrieve data from the full text The node sends a first retrieval instruction, the first retrieval instruction is used to indicate the
据库节点发送第二检索指令, 所述第二检索指令用于指示所述底层数据库节点 接收单元, 用于接收全文检索数据节点和底层数据库节点各自返回的检索 结果; And sending, by the library node, a second retrieval instruction, where the second retrieval instruction is used to indicate the bottom database node receiving unit, configured to receive a retrieval result returned by each of the full-text retrieval data node and the bottom database node;
汇聚单元, 用于将所述接收单元接收到的全文检索数据节点和底层数据库 节点各自返回的检索结果进行汇聚处理; 所述发送单元, 还用于将汇聚结果作 为完整的检索结果返回给所述客户端。  a convening unit, configured to perform a convergence process on the search result returned by the full-text search data node and the bottom database node received by the receiving unit, where the sending unit is further configured to return the convergence result as a complete search result to the Client.
12、 根据权利要求 11所述的控制节点, 所述待检索数据表中存储的数据超过所述至少一个字段的最大长度, 确定所述 至少一个字段为全文检索字段; 在所述待检索字段中有需要根据索引字段中的 关键字进行检索的字段时, 确定所述需要根据索引字段中的关键字进行检索的 字段为全文检索字段。 The control node according to claim 11, wherein the data stored in the data table to be retrieved exceeds a maximum length of the at least one field, and the at least one field is determined to be a full-text search field; When there is a field that needs to be retrieved according to a keyword in the index field, it is determined that the field that needs to be retrieved according to the keyword in the index field is a full-text search field.
PCT/CN2012/080963 2012-09-04 2012-09-04 Method and device for storing and retrieving data WO2014036684A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2012/080963 WO2014036684A1 (en) 2012-09-04 2012-09-04 Method and device for storing and retrieving data
CN201280001730.2A CN103891244B (en) 2012-09-04 2012-09-04 A kind of method and device carrying out data storage and search

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2012/080963 WO2014036684A1 (en) 2012-09-04 2012-09-04 Method and device for storing and retrieving data

Publications (1)

Publication Number Publication Date
WO2014036684A1 true WO2014036684A1 (en) 2014-03-13

Family

ID=50236434

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/080963 WO2014036684A1 (en) 2012-09-04 2012-09-04 Method and device for storing and retrieving data

Country Status (2)

Country Link
CN (1) CN103891244B (en)
WO (1) WO2014036684A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239568B (en) * 2017-06-27 2020-04-14 石化盈科信息技术有限责任公司 Distributed index implementation method and device
CN110019231B (en) * 2017-12-26 2021-06-04 中国移动通信集团山东有限公司 Method and node for dynamic association of parallel databases

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001092831A (en) * 1999-09-21 2001-04-06 Toshiba Corp Device and method for document retrieval
CN101916280A (en) * 2010-08-17 2010-12-15 上海云数信息科技有限公司 Parallel computing system and method for carrying out load balance according to query contents
CN102054007A (en) * 2009-11-10 2011-05-11 北大方正集团有限公司 Searching method and searching device
CN102136003A (en) * 2011-03-25 2011-07-27 上海交通大学 Large-scale distributed storage system
CN102265277A (en) * 2011-06-01 2011-11-30 华为技术有限公司 Operation method and device for data memory system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5701459A (en) * 1993-01-13 1997-12-23 Novell, Inc. Method and apparatus for rapid full text index creation
CN100481076C (en) * 2005-12-23 2009-04-22 北大方正集团有限公司 Searching method for relational data base and full text searching combination
CN101894143A (en) * 2010-06-28 2010-11-24 北京用友政务软件有限公司 Federated search and search result integrated display method and system
CN102025550A (en) * 2010-12-20 2011-04-20 中兴通讯股份有限公司 System and method for managing data in distributed cluster

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001092831A (en) * 1999-09-21 2001-04-06 Toshiba Corp Device and method for document retrieval
CN102054007A (en) * 2009-11-10 2011-05-11 北大方正集团有限公司 Searching method and searching device
CN101916280A (en) * 2010-08-17 2010-12-15 上海云数信息科技有限公司 Parallel computing system and method for carrying out load balance according to query contents
CN102136003A (en) * 2011-03-25 2011-07-27 上海交通大学 Large-scale distributed storage system
CN102265277A (en) * 2011-06-01 2011-11-30 华为技术有限公司 Operation method and device for data memory system

Also Published As

Publication number Publication date
CN103891244B (en) 2016-11-16
CN103891244A (en) 2014-06-25

Similar Documents

Publication Publication Date Title
US11921760B2 (en) Distributed transaction management with tokens
US10635717B2 (en) Query suggestion templates
US20150234927A1 (en) Application search method, apparatus, and terminal
US10452661B2 (en) Automated database schema annotation
US10747772B2 (en) Fuzzy substring search
US20140181098A1 (en) Methods and systems for retrieval of experts based on user customizable search and ranking parameters
JP6785921B2 (en) Picture search method, device, server and storage medium
US9275128B2 (en) Method and system for document indexing and data querying
JP2005339542A (en) Query to task mapping
US10152478B2 (en) Apparatus, system and method for string disambiguation and entity ranking
US9984155B2 (en) Inline discussions in search results around real-time clusterings
US20140297653A1 (en) Ontology-based query method and apparatus
US11573961B2 (en) Delta graph traversing system
US9734177B2 (en) Index merge ordering
US11436531B2 (en) Machine learning-powered resolution resource service for HCI systems
US20230087460A1 (en) Preventing the distribution of forbidden network content using automatic variant detection
JP2019087249A (en) Automatic search dictionary and user interfaces
US20160034589A1 (en) Method and system for search term whitelist expansion
US9110943B2 (en) Identifying an image for an entity
WO2021174924A1 (en) Information generation method and apparatus, electronic device, and storage medium
WO2014036684A1 (en) Method and device for storing and retrieving data
CN113032436B (en) Searching method and device based on article content and title
US9122748B2 (en) Matching documents against monitors
WO2020124524A1 (en) Method and apparatus for exclusive use of resources by resource platform, and electronic device
CN106776772A (en) A kind of method and device of data retrieval

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12884201

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12884201

Country of ref document: EP

Kind code of ref document: A1