WO2014036684A1 - Procédé et dispositif de stockage et de récupération de données - Google Patents

Procédé et dispositif de stockage et de récupération de données Download PDF

Info

Publication number
WO2014036684A1
WO2014036684A1 PCT/CN2012/080963 CN2012080963W WO2014036684A1 WO 2014036684 A1 WO2014036684 A1 WO 2014036684A1 CN 2012080963 W CN2012080963 W CN 2012080963W WO 2014036684 A1 WO2014036684 A1 WO 2014036684A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
full
field
node
text search
Prior art date
Application number
PCT/CN2012/080963
Other languages
English (en)
Chinese (zh)
Inventor
曹莉
吴向阳
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201280001730.2A priority Critical patent/CN103891244B/zh
Priority to PCT/CN2012/080963 priority patent/WO2014036684A1/fr
Publication of WO2014036684A1 publication Critical patent/WO2014036684A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Definitions

  • the present invention relates to the field of information processing, and in particular, to a method and apparatus for performing data storage and retrieval.
  • the existing technical solution is to use structured search language SQL to retrieve structured data, and use existing full-text search engine to perform full-text search on unstructured data.
  • SQL structured search language
  • full-text search engine to perform full-text search on unstructured data.
  • the technical solution provided by the prior art is to perform a structured search by sending the data to be retrieved to the SQL retrieval system, and sending the data to be retrieved to the full-text search engine for full-text search, and then facing the two independent
  • the retrieval results obtained by the retrieval system are integrated.
  • Embodiments of the present invention provide a method and apparatus for performing data storage and retrieval, which realizes distributed storage of data requiring full-text search and data that does not need to be full-text searched in a parallel database, and reduces storage redundancy; At the same time, full-text search and structured search of the retrieved data under the parallel database can be realized, and the retrieval efficiency is improved.
  • a method for performing data storage wherein the method is applied to a parallel database system, the parallel database system comprising a control node and a data node; the data node comprising a full-text search data node and an underlying database node; Methods include:
  • Creating a data table according to the data storage request Determining, according to the metadata of the data table, a full-text search field of the data table; sending, according to the full-text search field, a first data storage instruction to the full-text search data node; the first data storage instruction is used to indicate Storing the full-text search data node to store data corresponding to the full-text search field;
  • a method for data retrieval characterized in that it is applied to a parallel database system, the parallel database system comprising a control node and a data node; the data node comprises a full-text search data node and an underlying database node; the method comprises:
  • the control node receives a retrieval request sent by a client
  • the library node sends a second retrieval instruction, where the second retrieval instruction is used to instruct the bottom database node to receive the retrieval result returned by the full-text retrieval data node and the underlying database node, and retrieve the data node and the bottom layer in full text.
  • the search results returned by the database nodes are aggregated, and the aggregated result is returned to the client as a complete search result.
  • a control node for performing data storage characterized in that, in a parallel database system, the parallel database system includes the control node and a data node; the data node includes a full-text search data node and an underlying database node;
  • the control node includes:
  • a receiving unit configured to receive a data storage request sent by the client;
  • a creating unit configured to create a data table according to the data storage request received by the receiving unit, and a determining unit, configured to determine a full-text search field of the data table according to the metadata of the data table created by the creating unit;
  • a sending unit configured to send, according to the full-text search field determined by the determining unit, a first data storage instruction to the full-text search data node, where the first data storage instruction is used to indicate that the full-text search data node is stored Data corresponding to the full-text search field; sending a second data storage instruction to the underlying database node according to a field other than the full-text search field of the data table determined by the determining unit; the second data storage instruction is used to indicate the
  • the underlying database node stores ⁇ " ⁇ , Wang Yi into 4, thousand and I - « six thousand again,"
  • control nodes for performing data retrieval, characterized in that, in a parallel database system, the parallel database system includes a control node and a data node; the data node includes a full-text search data node and an underlying database node; Control nodes include:
  • a receiving unit configured to receive a retrieval request sent by the client
  • An obtaining unit configured to obtain, according to the retrieval request received by the receiving unit, a field to be retrieved and a data table to be retrieved;
  • a determining unit configured to determine a field to be retrieved and a data table to be retrieved according to the retrieval request; determine, according to the metadata of the data table to be retrieved, a full-text search field in the field to be retrieved; and retrieve data from the full text
  • the node sends a first retrieval instruction, where the first retrieval instruction is used to indicate
  • the underlying database node sends a second retrieval instruction, and the second retrieval instruction is configured to instruct the bottom number receiving unit to receive the retrieval result returned by each of the full-text retrieval data node and the underlying database node;
  • An aggregation unit configured to aggregate the search results returned by the full-text search data node and the bottom database node received by the receiving unit;
  • the sending unit is further configured to return the convergence result to the client as a complete retrieval result.
  • the method and device for data storage and retrieval provided by the embodiments of the present invention can store data that needs to be full-text searched and data that does not need to be full-text searched in different types of data nodes in a parallel database system architecture.
  • the distributed storage of data is realized, and the redundancy of data storage is reduced compared with the unified storage method of the prior art; meanwhile, different types are sent to different types of data nodes according to the retrieval request sent by the client.
  • the retrieval instruction realizes that the retrieval execution node can perform different types of retrieval on the retrieval data according to different retrieval execution instructions.
  • the method provided by the embodiment of the present invention simultaneously provides the full-text search capability and the structured search capability in the same system, and does not need to use two independent retrieval systems for searching, thereby Search steps to improve retrieval efficiency.
  • FIG. 1 is a flowchart of a method for performing data storage according to Embodiment 1 of the present invention
  • FIG. 2 is a flowchart of a method for performing data retrieval according to Embodiment 1 of the present invention
  • FIG. 3 is a flowchart of a method for performing data storage according to Embodiment 2 of the present invention.
  • FIG. 4 is a flowchart of another method for performing data storage according to Embodiment 2 of the present invention
  • FIG. 5 is a flowchart of a method for performing data retrieval according to Embodiment 3 of the present invention
  • FIG. 6 is a block diagram of a control node for performing data storage in Embodiment 4 of the present invention.
  • FIG. 7 is a block diagram showing another composition of a control node for performing data storage in Embodiment 4 of the present invention.
  • FIG. 8 is a block diagram of a control node for performing data retrieval according to Embodiment 4 of the present invention.
  • FIG. 9 is a block diagram showing a composition of a control node for performing data storage in Embodiment 5 of the present invention.
  • Figure 10 is a block diagram showing the composition of a control node for performing data retrieval in Embodiment 5 of the present invention.
  • An embodiment of the present invention provides a method for performing data storage and retrieval, wherein the method is applied to a parallel database system, where the parallel database system includes a control node and a data node; and the data node includes full-text search data. Node and underlying database nodes.
  • the data node includes a full-text search data node and an underlying database node, and the full-text search data node is configured to store data corresponding to the full-text search field, and perform full-text search on the data stored by itself;
  • the data corresponding to the full-text search field is not stored, and the structured search is performed on the data stored by itself.
  • the full-text retrieval data node and the underlying database node can be distributed in the following two ways:
  • the first distribution method all the full-text retrieval data nodes are set in one pool, and the underlying database nodes are all set in another pool.
  • the two pools are logically independent and can be physically separated. It can be deployed on two devices or it can be integrated on the same device. In this way, the pool in which the full-text search data node is set stores data that needs to be retrieved in full-text, and the pool in which the underlying database node is set stores data that does not need to be retrieved in full-text.
  • the second distribution method a full-text search data node and an underlying database node as a pair of nodes are set together on the same data retrieval node.
  • This data retrieval node only includes one The full-text retrieval data node and an underlying database node, and different data retrieval nodes set different full-text retrieval data nodes and underlying database nodes. In this way, each data retrieval node internally stores different data.
  • the full-text retrieval data node stores data that needs to be retrieved in full-text
  • the underlying database node stores data that does not need to be retrieved in full-text
  • the full-text retrieval data node and the underlying database node in the same data retrieval node The stored data needs to have information corresponding to the same distribution key value and the like.
  • an embodiment of the present invention provides a method for performing data storage, which may be implemented by a control node. As shown in FIG. 1, the method includes:
  • the data table may be represented by a structured data storage form, wherein the stored content is provided by a client.
  • Table 1 below is a specific structure of one embodiment of the data table.
  • Name, Provider, and Summary are field names, and each column corresponding to each field stores data corresponding to each field.
  • the metadata is used to indicate which of the all fields of the data table are full-text search fields, and which are not full-text search fields.
  • the representation of the metadata can be specifically set in the data table, and it is identified whether the data stored in each field exceeds the maximum length of the field, or the metadata is set to correspond to the data table, but is set to be independently stored outside the data table.
  • Data sheet. Metadata can be set to use a combination of "0""1" or “yes”"no” to indicate which ones need to be full-text checked. Which is not required, for example, “0” means no, “1” means it is needed; or “no” means no, "yes” means it is needed. It can be implemented in the following two ways, including:
  • the first method if the data stored in the field of the data table exceeds the maximum length of the field, it is determined that the field is a full-text search field.
  • the second method if the data table has a search according to a keyword in the index field, then the field is determined to be a full-text search field.
  • the first method is that the control node performs statistics on the length of data stored in each field, and compares with the maximum length of each field to determine a full-text search field and a non-full-text search field, and the control node pairs each field.
  • the type is identified.
  • the second method is that the keywords in the index field are identified by the client, and the control node is only used to store these identifiers and identify the meaning of these tags.
  • the control node identifies each field according to this correspondence.
  • the first data storage instruction is specifically configured to instruct the full-text search data node to create an index for the full-text search field or update an index stored by the full-text search data node.
  • the second data storage instruction is specifically configured to instruct the underlying database node to perform structured data storage.
  • an embodiment of the present invention provides a data retrieval method.
  • the method is implemented by a control node, as shown in FIG. 2, the method includes:
  • the search request may be a standard query statement specified by the SQL, for example, "Sel ect id, name from A where comment Like ' roman ' group by name” , the statement means that the query "comment” from the table A contains “roman” " Corresponding id and name, and sort by name.
  • the first method in the to-be-retrieved field is determined according to the metadata of the to-be-retrieved data table: if at least one field in the to-be-searched field exists in the data to be retrieved in the to-be-retrieved data table If the maximum length of the at least one field is exceeded, then the at least one field is determined to be a full-text search field.
  • the second method if there is a field in the field to be retrieved that needs to be retrieved according to a keyword in the index field, it is determined that the field that needs to be retrieved according to the keyword in the index field is a full-text search field.
  • identification of the full-text search field by the first method and the second method corresponds to the method of how to mark the full-text search field described in the above step 103.
  • the search instruction mainly includes The following information:
  • the second retrieval instruction here refers to a set of operations to be performed on a particular query database.
  • Each step of the execution instruction describes a specific database operation such as table scan, join, aggregation, sort, and the like.
  • a complete database retrieval instruction which mainly includes the following information:
  • Cost The evaluation of system resource consumption when the data is retrieved
  • rows Retrieves the evaluation of the total number of rows returned. Reflects the selectivity of estimating the conditions of any WHERE clause.
  • width retrieves the total number of bytes of the total number of rows returned, reflecting the size of the data set that satisfies the search criteria.
  • the method for performing aggregation processing on the retrieval result returned by each data node may be: performing equivalent connection according to a specific field.
  • Table 2 shows that the search result obtained by the full-text search data node is id field value 2, 3; the search result obtained by the underlying database node of Table 3 is id field value 1, 2 And 3, and the corresponding Name field information, according to the value of the id field, the search result of the id field value of 2, 3 is merged into a table, such as Table 4 shows.
  • a method for data storage and retrieval provided by an embodiment of the present invention can store data that needs to be full-text searched and data that does not need to be full-text searched separately with different types of data nodes in a parallel database system architecture.
  • the distributed storage of data reduces the redundancy of data storage compared with the unified storage method of the prior art; meanwhile, different types of retrieval are sent to different types of data nodes according to the retrieval request sent by the client.
  • the instruction realizes that the retrieval execution node can perform different types of retrieval on the data to be retrieved according to different retrieval execution instructions.
  • the method provided by the embodiment of the present invention simultaneously provides the full-text search capability and the structured search capability in the same system, and does not need to use two independent retrieval systems for searching, thereby Search steps to improve retrieval efficiency.
  • Example 2 Based on the first distribution manner of the full-text search data node and the underlying database node described in Embodiment 1, the embodiment of the present invention provides a method for performing data storage, which is implemented by a control node, as shown in FIG. Methods include:
  • the second data storage instruction is used to instruct the bottom database node to store the data table full-text search field. Data corresponding to other fields.
  • steps 304 and 305 described herein do not require a significant order of execution in the order of execution.
  • step 303 if it is determined that there is no field that requires full-text search, only step 305 is performed. If it is determined that all of the full-text search fields are required, then only step 304 can be performed.
  • a data loading server including a full-text retrieval data loading server, is provided in the parallel database system framework as described in Embodiment 1.
  • the underlying database load server is configured to load data corresponding to the full-text search field in the data table to the full-text search data node under the control of the control node, so that the full-text search data node stores corresponding
  • the underlying database loading server is configured to load data corresponding to the full-text search field in the data table to the underlying database node under the control of the control node, so that the underlying database node stores the corresponding data.
  • the specific step of the step 304 includes: the control node divides the data table into columns, divides the column fields that need full-text search, and then generates corresponding data loading tasks according to the divided fields and sends them to the full-text search.
  • the full-text search data loading server allocates data corresponding to the column fields that need full-text search to different full-text search data nodes for storage according to a preset distribution strategy.
  • the full-text search data loading task is a process of storing unstructured data of an application or data that needs to support full-text search to a full-text search server, and generally refers to a process of creating an inverted index. If it is structured data, directly through the word segmentation, filtering, creating an inverted index table, otherwise it also contains the data information extraction process. There is no strict industry standard for data loading in full-text search. Generally, the private API is opened to the outside world, and the creation of data index is completed.
  • the pre-set distribution policy may be set to store different information of different regions by different data nodes, for example, node 1 stores information in Beijing, node 2 stores information in Shanghai, and the like.
  • the distribution policy can be set according to actual needs, which is not limited by the embodiment of the present invention.
  • the specific steps of the step 305 include: the control node divides the data table into columns, divides the column fields that do not need full-text search, and then generates corresponding data loading tasks according to the divided fields.
  • the underlying database loading server allocates data corresponding to the column fields that do not need full-text retrieval to different underlying database nodes for storage according to a preset distribution strategy.
  • the underlying database data loading task is to store the specific data of the application in the underlying database through the standard SQL loading statement COPY FROM, for example:
  • the full-text search data node in order to ensure that the retrieval result of the full-text retrieval data node and the retrieval result of the underlying database node can be merged, in the process of generating the data loading task, it is also necessary to determine which fields are the primary key fields, that is, mas ter Key field, these fields are used to represent the association between data stored in different nodes. Specifically, when performing data storage, the full-text search data node must store the data corresponding to the field of the main key field in addition to the data corresponding to the field that needs full-text search, and the storage method of the underlying database node is similar.
  • An embodiment of the present invention further provides a method for performing data storage, where the method is applied to a second distribution manner of a full-text search data node and an underlying database node as described in the embodiment, as shown in FIG. 4, in this method.
  • the method further includes: step 306, the data table is divided into rows; then the step 307 is replaced, the step is 307 is data that needs to be stored for each row, and the full-text search field of each row is determined according to the metadata of the data table.
  • the execution of the other steps is the same as the method of performing data storage corresponding to the first distribution mode of the full-text search data node and the underlying database node in this embodiment.
  • a method for performing data storage according to an embodiment of the present invention may be implemented in a parallel database system architecture, where data required for full-text retrieval and data not required for full-text retrieval are separately stored on different types of data nodes.
  • the distributed storage of data reduces the redundancy of data storage compared to the unified storage method of the prior art.
  • the data table created by the control node is represented by a data table as shown in Table 5 below.
  • the data table name is A, and there are three different fields: id, name, comment, where id is mas Ter key field, comment field requires full-text search.
  • the id field and the name field are stored in the underlying database node, and the id field and the co ent field are stored in the full-text retrieval node.
  • the method for performing data retrieval provided by the embodiment of the present invention, as shown in FIG. 5, includes:
  • the control node receives a retrieval request sent by the client.
  • the query corresponding to the search request is Select id, name from A where comment Like 'roman' group by name, and the statement means that the query "comment" contains the id and name corresponding to "roman" from the table A. And use name to perform 4 unordered.
  • the control node determines, according to the retrieval request, a field to be retrieved and a data table to be retrieved.
  • the control node acquires metadata of the data table to be retrieved, and determines a full-text search field in the to-be-searched field according to the metadata of the data table to be retrieved.
  • the control node sends a first retrieval instruction to the full-text retrieval data node according to the full-text search field in the to-be-retrieved field.
  • the underlying database node sends a second retrieval instruction.
  • the full-text search data node searches the to-be-searched data stored in the full-text search data node according to the first search instruction, and obtains a full-text search node search result. 407.
  • the bottom database node searches the data to be retrieved stored by the underlying database node according to the second retrieval instruction, and obtains a retrieval result of the underlying database node.
  • the retrieval results of the full-text search nodes are shown in Table 6 below, and the results of the underlying database nodes are as shown in Table 7 below.
  • the control node receives the full-text search node search result sent by the full-text search data node and the bottom-level database node search result sent by the underlying database node.
  • the control node aggregates the retrieval result of the full-text retrieval node sent by the received full-text retrieval node and the retrieval result of the underlying database node sent by the underlying database node, to obtain a complete retrieval result.
  • the complete search results obtained here are shown in Table 8 below.
  • the control node sends the complete search result to the client.
  • search results described in Tables 6 to 8 above are the search results generated based on the first distribution manner described in Embodiment 1, that is, the pool corresponding to the full-text search node returns a full-text search result to the control node.
  • the pool corresponding to the underlying database node returns an underlying data retrieval result to the control node, and the control node aggregates the two types of retrieval results.
  • each data retrieval node returns its own retrieval result.
  • the retrieval results of the three data retrieval nodes can be as follows: To the table shown in Table 11. Table 9 Data retrieval node 1 search results:
  • the control node aggregates the three search results to obtain a complete search result.
  • the search results are the same as in Table 8 above.
  • a method for data retrieval provided by an embodiment of the present invention can send different types of retrieval instructions to different types of data nodes according to a retrieval request sent by a client in a parallel database system architecture, so that the retrieval execution node can be Different retrieval execution instructions simultaneously perform different types of retrieval on the retrieved data.
  • the method provided by the embodiment of the present invention simultaneously provides the full-text search capability and the structured search capability in the same system, and does not need to use two independent retrieval systems for searching, thereby Search step, improved search efficiency Rate.
  • An embodiment of the present invention provides a control node for performing data storage, which may be applied to a parallel database system, where the parallel database system includes the control node and a data node; and the data node includes a full-text search data node and a bottom layer.
  • the control node includes: a receiving unit 51, a creating unit 52, a determining unit 53, and a sending unit 54.
  • the receiving unit 51 is configured to receive a data storage request sent by the client.
  • the creating unit 52 is configured to create a data table according to the data storage request received by the receiving unit 51.
  • the determining unit 53 is configured to determine a full-text search field of the data table according to the metadata of the data table created by the creating unit 52.
  • the sending unit 54 is configured to send, according to the full-text search field determined by the determining unit 53, a first data storage instruction to the full-text search data node, where the first data storage instruction is used to indicate that the full-text search data is to be
  • the node stores data corresponding to the full-text search field; and sends a second data storage instruction to the bottom database node according to a field other than the data table full-text search field determined by the determining unit 53; the second data storage instruction is used for Instructing the underlying database node to store data corresponding to other fields than the full-text search field of the data table.
  • the determining unit 53 is specifically configured to: when the data stored in the field of the data table exceeds a maximum length of the field, determine that the field is a full-text search field; and have an index according to the index in the data table. When the keywords in the field are retrieved, it is determined that the field is a full-text search field.
  • control node further includes: a segmentation unit 55. Before the full-text search field of the data table, the data table is segmented by row.
  • the determining unit 53 is further configured to determine, for each row of data, a full-text search field of each row according to metadata of the data table.
  • the first data storage instruction is used to indicate that the full-text search data node stores data corresponding to the full-text search field, and specifically includes: indicating the full-text search data section The point creates an index for the full-text search field or updates an index stored by the full-text search data node.
  • the embodiment of the present invention further provides a control node for performing data retrieval, which can be applied to a parallel database system, where the parallel database system includes a control node and a data node; and the data node includes a full-text search data node and an underlying database.
  • the control node includes: a receiving unit 61, an obtaining unit 62, a determining unit 63, a sending unit 64, a receiving unit 65, and a convergence unit 66.
  • the receiving unit 61 is configured to receive a retrieval request sent by the client.
  • the obtaining unit 62 is configured to obtain a to-be-retrieved field and a to-be-retrieved data table according to the retrieval request received by the receiving unit 61.
  • a determining unit 63 configured to determine, according to the retrieval request, a field to be retrieved and a data table to be retrieved; and determine, according to the metadata of the data table to be retrieved, a full-text search field in the field to be retrieved.
  • the sending unit 64 is configured to send, according to the full-text search field in the to-be-retrieved field determined by the determining unit 63, a first retrieving instruction, where the first retrieving instruction is used,
  • the receiving unit 65 is configured to receive a search result returned by each of the full-text search data node and the bottom database node.
  • the aggregation unit 66 is configured to perform a convergence process on the search results returned by the full-text search data node and the bottom database node received by the receiving unit 65.
  • the sending unit 64 is further configured to return the convergence result to the client as a complete retrieval result.
  • the determining unit 63 is configured to: at least one field in the to-be-retrieved field, where the data stored in the to-be-retrieved data table exceeds a maximum length of the at least one field, Determining that the at least one field is a full-text search field; when there is a field in the to-be-searched field that needs to be retrieved according to a keyword in the index field, determining that the field that needs to be retrieved according to a keyword in the index field is the full text Retrieve the field.
  • the device for storing and retrieving data can store data that needs to be full-text searched and data that does not need to be full-text searched in different types of data nodes in a parallel database system architecture.
  • the distributed storage of data reduces the redundancy of data storage compared with the unified storage method of the prior art; meanwhile, different types of retrieval are sent to different types of data nodes according to the retrieval request sent by the client.
  • the instruction realizes that the retrieval execution node can perform different types of retrieval on the data to be retrieved according to different retrieval execution instructions.
  • the method provided by the embodiment of the present invention simultaneously provides the full-text search capability and the structured search capability in the same system, and does not need to use two independent retrieval systems for searching, thereby Search steps to improve retrieval efficiency.
  • An embodiment of the present invention provides a control node for performing data storage, which may be applied to a parallel database system, where the parallel database system includes the control node and a data node; and the data node includes a full-text search data node and a bottom layer.
  • the processor 71 is configured to receive a data storage request sent by the client, create a data table according to the data storage request, and determine a full-text search field of the data table according to the metadata of the data table; according to the full-text search field, Transmitting, to the full-text search data node, a first data storage instruction, where the first data storage instruction is used to indicate that the full-text search data node stores data corresponding to the full-text search field; a field other than the field, sending a second data storage instruction to the underlying database node; the second data storage instruction is configured to instruct the bottom database node to store data corresponding to a field other than the full-text search field of the data table.
  • the memory 72 is configured to store a data storage request, metadata of the data table, a first data storage instruction, and a second data storage instruction.
  • the processor 71 is specifically configured to store data in a field of the data table.
  • the field is determined to be a full-text search field; when the data table is searched according to a keyword in the index field, the field is determined to be a full-text search field.
  • the processor is further configured to divide the data table into rows before determining the full-text search field of the data table according to the metadata of the data table. For each row of data that needs to be stored, the full-text search field of each row is determined based on the metadata of the data table.
  • the first data storage instruction is used to indicate that the full-text search data node stores the data corresponding to the full-text search field, and specifically includes: indicating that the full-text search data node is the full-text search field Create an index or update an index stored by the full-text search data node.
  • the embodiment of the present invention further provides a control node for performing data retrieval, which can be applied to a parallel database system, where the parallel database system includes a control node and a data node; and the data node includes a full-text search data node and an underlying database.
  • the control node includes: a processor 73 and a memory 74.
  • the processor 73 is configured to receive a retrieval request sent by the client, determine, according to the retrieval request, a field to be retrieved and a data table to be retrieved, obtain metadata of the data table to be retrieved, and according to the data table to be retrieved Metadata, determining a full-text search field in the to-be-searched field; sending, according to the full-text search field in the to-be-retrieved field, a first search instruction to the full-text search data node, where the first search instruction is used to indicate The full-text search data node retrieves a field in the to-be-retrieved field, and sends a second retrieval instruction to the underlying database node, where the second retrieval instruction is used to refer to corresponding data; receiving a full-text retrieval data node and an underlying database node The returned search result is aggregated and the search result returned by the full-text search data node and the underlying database node is aggregated, and the aggregated result is returned to the client as a complete search result.
  • the memory 74 is configured to store a retrieval request, metadata of the data table to be retrieved, a first retrieval instruction, a second retrieval instruction, and a retrieval result.
  • the processor 73 is specifically configured to have at least one word in the to-be-retrieved field.
  • the data stored in the data table to be retrieved exceeds the maximum length of the at least one field, and the at least one field is determined to be a full-text search field; in the field to be retrieved, it is required to perform a keyword according to the index field.
  • the field is retrieved, it is determined that the field that needs to be retrieved according to the keyword in the index field is a full-text search field.
  • the device for storing and retrieving data can store data that needs to be full-text searched and data that does not need to be full-text searched in different types of data nodes in a parallel database system architecture.
  • the distributed storage of data reduces the redundancy of data storage compared with the unified storage method of the prior art; meanwhile, different types of retrieval are sent to different types of data nodes according to the retrieval request sent by the client.
  • the instruction realizes that the retrieval execution node can perform different types of retrieval on the data to be retrieved according to different retrieval execution instructions.
  • the method provided by the embodiment of the present invention simultaneously provides the full-text search capability and the structured search capability in the same system, and does not need to use two independent retrieval systems for searching, thereby Search steps to improve retrieval efficiency.
  • the present invention can be implemented by means of software plus necessary general hardware, and of course, by hardware, but in many cases, the former is a better implementation. .
  • the technical solution of the present invention which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a readable storage medium, such as a floppy disk of a computer.
  • a hard disk or optical disk or the like includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention concerne le domaine du traitement des informations et, en particulier, un procédé et un dispositif de stockage et de récupération de données, afin de mettre en œuvre le stockage réparti des données nécessitant une récupération en texte intégral et des données ne nécessitant pas de récupération en texte intégral dans la base de données parallèle, réduisant ainsi la redondance du stockage; en outre, la récupération en texte intégral et la récupération structurée des données à récupérer sont mises en œuvre dans la base de données parallèle, améliorant ainsi l'efficacité de la récupération. La présente invention comporte : un nœud de commande, dans une architecture système de la base de données parallèle, qui stocke des données nécessitant une récupération en texte intégral et des données ne nécessitant pas de récupération en texte intégral sur différents types de nœuds de données respectivement; qui envoie différents types d'instructions de récupération à différents types de nœuds de données, en fonction d'une demande de récupération envoyée par un client, afin qu'un nœud d'exécution de récupération puisse exécuter simultanément, en fonction des différentes instructions d'exécution de récupération, différents types d'opérations de récupération sur les données à récupérer. Les modes de réalisation de la présente invention s'appliquent principalement aux processus de stockage et de récupération de données.
PCT/CN2012/080963 2012-09-04 2012-09-04 Procédé et dispositif de stockage et de récupération de données WO2014036684A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201280001730.2A CN103891244B (zh) 2012-09-04 2012-09-04 一种进行数据存储和检索的方法及装置
PCT/CN2012/080963 WO2014036684A1 (fr) 2012-09-04 2012-09-04 Procédé et dispositif de stockage et de récupération de données

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2012/080963 WO2014036684A1 (fr) 2012-09-04 2012-09-04 Procédé et dispositif de stockage et de récupération de données

Publications (1)

Publication Number Publication Date
WO2014036684A1 true WO2014036684A1 (fr) 2014-03-13

Family

ID=50236434

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/080963 WO2014036684A1 (fr) 2012-09-04 2012-09-04 Procédé et dispositif de stockage et de récupération de données

Country Status (2)

Country Link
CN (1) CN103891244B (fr)
WO (1) WO2014036684A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239568B (zh) * 2017-06-27 2020-04-14 石化盈科信息技术有限责任公司 分布式索引实现方法及装置
CN110019231B (zh) * 2017-12-26 2021-06-04 中国移动通信集团山东有限公司 一种并行数据库动态关联的方法及节点

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001092831A (ja) * 1999-09-21 2001-04-06 Toshiba Corp 文書検索装置及び文書検索方法
CN101916280A (zh) * 2010-08-17 2010-12-15 上海云数信息科技有限公司 并行计算系统及按查询内容进行负载均衡的方法
CN102054007A (zh) * 2009-11-10 2011-05-11 北大方正集团有限公司 一种检索方法及检索装置
CN102136003A (zh) * 2011-03-25 2011-07-27 上海交通大学 大规模分布式存储系统
CN102265277A (zh) * 2011-06-01 2011-11-30 华为技术有限公司 数据存储系统的操作方法和装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5701459A (en) * 1993-01-13 1997-12-23 Novell, Inc. Method and apparatus for rapid full text index creation
CN100481076C (zh) * 2005-12-23 2009-04-22 北大方正集团有限公司 关系型数据库与全文检索相结合的检索方法
CN101894143A (zh) * 2010-06-28 2010-11-24 北京用友政务软件有限公司 一种联邦检索及检索结果集成展现方法及系统
CN102025550A (zh) * 2010-12-20 2011-04-20 中兴通讯股份有限公司 一种分布式集群中数据管理的系统和方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001092831A (ja) * 1999-09-21 2001-04-06 Toshiba Corp 文書検索装置及び文書検索方法
CN102054007A (zh) * 2009-11-10 2011-05-11 北大方正集团有限公司 一种检索方法及检索装置
CN101916280A (zh) * 2010-08-17 2010-12-15 上海云数信息科技有限公司 并行计算系统及按查询内容进行负载均衡的方法
CN102136003A (zh) * 2011-03-25 2011-07-27 上海交通大学 大规模分布式存储系统
CN102265277A (zh) * 2011-06-01 2011-11-30 华为技术有限公司 数据存储系统的操作方法和装置

Also Published As

Publication number Publication date
CN103891244B (zh) 2016-11-16
CN103891244A (zh) 2014-06-25

Similar Documents

Publication Publication Date Title
US10635717B2 (en) Query suggestion templates
US10747772B2 (en) Fuzzy substring search
US9684713B2 (en) Methods and systems for retrieval of experts based on user customizable search and ranking parameters
US20160371275A1 (en) Automated database schema annotation
JP6785921B2 (ja) ピクチャ検索方法、装置、サーバー及び記憶媒体
US9275128B2 (en) Method and system for document indexing and data querying
JP2005339542A (ja) クエリからタスクへのマッピング
US10146775B2 (en) Apparatus, system and method for string disambiguation and entity ranking
US20110282855A1 (en) Scoring relationships between objects in information retrieval
WO2014071787A1 (fr) Terminal, appareil et procédé de recherche d'application
US20110252018A1 (en) System and method for creating search index on cloud database
US9984155B2 (en) Inline discussions in search results around real-time clusterings
US20140297653A1 (en) Ontology-based query method and apparatus
US20160103916A1 (en) Systems and methods of de-duplicating similar news feed items
EP2715572A1 (fr) Découverte de documents indexés
US11436531B2 (en) Machine learning-powered resolution resource service for HCI systems
US20220358109A1 (en) Database live reindex
WO2021174924A1 (fr) Procédé et appareil de génération d'informations, dispositif électronique, et support de stockage
US20210011913A1 (en) Delta graph traversing system
JP2019087249A (ja) 自動検索辞書およびユーザインターフェイス
US20160034589A1 (en) Method and system for search term whitelist expansion
US11790008B2 (en) Persisted queries and batch streaming
WO2014059851A1 (fr) Serveur de recherche et procédé de recherche
US9110943B2 (en) Identifying an image for an entity
WO2014036684A1 (fr) Procédé et dispositif de stockage et de récupération de données

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12884201

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12884201

Country of ref document: EP

Kind code of ref document: A1