CN117763078A

CN117763078A - Data processing method and device, electronic equipment and storage medium

Info

Publication number: CN117763078A
Application number: CN202410139299.2A
Authority: CN
Inventors: 龚振晗; 谢振江; 赵裕众
Original assignee: Beijing Oceanbase Technology Co Ltd
Current assignee: Beijing Oceanbase Technology Co Ltd
Priority date: 2024-01-31
Filing date: 2024-01-31
Publication date: 2024-03-26

Abstract

One or more embodiments of the present disclosure provide a data processing method and apparatus, an electronic device, and a storage medium, where the method includes: in the process of writing target data in a memory into a disk, responding to a query instruction aiming at the target data, respectively querying data in at least one first local data and at least one second local data according to the query instruction and the intermediate index layer information of the at least one first local data and the intermediate index layer information of the at least one second local data, and sequencing the queried data to be used as a data query result; the first local data comprises part of data stored in a magnetic disk in the target data, the second local data comprises part of data stored in a memory in the target data, the target data is in a column storage form, and each column group of the target data corresponds to one second local data.

Description

Data processing method and device, electronic equipment and storage medium

Technical Field

One or more embodiments of the present disclosure relate to the field of database technologies, and in particular, to a data processing method and apparatus, an electronic device, and a storage medium.

Background

Today, the development of the internet and informatization is rapid, and the generation of data is explosively increasing, so that the requirements for databases and management thereof are increasing. In the data processing process, a data table is required to be subjected to query operation (hereinafter referred to as DQL operation) by utilizing DQL (Data Query Language) and a data query language; in the data processing process, a data table is also required to be operated by using a DML (Data Manipulation Language ) (hereinafter referred to as DML operation), for example, data is added, deleted, checked, changed and the like; the data processing process also requires the use of DDL (Data Definition Languages, data definition language) to reform the data, such as creating new tables, adding columns, deleting columns, changing column types, etc.

In the related art, in the process of reforming data by using DDL, related data cannot provide services such as inquiry, addition, deletion, and modification, so that service processing of a database is affected, and service problems such as service request overtime are generated.

Disclosure of Invention

In view of this, one or more embodiments of the present disclosure provide a data processing method and apparatus, an electronic device, and a storage medium.

In order to achieve the above object, one or more embodiments of the present disclosure provide the following technical solutions:

According to a first aspect of one or more embodiments of the present specification, there is provided a data processing method, the method comprising:

and constructing index data according to a line key field and a text field of the indexed data, wherein the index data comprises a line key field, a text identification field, a word segmentation field and a word frequency field, the line key field of the index data is identical to the line key field of the indexed data, the line key field of the index data is a main key of the index data, the line key field of the indexed data is a main key of the indexed data, and the text identification field, the word segmentation field and the word frequency field of the index data correspond to the text field of the indexed data.

In one embodiment of the present disclosure, the index data includes first index sub-data and second index sub-data, wherein the first index sub-data includes the line key field and the text identification field, the line key field is a primary key of the first index sub-data, and the second index data includes the text identification field, the word segmentation field and the word frequency field, and the text identification field and the word segmentation field are primary keys of the second index data.

In one embodiment of the present specification, the constructing the index data according to the line key field and the text field of the indexed data includes:

constructing first index sub-data according to the line key field and the text field of the indexed data;

and constructing second index sub-data according to the line key field and the text field of the indexed data and the line key field and the text identification field of the first index sub-data.

In one embodiment of the present specification, the index data further includes third index sub-data, the third index sub-data including the line key field and the text identification field, the text identification field being a primary key of the third index sub-data.

In one embodiment of the present disclosure, the constructing the index data according to the line key field and the text field of the indexed data further includes:

and constructing third index sub-data according to the line key field and the text identification field of the first index sub-data.

In one embodiment of the present specification, the index data further includes fourth index sub-data, the fourth index sub-data including the word segmentation field, the text identification field, and the word frequency field, the word segmentation field, the text identification being a primary key of the fourth index data.

and constructing fourth index sub-data according to the line key field and the text field of the indexed data and the line key field and the text identification field of the first index sub-data.

In one embodiment of the present specification, the method further comprises:

in the process of constructing index data, in response to receiving a DQL operation for the indexed data, the DQL operation is performed for the indexed data.

In one embodiment of the present specification, the method further comprises:

in the process of constructing index data, responding to receiving the DML operation for the indexed data, executing the DML operation for the indexed data, and synchronizing increment data generated by the DML operation to the index data.

In one embodiment of the present specification, said synchronizing incremental data generated by said DML operation to said index data includes:

generating a row key, a text identifier, a word segmentation and a word frequency of the inserted data in the index data according to the row key and the text of the inserted data in the indexed data under the condition that the DML operation is the inserting operation, and writing the inserting operation of the row key, the text identifier, the word segmentation and the word frequency carrying the inserted data in the index data into the index data;

And under the condition that the DML operation is a deleting operation, generating a main key of deleting data in the index data according to a line key and/or a text of deleting data in the indexed data, and writing the deleting operation carrying the main key of deleting data in the index data into the index data.

According to a second aspect of one or more embodiments of the present specification, there is provided a data processing apparatus, the apparatus comprising:

the index construction module is used for constructing index data according to a line key field and a text field of the indexed data, wherein the index data comprises a line key field, a text identification field, a word segmentation field and a word frequency field, the line key field of the index data is identical to the line key field of the indexed data, the line key field of the index data is a main key of the index data, the line key field of the indexed data is a main key of the indexed data, and the text identification field, the word segmentation field and the word frequency field of the index data correspond to the text field of the indexed data.

In one embodiment of the present specification, the index building module is configured to:

In one embodiment of the present specification, the index building module is further configured to:

In one embodiment of the present specification, the apparatus further includes a query module configured to:

In one embodiment of the present description, the apparatus further comprises a reforming module for:

In one embodiment of the present disclosure, the reforming module is configured to, when synchronizing incremental data generated by the DML operation to the index data,:

According to a third aspect of one or more embodiments of the present description, a computer program product is presented, comprising a computer program/instruction which, when executed by a processor, implements the steps of the method provided by any of the embodiments of the first aspect.

According to a fourth aspect of one or more embodiments of the present specification, there is provided an electronic device comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor implements the method of the first aspect by executing the executable instructions.

According to a fifth aspect of one or more embodiments of the present description, there is provided a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method according to the first aspect.

The technical scheme provided by the embodiment of the specification can comprise the following beneficial effects:

According to the data processing method provided by the embodiment of the specification, index data can be constructed according to the line key field and the text field of the indexed data, wherein the index data comprises the line key field, the text identification field, the word segmentation field and the word frequency field, the line key field of the index data is identical to the line key field of the indexed data, the line key field of the index data is the main key of the index data, the line key field of the indexed data is the main key of the indexed data, and the text identification field, the word segmentation field and the word frequency field of the index data correspond to the text field of the indexed data. That is, the method can realize full-text index construction of the indexed data by constructing the index data, and the process does not involve DDL operations such as adding index columns in the indexed data, so that the indexed data can provide services such as inquiry service, addition, deletion, modification and the like to the outside in the process, thereby avoiding the influence of business processing in the full-text index construction process, ensuring the business processing response speed of a database, and avoiding the problems such as overtime of a business request.

Drawings

Fig. 1 is a flow chart of a data processing method according to an exemplary embodiment.

Fig. 2 is a schematic diagram of an apparatus according to an exemplary embodiment.

Fig. 3 is a block diagram of a data processing apparatus according to an exemplary embodiment.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with aspects of one or more embodiments of the present description as detailed in the accompanying claims.

It should be noted that: in other embodiments, the steps of the corresponding method are not necessarily performed in the order shown and described in this specification. In some other embodiments, the method may include more or fewer steps than described in this specification. Furthermore, individual steps described in this specification, in other embodiments, may be described as being split into multiple steps; while various steps described in this specification may be combined into a single step in other embodiments.

In the related art, in the process of reforming data by using DDL, related data cannot provide query service to the outside, so that the service processing of the database is affected, and service problems such as service request timeout are generated. In particular, the process of reforming data using DDL is generally time consuming, i.e., the duration of a database transaction pause is long. For example, when full-text indexes are built on some data, a reformulation operation such as an index column needs to be added to the data, which results in that the data cannot provide services such as inquiry, addition and deletion, and modification.

Based on this, in the first aspect, at least one embodiment of the present disclosure provides a data processing method, where the method may not affect services such as query, addition, deletion, and modification of a data in building a full text index for the data, that is, the data may be built on-line, without building the full text index off-line, where on-line refers to not suspending the service provided externally, and off-line refers to suspending the service provided externally.

Referring to fig. 1, a flow of the data processing method is shown, which includes step S101.

In step S101, index data is constructed according to a line key field and a text field of the indexed data, where the index data includes a line key field, a text identification field, a word segmentation field, and a word frequency field, the line key field of the index data is the same as the line key field of the indexed data, the line key field of the index data is a primary key of the index data, the line key field of the indexed data is a primary key of the indexed data, and the text identification field, the word segmentation field, and the word frequency field of the index data correspond to the text field of the indexed data (i.e., the text identification field, the word segmentation field, and the word frequency field of the index data are used to represent a text structure of the text field of the indexed data).

The indexed data is the data for which the full text index is constructed by the method, namely the method aims at constructing the full text index for the indexed data. The indexed data may be a table (i.e., a master table), or other data form, and this description is not intended to be limiting. The row key field of the indexed data is used to indicate the identity of the row, that is, each element in the row key field is used to represent the identity of the row in which it is located; the elements in the text field of the indexed data are text content; the indexed data may contain other fields in addition to the line key field and the text field, or may not contain other fields, which this specification is not intended to limit. For example, the indexed data is a paper statistics table that contains fields for a row key, author, date of release, name, abstract, etc., where the name and abstract fields are text fields.

The index data and the indexed data are mutually independent, namely the indexed data can not be influenced in the construction process of the index data to provide inquiry, addition, deletion, modification and other services for the indexed data. The index data may be in the form of a table or tables, or other data formats, and this description is not intended to be limiting.

Illustratively, the index data is a table with a Row Key field Row Key, a text identification field doc_id, a WORD segmentation field WORD, and a WORD frequency field word_count. When index data is generated, an index auxiliary table, namely the table with a line key field, a text identification field, a word segmentation field and a word frequency field, is firstly constructed, then snapshot points of the indexed data are obtained, and when all active transactions started before the snapshot points are ended, the following operations are executed for each text field non-empty line in the indexed data to complete data completion: generating a text identifier for the text content in the text field, word segmentation and counting the word frequency of each word segment of the text content in the text field, adding the row key of the row in the indexed data and the text identifier of the row, and adding the word frequency of each word segment and each word segment as a plurality of rows of records (namely, a main key, the text identifier, one word segment and the word frequency thereof form a row of records) in an index auxiliary table to the index auxiliary table. In this example, the index auxiliary table may query the word segmentation and the like through a row key as a main key, so as to improve the query efficiency of the full-text index.

Still further exemplary, the index data includes first index sub-data and second index sub-data, wherein the first index sub-data includes the line key field and the text identification field, the line key field is a primary key of the first index sub-data, and the second index data includes the text identification field, the word segmentation field and the word frequency field, and the text identification field and the word segmentation field are primary keys of the second index data. For example, the first index sub data and the second index sub data may be two index auxiliary tables. The first index sub-data in this example may be used to find a text identification field according to a line key, and the second index sub-data may be used to find a word frequency (i.e., the number of occurrences of a word) of a word in a text corresponding to a text identification according to a primary key made up of the text identification and the word.

In this example, the index data may be generated as follows:

first, first index sub-data is constructed from a line key field and a text field of the indexed data. For example, an index auxiliary table fts_rowkey_doc is constructed first, that is, a table having the above-mentioned line key field and text identification field and using the line key field as a main key, then a snapshot point of the indexed data is obtained, and when all active transactions started previously are finished, the following operations are performed for each text field non-empty line in the indexed data, so as to complete data completion of the index auxiliary table fts_rowkey_doc: a text identifier is generated for the text content within the text field and the row key for the row in the indexed data, and the text identifier for the row, are added to the index-assist table fts_rowkey_doc as a row record in the index-assist table fts_rowkey_doc.

And constructing second index sub-data according to the line key field and the text field of the indexed data and the line key field and the text identification field of the first index sub-data. For example, an INDEX auxiliary table fts_index is first constructed, that is, a table having the text identification field, the word segmentation field and the word frequency field and using the text identification field and the word segmentation field as primary keys is then obtained, and when all active transactions started before all end, the following operations are executed for each row in the INDEX auxiliary table fts_rowkey_doc to complete data complement of the INDEX auxiliary table fts_index: the text content in the text field of the same line as the line key is found in the indexed data, the text content is segmented and the word frequency of each segmented word is counted, and the text identification of the line, the word frequency of each segmented word and the word frequency of each segmented word are used as a plurality of line records (namely, the text identification and one segmented word and the word frequency thereof form a line record) of an INDEX auxiliary table FTS_INDEX and are added into the INDEX auxiliary table FTS_INDEX.

In this example, the structural relationship and construction order of the first index sub-data and the second index sub-data may be such that the DML operation is continuous in the data completion process, i.e., incremental data may be synchronized to the first index sub-data and the second index sub-data when the indexed data performs the DML operation. For example, during the period of data completion of the index auxiliary table fts_row_doc, the corresponding main table can execute DML operation and DQL operation, and the deletion operation of the main table can complete synchronous update of the index auxiliary table fts_row_doc only by writing a deletion row of a main key into the memtable of the index auxiliary table fts_row_doc; the insertion operation of the main table is that after the DOC_ID is obtained, data are inserted into the memtable of the index auxiliary table FTS_ROWKEY_DOC; the update operation may be regarded as a delete operation plus an insert operation. In particular, the primary KEY of the first index sub-data only comprises a ROW KEY field ROW KEY, and does not comprise a text identification field DOC_ID, so that concurrent deleting operation can be supported during the process of completing the data of the first index sub-data, because the deleting operation does not need to generate a new text identification DOC_ID involved, only a deleting ROW corresponding to the record ROW KEY needs to be written into a memtable, and the conflict between the DML operation and the completing data in the process of constructing the full-text index is avoided.

Also exemplary, the index data includes first index sub-data including the line key field and the text identification field, the line key field being a primary key of the first index sub-data, the second index data including the text identification field, the word segmentation field, and the word frequency field, the text identification field, the word segmentation field being a primary key of the second index data, and third index sub-data including the line key field and the text identification field, the text identification field being a primary key of the third index sub-data. For example, the first index sub data, the second index sub data, and the third index sub data may be three index auxiliary tables. The first index sub-data in this example may be used to find a text identifier according to a line key, while the second index sub-data may be used to find a word frequency (i.e., the number of occurrences of a word) of a word in a text corresponding to a text identifier according to a primary key composed of the text identifier and the word, and the third index sub-data may be used to find a line key according to the text identifier.

In this example, the index data may be generated as follows:

And constructing second index sub-data according to the line key field and the text field of the indexed data and the line key field and the text identification field of the first index sub-data. For example, an INDEX auxiliary table fts_index is first constructed, that is, a table having the text identification field, the word segmentation field and the word frequency field and using the text identification field and the word segmentation field as primary keys is then obtained, and when all active transactions started before all end, the following operations are executed for each row in the INDEX auxiliary table fts_rowkey_doc to complete data complement of the auxiliary table fts_index: the text content in the text field of the same line as the line key is found in the indexed data, the text content is segmented and the word frequency of each segmented word is counted, and the text identification of the line, the word frequency of each segmented word and the word frequency of each segmented word are used as a plurality of line records (namely, the text identification and one segmented word and the word frequency thereof form a line record) of an INDEX auxiliary table FTS_INDEX and are added into the INDEX auxiliary table FTS_INDEX.

And constructing third index sub-data according to the line key field and the text identification field of the first index sub-data. For example, an index auxiliary table fts_doc_rowkey is first constructed, that is, a table having the above-mentioned row key field and text identification field and using the text identification field as a main key, then a snapshot point of the index auxiliary table fts_rowkey_doc is obtained, and when all active transactions opened before the completion is waited, each row record in the index auxiliary table fts_rowkey_doc is added to the index auxiliary table fts_doc_rowkey as a row record of the index auxiliary table fts_doc_rowkey, so as to complete the data completion of the index auxiliary table fts_doc_rowkey.

It should be appreciated that the construction process of the second index sub-data and the third index sub-data may be parallel, that is, the second index sub-data and the third index sub-data may be constructed in parallel after the first index sub-data is constructed, thereby improving the construction speed of the index data.

In this example, the structural relationship and construction order of the first, second, and third index sub-data may be such that the DML operations are continuously online during the data completion process, i.e., incremental data may be synchronized to the first, second, and third index sub-data when the indexed data performs the DML operations. For example, during the period of data completion of the index auxiliary table fts_row_doc, the corresponding main table can execute DML operation and DQL operation, and the deletion operation of the main table can complete synchronous update of the index auxiliary table fts_row_doc only by writing a deletion row of a main key into the memtable of the index auxiliary table fts_row_doc; the insertion operation of the main table is that after the DOC_ID is obtained, data are inserted into the memtable of the index auxiliary table FTS_ROWKEY_DOC; the update operation may be regarded as a delete operation plus an insert operation. In particular, the primary KEY of the first index sub-data only comprises a ROW KEY field ROW KEY, and does not comprise a text identification field DOC_ID, so that concurrent deleting operation can be supported during the process of completing the data of the first index sub-data, because the deleting operation does not need to generate a new text identification DOC_ID involved, only a deleting ROW corresponding to the record ROW KEY needs to be written into a memtable, and the conflict between the DML operation and the completing data in the process of constructing the full-text index is avoided.

Further exemplary, the index data includes first index sub-data, second index sub-data, and fourth index sub-data, wherein the first index sub-data includes the line key field and the text identification field, the line key field is a primary key of the first index sub-data, the second index data includes the text identification field, the word segmentation field, and the word frequency field, the text identification field, the word segmentation field are primary keys of the second index data, and the fourth index sub-data includes the word segmentation field, the text identification field, and the word frequency field, the word segmentation field, and the text identification are primary keys of the fourth index data. For example, the first index sub data, the second index sub data, and the fourth index sub data may be three index auxiliary tables. The first index sub-data in this example may be used to find a text identifier according to a line key, while the second index sub-data may be used to find a word frequency of a word (i.e., the number of occurrences of the word) of the word in a text corresponding to a text identifier according to a primary key composed of the word identifier and the word, and the fourth index sub-data may be used to find a word frequency of the word (i.e., the number of occurrences of the word) of the word in a text corresponding to the text identifier according to a primary key composed of the word and the text.

In this example, the index data may be generated as follows:

Next, fourth index sub-data is constructed according to the line key field and the text field of the indexed data, and the line key field and the text identification field of the first index sub-data. For example, an index auxiliary table fts_doc_word is first constructed, that is, a table having the WORD segmentation field, the text identification field and the WORD frequency field, and using the WORD segmentation field and the text identification field as primary keys, then snapshot points of the indexed data and the index auxiliary table fts_rowkey_doc are obtained, and when all active transactions started before the time are finished, the following operations are executed for each row in the index auxiliary table fts_rowkey_doc to complete data completion of the auxiliary table fts_doc_word: the text content in the text field of the same line as the line key is found in the indexed data, the WORD segmentation is carried out on the text content, the WORD frequency of each WORD segmentation is counted, and a plurality of lines of records (namely, a WORD, the text identifier of the line and the WORD frequency of the WORD segmentation are formed into a line of records) of which the WORD frequency and the text identifier of each WORD segmentation of the line are taken as an index auxiliary table FTS_DOC_WORD are added into the index auxiliary table FTS_DOC.

It should be appreciated that the construction process of the second index sub-data and the fourth index sub-data may be parallel, that is, the second index sub-data and the fourth index sub-data may be constructed in parallel after the first index sub-data is constructed, thereby improving the construction speed of the index data.

In this example, the structural relationship and construction order of the first index sub-data, the second index sub-data, and the fourth index sub-data may be such that the DML operation is continuously on-line during the data completion process, i.e., incremental data may be synchronized to the first index sub-data, the second index sub-data, and the fourth index sub-data when the DML operation is performed by the index data. For example, during the period of data completion of the index auxiliary table fts_row_doc, the corresponding main table can execute DML operation and DQL operation, and the deletion operation of the main table can complete synchronous update of the index auxiliary table fts_row_doc only by writing a deletion row of a main key into the memtable of the index auxiliary table fts_row_doc; the insertion operation of the main table is that after the DOC_ID is obtained, data are inserted into the memtable of the index auxiliary table FTS_ROWKEY_DOC; the update operation may be regarded as a delete operation plus an insert operation. In particular, the primary KEY of the first index sub-data only comprises a ROW KEY field ROW KEY, and does not comprise a text identification field DOC_ID, so that concurrent deleting operation can be supported during the process of completing the data of the first index sub-data, because the deleting operation does not need to generate a new text identification DOC_ID involved, only a deleting ROW corresponding to the record ROW KEY needs to be written into a memtable, and the conflict between the DML operation and the completing data in the process of constructing the full-text index is avoided.

Further exemplary, the index data includes first index sub-data, second index sub-data, third index sub-data, and fourth index sub-data, where the first index sub-data includes the line key field and the text identification field, the line key field is a primary key of the first index sub-data, the second index data includes the text identification field, the word segmentation field, and the word frequency field, the text identification field is a primary key of the second index data, the third index sub-data includes the line key field and the text identification field, the text identification field is a primary key of the third index sub-data, and the fourth index sub-data includes the word segmentation field, the text identification field, and the word frequency field, and the word segmentation field, the text identification is a primary key of the fourth index data. For example, the first index sub data, the second index sub data, the third index sub data, and the fourth index sub data may be four index auxiliary tables. The first index sub-data in this example may be used to find a text identifier according to a line key, while the second index sub-data may be used to find a word frequency of a word (i.e., the number of occurrences of the word) of the word in a text corresponding to the text identifier according to a primary key composed of the text identifier and the word, while the third index sub-data may be used to find a line key according to the text identifier, and the fourth index sub-data may be used to find a word frequency of the word (i.e., the number of occurrences of the word) of the word in a text corresponding to the text identifier according to a primary key composed of the word and the text.

In this example, the index data may be generated as follows:

It should be appreciated that the construction processes of the second index sub data, the third index sub data, and the fourth index sub data may be parallel, that is, the second index sub data, the third index sub data, and the fourth index sub data may be constructed in parallel after the first index sub data is constructed, thereby improving the construction speed of the index data.

In this example, the structural relationship and construction order of the first, second, third, and fourth index sub-data may be such that the DML operations are continuous in the data completion process, i.e., incremental data may be synchronized to the first, second, and third index sub-data when the indexed data performs the DML operations. For example, during the period of data completion of the index auxiliary table fts_row_doc, the corresponding main table can execute DML operation and DQL operation, and the deletion operation of the main table can complete synchronous update of the index auxiliary table fts_row_doc only by writing a deletion row of a main key into the memtable of the index auxiliary table fts_row_doc; the insertion operation of the main table is that after the DOC_ID is obtained, data are inserted into the memtable of the index auxiliary table FTS_ROWKEY_DOC; the update operation may be regarded as a delete operation plus an insert operation. In particular, the primary KEY of the first index sub-data only comprises a ROW KEY field ROW KEY, and does not comprise a text identification field DOC_ID, so that concurrent deleting operation can be supported during the process of completing the data of the first index sub-data, because the deleting operation does not need to generate a new text identification DOC_ID involved, only a deleting ROW corresponding to the record ROW KEY needs to be written into a memtable, and the conflict between the DML operation and the completing data in the process of constructing the full-text index is avoided.

The construction process of the index data shown in the above examples does not affect the DQL operation of the indexed data. That is, the method may also perform an DQL operation on the indexed data in response to receiving the DQL operation on the indexed data during the building of the indexed data.

The construction process of the index data shown in each example does not affect the DML operation of the indexed data, and the incremental data generated by the DML operation can be synchronized into the indexed data. That is, the method may further perform a DML operation with respect to the indexed data in response to receiving the DML operation with respect to the indexed data during the constructing of the index data, and synchronize delta data generated by the DML operation to the index data.

And under the condition that the DML operation is an inserting operation, generating a row key, a text identifier, a word segmentation and a word frequency of the inserting data in the index data according to the row key and the text of the inserting data in the indexed data, and writing the inserting operation of the row key, the text identifier, the word segmentation and the word frequency carrying the inserting data in the index data into the index data, wherein the inserting data in the indexed data is the data inserted by the inserting operation, and the inserting data in the index data is the data to be inserted by the inserting operation.

For example, in the process of constructing the first index sub-data, a line key of the inserted data in the indexed data is used as a line key of the inserted data in the first index sub-data, a text identifier of the inserted data in the first index sub-data is generated for a text of the inserted data in the indexed data, and the line key and the text identifier of the inserted data are used as a line record in the first index sub-data to be written into the first index sub-data.

For example, in the process of constructing the second index sub-data, a row key of the inserted data in the indexed data is used as a row key of the inserted data in the first index sub-data, a text identifier of the inserted data in the first index sub-data is generated for a text of the inserted data in the indexed data, and the row key and the text identifier of the inserted data are used as a row record in the first index sub-data to be written into the first index sub-data; then, word segmentation is carried out on the text content of the inserted data in the indexed data, word frequency of each word segmentation is counted, and a text identifier of the inserted data in the first index sub-data, each word segmentation and word frequency thereof form a plurality of lines of records (namely, the text identifier, one word segmentation and word frequency thereof form a line of records) are inserted into the second index sub-data.

For example, in the process of constructing the third index sub-data, a row key of the inserted data in the indexed data is used as a row key of the inserted data in the first index sub-data, a text identifier of the inserted data in the first index sub-data is generated for a text of the inserted data in the indexed data, and the row key and the text identifier of the inserted data are used as a row record in the first index sub-data to be written into the first index sub-data; and writing the inserted data in the first index sub-data into the third index sub-data as a row of records.

For example, in the process of constructing the fourth index sub-data, a row key of the inserted data in the indexed data is used as a row key of the inserted data in the first index sub-data, a text identifier of the inserted data in the first index sub-data is generated for a text of the inserted data in the indexed data, and the row key and the text identifier of the inserted data are used as a row record in the first index sub-data to be written into the first index sub-data; and then, word segmentation is carried out on the text content of the inserted data in the indexed data, word frequency of each word segmentation is counted, and each word segmentation of the inserted data in the first index sub-data, the word frequency and the text mark form a plurality of lines of records (namely, one word segmentation, the word frequency and the text mark form a line of records) are inserted into the fourth index sub-data.

For example, in the process of constructing the first index sub-data, a row key of deleted data in the indexed data may be used as a main key of deleted data in the first index sub-data, and a row of deleted rows carrying the main key may be written into the first index sub-data.

For example, in the process of constructing the second index sub-data, a row key of deleted data in the indexed data is used as a main key of deleted data in the first index sub-data, and a row of deleted rows carrying the main key is written into the first index sub-data; determining a text identifier of the deleted data in the first index sub-data, performing word segmentation on the text of the deleted data in the indexed data, counting word frequency of each word segmentation, forming a plurality of primary keys by the text identifier and each word segmentation, and writing a deletion row carrying the primary keys into the second index sub-data.

For example, in the process of constructing the third index sub-data, a row key of deleted data in the indexed data is used as a main key of deleted data in the first index sub-data, and a row of deleted rows carrying the main key is written into the first index sub-data; and determining a text identifier of the deleted data in the first index sub-data, using the text identifier as a main key of the deleted data in the second index sub-data, and writing a deleted row carrying the main key into the third index sub-data.

For example, in the process of constructing the fourth index sub-data, a row key of deleted data in the indexed data may be used as a main key of deleted data in the first index sub-data, and a row of deleted rows carrying the main key may be written into the first index sub-data; determining a text identifier of the deleted data in the first index sub-data, performing word segmentation on the text of the deleted data in the indexed data, counting word frequency of each word segmentation, forming a plurality of main keys by each word segmentation and the text identifier, and writing a deleted row carrying the main keys into the fourth index sub-data.

Fig. 2 is a schematic block diagram of an apparatus according to an exemplary embodiment. Referring to fig. 2, at the hardware level, the device includes a processor 202, an internal bus 204, a network interface 206, a memory 208, and a nonvolatile storage 210, although other hardware required by other services is possible. One or more embodiments of the present description may be implemented in a software-based manner, such as by the processor 202 reading a corresponding computer program from the non-volatile storage 210 into the memory 208 and then running. Of course, in addition to software implementation, one or more embodiments of the present disclosure do not exclude other implementation manners, such as a logic device or a combination of software and hardware, etc., that is, the execution subject of the following processing flow is not limited to each logic unit, but may also be hardware or a logic device.

Referring to fig. 3, the data processing apparatus may be applied to the device shown in fig. 2 to implement the technical solution of the present specification. The device comprises:

the index construction module 301 is configured to construct index data according to a line key field and a text field of indexed data, where the index data includes a line key field, a text identification field, a word segmentation field, and a word frequency field, the line key field of the index data is the same as the line key field of the indexed data, the line key field of the index data is a primary key of the index data, the line key field of the indexed data is a primary key of the indexed data, and the text identification field, the word segmentation field, and the word frequency field of the index data correspond to the text field of the indexed data.

One or more embodiments of the present specification also propose a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method provided by any of the embodiments of the first aspect.

One or more embodiments of the present specification also provide a computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method according to the first aspect.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. A typical implementation device is a computer, which may be in the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or a combination of any of these devices.

In a typical configuration, a computer includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, read only compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage, quantum memory, graphene-based storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by the computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multi-service processing and parallel processing are also possible or may be advantageous.

The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

User information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to herein are both user-authorized or fully authorized information and data by parties, and the collection, use and processing of relevant data requires compliance with relevant laws and regulations and standards of the relevant country and region, and is provided with corresponding operation portals for user selection of authorization or denial.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and provide corresponding operation entries for the user to select authorization or rejection.

It should be understood that although the terms first, second, third, etc. may be used in one or more embodiments of the present description to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

The foregoing description of the preferred embodiment(s) is (are) merely intended to illustrate the embodiment(s) of the present invention, and it is not intended to limit the embodiment(s) of the present invention to the particular embodiment(s) described.

Claims

1. A method of data processing, the method comprising:

2. The data processing method of claim 1, the index data comprising first index sub-data and second index sub-data, wherein the first index sub-data comprises the line key field and the text identification field, the line key field is a primary key of the first index sub-data, the second index data comprises the text identification field, the word segmentation field, and the word frequency field, and the text identification field, the word segmentation field are primary keys of the second index data.

3. The data processing method according to claim 2, wherein the constructing the index data from the line key field and the text field of the indexed data includes:

4. The data processing method of claim 3, the index data further comprising third index sub-data, the third index sub-data comprising the line key field and the text identification field, the text identification field being a primary key of the third index sub-data.

5. The data processing method according to claim 4, wherein the constructing the index data from the line key field and the text field of the indexed data further comprises:

6. The data processing method of claim 3, the index data further comprising fourth index sub-data, the fourth index sub-data comprising the word segmentation field, the text identification field, and the word frequency field, the word segmentation field, the text identification being a primary key of the fourth index data.

7. The data processing method according to claim 6, wherein the constructing the index data from the line key field and the text field of the indexed data further comprises:

8. The data processing method according to any one of claims 1 to 7, the method further comprising:

9. The data processing method according to any one of claims 1 to 7, the method further comprising:

10. The data processing method of claim 9, the synchronizing delta data generated by the DML operation to the index data, comprising:

11. A data processing apparatus, the apparatus comprising:

12. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method of any of claims 1 to 10.

13. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to implement the method of any of claims 1-10 by executing the executable instructions.

14. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method of any of claims 1-10.