CN108897874B

CN108897874B - Method and apparatus for processing data

Info

Publication number: CN108897874B
Application number: CN201810715769.XA
Authority: CN
Inventors: 陈星�
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Douyin Vision Co Ltd; Douyin Vision Beijing Co Ltd
Priority date: 2018-07-03
Filing date: 2018-07-03
Publication date: 2020-10-30
Anticipated expiration: 2038-07-03
Also published as: CN108897874A

Abstract

The embodiment of the application discloses a method and a device for processing data. One embodiment of the method comprises: extracting a key-value pair set of a target data table to be added to a target database, and summarizing keywords of key-value pairs in the key-value pair set into a first keyword set, wherein the key-value pairs comprise keywords and values of the keywords, and each column of the target data table corresponds to one keyword; summarizing the keywords corresponding to each column of the target data table into a second keyword set; adding values in the set of key-value pairs to the target data table based on a match of the first set of keys and the second set of keys. This embodiment increases the flexibility of storage of the set of key-value pairs.

Description

Method and apparatus for processing data

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method and a device for processing data.

Background

Map is an associative container of STL (Standard Template Library) that provides one-to-one data processing capability. In general, maps may be viewed as a collection for storing key-value pairs.

Currently, the Map data is stored in a manner that keys of key-value pairs in the Map are respectively stored as key sequences and values of the key-value pairs in the Map are stored as value sequences in a storage layer. When the value of a certain key word needs to be accessed, the two sequences need to be deserialized, and a Map data structure is constructed in a memory, so that the value of the certain key word can be read.

Disclosure of Invention

The embodiment of the application provides a method and a device for processing data.

In a first aspect, an embodiment of the present application provides a method for processing data, where the method includes: extracting a key-value pair set of a target data table to be added to a target database, and summarizing keywords of key-value pairs in the key-value pair set into a first keyword set, wherein the key-value pairs comprise keywords and values of the keywords, and each column of the target data table corresponds to one keyword; summarizing the keywords corresponding to each column of the target data table into a second keyword set; based on the matching of the first set of keys and the second set of keys, the values of the keys in the set of key-value pairs are added to the target data table.

In some embodiments, the method further comprises: in response to receiving an access request for a value of a target key in a target data table, determining whether a target column corresponding to the target key exists in the target data table; and in response to determining that the target column exists in the target data table, extracting values of all rows in the target column from the target data table, and returning the extracted values of all rows.

In some embodiments, the method further comprises: a null value is returned in response to determining that the target column is not present in the target data table.

In some embodiments, the method further comprises: in response to receiving an access request for the target data table, extracting values in the target data table by column; summarizing the values of the same row in the extracted values of each column in a key value pair mode to generate at least one key value pair set, and returning at least one key value pair set, wherein for each extracted value, the key word corresponding to the column of the value is the key word of the key value pair, and the value is the value of the key word.

In some embodiments, aggregating values of the same row of the extracted values of the columns in the form of key-value pairs, generating at least one set of key-value pairs, comprises: and summarizing the values of the same row in the extracted values of each column in a key value pair mode, deleting the key value pairs with null values, and generating at least one key value pair set.

In some embodiments, adding values in the set of key-value pairs to the target data table based on a match of the first set of keys and the second set of keys comprises: taking keywords which simultaneously belong to the first keyword set and the second keyword set as coincident keywords, and summarizing the keywords into a coincident keyword set; summarizing the keywords which belong to the first keyword set and do not belong to the second keyword set into a third keyword set; summarizing the keywords which belong to the second keyword set and do not belong to the first keyword set into a fourth keyword set; inserting columns corresponding to the keywords in the third keyword set into the target data table, wherein each row of the inserted columns is a null value; for the keywords in the first keyword set, adding the values corresponding to the keywords in the key-value pair set to the columns of the target data table corresponding to the keywords; and writing a null value in the column corresponding to the key in the fourth key set in the target data table.

In a second aspect, an embodiment of the present application provides an apparatus for processing data, the apparatus including: the system comprises a first summarizing unit, a second summarizing unit and a third summarizing unit, wherein the first summarizing unit is configured to extract a key-value pair set of a target data table to be added to a target database, and summarize keywords of key-value pairs in the key-value pair set into a first keyword set, the key-value pairs comprise keywords and values of the keywords, and each column of the target data table corresponds to one keyword; the second summarizing unit is configured to summarize the keywords corresponding to the columns of the target data table into a second keyword set; an adding unit configured to add values of the keywords in the set of key value pairs to the target data table based on a match of the first set of keywords and the second set of keywords.

In some embodiments, the apparatus further comprises: a first receiving unit configured to determine whether a target column corresponding to a target key exists in a target data table in response to receiving an access request for a value of the target key in the target data table; and the first extraction unit is configured to extract the values of all rows in the target column from the target data table and return the extracted values of all rows in response to the fact that the target column exists in the target data table.

In some embodiments, the apparatus further comprises: a return unit configured to return a null value in response to determining that the target column does not exist in the target data table.

In some embodiments, the apparatus further comprises: a second receiving unit configured to extract values in the target data table by column in response to receiving an access request to the target data table; and the second extraction unit is configured to collect values of the same row in the extracted values of the columns in a key-value pair mode, generate at least one key-value pair set and return the at least one key-value pair set, wherein for each extracted value, a key corresponding to the column of the value is a key of the key-value pair, and the value is the value of the key.

In some embodiments, the second extraction unit is further configured to: and summarizing the values of the same row in the extracted values of each column in a key value pair mode, deleting the key value pairs with null values, and generating at least one key value pair set.

In some embodiments, the adding unit comprises: a first summarizing module configured to summarize keywords belonging to both the first keyword set and the second keyword set as coincident keywords into a coincident keyword set; a second summarizing module configured to summarize keywords belonging to the first keyword set and not belonging to the second keyword set into a third keyword set; a third summarizing module configured to summarize keywords belonging to the second keyword set and not belonging to the first keyword set into a fourth keyword set; an inserting module configured to insert columns corresponding to the keywords in the third keyword set in the target data table, wherein each row of the inserted columns is a null value; the adding module is configured to add a value corresponding to a key in the key-value pair set to a column of the target data table corresponding to the key for the key in the first key set; and the writing module is configured to write a null value in a column corresponding to the key in the fourth key set in the target data table.

In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a storage device having one or more programs stored thereon which, when executed by one or more processors, cause the one or more processors to implement a method as in any embodiment of a method for processing data.

In a fourth aspect, embodiments of the present application provide a computer-readable medium on which a computer program is stored, which program, when executed by a processor, implements a method as in any one of the embodiments of the method for processing data.

According to the method and the device for processing data, keywords of key values in a key value pair set are collected into a first keyword set by extracting the key value pair set of a target data table to be added to a target database, and then keywords corresponding to each column of the target data table are collected into a second keyword set; and finally, based on the matching of the first keyword set and the second keyword set, adding the values of the keywords in the key-value pair set to the target data table, thereby realizing the storage of the contents in the key-value pair set by using the target database, improving the flexibility of the storage of the key-value pair set, and providing convenience for the subsequent access to the key-value pair set and the key-value pairs in the key-value pair set.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for processing data according to the present application;

FIG. 3 is a schematic diagram of an application scenario of a method for processing data according to the present application;

FIG. 4 is a flow diagram of yet another embodiment of a method for processing data according to the present application;

FIG. 5 is a schematic block diagram illustrating one embodiment of an apparatus for processing data according to the present application;

FIG. 6 is a schematic block diagram of a computer system suitable for use in implementing an electronic device according to embodiments of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 shows an exemplary system architecture 100 to which the method for processing data or the apparatus for processing data of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may use the

terminal devices

101, 102, 103 to interact with the server 105 over the network 104 to receive or send messages (e.g., access requests for data), etc. Various communication client applications, such as data processing applications, instant messaging tools, mailbox clients, social platform software, and the like, may be installed on the

terminal devices

101, 102, and 103.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices having a display screen and communicating over a network, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server that provides various services, such as a data storage server that stores data transmitted by the

terminal apparatuses

101, 102, 103. Various types of databases (e.g., relational databases, non-relational databases, etc.) may be installed in the data storage server. The data processing server may extract data (e.g., a set of key-value pairs) of a target data table to be added to a target database (e.g., a columnar database in an installed database). The data processing server may also process the data to be stored and the target data table, and add the value in the key-value pair set to the target data table.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be noted that the method for processing data provided in the embodiment of the present application is generally performed by the server 105, and accordingly, the apparatus for processing data is generally disposed in the server 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for processing data in accordance with the present application is shown. The method for processing data comprises the following steps:

step 201, extracting a key-value pair set of a target data table to be added to a target database, and summarizing keywords of key-value pairs in the key-value pair set into a first keyword set.

In this embodiment, the executing agent of the method for processing data (e.g., server 105 shown in fig. 1) may first extract the set of key-value pairs to be added to the target data table of the target database. Here, the above-mentioned set of key-value pairs to be stored may contain one or more key-value pairs. The key-value pair may include a key (key) and a value of the key (value), among other things. After extracting the set of key-value pairs, the execution agent may group keys of key-value pairs in the set of key-value pairs into a first set of keys.

It should be noted that various types of databases may be installed in the execution body. Such as a relational database, a non-relational database, and the like. The relational database may include various types of databases such as a line database, a column database, and the like. In practice, line databases (e.g., Oracle, DB2, MySQL, etc.) store data by line. Columnar databases (e.g., Hbase, Sybase IQ, infobright, or infinibb, etc.) store data by column. Here, the target database may be a column database to which the execution main body is installed. If a plurality of column databases are installed in the execution body, the target database may be a designated column database among the plurality of column number databases installed.

Note that a plurality of data tables may be stored in the target database. The target data table may be a specific data table of the plurality of data tables in which the key-value pair set is to be stored. The target data table may be a specific section of a specific data table among the plurality of data tables. In this case, the section may be referred to as a target data table.

Here, each column of the target data table may correspond to a key. The keywords corresponding to the columns of the target data table are different. For each column, the key corresponding to the column and the value of each row in the column may form a key-value pair. As an example, the first column in the target data table corresponds to a key of 'a'. The first column has two rows of values, wherein the first row has a value of 1 and the second row has a value of 2. At this time, two key-value pairs, respectively, the key-value pair '"a' corresponding to the value of the first row and the first column may be configured: 1 "(the content in the double quotation marks is a key-value pair) and a key-value pair '" a' corresponding to the value of the first column of the second row: 2 "(key-value pair content within double quote). Thus, the value of each row in each column of the target data table may be represented by a key value pair. In practice, each row in a relational database may be referred to as a Tuple (Tuple). Each row of the target data table may be viewed as a set of key-value pairs.

In one scenario, the set of key-value pairs may be pre-stored locally to the execution agent. At this time, the execution body may directly extract the key-value pair set from local.

In another scenario, the execution entity may receive a data storage request sent by a terminal device (e.g.,

terminal devices

101, 102, 103 shown in fig. 1) through a wired connection or a wireless connection. The data storage request may include the set of key-value pairs. At this time, the execution body may extract the set of key-value pairs from the data storage request. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.

Step 202, summarizing the keywords corresponding to each column of the target data table into a second keyword set.

In this embodiment, the execution subject may collect keywords corresponding to each column in the target data table to obtain a second keyword set.

Step 203, based on the matching of the first keyword set and the second keyword set, adding the values of the keywords in the key-value pair set to the target data table.

In this embodiment, the execution subject may first match the first keyword set with the second keyword set. Here, if the first keyword set is equal to the second keyword set, the first keyword set may be considered to match the second keyword set. If the first keyword set is not equal to the second keyword set, the first keyword set may be considered as not matching the second keyword set.

In response to determining that the first set of keys matches the second set of keys, the execution entity may determine a column of the target data table to which keys of respective key-value pairs of the set of key-value pairs correspond, adding values of the respective keys to the respective columns, as each column of the target data table may correspond to a key. Thus, storage of key-value pairs in the set of key-value pairs described above is achieved.

In response to determining that the first set of keys does not match the second set of keys, for each key-value pair in the set of key-value pairs, the executing agent may determine whether the key of that key-value pair belongs to the second set of keys. In response to determining not to belong to the second set of keys, a column may be inserted in the target data table. And writes a NULL value (NULL) in each row in the inserted column. Where the inserted column corresponds to the key of the key-value pair. After the above operation is performed on each key-value pair in the set of key-value pairs (i.e., whether the key belongs to the second set of keys is determined, and if not, a column is inserted into the target data table), a set of keys corresponding to each column of the target data table is formed, including the first set of keys. In this case, the execution agent may determine a column of the target data table corresponding to the key of each key-value pair in the key-value pair set, and add the value of each key to the corresponding column. Thus, storage of key-value pairs in the set of key-value pairs described above is achieved.

In some optional implementations of this embodiment, in response to determining that the first set of keys matches the second set of keys, the execution principal may extract each key-value pair from the set of key-value pairs one by one. Then, the column in the target data table corresponding to the key of the extracted key-value pair may be determined. Finally, the value of the key-value pair may be added to the column. Until the value of each key in the set of key-value pairs is added to the target data table. Thus, storage of key-value pairs in the set of key-value pairs described above is achieved.

In some optional implementations of this embodiment, the execution principal may extract key-value pairs from the set of key-value pairs in parallel in response to determining that the first set of keys matches the second set of keys. Then, the column in the target data table corresponding to the key of the extracted key-value pair is determined. Finally, the value of the key-value pair may be added to the column. Until the value of each key in the set of key-value pairs is added to the target data table. Thus, storage of key-value pairs in the set of key-value pairs described above is achieved.

In some optional implementations of this embodiment, in response to determining that the first set of keys does not match the second set of keys, the execution principal may add values in the set of key-value pairs in other manners. For example, for each key-value pair in the set of key-value pairs, the executing agent may determine whether the key of that key-value pair belongs to the second set of keys. In response to determining to belong to the second set of keys, a column in the target data table corresponding to a key of the key-value pair may be determined. The value of the key-value pair is then added to the column. After writing the values corresponding to the keys belonging to the second key set in the key-value pair set into the target data table, the execution body may insert columns into the target data table, where each inserted column corresponds to a key not belonging to the second key set in the key-value pair set, and write the value of the key corresponding to the column in the last row of each inserted column. Thus, storage of key-value pairs in the set of key-value pairs described above is achieved.

In some optional implementations of this embodiment, the executing entity may further determine whether the first keyword set matches the second keyword set by using the following method: if the first keyword set is a subset of the second keyword set, the first keyword set may be considered to match the second keyword set. If the first keyword set is not a subset of the second keyword set, the first keyword set and the second keyword set may be considered to be unmatched.

In some optional implementations of the embodiment, after adding the value of the key-value pair in the set of key-value pairs to the target data table, the execution body may further write a NULL value (NULL) to a position in the target data table where no value is added.

In some optional implementations of this embodiment, the method for processing data may further include the steps of: in response to receiving an access request for a value of a target key in a target data table, it is determined whether a target column corresponding to the target key exists in the target data table. And in response to determining that the target column exists in the target data table, extracting values of all rows in the target column from the target data table, and returning the extracted values of all rows.

In the foregoing optional implementation manner, the method may further include: a NULL value (NULL) is returned in response to determining that the target column does not exist in the target data table.

In some optional implementations of this embodiment, the method for processing data may further include the steps of: in response to receiving an access request to the target data table, values in the target data table are extracted by column. And summarizing the values of the same row in the extracted values of each column in a key value pair mode to generate at least one key value pair set, and returning the at least one key value pair set.

In the above optional implementation manner, the summarizing the values of the same row in the extracted values of each column in the form of key value pairs to generate at least one key value pair set may be performed according to the following steps: first, values of the same row among the extracted values of the respective columns are summarized in the form of key value pairs. Then, key-value pairs whose values are NULL values (NULL) are deleted. Finally, at least one set of key-value pairs is generated.

In the method provided by the above embodiment of the present application, first, a key-value pair set of a target data table to be added to a target database is extracted, and keywords of key-value pairs in the key-value pair set are summarized into a first keyword set. And then summarizing the keywords corresponding to each column of the target data table into a second keyword set. Finally, based on the matching of the first keyword set and the second keyword set, the values of the keywords in the key-value pair set are added to the target data table. Therefore, the storage of the content in the key-value pair set by using the target database is realized, and the flexibility of the storage of the key-value pair set is improved. Meanwhile, in this way, when the key value pair set and the key value pairs in the key value pair set need to be accessed, the data can be acquired without constructing a key word sequence and a value sequence and without performing deserialization operation on the key word sequence and the value sequence. Therefore, compared with the prior art, the method provided by the embodiment can provide convenience for subsequent access to the key-value pair set and the key-value pairs in the key-value pair set, so that the subsequent access efficiency to the data is improved.

With further reference to FIG. 3, a flow 300 of yet another embodiment of a method for processing data is shown. The flow 300 of the method for processing data includes the steps of:

step 301, extracting a key-value pair set of a target data table to be added to a target database, and summarizing keywords of key-value pairs in the key-value pair set into a first keyword set.

In this embodiment, the executing agent of the method for processing data (e.g., server 105 shown in fig. 1) may first extract the set of key-value pairs to be added to the target data table of the target database. Here, the above-mentioned set of key-value pairs to be stored may contain one or more key-value pairs. The key-value pair may include a key and a value of the key, among other things. Each column of the target data table corresponds to a key.

Step 302, the keywords belonging to the first keyword set and the second keyword set at the same time are taken as the coincident keywords and are collected into a coincident keyword set.

In this embodiment, the execution subject may group keywords belonging to both the first keyword set and the second keyword set as overlapped keywords into an overlapped keyword set. In practice, the keywords belonging to both the first keyword set and the second keyword set are keywords in a union of the first keyword set and the second keyword set.

Step 303, summarizing the keywords belonging to the first keyword set and not belonging to the second keyword set into a third keyword set.

In this embodiment, the execution subject may group keywords that belong to the first keyword set and do not belong to the second keyword set into a third keyword set.

And step 304, summarizing the keywords which belong to the second keyword set and do not belong to the first keyword set into a fourth keyword set.

In this embodiment, the execution subject may group keywords that belong to the second keyword set and do not belong to the first keyword set into a fourth keyword set.

Step 305, insert the column corresponding to the key in the third set of keys in the target data table.

In this embodiment, the execution body may insert a column corresponding to a keyword in the third keyword set in the target data table. The inserted columns correspond to the keywords in the third keyword set one by one. Here, each row of the inserted column is NULL (NULL).

Step 306, for the keyword in the first keyword set, adding the value corresponding to the keyword in the above-mentioned key-value pair set to the column of the target data table corresponding to the keyword.

In this embodiment, for a keyword in the first keyword set, the execution main body may add a value corresponding to the keyword in the set of key value pairs to a column of the target data table (i.e., the target data table after being inserted into the column) corresponding to the keyword.

Step 307, write null values in the columns in the target data table corresponding to the keys in the fourth key set.

In this embodiment, a NULL value (NULL) is written in the column corresponding to the key in the fourth key set in the target data table (i.e., the target data table formed after step 306 is executed).

In some optional implementations of this embodiment, the method for processing data may further include step 308 and step 309. The method comprises the following specific steps:

step 308, in response to receiving an access request for a value of a target key in the target data table, determining whether a target column corresponding to the target key exists in the target data table.

Here, the access request may include information such as the name or identification of the target data table, the target keyword, and the like. Since each column in the target data table corresponds to a keyword, the execution subject may match the target keyword with the keyword corresponding to each column. If the keywords corresponding to each column include the keyword matched with the target keyword, it may be determined whether a target column corresponding to the target keyword exists in the target data table. If the keywords corresponding to each column do not match the target keywords, it may be determined whether a target column corresponding to the target keywords exists in the target data table. Here, if two keywords are identical, it can be determined that the two keywords match. If the two keywords are different, it may be determined that the two keywords do not match.

In step 309, in response to determining that the target column exists in the target data table, extracting values of each row in the target column from the target data table, and returning the extracted values of each row.

Here, the execution body may read values of respective rows in the target column, and may return the read values in the form of an array.

Optionally, the execution body may further display the read value.

Optionally, after the step 308 is executed, the executing body may further execute the following operations: a NULL value (NULL) is returned in response to determining that the target column does not exist in the target data table.

In some optional implementations of this embodiment, the method for processing data may further include step 310 and step 311. The method comprises the following specific steps:

in response to receiving an access request to the target data table, the values in the target data table are extracted by column, step 310.

Here, the name or identification of the target data table may be included in the access request. The execution agent may extract the values in the target data table in sequence as a column after receiving the access request. Namely, extracting values of a first column in a target data table; then, extracting all values of a second column in the target data table; and so on.

And 311, summarizing the values of the same row in the extracted values of each column in a key value pair form to generate at least one key value pair set, and returning the at least one key value pair set.

Here, since each value in the target data table may be represented by a key value pair, the execution body may summarize the key values corresponding to the values in the same row. Each row corresponds to a set of key value pairs. For each extracted value, the key corresponding to the column of the value is the key of the key-value pair, and the value is the value of the key.

Optionally, in step 311, the values of the same row in the extracted values of each column are summarized in the form of key value pairs to generate at least one key value pair set, which may be performed according to the following steps: first, values of the same row among the extracted values of the respective columns are summarized in the form of key value pairs. Then, key-value pairs whose values are NULL values (NULL) are deleted. Finally, at least one set of key-value pairs is generated.

Optionally, the executing body may further display the generated at least one key value pair set.

With continued reference to fig. 4, fig. 4 is a schematic diagram of an application scenario of the method for processing data according to the present embodiment.

In the application scenario of fig. 4, the data storage server first extracts a set of key-value pairs (as shown by reference numeral 402) to be added to a target data table (as shown by reference numeral 401) of a target database, and summarizes the keywords of the key-value pairs in the set of key-value pairs into a first keyword set (as shown by reference numeral 403). For example, the target data table has three columns. The first, second, and third columns correspond to the keyword 'a', the keyword 'b', and the keyword 'c', respectively. The values of the first column in the target data table are 1 and 1 in sequence; the second column has values of 2 and NULL (NULL) in order; the values of the third column are NULL (NULL) and 3 in that order.

The key-value pair set includes a key-value pair '″ a': 5 "(the content within the double quote is the key-value pair) and the key-value pair"'d': 4 "(key-value pair content within double quote). The first set of keys assembled is { 'a','d'.

Next, the data storage server aggregates the keywords corresponding to each column of the target data table into a second keyword set (as indicated by reference numeral 404). Continuing with the above example, the first set of keys of the assembled assembly is { 'a', 'b', 'c'.

Next, the data storage server groups together the keywords belonging to both the first keyword set and the second keyword set as a keyword group (as indicated by reference numeral 405). Continuing with the above example, the set of merged coincident keys is { 'a' }.

Then, the data storage server summarizes the keywords belonging to the first keyword set and not belonging to the second keyword set as a third keyword set (as shown by reference numeral 406). Continuing with the above example, the third set of keywords assembled is {'d' }.

Next, the data storage server summarizes the keywords that belong to the second keyword set and do not belong to the first keyword set as a fourth keyword set (as shown by reference numeral 407). Continuing with the above example, the fourth set of keywords that are assembled is { 'b', 'c'.

Next, the data storage server inserts a column corresponding to the keyword'd' in the third keyword set in the data storage server target data table, wherein each row of the inserted column is a null value. The target data table after the column insertion is shown as reference numeral 408.

Then, the data storage server adds the value corresponding to the key in the key-value pair set to the column of the target data table corresponding to the key for the key in the first key set. The added target data table is shown as reference numeral 409.

Next, the data storage server writes a null value in the column corresponding to the keyword in the fourth keyword set in the target data table, and obtains an updated target data table, as shown by reference numeral 410.

The method of the embodiment highlights the step of adding data to the target data table through the constructed coincident keyword set, the third keyword set and the fourth keyword set. Different operations are performed in the target data table for different sets of keys. Therefore, the scheme described in this embodiment can flexibly add the values of the keywords in the key value pair set to the target database, and can also avoid repeatedly performing operations such as keyword matching, thereby improving the data storage efficiency.

In addition, in an optional implementation manner of this embodiment, the step of accessing the set of key-value pairs and the key-value pairs in the set of key-value pairs is highlighted. When the key value pair set and the key value pairs in the key value pair set are accessed, data can be obtained without constructing a key word sequence and a value sequence and without performing deserialization operation on the key word sequence and the value sequence. Therefore, IO (input/output) expenditure in the data access process is reduced, and data access efficiency is improved.

With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of an apparatus for processing data, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 5, the apparatus 500 for processing data according to the present embodiment includes: a first summarizing unit 501, configured to extract a set of key-value pairs of a target data table to be added to a target database, summarize keywords of the key-value pairs in the set of key-value pairs into a first keyword set, where the key-value pairs include keywords and values of the keywords, and each column of the target data table corresponds to one keyword; a second summarizing unit 502 configured to summarize the keywords corresponding to each column of the target data table into a second keyword set; an adding unit 503 configured to add the value of the key in the key-value pair set to the target data table based on the matching of the first key set and the second key set.

In some optional implementations of this embodiment, the apparatus may further include a first receiving unit and a first extracting unit (not shown in the figure). The first receiving unit may be configured to determine whether a target column corresponding to a target key exists in the target data table in response to receiving an access request for a value of the target key in the target data table. The first extracting unit may be configured to extract values of respective rows in the target column from the target column and return the extracted values of the respective rows in response to determining that the target column exists in the target data table.

In some optional implementations of this embodiment, the apparatus may further include a return unit (not shown in the figure). Wherein the returning unit may be configured to return a null value in response to determining that the target column does not exist in the target data table.

In some optional implementations of this embodiment, the apparatus may further include a second receiving unit and a second extracting unit (not shown in the figure). Wherein the second receiving unit may be configured to extract values in the target data table by column in response to receiving an access request to the target data table. The second extraction unit may be configured to aggregate values of the same row in the extracted values of the columns in a form of key value pairs, generate at least one key value pair set, and return the at least one key value pair set, where, for each extracted value, a key corresponding to a column in which the value is located is a key of the key value pair, and the value is a value of the key.

In some optional implementations of the embodiment, the second extraction unit may be further configured to group values of the same row in the extracted values of the columns in the form of key-value pairs, delete the key-value pairs whose values are null values, and generate at least one key-value-pair set.

In some optional implementations of this embodiment, the adding unit 503 may include a first summing module, a second summing module, a third summing module, an inserting module, an adding module, and a writing module (not shown in the figure). The first summarizing module may be configured to summarize keywords belonging to both the first keyword set and the second keyword set as coincident keywords into a coincident keyword set. The second summarization module may be configured to summarize keywords belonging to the first keyword set and not belonging to the second keyword set into a third keyword set. The third summarizing module may be configured to summarize keywords belonging to the second keyword set and not belonging to the first keyword set into a fourth keyword set. The inserting module may be configured to insert a column corresponding to a key in the third set of keys in the target data table, wherein each row of the inserted column is a null value. The adding module may be configured to, for a key in the first set of keys, add a value corresponding to the key in the set of key-value pairs to a column of the target data table corresponding to the key. The writing module may be configured to write a null value in a column of the target data table corresponding to a key in the fourth key set.

In the apparatus provided in the foregoing embodiment of the present application, first, the first summarizing unit 501 extracts a key-value pair set of a target data table to be added to a target database, and summarizes keywords of key-value pairs in the key-value pair set into a first keyword set. Then, the second summarizing unit 502 summarizes the keywords corresponding to each column of the target data table into a second keyword set. Finally, the adding unit 503 adds the value of the key in the key-value pair set to the target data table based on the matching of the first key set and the second key set. Therefore, the storage of the content in the key-value pair set by using the target database is realized, and the flexibility of the storage of the key-value pair set is improved. Meanwhile, in this way, when the key value pair set and the key value pairs in the key value pair set need to be accessed, the data can be acquired without constructing a key word sequence and a value sequence and without performing deserialization operation on the key word sequence and the value sequence. Therefore, compared with the prior art, the method provided by the embodiment can provide convenience for subsequent access to the key-value pair set and the key-value pairs in the key-value pair set, so that the subsequent access efficiency to the data is improved.

Referring now to FIG. 6, shown is a block diagram of a computer system 600 suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 601. It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a first summarization unit, a second summarization unit, and an addition unit. Where the names of these units do not in some cases constitute a limitation on the unit itself, for example, a first aggregation unit may also be described as a "unit aggregated into a first set of keywords".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: extracting a key-value pair set of a target data table to be added to a target database, and summarizing keywords of key-value pairs in the key-value pair set into a first keyword set, wherein the key-value pairs comprise keywords and values of the keywords, and each column of the target data table corresponds to one keyword; summarizing the keywords corresponding to each column of the target data table into a second keyword set; adding values in the set of key-value pairs to the target data table based on a match of the first set of keys and the second set of keys.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method for processing data, comprising:

extracting a key-value pair set of a target data table to be added to a target database, and summarizing keywords of key-value pairs in the key-value pair set into a first keyword set, wherein the key-value pairs comprise keywords and values of the keywords, and each column of the target data table corresponds to one keyword;

summarizing the keywords corresponding to each column of the target data table into a second keyword set;

adding values of keys in the set of key-value pairs to the target data table based on a match of the first set of keys and the second set of keys, including: in response to determining that the first set of keywords does not match the second set of keywords, adding keywords belonging to the first set of keywords and not belonging to the second set of keywords in the target data table, and adding values of the keywords of each key-value pair in the set of key-value pairs to the target data table after the keywords are added.

2. The method for processing data according to claim 1, wherein the method further comprises:

in response to receiving an access request for a value of a target key in the target data table, determining whether a target column corresponding to the target key exists in the target data table;

in response to determining that the target column exists in the target data table, extracting values of rows in the target column from the target data table, and returning the extracted values of the rows.

3. The method for processing data according to claim 2, wherein the method further comprises:

returning a null value in response to determining that the target column is not present in the target data table.

4. The method for processing data according to claim 1, wherein the method further comprises:

in response to receiving an access request to the target data table, extracting values in the target data table by column;

summarizing the values of the same row in the extracted values of each column in a key-value pair mode to generate at least one key-value pair set, and returning the at least one key-value pair set, wherein for each extracted value, the key corresponding to the column of the value is the key of the key-value pair, and the value is the value of the key.

5. The method for processing data according to claim 4, wherein said aggregating values of a same row of the extracted values of columns in the form of key-value pairs, generating at least one set of key-value pairs, comprises:

and summarizing the values of the same row in the extracted values of each column in a key value pair mode, deleting the key value pairs with null values, and generating at least one key value pair set.

6. The method for processing data according to one of claims 1-5, wherein said adding values of keys in said set of key-value pairs to said target data table based on a match of said first set of keys and said second set of keys comprises:

taking keywords which simultaneously belong to the first keyword set and the second keyword set as coincident keywords, and summarizing the keywords into a coincident keyword set;

summarizing keywords belonging to the first keyword set and not belonging to the second keyword set into a third keyword set;

summarizing keywords belonging to the second keyword set and not belonging to the first keyword set into a fourth keyword set;

inserting columns corresponding to the keywords in the third keyword set into the target data table, wherein each row of the inserted columns is a null value;

for the keywords in the first keyword set, adding the value corresponding to the keyword in the key-value pair set to the column of the target data table corresponding to the keyword;

and writing a null value in a column corresponding to the key in the fourth key set in the target data table.

7. An apparatus for processing data, comprising:

the system comprises a first summarizing unit, a second summarizing unit and a third summarizing unit, wherein the first summarizing unit is configured to extract a key-value pair set of a target data table to be added to a target database, and summarize keywords of key-value pairs in the key-value pair set into a first keyword set, the key-value pairs comprise keywords and values of the keywords, and each column of the target data table corresponds to one keyword;

the second summarizing unit is configured to summarize keywords corresponding to each column of the target data table into a second keyword set;

an adding unit configured to add values of keys in the set of key-value pairs to the target data table based on a match of the first set of keys and the second set of keys, including: in response to determining that the first set of keywords does not match the second set of keywords, adding keywords belonging to the first set of keywords and not belonging to the second set of keywords in the target data table, and adding values of the keywords of each key-value pair in the set of key-value pairs to the target data table after the keywords are added.

8. The apparatus for processing data of claim 7, wherein the apparatus further comprises:

a first receiving unit configured to determine whether a target column corresponding to a target key exists in the target data table in response to receiving an access request for a value of the target key in the target data table;

a first extraction unit configured to extract values of respective rows in the target column from the target data table and return the extracted values of the respective rows in response to determining that the target column exists in the target data table.

9. The apparatus for processing data of claim 8, wherein the apparatus further comprises:

a return unit configured to return a null value in response to determining that the target column is not present in the target data table.

10. The apparatus for processing data of claim 7, wherein the apparatus further comprises:

a second receiving unit configured to extract values in the target data table by column in response to receiving an access request to the target data table;

and the second extraction unit is configured to collect values of the same row in the extracted values of the columns in a key-value pair form, generate at least one key-value pair set, and return the at least one key-value pair set, wherein for each extracted value, a key corresponding to the column of the value is a key of the key-value pair, and the value is the value of the key.

11. The apparatus for processing data of claim 10, wherein the second extraction unit is further configured to:

12. Apparatus for processing data according to one of claims 7 to 11, wherein the adding unit comprises:

a first summarizing module configured to summarize keywords belonging to both the first keyword set and the second keyword set as coincident keywords into a coincident keyword set;

a second summarization module configured to summarize keywords belonging to the first keyword set and not belonging to the second keyword set into a third keyword set;

a third aggregation module configured to aggregate keywords belonging to the second keyword set and not belonging to the first keyword set into a fourth keyword set;

an insertion module configured to insert columns in the target data table corresponding to the keywords in the third set of keywords, wherein each row of the inserted columns is a null value;

the adding module is configured to add, for a keyword in the first keyword set, a value corresponding to the keyword in the key-value pair set to a column of the target data table corresponding to the keyword;

a write module configured to write a null value in a column in the target data table corresponding to a key in the fourth set of keys.

13. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.

14. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-6.