CN106815238B

CN106815238B - Serialization and deserialization method and device for structured data

Info

Publication number: CN106815238B
Application number: CN201510857451.1A
Authority: CN
Inventors: 李勇勇; 蔡瀛; 王升功
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2015-11-30
Filing date: 2015-11-30
Publication date: 2020-10-20
Anticipated expiration: 2035-11-30
Also published as: CN106815238A; WO2017092580A1

Abstract

The application provides a serialization and deserialization method and a device of structured data, wherein the method comprises the following steps: acquiring attribute value groups corresponding to the n structured data respectively; generating a set of serialized data according to the attribute value groups corresponding to the n structured data respectively; the serialized data includes n records, wherein the ith record includes m_i+1 fields, first m_iThe value information of the individual field stores an attribute value group corresponding to the ith structured data, an mth_iThe tag information of +1 fields stores a first specific identifier for identifying the end of the ith record; the serialized data further includes a specific domain after the n records, and the tag information of the specific domain stores a second specific identifier for identifying the end of the serialized data. It can be seen that the structure of the message object is not adopted in the application, and the problem that serialization cannot be realized due to the size limitation of the message object is solved.

Description

Serialization and deserialization method and device for structured data

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for serialization and deserialization of structured data.

Background

With the advent of the big data age, more and more services have released processing platforms for structured data that are capable of providing both serialization and deserialization processing for structured data. Serialization of structured data refers to the conversion of raw structured data into serialized data having a particular format to facilitate transmission or the like from the serialized data. And deserialization is the reverse process of serialization, which can restore serialized data having a particular format to the original structured data.

At present, a commonly used serialization method is based on a protobuf protocol proposed by google, and when at least one piece of structured data is serialized based on the protobuf protocol, a specific serialization process includes: each structured data has a corresponding attribute value set, all structured data is converted into a message object, wherein the message object comprises at least one field (field), each field comprises tag information (tag) and value information (value), and the attribute value set of each structured data is stored in the value of each field.

For example, structured data order1 has a set of attribute values: a1, a2, and a3, structured data order2 has a set of attribute values: b1, b2 and b3, according to the rules of protobuf protocol, converting the structured data order1 and order2 into a message object a, wherein the message object a has 6 fields, and the 6 fields respectively store a1, a2, a3, b1, b2 and b3 in values.

However, in such a serialization method, since the structured data needs to be converted into the message object, the size of the message object has a certain limitation, for example, less than a certain byte needs to be satisfied, and the above limitation may result in that the serialization cannot be realized.

Disclosure of Invention

The technical problem to be solved by the present application is to provide a method and an apparatus for serializing and deserializing structured data, so as to solve the problem that serialization cannot be realized due to size limitation of a message object.

Therefore, the technical scheme for solving the technical problem is as follows:

the application provides a method for serializing structured data, which comprises the following steps:

acquiring n structured data, wherein n is more than or equal to 1;

acquiring attribute value groups corresponding to the n structured data respectively;

generating a set of serialized data according to the attribute value groups corresponding to the n structured data respectively;

the serialized data includes n records, wherein the ith record includes m_i+1 domains, i is more than or equal to 1 and less than or equal to n, m_iNot less than 1, front m_iValue letter of personal areaThe information is stored with a data set of the ith record, wherein the data set is an attribute value set corresponding to the ith structured data in the n structured data, and the mth record is the data set of the ith record_iThe tag information of +1 fields stores a first specific identifier for identifying the end of the ith record; the serialized data further includes a specific domain after the n records, and the tag information of the specific domain stores a second specific identifier for identifying the end of the serialized data.

Optionally, the n records respectively store check values corresponding to the records; wherein, in the ith record, the mth record_iAnd the value information of the +1 domain stores a check value corresponding to the ith record, and the check value corresponding to the ith record is obtained according to the attribute value group corresponding to the ith structured data.

Optionally, the specific domain includes a first specific domain and/or a second specific domain;

the label information of the first specific domain stores a first sub-identifier, the value information stores a total check value, and the total check value is obtained according to the check values respectively corresponding to the n records;

the label information of the second specific domain stores a second sub-identifier, and the value information stores a total record number, wherein the total record number is n.

Optionally, the first specific identifier and the second specific identifier are both numerical values greater than a preset threshold, and the preset threshold is determined according to a maximum available range of the domain identifier of the tag information.

The application provides a deserialization method of serialized data, which comprises the following steps:

obtaining a set of serialized data, the serialized data comprising n records, wherein the ith record comprises m records_i+1 domains, i is more than or equal to 1 and less than or equal to n, n is more than or equal to 1, m_iNot less than 1, front m_iThe value information of the individual field stores the data group of the ith record, the mth record_iThe tag information of +1 fields stores a first specific identifier for identifying the end of the ith record, the serialized data further includes a specific field after the n records, and the tag of the specific fieldThe information storage has a second specific identifier for identifying the end of the serialized data;

acquiring attribute value groups corresponding to n structured data from the serialized data, wherein the attribute value group corresponding to the ith structured data is acquired from the data group of the ith record;

and generating the n structured data according to the attribute value groups corresponding to the n structured data respectively.

Optionally, the method further includes:

obtaining check values respectively corresponding to n records from the serialized data, wherein the check value corresponding to the ith record is m-th record from the ith record_iAcquiring the value information of +1 domains;

and verifying the serialized data according to the attribute value groups corresponding to the n structured data respectively and the verification values corresponding to the n records respectively.

Optionally, if the specific domain after the n records includes a first specific domain, the method further includes: acquiring a total check value from the value information of the first specific domain, and checking the serialized data according to the check values respectively corresponding to the n records and the total check value;

if the particular domain after the n records includes a second particular domain, the method further comprises: acquiring a total record number from the value information of the second specific domain, and verifying the serialized data according to the total record number and the record number included in the serialized data;

the label information of the first specific domain stores a first sub-identifier, and the label information of the second specific domain stores a second sub-identifier.

The application provides a storage method of serialized data, which comprises the following steps:

the server creates a session;

the server sends the session identification of the session to a client;

the server receives a plurality of data blocks sent by the client in a distributed manner, wherein each data block is associated with the session identification and comprises one or more groups of serialized data;

the server receives a data block storage list sent by the client, wherein the data block storage list is used for identifying all data blocks to be stored;

and if the plurality of data blocks are matched with the data block storage list, the server stores the plurality of data blocks.

Optionally, any one of the one or more sets of serialized data includes n records, where the ith record includes m records_i+1 domains, i is more than or equal to 1 and less than or equal to n, n is more than or equal to 1, m_iNot less than 1, front m_iThe value information of the individual field stores the data set of the ith record, mth_iThe tag information of +1 field stores a first specific identifier for identifying the end of the ith record, the set of serialized data further includes a specific field after the n records, and the tag information of the specific field stores a second specific identifier for identifying the end of the set of serialized data.

Optionally, the method further includes:

if the data blocks are not matched with the data block storage list, the server sends a data block missing list to the client, wherein the data block missing list is used for identifying the data blocks which belong to the data block storage list and do not belong to the data blocks;

and the server receives the data blocks which are sent by the client in a distributed mode and identified in the data block missing list.

The application provides a method for downloading serialized data, which comprises the following steps: the server sends the total number of the stored serialized data to the client;

the server receives download information sent by the client, wherein the download information indicates the group number identification of the serialized data to be downloaded by the client;

and the server sends the serialized data corresponding to the group number identification to the client in a distributed manner.

Optionally, any one set of the stored serialized data includes n records, where the ith record includes m records_i+1 domains, i is more than or equal to 1 and less than or equal to n, n is more than or equal to 1, m_iNot less than 1, front m_iThe value information of the individual field stores the data set of the ith record, mth_iThe tag information of +1 field stores a first specific identifier for identifying the end of the ith record, the set of serialized data further includes a specific field after the n records, and the tag information of the specific field stores a second specific identifier for identifying the end of the set of serialized data.

The application provides a serialization apparatus, comprising:

the data acquisition unit is used for acquiring n structured data, wherein n is more than or equal to 1;

an attribute value acquisition unit, configured to acquire attribute value groups corresponding to the n pieces of structured data, respectively;

the data generating unit is used for generating a group of serialized data according to the attribute value groups corresponding to the n structured data respectively;

the serialized data includes n records, wherein the ith record includes m_i+1 domains, i is more than or equal to 1 and less than or equal to n, m_iNot less than 1, front m_iThe value information of the individual field stores the data set of the ith record, wherein the data set is the attribute value set corresponding to the ith structured data in the n structured data, and the mth record_iThe tag information of +1 fields stores a first specific identifier for identifying the end of the ith record; the serialized data further includes a specific domain after the n records, and the tag information of the specific domain stores a second specific identifier for identifying the end of the serialized data.

Optionally, the n records respectively store check values corresponding to the records; wherein, in the ith record, the mth record_iThe value information of +1 field stores the check corresponding to the ith recordAnd obtaining a check value corresponding to the ith record according to the attribute value group corresponding to the ith structured data.

The application provides an deserialization device, includes:

a data acquisition unit for acquiring a set of serialized data comprising n records, wherein the ith record comprises m_i+1 domains, i is more than or equal to 1 and less than or equal to n, n is more than or equal to 1, m_iNot less than 1, front m_iThe value information of the individual field stores the data group of the ith record, the mth record_iThe tag information of +1 fields stores a first specific identifier for identifying the end of the ith record, the serialized data further includes a specific field after the n records, and the tag information of the specific field stores a second specific identifier for identifying the end of the serialized data;

an attribute value acquisition unit, configured to acquire attribute value groups corresponding to n pieces of structured data from the serialized data, where an attribute value group corresponding to an ith piece of structured data is acquired from a data group of an ith record;

and the data generation unit is used for generating the n structured data according to the attribute value groups corresponding to the n structured data respectively.

Optionally, the method further includes:

a check value obtaining unit for obtaining n marks from the serialized dataRecording check values corresponding to the ith record respectively, wherein the check value corresponding to the ith record is m-th record in the ith record_iAcquiring the value information of +1 domains;

and the first checking unit is used for checking the serialized data according to the attribute value groups respectively corresponding to the n structured data and the checking values respectively corresponding to the n records.

Optionally, if the specific domain after the n records includes a first specific domain, the apparatus further includes: the second checking unit is used for acquiring a total checking value from the value information of the first specific domain and checking the serialized data according to the checking values respectively corresponding to the n records and the total checking value;

if the specific domain after the n records includes a second specific domain, the apparatus further comprises: a third checking unit, configured to obtain a total number of records from the value information of the second specific field, and check the serialized data according to the total number of records and a number of records included in the serialized data;

The application provides a server, including:

a creating unit configured to create a session;

a sending unit, configured to send a session identifier of the session to a client;

a receiving unit, configured to receive multiple data blocks sent by the client in a distributed manner, where each data block is associated with the session identifier, and each data block includes one or more sets of serialized data, and receive a data block save list sent by the client, where the data block save list is used to identify all data blocks to be saved;

and the storage unit is used for storing the plurality of data blocks if the plurality of data blocks are matched with the data block storage list.

Optionally, the sending unit is further configured to send a data block missing list to the client if the plurality of data blocks are not matched with the data block storage list, where the data block missing list is used to identify a data block that belongs to the data block storage list and does not belong to the plurality of data blocks;

the receiving unit is further configured to receive the data block that is sent by the client in a distributed manner and identified in the data block missing list.

The application provides a server, including: a transmitting unit and a receiving unit;

the sending unit is used for sending the total number of the stored serialized data to the client;

the receiving unit is used for receiving download information sent by the client, and the download information indicates the group number identifier of the serialized data to be downloaded by the client;

the sending unit is further configured to send the serialized data corresponding to the group number identifier to the client in a distributed manner.

Optionally, any one set of the stored serialized data includes n records, where the ith record includes m records_i+1 domains, i is more than or equal to 1 and less than or equal to n, n is more than or equal to 1, m_iNot less than 1, front m_iThe value information of the individual field stores the data of the ith recordGroup m_iThe tag information of +1 field stores a first specific identifier for identifying the end of the ith record, the set of serialized data further includes a specific field after the n records, and the tag information of the specific field stores a second specific identifier for identifying the end of the set of serialized data.

According to the technical scheme, in the embodiment of the application, n pieces of structured data are converted into one set of serialized data, wherein the set of serialized data comprises two parts, the first part comprises n records, each record corresponds to one piece of structured data, and the ith record comprises m records_i+1 field, first m_iThe value of the field stores the attribute value group corresponding to the ith structured data, the mth_iThe tag of +1 fields stores a first specific identifier for identifying the end of the ith record; and the second part, i.e. the n records, further comprise a specific field after which tag a second specific identifier is stored for identifying the end of the serialized data. It can be seen that the structure of the message object is no longer adopted in this application, but n structured data are converted into n records, the last 1 field of each record is used to identify the end of the record, so that different records can be distinguished, and the end of the serialized data is identified by a specific field after the n records. Therefore, the problem that serialization cannot be realized due to the size limitation of the message object is solved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art according to the drawings.

FIG. 1 is a diagram illustrating the structure of field in the protobuf protocol;

FIG. 2 is a schematic flow chart diagram illustrating one embodiment of a serialization method provided herein;

FIG. 3 is a schematic diagram of the structure of serialized data provided herein;

FIG. 4 is a schematic flow chart diagram illustrating one embodiment of a deserialization process provided herein;

FIG. 5 is a schematic flow chart diagram illustrating an embodiment of a saving method provided herein;

fig. 6 is a flowchart illustrating a downloading method according to an embodiment of the present application;

FIG. 7 is a schematic diagram of an apparatus embodiment of a serialization apparatus provided herein;

FIG. 8 is a schematic diagram of an apparatus embodiment of an deserialization apparatus provided herein;

FIG. 9 is a schematic diagram of an embodiment of an apparatus of a server provided in the present application;

fig. 10 is a schematic structural diagram of another embodiment of a server provided in the present application.

Detailed Description

One commonly used serialization method is based on the protobuf protocol proposed by google, which is explained below.

The protobuf protocol is based on the structure of a message object, and the message object includes at least one field, as shown in fig. 1, where one field includes a tag and a value.

The tag is 4 bytes long and includes a field identifier (field _ number) and a value type (value _ type), the field _ number is used to identify field, the value _ type is used to specify the data type in value, and the value is used to store data.

When at least one piece of structured data is serialized on the basis of a protobuf protocol, all the structured data are converted into a message object, and the attribute value set of the corresponding structured data is stored in each value of one of the message objects. For example, structured data order1 has a set of attribute values: a1, a2, and a3, structured data order2 has a set of attribute values: b1, b2 and b3, according to the rules of protobuf protocol, converting the structured data order1 and order2 into a message object a, wherein the message object a has 6 fields, and the 6 fields respectively store a1, a2, a3, b1, b2 and b3 in values.

In addition, the protobuf protocol does not support data verification in the serialization process, so that the security of data cannot be guaranteed.

The embodiment of the application provides a method and a device for serialization and deserialization of structured data, which are used for solving the problem that serialization cannot be realized due to the size limitation of a message object.

In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 2, the present application provides a method embodiment of a method for serializing structured data, the method of the present embodiment includes:

201: acquiring n structured data, wherein n is more than or equal to 1.

For example, two structured data orders 1 and order2 are obtained.

202: and acquiring attribute value groups corresponding to the n structured data respectively. Wherein each attribute value set includes at least one attribute value.

For example, the structured data order1 corresponds to the attribute value set a, and the attribute value set a specifically includes: a1, a2 and a 3; the structured data order2 corresponds to the attribute value set b, and the attribute value set b specifically includes: b1, b2 and b 3.

203: and generating a set of serialized data according to the attribute value groups corresponding to the n structured data respectively.

As shown in fig. 3, the serialized data includes two parts. The first part includes n records, Row1, Row 2, … …, Row n in Row1 of fig. 3. Each record corresponds to one piece of structured data, that is, each record stores the attribute value group of the structured data corresponding to the record.

The description is given by the ith record, i is more than or equal to 1 and less than or equal to n, and the ith record comprises m_i+1 fields, m_iNot less than 1, front m_iThe value of each field stores the data group of the ith record, wherein the data group of the ith record is the attribute value group corresponding to the ith structured data in the n structured data, and the mth record is the attribute value group corresponding to the ith structured data in the n structured data_iThe tag of +1 fields stores the first specific identifier, and the first specific identifier is used for identifying the end of the ith record.

For example, as shown in FIG. 3, record number 1 (i.e., Row1) includes m₁+1 columns (columns), i.e., Column1, Column 2, … …, Column m shown in FIG. 3₁Checksum1, while a column actually corresponds to a field, the identification of a column can be represented by the field _ number of the corresponding field, m₁The identification of the columns may be from 0 to m₁-1 arithmetic increments, or can be out of order. In record 1, the first m₁The value of Column stores the data set of item 1, i.e. the attribute value set (e.g. a1, a2 and a3) corresponding to the 1 st structured data. M th₁The tag of +1 columns (i.e. Checksum1 shown in fig. 3) stores a first specific identifier for identifying the end of the 1 st record.

Wherein the first specific mark is stored in the m-th mark₁In a field _ number of +1 fields, it can be distinguished from a field _ number in a field for storing data. E.g. in the ith record, the first m_iThe field _ number in field is incremented by an equal difference starting from 0, e.g. 0, 1, 2, … …, m _i1, the first specific identifier may be a relatively large number, i.e. a number that is not reachable by field _ number in the field used for storing data. Specifically, the first specific identifier may be a value greater than a preset threshold, where the preset threshold is determined according to a maximum available range of field _ number of tag. For example, the first specific identifier may be 2 to the power of 25 minus 1, i.e., (2^25) -1.

After the first part, n records, the serialized data also includes a second part, specific field, where tag of specific field stores a second specific identifier for identifying the end of the serialized data. Wherein the first specific identifier and the second specific identifier are different. Specifically, the second specific identifier may be a value greater than a preset threshold, where the preset threshold is determined according to a maximum available range of field _ number of tag. For example, the second specific identifier may be (2^25) -1024 and/or (2^25) -2.

For example, as shown in FIG. 3, after n records, i.e., Row1, Row 2, … …, Row n, the serialized data also includes specific fields, i.e., record Checksum n and record Total Row Count, where record Checksum n and record Total Row Count each correspond to a field. The field _ number of the record Checksum n and the field _ number of the record Total RowCount are used as the second specific identifier. The specific field shown in FIG. 3 may include only one of record Checksum n and record Total Row Count.

It should be noted that, in the embodiment of the present application, the number of fields of each record in the serialized data may be different, but the field that is missing is complemented when the serialized data is deserialized, so that the number of fields of each record is consistent, wherein the value in the complemented field is set to null and is not encoded.

According to the technical scheme, in the embodiment of the application, n pieces of structured data are converted into one set of serialized data, wherein the set of serialized data comprises two parts, the first part comprises n records, each record corresponds to one piece of structured data, and the ith record comprises m records_i+1 field, first m_iThe value of the field stores the attribute value group corresponding to the ith structured data, the mth_iThe tag of +1 fields stores a first specific identifier for identifying the end of the ith record; and the second part, i.e. the n records, further comprise a specific field after which tag a second specific identifier is stored for identifying the end of the serialized data. It can be seen that messag is no longer used in this applicatione-object structure, but converts n structured data into n records, the last 1 field of each record being used to identify the end of the record, thus enabling the distinction between different records, and the n records being followed by a specific field identifying the end of the serialized data. Therefore, the problem that serialization cannot be realized due to the size limitation of the message object is solved.

In the embodiment of the application, when the serialized data is generated, a check code can be added into the serialized data, so that data check in a serialization process is supported, and the data security is improved. This will be explained in detail below.

Optionally, the n records respectively store check values corresponding to the records. Wherein, in the ith record, the mth record_iThe value of +1 field stores the check value corresponding to the ith record, and the check value corresponding to the ith record is obtained according to the attribute value group corresponding to the ith structured data. For example, according to each attribute value in the attribute value group corresponding to the ith structured data, corresponding redundancy check codes, such as crc32, are respectively calculated, the sum of the calculated redundancy check codes is used as the check value corresponding to the ith record and is stored in the mth record_i+1 field values, such as the value of Checksum1 shown in fig. 3. M th_iThe value _ type of +1 fields may be 32-bit as defined by the protobuf protocol.

Optionally, a check value is stored in the specific field. It should be noted that the specific field in the serialized data may be one field or a plurality of fields. For example, the particular field includes a first field and/or a second field. The tag of the first field stores a first sub-identifier, and the value stores a total check value, where the total check value is obtained according to check values respectively corresponding to the n records, for example, a sum of the check values corresponding to each record. The tag of the second field stores a second sub identifier, and the value stores a total number of records, wherein the total number of records is n.

For example, as shown in FIG. 3, the first field may be the field corresponding to record Checksum n, the field _ number of the field may be (2^25) to 1024, and the value _ type may be 32-bit defined by protobuf protocol. The second field may be the field corresponding to the record Total Row Count, which may have a field _ number of (2^25) -2 and a value _ type of 64-bits as defined by the protocol.

Wherein the first sub-identifier is used as the second specific identifier if the specific field includes a first field and does not include a second field, the second sub-identifier is used as the second specific identifier if the specific field includes a second field and does not include the first field, and the first sub-identifier and the second sub-identifier are used as the second specific identifier if the specific field includes the first field and includes the second field.

The deserialization process corresponding to the above embodiment is explained below.

Referring to fig. 4, the present application provides a method embodiment of a deserialization method of serialized data, the method of this embodiment includes:

401: a set of serialized data is obtained.

The serialized data is specifically the serialized data generated in the embodiment corresponding to fig. 2. As shown in fig. 3, the serialized data includes two portions, the first portion including n records. Wherein the ith record comprises m _i1 field, i is more than or equal to 1 and less than or equal to n, n is more than or equal to 1, m_iNot less than 1, front m_iThe value of field stores the data set of the ith record, m_iThe tag of +1 fields stores the first specific identifier, and the first specific identifier is used for identifying the end of the ith record. After the first part, n records, the serialized data also includes a second part, specific field, tag of which stores a second specific identifier for identifying the end of the serialized data.

402: and acquiring attribute value groups corresponding to the n structured data from the serialized data. And acquiring the attribute value group corresponding to the ith structured data from the data group of the ith record.

In this embodiment, the data set of the ith record may be directly used as the attribute value set corresponding to the ith structured data.

403: and generating the n structured data according to the attribute value groups corresponding to the n structured data respectively.

According to the technical scheme, the serialized data in the embodiment of the application does not adopt the structure of the message object, but stores the attribute value groups of n structured data through n records, the last 1 field of each record is used for identifying the end of the record, so that different records can be distinguished, and the end of the serialized data is identified through a specific field after the n records. Therefore, the attribute values in the attribute value group of the structured data are not limited by the number any more, and the problem that serialization cannot be realized due to the size limitation of the message object is solved.

Optionally, both the first specific identifier and the second specific identifier are values greater than a preset threshold, and the preset threshold is determined according to a maximum available range of field _ number of tag.

In the embodiment of the application, data verification can be performed according to the verification code in the serialized data, so that the safety of the data is improved. This will be explained in detail below.

Optionally, the method further includes: obtaining check values respectively corresponding to n records from the serialized data, wherein the check value corresponding to the ith record is m-th record from the ith record_i+1 value of field; and verifying the serialized data according to the attribute value groups corresponding to the n structured data respectively and the verification values corresponding to the n records respectively. And if all the n records are successfully verified, the verification of the serialized data is successful.

And explaining verification of the ith record, respectively calculating corresponding redundancy check codes according to each attribute value in the attribute value group corresponding to the ith structured data, and calculating the sum of each redundancy check code, wherein if the calculated sum of each redundancy check code is consistent with the check value corresponding to the ith record, the verification of the ith record is successful.

Optionally, if the specific field after the n records includes a first specific field, the method further includes: and acquiring a total check value from the value of the first specific field, and checking the serialized data according to the check values respectively corresponding to the n records and the total check value. For example, the sum of the check values respectively corresponding to the n records is calculated, and if the sum of the check values is consistent with the total check value, it indicates that the serialized data is successfully checked.

If the particular field after the n records comprises a second particular field, the method further comprises: and acquiring a total record number from the value of the second specific field, and checking the serialized data according to the total record number and the record number included by the serialized data. For example, the number of records included in the serialized data is obtained, and if the number of records matches the total number of records, it indicates that the serialized data is successfully verified.

Wherein, the tag of the first specific field stores a first sub-identifier, and the tag of the second specific field stores a second sub-identifier.

With the advent of the big data age, more and more services have released platforms for processing structured data, such as the bigquery platform of google corporation and the redshift platform of amazon corporation. The platforms can convert the structured data into the serialized data, so that transmission operations such as uploading and downloading can be carried out.

At present, when the platforms upload serialized data, the all or nothing principle cannot be realized, that is, either all the data can be successfully uploaded or all the data can not be uploaded, so that the consistency of the data in the uploading process cannot be ensured.

Therefore, the embodiment of the application also provides a storage method of the serialized data and a server, so as to realize parallel uploading of the serialized data, thereby reducing the transmission time.

Referring to fig. 5, the present application provides a method embodiment of a method for storing serialized data, which is applied to a server. The method of the embodiment comprises the following steps: 501. 502, 503, 504 and 505.

501: the server creates a session.

Wherein the server may set the state of the session to open. When the state of the session is open, the storage operation of the data corresponding to the session can be performed.

502: the server sends a session identification (session ID) of the session to the client.

The state of the session can be shared between the server and the client, so the server can also notify the client of the state of the session.

601: and the client receives the session identification sent by the server.

602: the client divides the serialized data to obtain a plurality of divided data blocks (blocks).

Wherein each data block is associated with the session identification, in fact with the session. Each data block comprises one or more groups of serialized data, and if each data block comprises multiple groups of serialized data, an end identifier needs to be added at the end of the multiple groups of serialized data to identify the end of the data block.

603: and the client side sends the segmented data blocks in a distributed manner.

In this embodiment, since the server establishes the session and the session is associated with the plurality of data blocks, the client can be supported to send the plurality of data blocks in a distributed manner, thereby reducing the transmission time.

503: the server receives a plurality of data blocks sent by the client in a distributed mode.

604: and the client sends a data block storage list.

The data block saving list is used for identifying all data blocks to be saved. In fact, the data block holding list actually identifies all data blocks after the client has segmented the serialized data.

504: and the server receives a data block storage list sent by the client.

It should be noted that the execution sequence of

steps

503 and 504 is not limited, that is, step 503 may be executed first and then step 504 is executed, or step 504 may be executed first and then step 503 is executed, or

steps

503 and 504 may be executed simultaneously.

505: and if the received data blocks are matched with the data block storage list, the server stores the received data blocks.

505 may include sub-steps 5051 and 5052.

5051: since the data needs to ensure strong consistency, that is, the received data blocks are all saved or none of the received data blocks are saved. Therefore, after the server receives the plurality of data blocks and the data block storage list, the received plurality of data blocks and the list are matched, actually, whether all the data blocks identified by the list are consistent with the received plurality of data blocks is judged, if so, the matching is considered to be successful, and at this time, 5052 is executed.

5052: the server submits (commit) the received plurality of data blocks, in effect saving the received plurality of data blocks.

The server may also set the state of the session to closed. When the session is closed, the storage operation of the data corresponding to the session is not allowed. The server may also notify the client of the state of the session.

If in step 5051, it is determined that the multiple data blocks do not match the list, the server sends a data block missing list to the client, where the data block missing list is used to identify data blocks that belong to the data block storage list and do not belong to the multiple data blocks; and the server receives the data blocks which are sent by the client in a distributed mode and identified in the data block missing list. The server may perform a re-match based on the newly received data block. If the same data block is received multiple times, the last time the data block is received is taken as the standard.

According to the technical scheme, in the embodiment of the application, a session is established, the session is associated with a plurality of data blocks into which serialized data is divided, and the server receives a data block storage list and indicates all data blocks to be stored, so that the server can receive the plurality of data blocks sent by the client in a distributed manner, and judges whether the received plurality of data blocks are matched with the list according to the received plurality of data blocks and the list, that is, whether the server has received all data blocks identified by the list, and if so, the server has received all data blocks identified by the list, so that the server stores the received plurality of data blocks. Therefore, in the embodiment of the application, the server receives the multiple data blocks and the data block storage list sent by the client in a distributed manner, and only when the multiple received data blocks are matched with the data block storage list, the multiple received data blocks are stored, so that the consistency of the data blocks is ensured.

The embodiment can be used for uploading the serialized data provided by the embodiment of the application, and can also be used for uploading any current serialized data. In particular, any one of the one or more sets of serialized data can be of the particular structure shown in fig. 3. Specifically, n records are included, wherein the ith record includes m _i1 field, i is more than or equal to 1 and less than or equal to n, n is more than or equal to 1, m_iNot less than 1, front m_iThe value of field stores the data set of the ith record, m_iThe tag of +1 fields stores a first specific identifier for identifying the end of the i-th record, the set of serialized data further includes a specific field after the n records, and the tag of the specific field stores a second specific identifier for identifying the end of the set of serialized data. The set of serialized data may refer to relevant content in the method embodiment corresponding to fig. 2, and will not be described herein again.

The embodiment of the application also provides a method for downloading the serialized data, which can realize parallel transmission.

Referring to fig. 6, the present application provides a method embodiment of a method for downloading serialized data, which is applied to a server. The method of the embodiment comprises the following steps: 701. 702, 703 and 704.

701: the server creates a session.

Wherein the session can be associated with an index generated by a subsequent process, thereby avoiding the need to repeatedly establish an index.

The server may also set the state of the session to open. When the session is open, it indicates that the data corresponding to the session can be read. The state of the session can be shared between the server and the client, so the server can also notify the client of the state of the session.

It should be noted that, in the embodiment of the present application, the server may not establish a session.

701: and the server sends the total number of the stored serialized data to the client.

For example, the server sends to the client information that a total of 50 sets of serialized data are currently stored.

801: and the client receives the total group number sent by the server, and generates downloading information according to the total group number, wherein the downloading information indicates the group number identification of the serialized data to be downloaded.

For example, the download information may include binary information (offset, count), where the offset indicates a starting number of sets of serialized data to be downloaded, and the count indicates a total number of sets of serialized data to be downloaded. For example, if the binary information is (10, 10), it represents that the 10 th group to the 19 th group of serialized data are downloaded.

The server can establish an index, sequence all the serialized data, and facilitate the client to specify the binary information to download, wherein the index is associated with the session established by the server.

802: and the client sends the downloading information to the server.

703: and the server receives the downloading information sent by the client.

704: and the server sends the serialized data corresponding to the group number identification to the client in a distributed manner.

In the embodiment of the application, the server sends the serialized data to the client in a distributed manner, so that the time for data transmission is reduced.

The embodiment can be used for downloading the serialized data provided by the embodiment of the application, and can also be used for downloading any current serialized data. Specifically, any one set of the stored serialized data can have a structure as shown in fig. 3. Specifically, n records are included, wherein the ith record includes m _i1 field, i is more than or equal to 1 and less than or equal to n, n is more than or equal to 1, m_iNot less than 1, front m_iThe value of field stores the data set of the ith record, m_iThe tag of +1 fields stores a first specific identifier for identifying the end of the i-th record, the set of serialized data further includes a specific field after the n records, and the tag of the specific field stores a second specific identifier for identifying the end of the set of serialized data. The set of serialized data may refer to relevant content in the method embodiment corresponding to fig. 2, and will not be described herein again.

At present, the bigquery platform and the redshift platform only provide partial data transmission functions, for example, the redshift platform cannot support direct uploading of serialized data onto the platform, and neither the bigquery platform nor the redshift platform supports downloading of serialized data from the platform, but in the embodiment of the present application, the embodiments corresponding to fig. 5 and fig. 6 respectively can implement uploading and downloading of serialized data, and make up for the deficiencies of the above-mentioned platforms. The embodiments corresponding to fig. 5 and fig. 6, respectively, may be implemented based on the HTTP protocol, that is, the client and the server communicate with each other through the HTTP request.

The embodiment of the present application further provides an embodiment of an apparatus corresponding to the embodiment of the method, which is specifically described below.

Referring to fig. 7, an apparatus embodiment of a serialization apparatus is provided in the present application, which corresponds to the method embodiment shown in fig. 2. The apparatus of this embodiment includes: a data acquisition unit 701, an attribute value acquisition unit 702, and a data generation unit 703.

The data acquisition unit 701 is used for acquiring n pieces of structured data, wherein n is larger than or equal to 1.

For example, two structured data orders 1 and order2 are obtained.

The attribute value obtaining unit 702 is configured to obtain attribute value groups corresponding to the n pieces of structured data, respectively.

The data generating unit 703 is configured to generate a set of serialized data according to the attribute value groups corresponding to the n pieces of structured data, respectively.

Wherein the first specific mark is stored in the m-th mark₁In a field _ number of +1 fields, it can be distinguished from a field _ number in fields used for storing data. For example the first specific identifier may be a relatively large number, i.e. a number that is not reachable by field _ number in the field used for storing data. Specifically, the first specific identifier may be a value greater than a preset threshold, where the preset threshold is determined according to a maximum available range of field _ number of tag. For example, the first specific identifier may be 2 to the power of 25 minus 1, i.e., (2^25) -1.

Optionally, the specific domain includes a first specific domain and/or a second specific domain.

And the tag information of the first specific domain stores a first sub-identifier, the value information stores a total check value, and the total check value is obtained according to the check values respectively corresponding to the n records. The label information of the second specific domain stores a second sub-identifier, and the value information stores a total record number, wherein the total record number is n.

Referring to fig. 8, an embodiment of an apparatus for deserializing an apparatus is provided in the present application, which corresponds to the embodiment of the method shown in fig. 4. The apparatus of this embodiment includes: a data acquisition unit 801, an attribute value acquisition unit 802, and a data generation unit 803.

A data acquiring unit 801, configured to acquire a set of serialized data.

Wherein the sequenceThe serialization data is specifically the serialization data generated by the embodiment corresponding to fig. 2. As shown in fig. 3, the serialized data includes two portions, the first portion including n records. Wherein the ith record comprises m _i1 field, i is more than or equal to 1 and less than or equal to n, n is more than or equal to 1, m_iNot less than 1, front m_iThe value of field stores the data set of the ith record, m_iThe tag of +1 fields stores the first specific identifier, and the first specific identifier is used for identifying the end of the ith record. After the first part, n records, the serialized data also includes a second part, specific field, tag of which stores a second specific identifier for identifying the end of the serialized data.

An attribute value obtaining unit 802, configured to obtain attribute value sets corresponding to n pieces of structured data from the serialized data, where an attribute value set corresponding to an ith piece of structured data is obtained from a data set of the ith record.

A data generating unit 803, configured to generate the n pieces of structured data according to the attribute value groups corresponding to the n pieces of structured data, respectively.

Optionally, this embodiment further includes: a check value obtaining unit, configured to obtain check values corresponding to n records from the serialized data, where the check value corresponding to the ith record is m-th record from the ith record_iAcquiring the value information of +1 domains; a first checking unit, configured to check the attribute value groups corresponding to the n pieces of structured data according to the attribute value groups corresponding to the n pieces of structured data, and the checking values corresponding to the n pieces of records, respectivelyThe serialized data is checked. And if all the n records are successfully verified, the verification of the serialized data is successful.

Optionally, if the specific domain after the n records includes a first specific domain, the apparatus further includes: and the second checking unit is used for acquiring a total checking value from the value information of the first specific domain and checking the serialized data according to the checking values respectively corresponding to the n records and the total checking value.

Referring to fig. 9, an embodiment of an apparatus of a server is provided in the present application, which corresponds to the embodiment of the method shown in fig. 5. The server of this embodiment includes: a creating unit 901, a transmitting unit 902, a receiving unit 903, and a saving unit 904.

A creating unit 901 configured to create a session.

A sending unit 902, configured to send the session identifier of the session to the client.

A receiving unit 903, configured to receive multiple data blocks sent by the client in a distributed manner.

The receiving unit 903 is further configured to receive a data block save list sent by the client.

A saving unit 904, configured to save the plurality of data blocks if the plurality of data blocks match the data block saving list.

Since the data needs to ensure strong consistency, that is, the received data blocks are all saved or none of the received data blocks are saved. Therefore, after the receiving unit 903 receives the plurality of data blocks and the data block saving list, the server may match the plurality of received data blocks with the list, actually determine whether all the data blocks identified by the list are consistent with the plurality of received data blocks, if so, the server considers that the matching is successful, and at this time, the saving unit 904 submits (commit) the plurality of received data blocks, and actually saves the plurality of received data blocks.

Optionally, if the plurality of data blocks do not match the list, the sending unit 902 is further configured to send a data block missing list to the client, where the data block missing list is used to identify a data block that belongs to the data block storage list and does not belong to the plurality of data blocks; the receiving unit 903 is further configured to receive the data block identified in the data block missing list sent by the client in a distributed manner. The server may perform a re-match based on the newly received data block. If the same data block is received multiple times, the last time the data block is received is taken as the standard.

Referring to fig. 10, an embodiment of the present application provides another apparatus embodiment of a server, and the embodiment corresponds to the method embodiment shown in fig. 6. The server of this embodiment includes: a transmitting unit 1001 and a receiving unit 1002.

A sending unit 1001, configured to send the total number of the stored serialized data to the client;

for example, the sending unit 1001 sends, to the client, information that 50 sets of serialized data are currently stored.

The receiving unit 1002 is configured to receive download information sent by the client, where the download information indicates a group number identifier of serialized data to be downloaded by the client.

The sending unit 1001 is further configured to distributively send the serialized data corresponding to the group number identifier to the client.

The embodiments corresponding to fig. 9 and fig. 10, respectively, may be implemented based on the HTTP protocol, that is, the client and the server communicate with each other through the HTTP request.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A method for serializing structured data, comprising:

acquiring n structured data, wherein n is more than or equal to 1;

2. The serialization method according to claim 1, wherein check values corresponding to the respective records are stored in the n records; wherein, in the ith record, the mth record_iThe value information of +1 field is storedAnd obtaining the check value corresponding to the ith record according to the attribute value group corresponding to the ith structured data.

3. The serialization method according to claim 2, wherein the specific domain comprises a first specific domain and/or a second specific domain;

4. The serialization method according to claim 1, wherein said first specific identifier and said second specific identifier are both numerical values greater than a preset threshold, and said preset threshold is determined according to a maximum available range of the domain identifier of the tag information.

5. A method of deserializing serialized data comprising:

obtaining a set of serialized data, the serialized data comprising n records, wherein the ith record comprises m records_i+1 domains, i is more than or equal to 1 and less than or equal to n, n is more than or equal to 1, m_iNot less than 1, front m_iThe value information of the individual field stores the data group of the ith record, the mth record_iThe tag information of +1 fields stores a first specific identifier for identifying the end of the ith record, the serialized data further includes a specific field after the n records, and the tag information of the specific field stores a second specific identifier for identifying the end of the serialized data;

6. The deserialization method of claim 5, further comprising:

7. The deserialization method of claim 5,

if the specific domain after the n records comprises a first specific domain, the method further comprises: acquiring a total check value from the value information of the first specific domain, and checking the serialized data according to the check values respectively corresponding to the n records and the total check value;

8. The deserialization method of claim 5, wherein the first specific identifier and the second specific identifier are both numerical values greater than a preset threshold, and the preset threshold is determined according to a maximum available range of the domain identifier of the tag information.

9. A method for storing serialized data, comprising:

the server creates a session;

the server sends the session identification of the session to a client;

the server receives a plurality of data blocks sent by the client in a distributed manner, wherein each data block is associated with the session identification and comprises one or more groups of serialized data; any one of the one or more sets of serialized data comprises n records, wherein the ith record comprises m records_i+1 domains, i is more than or equal to 1 and less than or equal to n, n is more than or equal to 1, m_iNot less than 1, front m_iThe value information of the individual field stores the data set of the ith record, mth_iThe tag information of +1 fields stores a first specific identifier for identifying the end of the ith record, the set of serialized data further includes a specific field after the n records, and the tag information of the specific field stores a second specific identifier for identifying the end of the set of serialized data;

10. The saving method according to claim 9, further comprising:

11. A method for downloading serialized data, comprising: the server sends the total number of the stored serialized data to the client; any set of sequences in the stored serialized dataThe formatting data comprises n records, wherein the ith record comprises m_i+1 domains, i is more than or equal to 1 and less than or equal to n, n is more than or equal to 1, m_iNot less than 1, front m_iThe value information of the individual field stores the data set of the ith record, mth_iThe tag information of +1 fields stores a first specific identifier for identifying the end of the ith record, the set of serialized data further includes a specific field after the n records, and the tag information of the specific field stores a second specific identifier for identifying the end of the set of serialized data;

12. A serialization apparatus, comprising:

13. The garment of claim 12The device is characterized in that the n records respectively store check values corresponding to the records; wherein, in the ith record, the mth record_iAnd the value information of the +1 domain stores a check value corresponding to the ith record, and the check value corresponding to the ith record is obtained according to the attribute value group corresponding to the ith structured data.

14. The apparatus according to claim 13, wherein the specific domain comprises a first specific domain and/or a second specific domain;

15. The apparatus according to claim 12, wherein the first specific identifier and the second specific identifier are both values greater than a preset threshold, and the preset threshold is determined according to a maximum available range of the domain identifier of the tag information.

16. An deserializing apparatus, comprising:

17. The apparatus of claim 16, further comprising:

a check value obtaining unit, configured to obtain check values corresponding to n records from the serialized data, where the check value corresponding to the ith record is m-th record from the ith record_iAcquiring the value information of +1 domains;

18. The apparatus of claim 16, wherein if the particular field after the n records comprises a first particular field, the apparatus further comprises: the second checking unit is used for acquiring a total checking value from the value information of the first specific domain and checking the serialized data according to the checking values respectively corresponding to the n records and the total checking value;

19. The apparatus according to claim 16, wherein the first specific identifier and the second specific identifier are both values greater than a preset threshold, and the preset threshold is determined according to a maximum available range of the domain identifier of the tag information.

20. A server, comprising:

a creating unit configured to create a session;

a receiving unit, configured to receive multiple data blocks sent by the client in a distributed manner, where each data block is associated with the session identifier, and each data block includes one or more sets of serialized data, and receive a data block save list sent by the client, where the data block save list is used to identify all data blocks to be saved; any one of the one or more sets of serialized data comprises n records, wherein the ith record comprises m records_i+1 domains, i is more than or equal to 1 and less than or equal to n, n is more than or equal to 1, m_iNot less than 1, front m_iThe value information of the individual field stores the data set of the ith record, mth_iThe tag information of +1 fields stores a first specific identifier for identifying the end of the ith record, the set of serialized data further includes a specific field after the n records, and the tag information of the specific field stores a second specific identifier for identifying the end of the set of serialized data;

21. The server according to claim 20,

the sending unit is further configured to send a data block missing list to the client if the plurality of data blocks do not match the data block storage list, where the data block missing list is used to identify data blocks that belong to the data block storage list and do not belong to the plurality of data blocks;

22. A server, comprising: a transmitting unit and a receiving unit;

the sending unit is used for sending the total number of the stored serialized data to the client; any one set of the stored serialized data includes n records, wherein the ith record includes m_i+1 domains, i is more than or equal to 1 and less than or equal to n, n is more than or equal to 1, m_iNot less than 1, front m_iThe value information of the individual field stores the data set of the ith record, mth_iThe tag information of +1 fields stores a first specific identifier for identifying the end of the ith record, the set of serialized data further includes a specific field after the n records, and the tag information of the specific field stores a second specific identifier for identifying the end of the set of serialized data;