CN115422911A

CN115422911A - Method and device for processing Json text

Info

Publication number: CN115422911A
Application number: CN202211000956.2A
Authority: CN
Inventors: 张易
Original assignee: Beijing Oceanbase Technology Co Ltd
Current assignee: Beijing Oceanbase Technology Co Ltd
Priority date: 2022-08-19
Filing date: 2022-08-19
Publication date: 2022-12-02

Abstract

The disclosure provides a method and a device for processing Json texts. The method comprises the following steps: analyzing a Json text to obtain a key value pair set in the Json text, wherein the key value pair set comprises a plurality of keys and a plurality of values which correspond to the plurality of keys one by one; in the process of analyzing the Json text, the values are sequentially added to the Json binary field according to the analyzing sequence of the values; after the Json text is analyzed, sequencing the plurality of keys to obtain positioning data, wherein the positioning data is used for positioning the positions of the plurality of values in the Json binary field according to the sequenced keys; adding the plurality of keys and the positioning data to the Json binary field.

Description

Method and device for processing Json text

Technical Field

The disclosure relates to the technical field of data processing, in particular to a method and a device for processing Json texts.

Background

The data exchange format data makes it possible to exchange data between different programs of the computer or between different programming languages of the computer. The JavaScript object notation (Json) is a language for recording JavaScript in a text format, and is a lightweight data exchange format.

In some cases, for example when storing Json text, the Json text needs to be converted to Json Binary fields (Json Binary). In the related technology, the problem of large memory occupation exists when the Json text is converted into the Json binary field.

Disclosure of Invention

In view of this, embodiments of the present disclosure provide a method and an apparatus for processing a Json text, so as to solve the problem of large memory usage in the Josn text conversion process.

In a first aspect, a method for processing Json text is provided, which includes: analyzing a Json text to obtain a key value pair set in the Json text, wherein the key value pair set comprises a plurality of keys and a plurality of values which correspond to the plurality of keys one by one; in the process of analyzing the Json text, the values are sequentially added to the Json binary field according to the analyzing sequence of the values; after the Json text is analyzed, sequencing the plurality of keys to obtain positioning data, wherein the positioning data is used for positioning the positions of the plurality of values in the Json binary field according to the sequenced keys; adding the plurality of keys and the positioning data to the Json binary field.

Optionally, the method further comprises: storing the plurality of keys in a memory using a radix tree structure.

Optionally, the Json binary field contains a first field, and after the adding the plurality of keys and the positioning data to the Json binary field, the method further comprises: determining a position of the plurality of keys in the Josn binary field; updating the first field in-place such that the first field records the location of the plurality of keys in the Json binary field.

Optionally, the positioning data includes a first array and a second array, where the first array is used to record the offset of the sorted key, and the second array is used to record the offset of the value corresponding to the sorted key.

Optionally, the adding the plurality of keys to the Json binary field comprises: adding the plurality of keys to the Json binary field in the order of the plurality of keys' parsing.

Optionally, the parsing the Josn text further includes: checking the validity of the Josn text.

In a second aspect, an apparatus for processing Json text is provided, comprising: a parsing unit configured to parse the Json text to obtain a set of key-value pairs in the Josn text, the set of key-value pairs including a plurality of keys and a plurality of values in one-to-one correspondence with the plurality of keys; a first adding unit, configured to add the multiple values to the Json binary field in sequence according to the parsing order of the multiple values in the process of parsing the Json text; the sorting unit is configured to sort the plurality of keys after the Json text is analyzed, so as to obtain positioning data, and the positioning data is used for positioning the positions of the plurality of values in the Json binary field according to the sorted keys; a second adding unit configured to add the plurality of keys and the positioning data to the Json binary field.

Optionally, the apparatus further comprises: a storage unit configured to store the plurality of keys in a memory using a radix tree structure.

Optionally, the Json binary field contains a first field, the apparatus further comprising: a determining unit configured to determine positions of the plurality of keys in the Json binary field after the adding of the plurality of keys and the positioning data to the Json binary field; an update unit configured to update the first field in-place such that the first field records the positions of the plurality of keys in the Json binary field.

Optionally, the adding unit is further configured to: adding the plurality of keys to the Json binary field in the order of the plurality of keys' parsing.

Optionally, the parsing unit is further configured to: and checking the validity of the Json text.

In a third aspect, an apparatus for processing Json text is provided, which includes a memory and a processor, wherein the memory stores executable code, and the processor is configured to execute the executable code to implement the method according to the first aspect.

In a fourth aspect, there is provided a computer program product comprising: computer program code which, when run on a computer, causes the computer to perform the method of the first aspect described above.

In a fifth aspect, a computer-readable medium is provided, having program code stored thereon, which, when run on a computer, causes the computer to perform the method of the first aspect described above.

The scheme for processing the Json text can sequentially encode values according to the analysis sequence of the Json text, does not need to wait for all values to be analyzed and then encoded, namely, the codes of a plurality of values in the Json text are independent, the analyzed values can be directly encoded, the values do not need to be cached in a memory, only cache keys are arranged in the memory, the problem that the analyzed values occupy memory space is avoided, and the problem that the Json text is converted into Json binary fields, wherein the problem that the memory occupies a large space is solved.

Drawings

Fig. 1 is a schematic diagram of a Json binary field according to an embodiment of the disclosure.

Fig. 2 is a schematic flowchart of processing a Json text according to an embodiment of the present disclosure.

Fig. 3 is a diagram of another Json binary field provided by an embodiment of the disclosure.

Fig. 4 is a schematic flowchart of a method for processing a Json text according to an embodiment of the present disclosure.

Fig. 5 is a schematic diagram of a Json binary field corresponding to the method shown in fig. 4 according to an embodiment of the disclosure.

Fig. 6 is a schematic diagram of another Json binary field provided by an embodiment of the disclosure.

Fig. 7 is a diagram of another Json binary field provided by an embodiment of the disclosure.

Fig. 8 is a schematic flow chart of another process for Json text according to the embodiment of the present disclosure.

Fig. 9 is a schematic structural diagram of an apparatus for processing a Json text according to an embodiment of the present disclosure.

Fig. 10 is a schematic structural diagram of another apparatus for processing Json text according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments.

The data exchange format may describe data by a specific format that makes it possible for data to be exchanged between different programming languages, such that modern computer languages may support the data, the data exchange format may organize the data into a tree structure to describe relationships between data.

The data exchange format may be, for example, javaScript object notation (Json), which records JavaScript in text format, and is a lightweight data exchange format.

Json text (Json text) may include objects, which may be a collection of key-value pairs, and arrays, which may be a collection of values. Json objects may begin with a left bracket "{" and end with a right bracket "}", each key followed by a colon, each key-value pair separated by a comma. As an example, the Json text may be, for example, { "b":3, "a": "xyz" }, which includes two key-value pairs, respectively b =3 and a = xyz. The Json array may begin with a left middle bracket "[", and end with a right middle bracket "]", with commas separating each value. As an example, the Json text may be, for example, { "b": [ "3", "a", "xyz" ] }, the Json data comprising 3 key-value pairs, b =3, b = a, and b = xyz, respectively. The data type of the values of key-value pairs in the Json text may be, for example, one or more of numbers, strings, objects, arrays, and boolean values.

The data in the Json object can be other arrays or other objects, so that Json is data with nesting relation, namely Json is a composite data class type. Json is a nested structure, the nested structure enables Json to be tree-structured data in organization, and when a value in a Json text is basic type data, the value can be a leaf node in the tree structure; when a value in Json is an object, the value may be an intermediate node in a tree structure. The Json carries data types, does not need special structures and is easy to write, read and analyze. Since Json is a complex data type, json data is typically very large.

As mentioned, json is a tree-structured complex data structure, and access to Json is typically based on path access. However, there is no metadata describing the tree organization carried in Json, and when key value pairs in the Json are accessed, the Json needs to be loaded in a memory, a Json text is analyzed, and after the tree structure data is constructed, data access is performed on the basis of the Json tree structure data. As an example, when accessing data in Json text, the Json string in the Json text needs to be parsed into JavaScript objects. Parsing of the Json text may be accomplished using a Json parser, which may be, for example, gson, jackson, etc. The parsing of the Json text may also be performed using a parsing function, which may be, for example, a Json parse function.

During analysis, the Json text needs to be loaded in the memory, and analysis is performed according to Json data rules, so that a key value pair set in the Json text is obtained. When data in the Json text is queried, the Json text also needs to be loaded in the memory, after the Json text is expanded according to a Tree structure, the data is queried based on the Tree structure data, the Tree structure data can be, for example, a Json Tree, that is, the Json data in the Json text is analyzed into the Json Tree in the memory, and the data is queried based on the Json Tree.

It can be seen that the access efficiency of the unresolved Json text is very low, and the entire Json text needs to be traversed and resolved for any access to data in the Json text. As an example, for an unresolved Json text, when data query is performed, the data query is performed after the entire Json text is analyzed in a traversal manner. Moreover, when data access is repeatedly performed on the Json text, the access is required after the Json text is analyzed repeatedly. To make the Json text accessible and avoid repeated parsing per access, the parsed Json text may be serialized into Binary fields, which may be, for example, json Binary (Json Binary). Serialization may also be referred to as encoding, i.e., encoding key-value pairs in parsed Json text to encode them as Json binary fields.

The related art provides two methods for processing the Json text, and the two methods for processing the Json text provided in the related art are described below with reference to fig. 1 to 3. In the related art, the Json text may be '{ "b":3, "a": "xyz" }', and the Json text includes two key value pairs, b =3 and a = xyz, where b and a are keys, and 3 and xyz are values corresponding to the keys one to one.

Fig. 1 shows a schematic diagram of a Json binary field. The binary field shown in fig. 1 includes a length (object length) of a Json object, a value type (value type), a key (key), a value (value), and a length (value length) of the value. From the Json binary field shown in fig. 1, it can be seen that fig. 1 is encoded in the data order of the Json text, that is, fig. 1 is encoded in the parsing order of the data in the Json text. The key-value pair b =3 is analyzed first, and then the key-value pair a = xyz is analyzed, and fig. 1 sequentially and independently encodes key-value pair data according to the analysis order of the key-value pairs in the Json text. The encoding method shown in fig. 1 does not consider sorting key value pairs in Json, and when data query is performed, all elements in the entire encoded data need to be traversed, which is poor in query performance.

In order to improve the queryability of the Json text, metadata for describing tree-shaped organization can be added in the Json field, and when data is queried, the data can be queried directly based on the Json binary field, so that multiple times of analysis are avoided. For efficient query, when encoding the Json text, the parsed key-value pairs are sorted, e.g., the parsed keys may be sorted in the order of the strings, and encoded based on the sorted data

Fig. 2 shows a schematic flow diagram for processing Json text. As shown in fig. 2, after acquiring the Json text, the Json text may be processed at the server. The Json text is analyzed firstly, the purpose of the analysis is to convert data in the Json text into a JavaScript object, namely, a key value set in the Json text is extracted. And constructing a tree structure data Json tree in the memory based on the analyzed key value pairs. And after the complete tree structure is constructed, sequencing based on the complete tree structure data, and completing coding according to the sequenced data.

Fig. 3 is a diagram of a Json binary field corresponding to fig. 2. The Json binary field shown in fig. 3 may include a Json text type (type), a number of key value pairs (number count), a Json object length (object length), a key data length (key length), a key offset (key offset), a key (key), a value (value), and a value offset (value offset). It can be seen that when encoding the Json text according to the binary field described in fig. 3, the key data needs to be sorted after all key-value pairs in the Json text are resolved, and the data is encoded based on the sorted key. The purpose of this is that when data lookup is performed later, a binary lookup can be performed based on the key to speed up the query. As shown in fig. 3, the order of the value offsets and the order of the key offsets are kept consistent at the time of encoding. In this way, when data query is performed, after the key is found, the value offset can be directly found based on the array sequence number, and the value data can be found through the value offset.

It can be seen that the method for processing the Json text provided by the related art cannot sort key-value pairs in the Json text, and has a problem of low data search performance, such as the Json binary field shown in fig. 1; or the key-value pairs may be sorted to realize efficient search, but all the key-value pairs in the Json text need to be parsed out, and then encoded after sorting, as in the processing flow illustrated in fig. 2. In the foregoing, json is tree-structured data having a natural nesting relationship, and tree-structured data needs to be constructed in a memory in the process of analyzing all key-value pairs, and the tree-structured data occupies a large memory, and occupies a large amount of memory in a short period. Secondly, all key-value pairs in the Json text need to be analyzed, and then coding is carried out after sequencing, before the analysis is completed, the analyzed key-value pairs can only be temporarily cached in a memory, and the problem that the memory is occupied because data cannot be stored in a disk in time exists. And before the resolution is completed, the corresponding Json binary field of the Json text can not be completely coded and can be immediately landed, and a Json tree and the Json binary field exist in the memory at the same time. Particularly, in a database system, the access of other data types in a tenant is affected by the large occupation of the memory of the tenant, and when multiple connections in the tenant perform concurrent access, for example, when the multiple connections in the tenant store different Json texts at the same time, a large amount of memory is occupied, and the performance of the database is reduced.

Based on the above, the present disclosure provides a method and an apparatus for processing a Json text, so as to solve the problem that memory is occupied when the Json text is converted into a Json binary field.

Fig. 4 is a schematic flowchart of a method for processing a Json text according to an embodiment of the present disclosure. The Json text processing method and the Json text processing device can be suitable for Json-supporting relational databases and Nosql database products.

In step S410, the Json text is parsed to obtain a set of key-value pairs in the Json text.

The Json text may be a JavaScript language recorded in a text format, and in some embodiments, the Json data in the Json text may be string data of a text record. The Json text comprises a key value pair set, and the key value pair set comprises a plurality of keys and a plurality of values corresponding to the keys in a one-to-one mode. As an example, the Json text may be e.g., { "b":3, "a": "xyz" }' comprising two key-value pairs, b =3 and a = xyz, wherein b, a are keys and 3, xyz are values corresponding to the keys one to one.

Parsing the Json text can be understood as converting the Json data into data that can be stored or accessed in a Json format. Specifically, a set of key-value pairs in the Json text can be obtained by analyzing the Json text. Taking the data to be processed as '{ "b":3, "a": xyz "}' as an example, the key value pairs b =3 and a = xyz can be obtained by parsing the Json text.

In some embodiments, the parsing of the Json text may be implemented using a Json parser, which may be Gson, jackson, or the like, for example. In some embodiments, the Json text parsing may be performed using a parsing function, which may be, for example, a Json parse function.

In step S420, in the process of parsing the Json text, a plurality of values are sequentially added to the Json binary field according to the parsing order of the plurality of values.

In the process of analyzing the Json text, the values can be encoded according to the sequence of value analysis, and the encoded data of the values are sequentially added to the Json binary field. In some embodiments, encoding the data may also be referred to as serializing the data.

It should be appreciated that the method of processing Json text provided by the present disclosure may encode the values after they have been parsed. That is, the values may be encoded synchronously in the order of parsing. In some cases, the values may be encoded after a certain number of parsed values are satisfied, for example, after five parsed values are resolved, the values are encoded, in other words, the encoding of the value data in the present disclosure is performed synchronously with the Json text parsing.

Taking the Json text as '{ "b":3, "a": "xyz" }' as an example, if the key value pair analyzed first is b =3, the value 3 in the key value pair is obtained and encoded, and if the key value pair analyzed later is a = xyz, the value xyz in the key value pair is obtained and encoded. That is to say, values in the key value pairs can be sequentially encoded according to the order of key value pairs in the Json text to be analyzed.

After encoding the plurality of values, encoded data of the plurality of values may be obtained, and the encoded data of the plurality of values may be added to the Json binary field in the order in which the plurality of values are parsed. The encoded data of the value may include a data type (value type) of the value, the value (value), and a length of the value. The data type of the value can be, for example, integer, numeric, string, etc

Illustratively, the Json binary field may be, for example, 0x03 52, 0x02, 3xyz, 0x01. Where 0x03 is the type of Json data in the Json text, and the type of Json data may be an object, for example. And 52 is the length of the Json data. And 32, the positioning data is used for recording the positions of a plurality of keys in the Json binary field, at the moment, the positioning data can be left empty or any data can be filled in, and the positioning data is updated after the analysis is completed. 0x02 3xyz is encoded data of a value xyz, where 0x02 is a data type of the value xyz, and the data type may be, for example, a character string (string), 3 is a length of the value xyz, and xyz is value data. 0x01 is encoded data of value 3, wherein 0x01 is data type of value 3, which may be integer (int), for example, and 3 is value data, where the length of value 3 is omitted. It can be seen that the order of the plurality of values in the Json binary field is consistent with the order of the plurality of values in the Json text, i.e., the order of the plurality of values in the Json binary is consistent with the parsing order of the values.

In step S430, after the Json text is parsed, a plurality of keys are sorted to obtain the positioning data.

After the Json text is analyzed, all key value pair sets of the Json text are obtained, and in step S420, values in the key value pairs are already encoded, so in step S430, a plurality of keys in the Json text are sorted to obtain positioning data.

The positioning data may for example comprise a second array for recording the positions of a plurality of values corresponding to the sorted plurality of keys in a Json binary field, i.e. the order of the positioning data of the values is kept identical to the order of the sorted keys. The positioning data is used for positioning the positions of the values in the Json binary field according to the sorted keys, and the positioning data can be offset, namely the positions of the values in the Json binary field are positioned through the offset, and the positions of the values in the binary field corresponding to the sorted keys in a one-to-one mode can be positioned through the positioning data.

Continuing with the example where the Json binary field may be, for example, 0x03 52, 0x02, 3xz, 0x01, for example, the exemplary description will be made. In the Json binary field, the offset of the value xyz in the binary field may be, for example, 9, that is, 9 bytes are offset from the first bit of the binary, so as to obtain the type of the reciprocal value xyz, and the value is obtained by reading backwards according to the encoding rule; the offset of the value 3 in the binary field may be, for example, 17, i.e. 17 bytes from the first bit of the binary field, resulting in the type of value 3, and reading back according to the encoding rules may result in the value. In some embodiments, the offset of the value xyz in the binary field may be, for example, 14, i.e., 14 bytes from the binary first, which is directly obtained; the offset of the value 3 in the binary field may be, for example, 21, i.e., 21 bytes from the binary first bit, resulting directly in the value.

When sorting a plurality of keys, for example, the keys may be sorted according to the size of the keys, for example, sorted in ascending order according to the size of the values of the keys; for example, the keys may be sorted in character order according to their characters, and for example, the keys may be sorted in character order of a to Z.

After a plurality of keys are sorted, positioning data can be obtained, and the positioning data is introduced by taking the positioning data as an offset. As mentioned above, encoding the value results in an offset of the value in the Json binary field, e.g., the value xyz may be offset by 9 and the value 3 may be offset by 14. After sorting the keys according to characters, the key a is before the key b, so that the value xyz corresponding to the key a is before the value 3 corresponding to the key b in the binary position, and therefore, the offset of the value corresponding to the key a is before the offset of the value corresponding to the key b. That is, the location data for the values is array [9,17]. The value corresponding to the sorted key can be found in the Json field by locating an array.

In step S440, a plurality of keys and positioning data are added to the Json binary field.

After the location data is obtained, a plurality of keys and the location data may be added to the Json binary field including the value data to obtain a corresponding Json binary field of the Json text.

A plurality of keys may be encoded first and the encoded data for the plurality of keys may be added to the binary field. The encoded data of a key may include, for example, the length of the key (key length), the value of the key (key value), and the number of keys (key count).

When a plurality of keys are added to the Json binary field, the keys may be sequentially added in the order of the plurality of keys based on the sorted plurality of keys, or may be sequentially added in the order of the plurality of keys.

Continuing with the example where the Json binary field may be, for example, 0x03 52 32 0x02 3xyz0x01 3, an exemplary description will be given. The ordered keys have an order of [ a, b ], and the keys may be added to the binary code in the order of a plurality of keys. The encoded data of the keys may be, for example, 1a 1b 2, where 1a indicates the length of key a and the key data, 1b indicates the length of key b and the key data, and 2 indicates the number of keys in the Json text, and the encoded data of the keys may be added to the end of the binary field to obtain a Json binary field, 0x03 52, 0x02, 3xz, 0x01, 1a 1b 2, and then, the value positioning data [9,17] in the foregoing may be added to the end of the binary field to obtain a Json binary field, 0x03, 52, 3xz0, 01, 1a,

1b

2, 17. When a plurality of keys are coded according to the sequence of a plurality of key data, the positioning data of the plurality of keys can be obtained, for example, the positioning data of the plurality of keys can be the offsets of the plurality of keys, and the offsets of the plurality of keys can be added into the Json binary field to complete the serialization of the Json text.

The offset of the plurality of keys may indicate the position of the plurality of keys in the Json binary field. The offset of the key a may be 26, that is, 26 bytes are offset from the binary first bit, the length of the key a is obtained, and the key data is read backwards; the offset of key b may be 31, i.e., 31 bytes from the binary first bit, resulting in the length of key b, and the key data is read back. The offset of the key a can also be 27, namely 27 bytes are offset from the binary first bit, and the key data is directly obtained; the offset of key b may be 32, i.e. 32 bytes from the binary first bit, resulting directly in key data.

For example, the positional data of the plurality of keys may be added after the value positional data, or may be added after the encoded data of the plurality of keys. Illustratively, the binary field of the Json text may be, for example, 0x03 52 32, 0x02, 3xyz, 0x013 a,

1b

2, 31 9.

It can be seen that the method for processing the Json text provided by the embodiment of the present disclosure can sequentially encode values according to the parsing order of the Json text, and does not need to encode all the values after parsing, that is, encoding of a plurality of values in the Json text is independent, the parsed values can be directly encoded, and do not need to be cached in a memory, and only cache keys are stored in the memory, so that the memory space occupied by the parsed values is avoided, and the problem of large memory occupation in the process of converting the Json text into the Json binary field is solved.

In some embodiments, an apend interface of Lob may be called to sequentially write the Json binary fields into the disk according to the parsing order, or an apend function may be used to add a plurality of keys and positioning data to the tail of the above binary fields to obtain the binary fields of the Json text.

Since multiple keys in Json text are typically string-type data and their length is typically limited, for example in the Mysql database, a key may be 64K at maximum. Therefore, the analyzed keys can be converted into a radix tree structure for storage, so that the occupation of the keys on the memory is compressed, and the memory space is further released. The Radix Tree may be Radix Tree data, for example, that is, a plurality of keys parsed out from Json text are stored in a memory by using a Radix Tree structure, so as to further compress the memory occupied by the keys.

As previously mentioned, the Json binary field may contain a first field that may be used to record the location of a plurality of keys in the Json binary field. In some embodiments, the first field is also used to record the location of a number of keys in a Json binary field from which the number of keys in the binary field can be quickly located. The number of keys may for example precede the location data for the first key of the plurality of keys in the binary field, so that the number of keys in the Json binary field may be quickly located according to the first field and the location data for the required key is read back to find the key data quickly. The positioning data may be, for example, sort-key offset.

An exemplary description will be given below with a Json binary field of 0x03 52 32 0x02 3xyz0x01 3 1a 1b 2 17. The first field may be, for example, 32 of the above-mentioned fields, i.e., the number of keys in the Json text may be obtained after being offset by 32 bytes from the head of the Json binary field. In the Json binary field, 32 bytes are deviated from the head of the Json binary field to obtain that the Json text has two keys, and key data can be obtained by backward reading, so that efficient query of data is realized.

It can be seen that the first field needs to be obtained after the Json text is completely parsed and a plurality of keys and positioning data are added to the Json binary field. That is, after the Json parsing is completed and the encoding of the plurality of keys and the positioning data of the plurality of values is completed, the data of the first field can be obtained after the encoding is added to the Json binary field including the plurality of values. In order to enable the Json binary field to be landed in time, any data can be added into the first field, so that the Json binary field is landed in an additional mode, after the plurality of keys and the positioning data are added into the Json binary field, the positions of the plurality of keys in the Json binary field are determined, and the data in the first field is updated in an in-place updating mode.

The backfilling of the first field of data may be performed, for example, using an In-place update (In-place) instruction. The In-place update instruction is used for performing In-place update on fields with unchanged data occupation space before and after change, namely, the old values are directly covered by new values without moving other data or reapplying for space writing, and the In-place update instruction is an update scene with very high efficiency.

In step S430, after the Json text is parsed, sorting the plurality of keys to obtain location data, where the location data may include a first array and a second array, where the first array may be offsets of the plurality of keys and is used to record offsets of the sorted keys, and the second array may be offsets of a plurality of values and is used to record offsets of a plurality of values corresponding to the sorted order of the plurality of keys one by one. Therefore, when data is queried, after the key is found, the offset of the value can be directly found in the second array according to the serial number of the key in the first array, and the value can be quickly found through the offset of the value, so that the query efficiency is improved.

When a plurality of keys are added to the Json binary field, the keys do not need to be added according to the sequence of the keys, and the coding efficiency is further improved.

After the first array and the second array are obtained, in step S440, the plurality of keys and the first array and the second array are added to the Json binary field to obtain a binary field corresponding to the Json text. Therefore, when a plurality of keys are added to the Json binary field, the keys can be added in sequence according to the analysis sequence of the keys, and the processing efficiency of the Json text is further improved.

Continuing with the example where the Json binary field may be, for example, 0x03 52 32 0x02 3xyz0x01 3, an exemplary description will be given. The sequence of the sorted keys is [ a, b ], and a second array [9,17] consistent with the sequence of the plurality of keys is obtained according to the sequence of the sorted keys. According to the analysis sequence of a plurality of keys and the sorting sequence of the plurality of keys, calculating a first array [31,26] of offset of the sorted plurality of keys, namely in a Json binary field, according to the sorting sequence of the plurality of keys, the offset of a key a is 31, the offset of a key b is 26, the offset of a value xyz corresponding to the key a is 9, and the offset of a value 3 corresponding to the key b is 17. And coding a plurality of keys, and coding the addition of the keys, the first array and the second array at the tail of the binary field to obtain a binary field 0x03 52 32 0x02 3xyz0x01 3 1b 1a 231 9 corresponding to the Json text. The Json binary field will be described with reference to FIG. 5, which shows FIG. 5 as 0x03 52 32 0x02 3xyz0x01 3 1b 1a 231 9 in the above text.

The fields in the binary field shown in fig. 5 have been described above and are not described redundantly here. As shown in fig. 5, a plurality of values are sequentially added in the order of analysis of the Json text, and a plurality of keys are also sequentially added in the order of analysis of the Json text. The order of the offsets of the plurality of keys is consistent with the ordered key order, i.e., the order of the offsets in the first array is consistent with the ordered key order. A first offset in the first array indicates a position of a first key after sorting, and a second offset in the first array indicates a position of a second key after sorting. In fig. 5, the offset amount 36 indicates the position of the key a, and the offset amount 26 indicates the position of the key b. The order of the offsets of the plurality of values is consistent with the sorted key order, i.e. the order of the offsets in the second number group is consistent with the sorted key order. A first offset in the second array indicates a position of a value corresponding to the sorted first key, and a second offset in the second array indicates a position of a value corresponding to the sorted second key. In fig. 5, the offset amount 9 indicates the position of the value xyz corresponding to the key a, and the offset amount 17 indicates the position of 3 corresponding to the key b.

In some embodiments, the legitimacy of the Json text may also be checked when parsing the Json text.

A method for processing Json text provided by embodiments of the present disclosure is described below with reference to a Json binary field shown in fig. 6. The Json text shown in FIG. 6 is' { "d":4, "h": "x", "a": zxc "," p ": 5" }.

In parsing the Json text, the values are first encoded according to the parsing order of the values, and the encoded data of the values are sequentially added to the binary field according to the parsing order of the values, that is, the encoded data of multiple values, i.e., 0x01 40 x02 x 0x02 3zxc0x01 5, which may also be preceded by fields such as Json data type, data length, and position of the value, which are already described above and are not shown in fig. 6. In encoded data of plural values, 0x01 4 x02 x 0x02 x02 3zxc0x01, 0x01 4 indicates a data type and a value of value 4, 0x02 x indicates a data type and a value of value x, 0x02 3zxc indicates a data type, a data length and a value of value zxc, and 0x01 5 indicates a data type and a value of value 5.

After the Json text is analyzed, a plurality of values in the Json text are encoded, and a plurality of keys are cached in a memory. If multiple values are represented in groups that are offset in the Json binary field, the value offset array may be, for example, [1,6,11,19] before the plurality of keys are not sorted. In this array 1 is the offset of

value

4, 6 is the offset of value x, 11 is the offset of value zxc, and 19 is the offset of value 5. And sorting a plurality of keys according to the character sequence of Z-A, wherein the sorted plurality of keys are [ p, h, d, a ], and obtaining positioning data [19,6,1,11] according to the sorted plurality of keys. The first data in the positioning data is an offset of a value corresponding to a first key in the plurality of keys after the sorting in the binary field, that is, 19 indicates a position of a value 5 corresponding to a key p in the binary field, and the second data in the positioning data is an offset of a value corresponding to a second key in the plurality of keys after the sorting in the binary field, that is, 6 indicates a position of a value x corresponding to a key h in the binary field. This is because the key h is arranged behind the key p, and therefore, the offset amount of the value corresponding to the key h is arranged behind the offset amount of the value corresponding to the key p. That is, the order of the data in the value offset array remains the same as the order of the sorted keys. A plurality of keys and the above-described positioning data are added to the Json binary field, resulting in the following binary field, 0x01 4 0x02 x 0x02 3zxc 5 0x01 d h a p 196 1.

The positioning data of the plurality of keys is then added to the binary field. The positioning data for the plurality of keys may be an array of offsets for the plurality of keys, which array may be, for example, [27,25,24,26], the first data in the array indicating the offset of the first key in the sorted plurality of keys, i.e. 27 indicating the offset of key p in the binary field; the second data indicates the offset of the second key of the sorted plurality of keys, i.e. 25 indicates the offset of key h in the binary field. This is because the key h is arranged behind the key p, and therefore, the key offset amount of the key h is arranged behind the key offset amount of the key p. That is, the order of the data in the key offset array is consistent with the order of the sorted keys. The offset array of multiple keys is added before the offset array of values, resulting in the final binary field, 0x01 4 x02 x 0x02 3zxc0x01 d h a p 27 24 26 6 11.

In connection with the above-mentioned Json binary field, an example is given below of how fast lookup of key-value pairs can be achieved. Assuming that a lookup key-value pair p =5 is required, a binary lookup is employed. First, find the middle data in the offset array of the key, for example, the 2 nd offset in the array. The 2 nd offset is 25, from which the key h is found. H is compared to p, h being smaller than p. Thus, the key's offset continues to be looked up to the left, the 1 st offset 27 in the key's array of offsets is found to the left, and key p is found from that offset. When the value corresponding to the key p is searched, because the sequence of the data in the key offset array is consistent with the sequence in the value offset array, the value can be directly found based on the sequence number of the data. The offset of the key p is the first data in the key offset array, and the offset of the value corresponding to the key p is also the first data in the value offset array, so that the first value offset 19 is found, and the value data 5 is found based on this value offset.

A method for processing Json text provided by embodiments of the present disclosure is described below with reference to another Json binary field shown in fig. 7. The Json text shown in FIG. 7 is' { "zt": "pq", "wer":25 "," gt ":" bf "}.

When the Json text is analyzed, a plurality of values are sequentially encoded into a Json binary character string according to the analyzing sequence of the values. The encoded data of a plurality of values may be, for example, 0x02pq 0x01 25 0x02 bf, and it is seen that a plurality of values are sequentially encoded in the order of analysis of the plurality of values. After the analysis is completed, a plurality of values in the Json text are encoded, and a plurality of keys are cached in the memory. Before the plurality of keys are not sorted, the value offset array may be, for example, [1,7,13], where 1 is the offset of the value pq, 7 is the offset of the

value

25, and 13 is the offset of the value bf. And sequencing a plurality of keys according to the character sequence of A to Z, wherein the sequenced plurality of keys are [ gt, wer, zt ]. Based on the order of the plurality of keys, a positioning array [13,7,1] is obtained. The first data in the positioning array indicates the position of the value corresponding to the first key in the plurality of keys after sorting, namely 13 indicates the position of the value bf of the key gt in the binary field; the second data in the positioning array indicates the position of the value corresponding to the second key in the sorted plurality of keys, namely 7 indicates the position of the value 25 of the key wer in the binary field; the third data in the location array indicates the position of the value corresponding to the third key in the sorted plurality of keys, i.e. 1 indicates the position of the value pq of key zt in the binary field. It can be seen that the order in which the data in the array is located is consistent with the order of the sorted keys. In the positioning array described above, key zt is arranged behind key wer, and therefore, offset 1 of the value of key zt is arranged behind offset 7 of the value of key wer; the key gt is arranged ahead of the key wer, and therefore, the offset amount 13 of the value of the key gt is arranged ahead of the offset amount 7 of the value of the key wer. A plurality of keys and the above positioning data are added to the Json binary field, resulting in the following binary field, 0x02pq 0x01 25 0x02 bf zt wer 13.

The offsets for a plurality of keys may be, for example, a key offset array [24,21,19] upon adding the offsets for the plurality of keys to the Json binary field. In the key offset array, the key zt is arranged behind the key wer, and therefore, the offset 19 of the key zt is behind the offset 21 of the key wer; the key gt is arranged ahead of the key wer, and therefore, the key gt offset amount 24 is ahead of the offset amount 21 of the key wer. The final binary field, 0x02pq 0x01 25 0x02 bf zt wer 13 7 1 24 19, is obtained by adding an offset array of a plurality of keys before the offset array of values.

When key value pair search is performed, binary search is adopted assuming that a key value pair zt = pq needs to be searched. First, intermediate data, which is key offset 21 in the middle of the array, is found in the offset array of the key, and the key wer is found based on the key offset. The size of key wer is compared to key zt, which is larger than key wer, so the search continues to the right. Find key offset 19, find key zt based on this key offset. When finding the value of the key zt, because the order of the data in the key offset array is kept consistent with the order of the data in the value offset array, the offset with the consistent sequence number can be found based on the sequence number of the offset of the key zt in the key offset array directly in the value offset array. The key zt key offset 19 is the third data in the key offset array, and therefore the third data in the value offset array is the offset of the value corresponding to the key. I.e. 1 is the offset of the value corresponding to the key zt, based on which the value pq can be found.

The method for processing the Json text provided by the present disclosure is exemplarily described below with reference to fig. 8, and fig. 8 shows a schematic flow chart of processing the Json text.

As shown in fig. 8, the obtained Json text may be processed, the Json text may be Json data in a text format, the Json data may be character string data, and the Json text may include one or more key value pairs. The Json text is analyzed, for example, a Json parse function can be adopted to analyze the Json text to obtain a plurality of key value pairs in the Json text, wherein the plurality of key value pairs comprise a plurality of keys and values corresponding to the plurality of keys one by one. In the parsing process, the plurality of values are encoded in a parsing order to be added in a Json binary field. And according to the Json tree structure rule, constructing key tree structure data in the memory, wherein the tree structure data can be tree structure key sort trees corresponding to the sorted keys. And storing a plurality of keys in a memory by using a radix tree structure so as to compress data and further release a memory space. And then adding the plurality of keys, the key positioning data and the value positioning data to the Json binary field to obtain the Json binary field corresponding to the Json text.

As shown in fig. 8, an apend interface of Lob may be called to sequentially write the Json binary fields into the disk according to the parsing order, or an apend function may be used to add a plurality of keys and positioning data to the tail of the above binary fields to obtain the binary fields of the Json text. Therefore, the coded data of the value data can be written into the disk/hard disk at any time, and the occupation of the memory is further reduced. In some embodiments, the composite data such as the picture, the large binary object, the Json/GIS and the like cannot be stored in the memory because the composite data is too large, and can be written into the disk as required through Lob.

As shown In FIG. 8, the field may be backfilled using an In-place update (In-place update) function, such as the first field and the data length field In the Json binary field shown In FIG. 5. Therefore, the value can be stored In the disk In time without waiting for the data to be analyzed or the key data to be coded and then stored In the disk, and the data length field and the first key value pair position field can be updated only by using the In-place update function after the corresponding processing is finished.

Therefore, the method for processing the Json text provided by the embodiment of the disclosure does not need to temporarily store the value in the memory in the Json text serialization process, in other words, does not need to meet the requirement of reserving the memory with unknown size space to store the analyzed key value pair, eliminates the uncertainty of memory space occupation in the Json text serialization process, and solves the problem of serious dependence on the memory in the Json text encoding process.

Method embodiments of the present disclosure are described in detail above in conjunction with fig. 1-8, and apparatus embodiments of the present disclosure are described in detail below in conjunction with fig. 9-10. It is to be understood that the description of the method embodiments corresponds to the description of the apparatus embodiments, and therefore reference may be made to the method embodiments above for parts which are not described in detail.

Fig. 9 is a schematic structural diagram of an apparatus for processing Json text according to an embodiment of the present disclosure. The data serialization apparatus 900 shown in fig. 9 includes a parsing unit 910, a first adding unit 920, a sorting unit 930, and a second adding unit 940, and each unit is exemplarily described below.

A parsing unit 910 configured to parse the Json text to obtain a set of key-value pairs in the Josn text, the set of key-value pairs including a plurality of keys and a plurality of values in one-to-one correspondence with the plurality of keys;

a first adding unit 920, configured to sequentially add the multiple values to a Json binary field according to the parsing order of the multiple values in the process of parsing the Json text;

a sorting unit 930 configured to, after the Json text is parsed, sort the plurality of keys to obtain location data, where the location data is used to locate positions of the plurality of values in the Json binary field according to the sorted keys;

a second adding unit 940 configured to add the plurality of keys and the positioning data to the Json binary field.

Optionally, the apparatus further comprises: a storage unit 950 configured to store the plurality of keys in a memory using a radix tree structure.

Optionally, the Json binary field contains a first field, the apparatus further comprising: a determining unit 960 configured to determine the positions of the plurality of keys in the Json binary field after the adding of the plurality of keys and the positioning data to the Json binary field; an update unit 970 configured to update the first field in place such that the first field records the location of the plurality of keys in the Json binary field.

Fig. 10 is a schematic structural diagram of another apparatus for processing Json text according to an embodiment of the present disclosure. The apparatus 1000 shown in fig. 10 may include a memory 1010 and a processor 1020. The memory 1010 may be used to store executable code. The processor 1020 may be configured to execute the executable code stored in the memory 1010 to implement the steps of the various methods described above. In some embodiments, the apparatus 1020 may further include a network interface 1030, and the data exchange between the processor 1020 and the external device may be implemented through the network interface 1030.

It should be understood that in the embodiments of the present disclosure, "B corresponding to a" means that B is associated with a, from which B can be determined. It should also be understood that determining B from a does not mean determining B from a alone, but may also be determined from a and/or other information.

It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

It should be understood that, in various embodiments of the present disclosure, the sequence numbers of the above-mentioned processes do not imply an order of execution, and the order of execution of the processes should be determined by their functions and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present disclosure.

In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the disclosure are, in whole or in part, generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer readable storage medium may be any available medium that can be read by a computer or a data storage device including one or more available media integrated servers, data centers, and the like. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a Digital Versatile Disk (DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.

The above description is only for the specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present disclosure, and all the changes or substitutions should be covered within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A method of processing Json text comprising:

analyzing a Json text to obtain a key value pair set in the Json text, wherein the key value pair set comprises a plurality of keys and a plurality of values which correspond to the plurality of keys one by one;

in the process of analyzing the Json text, the values are sequentially added to the Json binary field according to the analyzing sequence of the values;

after the Json text is analyzed, sequencing the plurality of keys to obtain positioning data, wherein the positioning data is used for positioning the positions of the plurality of values in the Json binary field according to the sequenced keys;

adding the plurality of keys and the positioning data to the Json binary field.

2. The method of claim 1, further comprising:

storing the plurality of keys in a memory using a radix tree structure.

3. The method of claim 1, the Json binary field including a first field, and after the adding the plurality of keys and the positioning data to the Json binary field, the method further comprising:

determining a position of the plurality of keys in the Josn binary field;

updating the first field in-place such that the first field records the location of the plurality of keys in the Json binary field.

4. The method of claim 1, wherein the positioning data comprises a first array for recording offsets of the sorted keys and a second array for recording offsets of values corresponding to the sorted keys.

5. The method of claim 1, the adding the plurality of keys to the Json binary field, comprising:

adding the plurality of keys to the Json binary field in the order of the plurality of keys' parsing.

6. The method of claim 1, the parsing the Josn text, further comprising:

the validity of the Josn text is checked.

7. An apparatus for processing Json text, comprising:

a parsing unit configured to parse the Json text to obtain a set of key-value pairs in the Josn text, the set of key-value pairs including a plurality of keys and a plurality of values in one-to-one correspondence with the plurality of keys;

a first adding unit, configured to add the multiple values to the Json binary field in sequence according to the parsing order of the multiple values in the process of parsing the Json text;

the sorting unit is configured to sort the plurality of keys after the Json text is analyzed, so as to obtain positioning data, and the positioning data is used for positioning the positions of the plurality of values in the Json binary field according to the sorted keys;

a second adding unit configured to add the plurality of keys and the positioning data to the Json binary field.

8. The apparatus of claim 7, further comprising:

a storage unit configured to store the plurality of keys in a memory using a radix tree structure.

9. The device of claim 7, the Json binary field including a first field, the device further comprising:

a determining unit configured to determine positions of the plurality of keys in the Json binary field after the adding of the plurality of keys and the positioning data to the Json binary field;

an update unit configured to update the first field in-place such that the first field records the locations of the plurality of keys in the Json binary field.

10. The apparatus of claim 7, the positioning data comprising a first array to record offsets of sorted keys and a second array to record offsets of values corresponding to sorted keys.

11. The apparatus of claim 7, the adding unit further configured to:

adding the plurality of keys to the Json binary field in the resolved order of the plurality of keys.

12. The apparatus of claim 7, the parsing unit further configured to:

and checking the validity of the Json text.

13. An apparatus for processing Json text, comprising a memory having executable code stored therein and a processor configured to execute the executable code to implement the method of any of claims 1-6.