CN117973365A - JSON file parsing method, electronic device, storage medium and program product - Google Patents

JSON file parsing method, electronic device, storage medium and program product Download PDF

Info

Publication number
CN117973365A
CN117973365A CN202410294744.2A CN202410294744A CN117973365A CN 117973365 A CN117973365 A CN 117973365A CN 202410294744 A CN202410294744 A CN 202410294744A CN 117973365 A CN117973365 A CN 117973365A
Authority
CN
China
Prior art keywords
node
json
block
node block
tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410294744.2A
Other languages
Chinese (zh)
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Bi Ren Technology Co ltd
Original Assignee
Shanghai Bi Ren Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Bi Ren Technology Co ltd filed Critical Shanghai Bi Ren Technology Co ltd
Priority to CN202410294744.2A priority Critical patent/CN117973365A/en
Publication of CN117973365A publication Critical patent/CN117973365A/en
Pending legal-status Critical Current

Links

Abstract

The embodiment of the invention provides a JSON file analysis method, electronic equipment, a storage medium and a program product, wherein the method comprises the following steps: receiving a node block analysis request, wherein the node block analysis request comprises a node block identifier; determining corresponding node blocks in the node block tree according to the node block identifiers; the node block tree is constructed based on a node tree, the node tree is constructed according to the structural data of the JSON nodes in the JSON file, the node block comprises at least one JSON node in the same level in the node tree, and the size of the node block is smaller than a preset byte; determining a node block subtree according to the node block; the node block subtree comprises node blocks and each layer of sub-node blocks of the node blocks; according to the sequence from low to high of node block levels in the node block subtree, acquiring original text content of the node blocks according to text description information of the node blocks, analyzing the node blocks according to the original text content, and calling an analysis result of the child node blocks in an analysis process of the father node blocks. The embodiment of the invention greatly reduces the memory peak value of JSON file analysis.

Description

JSON file parsing method, electronic device, storage medium and program product
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to the technical field of artificial intelligence, and specifically relates to a JSON file analysis method, electronic equipment, a storage medium and a program product.
Background
JSON (JavaScript Object Notation, JS object tag) is a lightweight data interchange format. Based on a subset of the JS specifications formulated by ECMAScript (w 3 c), the hierarchical structure is concise and clear, and data is stored and represented in a text format completely independent of the programming language.
As a widely used general file format, the first step in analyzing and processing is to load and parse JSON files into a memory, so as to facilitate subsequent further processing. The existing method for loading and analyzing the JSON file is to load the JSON file into the memory entirely and then analyze the JSON file once. The scheme of one-time analysis of the integral loading needs to distribute a large amount of memory at one time, and when the available memory of the system is insufficient, the risk of failure and even breakdown exists, so that the maximum size of the resolvable JSON file is limited.
Disclosure of Invention
Aiming at the defect that a large amount of memory needs to be allocated at one time to analyze the existing integrally loaded JSON file at one time, and the maximum size of the resolvable JSON file is limited, the embodiment of the invention provides a JSON file analysis method, electronic equipment, a storage medium and a program product.
The embodiment of the invention provides a JSON file analysis method, which comprises the following steps: receiving a node block analysis request; the node block analysis request comprises a node block identifier; determining corresponding node blocks in a node block tree according to the node block identifiers; the node block tree is constructed based on a node tree, the node tree is constructed according to the structural data of the JSON nodes in the JSON file, the node block comprises at least one adjacent JSON node of the same level in the node tree, and the size of the node block is smaller than a preset byte; determining a node block subtree according to the node block; wherein the node block subtree comprises the node blocks and each layer of sub-node blocks of the node blocks, and the node block subtree is a subset of the node block tree; according to the sequence from low to high of the levels of the node blocks in the node block subtrees, acquiring original text contents of the node blocks according to the text description information of the node blocks, and analyzing the node blocks according to the original text contents; the hierarchy of the father node block is higher than that of the child node block, and the analysis process of the father node block calls the analysis result of the child node block.
According to the JSON file analysis method provided by the embodiment of the invention, the method further comprises the following steps: and for the node blocks at the same layer, according to a preset thread number, executing the actions of acquiring the original text content of the node blocks according to the text description information of the node blocks in parallel and analyzing the node blocks according to the original text content.
According to the JSON file analysis method provided by the embodiment of the invention, the method further comprises the following steps: acquiring the structured data of the JSON file; wherein the structured data comprises a recursive nested relationship of JSON nodes; constructing the node tree according to the structured data; and for the JSON nodes at the same layer in the node tree, merging adjacent JSON nodes into one node block according to the preset bytes to obtain the node block tree.
According to the JSON file parsing method provided by the embodiment of the present invention, before the structured data of the JSON file is obtained, the method further includes: asynchronous loading is carried out on the JSON file after blocking by utilizing multithreading, and the text description information of JSON nodes in the JSON file is recorded; the text description information comprises position information and text content length information of the JSON node.
According to the method for parsing the JSON file provided by the embodiment of the present invention, for the JSON nodes at the same layer in the node tree, adjacent JSON nodes are merged into one node block according to the preset byte, so as to obtain the node block tree, which includes: constructing a collector of nodes at the same layer, and traversing JSON nodes in the node tree layer by layer; wherein, the same-layer JSON nodes in the node tree are traversed, and the following processing is carried out: emptying the collector of the same-layer node; in response to the sum of text content lengths corresponding to existing JSON nodes in the collector plus text content length corresponding to the currently traversed JSON node being greater than the preset byte, merging the existing JSON nodes in the collector into a node block, emptying the collector, and adding the currently traversed JSON node into the collector; adding the JSON node into the collector in response to the sum of the text content lengths corresponding to the existing JSON nodes in the collector and the sum of the text content lengths corresponding to the currently traversed JSON nodes being smaller than or equal to the preset bytes; and merging JSON nodes existing in the collector into a node block in response to the completion of the traversal of the JSON nodes of the current layer.
According to the JSON file parsing method provided by the embodiment of the present invention, the obtaining the structured data of the JSON file includes: and acquiring the structured data of the JSON file preset level.
According to the JSON file parsing method provided by the embodiment of the present invention, before parsing the node block according to the original text content, the method further includes: and correcting the original text content in response to the format problem of the original text content so as to enable the original text content to conform to the JSON format specification.
The embodiment of the invention also provides a JSON file analysis device, which comprises: a receiving module for: receiving a node block analysis request; the node block analysis request comprises a node block identifier; a first determining module, configured to: determining corresponding node blocks in a node block tree according to the node block identifiers; the node block tree is constructed based on a node tree, the node tree is constructed according to the structural data of the JSON nodes in the JSON file, the node block comprises at least one adjacent JSON node of the same level in the node tree, and the size of the node block is smaller than a preset byte; a second determining module, configured to: determining a node block subtree according to the node block; wherein the node block subtree comprises the node blocks and each layer of sub-node blocks of the node blocks, and the node block subtree is a subset of the node block tree; the analysis module is used for: according to the sequence from low to high of the levels of the node blocks in the node block subtrees, acquiring original text contents of the node blocks according to the text description information of the node blocks, and analyzing the node blocks according to the original text contents; the hierarchy of the father node block is higher than that of the child node block, and the analysis process of the father node block calls the analysis result of the child node block.
The embodiment of the invention also provides electronic equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps of any one of the JSON file parsing methods when executing the program.
The embodiment of the invention also provides a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of any of the JSON file parsing methods described above.
The embodiment of the invention also provides a computer program product, which comprises a computer program, wherein the computer program realizes the steps of any one of the JSON file parsing methods when being executed by a processor.
According to the JSON file analysis method, electronic equipment, storage medium and program product, through receiving the node block analysis request, corresponding node blocks in the node block tree are determined according to the node block identification in the node block analysis request, the size of the node blocks is smaller than a preset byte, a node block subtree is determined according to the node blocks, the original text content of the node blocks is acquired according to the sequence from low to high of the node block level in the node block subtree, the node blocks are analyzed according to the text description information of the node blocks, the size of the node blocks is limited due to the limitation on the size of the node blocks, and after the node block analysis request is received, the node block subtree is constructed, the original text content of the node blocks in the node block subtree is analyzed, and compared with the one-time analysis of integrally loading the JSON file, the memory peak value of JSON file analysis is greatly reduced, and the JSON file analysis with larger size can be used.
Drawings
In order to more clearly illustrate the technical solutions of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a JSON file parsing method according to an embodiment of the present invention;
FIG. 2 is a second flowchart of a JSON file parsing method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram showing the effects of a JSON file parsing method according to an embodiment of the present invention;
FIG. 4 is a second schematic diagram illustrating the effect of the JSON file parsing method according to the embodiment of the present invention;
fig. 5 is a schematic structural diagram of a JSON file parsing device according to an embodiment of the present invention;
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The JSON file analysis method provided by the embodiment of the invention is used for improving the overall efficiency of loading and analyzing the JSON file, so that the loadable JSON file is larger in size, less in consumed memory and higher in loading speed. The JSON file analysis method provided by the embodiment of the invention can be used for loading large (large-size) JSON files, and the large JSON files are widely applied to various business scenes, such as performance analysis of a large model deep learning model.
Fig. 1 is a schematic flow chart of a JSON file parsing method according to an embodiment of the present invention. As shown in fig. 1, the method includes:
s1, receiving a node block analysis request; wherein the node block resolution request includes a node block identification.
The JSON file analysis method provided by the embodiment of the invention can be applied to a JSON file analysis device. The JSON file analysis device can provide an inquiry interface to the outside, and receives the node block analysis request sent by the outside through the inquiry interface. The node block resolution request includes a node block identification.
Step S2, corresponding node blocks in the node block tree are determined according to the node block identifiers; the node block tree is constructed based on a node tree, the node tree is constructed according to the structural data of the JSON nodes in the JSON file, the node block comprises at least one adjacent JSON node of the same level in the node tree, and the size of the node block is smaller than a preset byte.
The node block tree is a tree structure composed of node blocks. The node block tree is constructed based on a node tree, the node tree is a tree structure formed by JSON nodes and constructed according to the structural data of the JSON nodes in the JSON file, the structural data of the JSON nodes comprises a recursion nested relation of the JSON nodes, and the tree structure formed by the JSON nodes is constructed according to the recursion nested relation of the JSON nodes.
The node block tree is a result of simplifying the node tree, wherein node blocks in the node block tree comprise at least one adjacent JSON node in the same level in the node tree, and the size of the node blocks is smaller than a preset byte.
The node block tree maintains the name information of the node blocks, and the node blocks with specified names can be queried by comparing the names one by one. The node block identification may be a node block name. And determining the corresponding node blocks in the node block tree through the node block identification.
Step S3, determining a node block subtree according to the node block; the node block subtree comprises the node blocks and all layers of sub-node blocks of the node blocks, and the node block subtree is a subset of the node block tree.
And determining corresponding node blocks according to the node block identifiers, and determining node block subtrees according to the corresponding node blocks, wherein the node block subtrees are subsets of a node block tree. The node block subtree takes the node block corresponding to the node block identifier as a root node and comprises the node block and each layer of sub-node blocks of the node block.
S4, acquiring original text content of the node blocks according to the text description information of the node blocks in the order from low to high of the levels of the node blocks in the node block subtrees, and analyzing the node blocks according to the original text content; the hierarchy of the father node block is higher than that of the child node block, and the analysis process of the father node block calls the analysis result of the child node block.
The analysis process of the father node block calls the analysis result of the child node block, so that when the node blocks in the node block subtree are analyzed, the analysis is performed according to the order of the node block levels in the node block subtree from low to high. Specifically, according to the order of node block levels in the node block subtrees from low to high, acquiring original text contents of the node blocks according to text description information of the node blocks, and analyzing the node blocks according to the original text contents. After the node blocks in the node block subtrees are analyzed from low to high in the hierarchy, the analysis result of the node block corresponding to the node block identifier can be obtained, and the analysis result can be returned to the inquirer. The text description information of the node block can be acquired based on the text description information of the JSON node in the node block.
When the node blocks in the node block subtree are analyzed, the original text content needs to be analyzed, and the original text content can be analyzed through a traditional JSON file analysis tool.
According to the JSON file analysis method provided by the embodiment of the invention, the node block analysis request is received, the corresponding node block in the node block tree is determined according to the node block identification in the node block analysis request, the size of the node block is smaller than the preset byte, the node block subtree is determined according to the node block, the original text content of the node block is acquired according to the text description information of the node block according to the sequence from low to high of the node block level in the node block subtree, the node block is analyzed according to the original text content, the size of the node block is limited, and after the node block analysis request is received, the node block subtree is constructed, and the original text content of the node block in the node block subtree is analyzed.
According to the JSON file analysis method provided by the embodiment of the invention, the method further comprises the following steps: and for the node blocks at the same layer, according to a preset thread number, executing the actions of acquiring the original text content of the node blocks according to the text description information of the node blocks in parallel and analyzing the node blocks according to the original text content.
The JSON file is regarded as a data stream, and the problem that one-time analysis causes a larger memory peak value can be avoided by using stream analysis. Although the flow type analysis scheme solves the problem of memory peak value, based on the self characteristics of flow type analysis, the analysis mode can not analyze and process data in parallel, so that the analysis efficiency is low.
According to the JSON file analysis method provided by the embodiment of the invention, the node blocks at the same layer in the node block tree support parallel analysis. Therefore, when the original text content of the node blocks is obtained according to the text description information of the node blocks according to the order from low to high of the node block levels in the node block subtrees, and the node blocks in the same layer can be analyzed according to the original text content, the multithreading parallel analysis can be utilized for the node blocks in the same layer. For example, according to the preset thread number, the actions of acquiring the original text content of the node block according to the text description information of the node block and analyzing the node block according to the original text content can be executed in parallel, so that the analysis efficiency is improved.
The original text content size of each node block in the node block tree is smaller than the estimated size (preset byte), and the general JSON parallel analysis memory peak value has the following formula:
peak memory = node Tree total memory + node Block Tree memory + thread count (node Block maximum size + parser spent memory)
The parser consumes memory and the maximum size of the node block are in a certain proportional relation, which is about 2.5-5.
According to the JSON file analysis method provided by the embodiment of the invention, the original text content of the node blocks is obtained according to the text description information of the node blocks according to the preset thread number by executing the operation of analyzing the node blocks according to the original text content in parallel for the node blocks of the same layer, so that the analysis efficiency is greatly improved.
According to the JSON file analysis method provided by the embodiment of the invention, the method further comprises the following steps: acquiring the structured data of the JSON file; wherein the structured data comprises a recursive nested relationship of JSON nodes; constructing the node tree according to the structured data; and for the JSON nodes at the same layer in the node tree, merging adjacent JSON nodes into one node block according to the preset bytes to obtain the node block tree.
Before receiving the node block parsing request, it is necessary to construct a node tree in advance and construct a node block tree based on the node tree.
When a node tree is constructed, obtaining the structured data of the JSON file; wherein the structured data comprises a recursive nested relationship of JSON nodes; and constructing a node tree according to the structured data. By character-by-character analysis of the original text content of the loaded JSON file, other irrelevant character information is ignored, and characters representing JSON nodes, such as [ ] { } ", are identified and extracted. The node type of the position corresponding to the JSON character [ ] is an array, the node type of the position corresponding to the JSON character { } is a dictionary (or called an object), other arrays and dictionaries can be contained in the array and the dictionary, and common value content can also be contained in the array and the dictionary, the node is a data structure used for representing one section of content of the JSON file in the analysis process, the node corresponding to the array and the dictionary is an intermediate node, the node corresponding to the value content is a leaf node, and thus the array, the dictionary and the value form a node tree. The node tree is created from JSON node recursive nested relationships.
When a node block tree is constructed according to the node tree, for the JSON nodes at the same layer in the node tree, merging adjacent JSON nodes into a node block according to preset bytes, namely the size of the node block is required to be smaller than or equal to the preset bytes, and merging the JSON nodes adjacent at the same layer in the node tree according to the preset bytes of the node block to obtain the node block tree.
According to the JSON file analysis method provided by the embodiment of the invention, the node tree is constructed according to the structured data by acquiring the structured data of the JSON file, and for the JSON nodes on the same layer in the node tree, adjacent JSON nodes are combined into one node block according to the preset bytes, so that the node block tree is obtained, and the construction of the node tree and the node block tree is realized.
According to the JSON file parsing method provided by the embodiment of the present invention, before the structured data of the JSON file is obtained, the method further includes: asynchronous loading is carried out on the JSON file after blocking by utilizing multithreading, and the text description information of JSON nodes in the JSON file is recorded; the text description information comprises position information and text content length information of the JSON node.
The JSON file can be segmented, and the segmented JSON file is asynchronously loaded by utilizing multithreading. A random location reader for large text files can be constructed so that multiple threads can read the text content of the original JSON file at multiple different locations of a given file simultaneously. And using a random position reader to successively read the original text data blocks of the JSON file in advance in a plurality of threads, and ensuring that the front and back links of the original text data blocks are not intersected with each other.
After receiving the pre-fetching instruction of the main thread, the threads in the thread pool can asynchronously load the partitioned JSON file by utilizing the multithreading. When the main thread needs the original text data block, a new prefetch instruction is triggered immediately. The blocking asynchronous loading is to avoid the memory peak value and the extra long waiting time caused by loading the whole file into the memory, so that the subsequent file analysis and the file reading are parallel. Wherein file reading is also referred to as file loading.
For simple processing, the JSON file content can be sequentially segmented in equal size. The method comprises the steps of sequentially carrying out preliminary analysis on the loaded JSON file content of each block by using an analyzer, and obtaining and recording text description information of JSON nodes in the JSON file; the text description information comprises position information of the JSON node and text content length information. Each node records the starting position of the corresponding JSON content in the file and the text content length, so that the corresponding node content can be read in parallel according to the position of the corresponding JSON content and the text content length.
According to the JSON file analysis method provided by the embodiment of the invention, the partitioned JSON file is asynchronously loaded by utilizing the multithreading, and the text description information of the JSON nodes in the JSON file is recorded, wherein the text description information comprises the position information and the text content length information of the JSON nodes, so that the analysis efficiency of the JSON file is further improved, and the acquisition of the text description information of the JSON nodes is realized.
According to the method for parsing the JSON file provided by the embodiment of the present invention, for the JSON nodes at the same layer in the node tree, adjacent JSON nodes are merged into one node block according to the preset byte, so as to obtain the node block tree, which includes: constructing a collector of nodes at the same layer, and traversing JSON nodes in the node tree layer by layer; wherein, the same-layer JSON nodes in the node tree are traversed, and the following processing is carried out: emptying the collector of the same-layer node; in response to the sum of text content lengths corresponding to existing JSON nodes in the collector plus text content length corresponding to the currently traversed JSON node being greater than the preset byte, merging the existing JSON nodes in the collector into a node block, emptying the collector, and adding the currently traversed JSON node into the collector; adding the JSON node into the collector in response to the sum of the text content lengths corresponding to the existing JSON nodes in the collector and the sum of the text content lengths corresponding to the currently traversed JSON nodes being smaller than or equal to the preset bytes; and merging JSON nodes existing in the collector into a node block in response to the completion of the traversal of the JSON nodes of the current layer.
A plurality of continuous adjacent JSON nodes with the same level (same layer) in the node tree are combined into a node block, each JSON node has and can only have one node block corresponding to the node block, and the total size of the node block is smaller than the estimated size (preset byte). Simplifying the node tree according to the estimated size of the given node block, creating the node block tree, updating name information and completing the segmentation of the JSON file based on the JSON node boundary.
The node block tree is based on the purpose of parallel analysis, the node tree is simplified, and a plurality of adjacent JSON nodes in the same layer in the node tree are combined into one node block, so that independent parallel analysis can be carried out on each node block in a plurality of threads through a plurality of traditional JSON resolvers. The node blocks are divided into an intermediate node block and a leaf node block, and the hierarchical relationship of the node blocks is derived from the hierarchical relationship of the corresponding nodes in the node tree.
In order to solve the multi-thread parallel, a plurality of nodes in the same layer are combined into a node block, each subsequent thread can independently use a traditional JSON analyzer to analyze the node block, the estimated size is too small to cause the excessive number of the node blocks, the estimated size is too large to cause the independent analysis to consume too high memory, and the actual estimated size is only 4 Mb. Of course, this is just a recommended estimated size, which is actually determined according to the system resource situation and the data processing requirement.
Constructing a collector of nodes at the same layer, and traversing JSON nodes in a node tree layer by layer; wherein, traversing the JSON nodes at the same layer in the node tree, and processing the JSON nodes at the current layer as follows:
1) Emptying collector
2) Begin traversing each JSON node
If the condition is satisfied: the method comprises the steps that a node block is built according to the existing plurality of JSON nodes in a collector, meanwhile, the collector is emptied, and the currently traversed JSON nodes are added into the collector;
Otherwise, directly adding the JSON node currently traversed to the tail of the collector;
3) JSON node end of traversing current layer
4) A node block is constructed from a plurality of JSON nodes present in the collector.
And merging adjacent JSON nodes of the same layer into node blocks according to preset bytes to obtain a node block tree.
According to the JSON file analysis method provided by the embodiment of the invention, through clearing the collector of the nodes at the same layer, a node block is built according to the existing JSON nodes in the collector in response to the fact that the sum of the text content lengths corresponding to the existing JSON nodes in the collector plus the text content length corresponding to the currently traversed JSON nodes is larger than a preset byte, the collector is cleared, the currently traversed JSON nodes are added into the collector, in response to the fact that the sum of the text content lengths corresponding to the existing JSON nodes in the collector plus the text content length corresponding to the currently traversed JSON nodes is smaller than or equal to the preset byte, the JSON nodes are added into the collector, a node block is built according to the JSON nodes in the collector in response to the fact that the JSON nodes at the current layer are traversed, and the rapid building of the node block tree is realized by the aid of the collector.
According to the JSON file parsing method provided by the embodiment of the present invention, the obtaining the structured data of the JSON file includes: and acquiring the structured data of the JSON file preset level.
And when the node tree is constructed, obtaining the structured data of the JSON file, and constructing the node tree according to the structured data.
Considering that the node tree is finally simplified into a node block tree, because the node blocks are smaller than the estimated size (preset bytes), the deep node information does not help to the node block segmentation, so that the node number of the node tree can be further reduced by omitting the nodes with deep levels when the node tree is constructed, and the construction efficiency of the node tree is improved.
The structured data of the JSON file includes a recursive nested relationship of JSON nodes, embodying the hierarchy of JSON nodes. Therefore, when the structured data of the JSON file is acquired, the structured data of a preset hierarchy of the JSON file, such as three layers of structured data, can be acquired.
According to the JSON file analysis method provided by the embodiment of the invention, the processing efficiency is further improved by acquiring the structured data of the preset hierarchy of the JSON file.
According to the JSON file parsing method provided by the embodiment of the present invention, before parsing the node block according to the original text content, the method further includes: and correcting the original text content in response to the format problem of the original text content so as to enable the original text content to conform to the JSON format specification.
And acquiring corresponding original text content according to the node blocks, and analyzing the node blocks according to the original text content corresponding to the node blocks. Since the original text content corresponding to the node block is part of the content of the JSON file and may not conform to the JSON format specification, if the original text content has a format problem before the node block is parsed according to the original text content, the original text content is revised so that the original text content conforms to the JSON format specification. For example, according to the type of JSON node in the original text content corresponding to the node block, the head and tail are added [ ] or { }, so as to ensure that the JSON format specification is met.
After receiving the node block analysis request, the node blocks with specified names can be queried by comparing the names one by one. And then all the sub-node blocks of the node block are acquired, and a node block subtree is constructed. In a plurality of threads in a thread pool, according to the position information of node blocks in the node block subtrees and the text content length information, respectively reading corresponding original text content according to the sequence from low to high of the hierarchy, and carrying out proper correction on the content (such as adding [ ] from head to tail according to the node types in the node blocks so as to ensure to meet the JSON format specification), and then analyzing and processing each node block by using a traditional JSON analyzer.
According to the JSON file analysis method provided by the embodiment of the invention, the original text content is corrected by responding to the format problem of the original text content, so that the original text content accords with the JSON format specification, and the JSON file analysis reliability is improved.
FIG. 2 is a second flowchart of a JSON file parsing method according to an embodiment of the present invention. As shown in fig. 2, the method includes:
At step ①, splitting the large JSON file into a plurality of blocks, creating a random reader to load the split JSON file via multithreading; the content dividing blocks are the results of sequentially dividing the JSON file content into equal sizes;
In step ②, asynchronously loading the diced JSON file using multithreading chunking;
In step ③, a node tree is constructed by parsing the structured data in the JSON file;
In step ④, for the JSON nodes at the same layer in the node tree, merging adjacent JSON nodes into a node block according to the preset bytes to obtain a node block tree;
In step ⑤, after receiving the node block parsing request, the requested node block is obtained, and the original text content corresponding to the node block is parsed. When the original text content corresponding to the node block is analyzed, a node block subtree can be constructed according to the node block and the sub-node blocks of the node block, the original text content of the node block is obtained according to the text description information of the node block according to the order from low to high of the node block level in the node block subtree, and the node block is analyzed according to the original text content.
The embodiment of the invention firstly scans the whole JSON file in a blocking manner, analyzes the structured data of the JSON file character by character, such as symbols [ ] { }, establishes a node tree according to the structured data of the JSON file, then cuts the JSON file into node blocks by node boundaries based on the node tree, establishes a node block tree according to the node blocks, finally uses a traditional JSON analyzer to independently conduct parallelization analysis on the same-layer node blocks, and finally achieves the aim of low memory peak value and parallel analysis processing of the large JSON file.
Traditional parsers can only load JSON content in and then parse it once (which consumes too much memory), or consider JSON files as progressive parsing of the data stream (which cannot be done in parallel). The high memory scheme of the traditional parser is suitable for small-size JSON data, such as parsing a single node block.
Fig. 3 is a schematic diagram of an effect of the JSON file parsing method according to an embodiment of the present invention. As shown in fig. 3, the former two types of chrome trace JSON files with 2.2gb loading respectively using simdjson and rapidjson, the detected memory peak values are 11000mb and 5900mb respectively, and the latter four types of memory peak values are detected in four optimization stages of the JSON parser constructed by using the JSON file parsing method provided by the embodiment of the present invention. Finally, a depth filter is used to further reduce the memory peak.
Wherein:
simdjson: an open source JSON parsing library that uses simd instructions to accelerate JSON processing.
Rapidjson: an open source JSON parsing library for fast parsing JSON.
Chrome trace-a performance viewing tool with a chrome browser in-band.
FIG. 4 is a diagram showing a second effect of the JSON file parsing method according to the embodiment of the present invention. The JSON file analysis method provided by the embodiment of the invention is applied to a performance analysis tool, and the effect of analyzing a large JSON file is good. As shown in fig. 4, a plurality of large-size chrome trace JSON files are loaded and processed, and compared with a trace_processor (perfetto component of perfetto, which analyzes and loads trace files for cross-process access), and the loading time, the memory peak value and the session memory are respectively compared, so that the effect of analyzing large JSON files by the JSON file analysis method provided by the embodiment of the present invention is good. Wherein perfetto is a performance analysis tool.
The JSON file analysis method provided by the embodiment of the invention has the following advantages:
1. The memory peak value is low: the partitioning analysis based on node cutting is used, so that only a memory with a partitioning size is needed for each analysis, the memory peak value is greatly eliminated, and the whole large JSON file is not required to be loaded at one time, thereby supporting the JSON file with a larger size in the same environment;
2. and (3) repeated use: the cutting information can be stored and reused independently or prepared in advance, so that the processing is quicker;
3. Parallel analysis: each data block based on node boundary cutting can be independently loaded and analyzed in parallel without dependence;
4. The fault tolerance is good: the chrome trace JSON format requires that even if the end symbol such as "] }" is lacking at the end of the file, the end symbol can be resolved, the fault tolerance requirement presents difficulty for resolving the large-scale JSON file, the common parser can not process the problem, or the processing needs a lot of time expenditure, the partitioning parsing is used, the low-cost correction is carried out on the head and the tail of the partitioning data, and the fault tolerance is better.
5. The analysis speed is high: the asynchronous block loading and the node block parallel analysis enable the analysis processing of the large-scale JSON file to be greatly improved compared with the analysis speed of the complete loading single-thread analysis scheme of the traditional JSON analyzer.
The preferred embodiments of the present embodiment may be freely combined on the premise that the logic or structure does not conflict with each other, and the present invention is not limited to this.
The JSON file analysis device provided by the embodiment of the present invention is described below, and the JSON file analysis device described below and the JSON file analysis method described above can be referred to correspondingly.
Fig. 5 is a schematic structural diagram of a JSON file parsing device according to an embodiment of the present invention. As shown in fig. 5, the apparatus includes a receiving module 10, a first determining module 20, a second determining module 30, and an analyzing module 40, where: the receiving module 10 is configured to: receiving a node block analysis request; the node block analysis request comprises a node block identifier; the first determining module 20 is configured to: determining corresponding node blocks in a node block tree according to the node block identifiers; the node block tree is constructed based on a node tree, the node tree is constructed according to the structural data of the JSON nodes in the JSON file, the node block comprises at least one adjacent JSON node of the same level in the node tree, and the size of the node block is smaller than a preset byte; the second determining module 30 is configured to: determining a node block subtree according to the node block; wherein the node block subtree comprises the node blocks and each layer of sub-node blocks of the node blocks, and the node block subtree is a subset of the node block tree; the parsing module 40 is configured to: according to the sequence from low to high of the levels of the node blocks in the node block subtrees, acquiring original text contents of the node blocks according to the text description information of the node blocks, and analyzing the node blocks according to the original text contents; the hierarchy of the father node block is higher than that of the child node block, and the analysis process of the father node block calls the analysis result of the child node block.
According to the JSON file analysis device provided by the embodiment of the invention, through receiving the node block analysis request, corresponding node blocks in the node block tree are determined according to the node block identification in the node block analysis request, the size of the node blocks is smaller than a preset byte, the node block subtree is determined according to the node blocks, the original text content of the node blocks is acquired according to the text description information of the node blocks in the order from low to high of the node block level in the node block subtree, the node blocks are analyzed according to the original text content, the size of the node blocks is limited, and after the node block analysis request is received, the node block subtree is constructed, and the original text content of the node blocks in the node block subtree is analyzed.
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, as shown in fig. 6, the electronic device may include: processor 610, communication interface (Communications Interface) 620, memory 630, and communication bus 640, wherein processor 610, communication interface 620, memory 630 communicate with each other via communication bus 640. The processor 610 may call logic instructions in the memory 630 to perform JSON file parsing method that includes: receiving a node block analysis request; the node block analysis request comprises a node block identifier; determining corresponding node blocks in a node block tree according to the node block identifiers; the node block tree is constructed based on a node tree, the node tree is constructed according to the structural data of the JSON nodes in the JSON file, the node block comprises at least one adjacent JSON node of the same level in the node tree, and the size of the node block is smaller than a preset byte; determining a node block subtree according to the node block; wherein the node block subtree comprises the node blocks and each layer of sub-node blocks of the node blocks, and the node block subtree is a subset of the node block tree; according to the sequence from low to high of the levels of the node blocks in the node block subtrees, acquiring original text contents of the node blocks according to the text description information of the node blocks, and analyzing the node blocks according to the original text contents; the hierarchy of the father node block is higher than that of the child node block, and the analysis process of the father node block calls the analysis result of the child node block.
Further, the logic instructions in the memory 630 may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program, where the computer program may be stored on a non-transitory computer readable storage medium, and when the computer program is executed by a processor, the computer is capable of executing the JSON file parsing method provided by the above methods, where the method includes: receiving a node block analysis request; the node block analysis request comprises a node block identifier; determining corresponding node blocks in a node block tree according to the node block identifiers; the node block tree is constructed based on a node tree, the node tree is constructed according to the structural data of the JSON nodes in the JSON file, the node block comprises at least one adjacent JSON node of the same level in the node tree, and the size of the node block is smaller than a preset byte; determining a node block subtree according to the node block; wherein the node block subtree comprises the node blocks and each layer of sub-node blocks of the node blocks, and the node block subtree is a subset of the node block tree; according to the sequence from low to high of the levels of the node blocks in the node block subtrees, acquiring original text contents of the node blocks according to the text description information of the node blocks, and analyzing the node blocks according to the original text contents; the hierarchy of the father node block is higher than that of the child node block, and the analysis process of the father node block calls the analysis result of the child node block.
In still another aspect, an embodiment of the present invention further provides a non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor is implemented to perform the JSON file parsing method provided by the above methods, the method including: receiving a node block analysis request; the node block analysis request comprises a node block identifier; determining corresponding node blocks in a node block tree according to the node block identifiers; the node block tree is constructed based on a node tree, the node tree is constructed according to the structural data of the JSON nodes in the JSON file, the node block comprises at least one adjacent JSON node of the same level in the node tree, and the size of the node block is smaller than a preset byte; determining a node block subtree according to the node block; wherein the node block subtree comprises the node blocks and each layer of sub-node blocks of the node blocks, and the node block subtree is a subset of the node block tree; according to the sequence from low to high of the levels of the node blocks in the node block subtrees, acquiring original text contents of the node blocks according to the text description information of the node blocks, and analyzing the node blocks according to the original text contents; the hierarchy of the father node block is higher than that of the child node block, and the analysis process of the father node block calls the analysis result of the child node block.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. The JSON file parsing method is characterized by comprising the following steps:
receiving a node block analysis request; the node block analysis request comprises a node block identifier;
Determining corresponding node blocks in a node block tree according to the node block identifiers; the node block tree is constructed based on a node tree, the node tree is constructed according to the structural data of the JSON nodes in the JSON file, the node block comprises at least one adjacent JSON node of the same level in the node tree, and the size of the node block is smaller than a preset byte;
Determining a node block subtree according to the node block; wherein the node block subtree comprises the node blocks and each layer of sub-node blocks of the node blocks, and the node block subtree is a subset of the node block tree;
According to the sequence from low to high of the levels of the node blocks in the node block subtrees, acquiring original text contents of the node blocks according to the text description information of the node blocks, and analyzing the node blocks according to the original text contents; the hierarchy of the father node block is higher than that of the child node block, and the analysis process of the father node block calls the analysis result of the child node block.
2. The JSON file parsing method of claim 1, further comprising:
And for the node blocks at the same layer, according to a preset thread number, executing the actions of acquiring the original text content of the node blocks according to the text description information of the node blocks in parallel and analyzing the node blocks according to the original text content.
3. The JSON file parsing method of claim 1, further comprising:
Acquiring the structured data of the JSON file; wherein the structured data comprises a recursive nested relationship of JSON nodes;
constructing the node tree according to the structured data;
and for the JSON nodes at the same layer in the node tree, merging adjacent JSON nodes into one node block according to the preset bytes to obtain the node block tree.
4. A JSON file parsing method as claimed in claim 3, in which prior to the retrieval of the structured data of the JSON file, the method further comprises:
asynchronous loading is carried out on the JSON file after blocking by utilizing multithreading, and the text description information of JSON nodes in the JSON file is recorded; the text description information comprises position information and text content length information of the JSON node.
5. The JSON file parsing method according to claim 4, wherein for the JSON nodes at the same layer in the node tree, merging adjacent JSON nodes into one node block according to the preset byte to obtain the node block tree, including:
Constructing a collector of nodes at the same layer, and traversing JSON nodes in the node tree layer by layer; wherein, the same-layer JSON nodes in the node tree are traversed, and the following processing is carried out:
Emptying the collector of the same-layer node;
In response to the sum of text content lengths corresponding to existing JSON nodes in the collector plus text content length corresponding to the currently traversed JSON node being greater than the preset byte, merging the existing JSON nodes in the collector into a node block, emptying the collector, and adding the currently traversed JSON node into the collector;
Adding the JSON node into the collector in response to the sum of the text content lengths corresponding to the existing JSON nodes in the collector and the sum of the text content lengths corresponding to the currently traversed JSON nodes being smaller than or equal to the preset bytes;
And merging JSON nodes existing in the collector into a node block in response to the completion of the traversal of the JSON nodes of the current layer.
6. A JSON file parsing method as claimed in claim 3, in which the obtaining the structured data of the JSON file comprises:
and acquiring the structured data of the JSON file preset level.
7. The JSON file parsing method of claim 1, further comprising, prior to said parsing the node blocks from the original text content:
And correcting the original text content in response to the format problem of the original text content so as to enable the original text content to conform to the JSON format specification.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor performs the steps of the JSON file parsing method of any one of claims 1 to 7 when the program is executed.
9. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements the steps of the JSON file parsing method of any one of claims 1 to 7.
10. A computer program product comprising a computer program which when executed by a processor implements the steps of the JSON file parsing method of any one of claims 1 to 7.
CN202410294744.2A 2024-03-14 2024-03-14 JSON file parsing method, electronic device, storage medium and program product Pending CN117973365A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410294744.2A CN117973365A (en) 2024-03-14 2024-03-14 JSON file parsing method, electronic device, storage medium and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410294744.2A CN117973365A (en) 2024-03-14 2024-03-14 JSON file parsing method, electronic device, storage medium and program product

Publications (1)

Publication Number Publication Date
CN117973365A true CN117973365A (en) 2024-05-03

Family

ID=90848074

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410294744.2A Pending CN117973365A (en) 2024-03-14 2024-03-14 JSON file parsing method, electronic device, storage medium and program product

Country Status (1)

Country Link
CN (1) CN117973365A (en)

Similar Documents

Publication Publication Date Title
US9396172B2 (en) Method for data chunk partitioning in XML parsing and method for XML parsing
CN101329665A (en) Method for analyzing marking language document and analyzer
CN110502227B (en) Code complement method and device, storage medium and electronic equipment
CN110737466A (en) Source code coding sequence representation method based on static program analysis
CN113032362A (en) Data blood margin analysis method and device, electronic equipment and storage medium
CN113986241B (en) Configuration method and device of business rules based on knowledge graph
CN115543402B (en) Software knowledge graph increment updating method based on code submission
CN110347390B (en) Method, storage medium, equipment and system for rapidly generating WEB page
CN114610957A (en) Data processing method, device, equipment and computer storage medium
CN116341513A (en) Multi-source heterogeneous log data analysis method based on semantic enhancement
CN110209387B (en) Method and device for generating top-level HDL file and computer readable storage medium
CN111611788B (en) Data processing method and device, electronic equipment and storage medium
CN114124918A (en) Message parsing method and device
CN117973365A (en) JSON file parsing method, electronic device, storage medium and program product
CN111061927B (en) Data processing method and device and electronic equipment
CN115567166B (en) Method for carrying out bus data parallel computing decoding by using GPU
CN111581057A (en) General log analysis method, terminal device and storage medium
CN105512237A (en) Data introduction system with complex structure
CN111768767B (en) User tag extraction method and device, server and computer readable storage medium
CN112114812B (en) Grammar checking method applied to industrial robot programming language
CN110109672B (en) Analysis processing method and device for expression
CN112948419A (en) Query statement processing method and device
CN113971044A (en) Component document generation method, device, equipment and readable storage medium
CN113868249A (en) Data storage method and device, computer equipment and storage medium
CN113094122A (en) Method and device for executing data processing script

Legal Events

Date Code Title Description
PB01 Publication