WO2019148671A1 - Xml file parsing method, device, computer apparatus, and storage medium - Google Patents

Xml file parsing method, device, computer apparatus, and storage medium Download PDF

Info

Publication number
WO2019148671A1
WO2019148671A1 PCT/CN2018/084227 CN2018084227W WO2019148671A1 WO 2019148671 A1 WO2019148671 A1 WO 2019148671A1 CN 2018084227 W CN2018084227 W CN 2018084227W WO 2019148671 A1 WO2019148671 A1 WO 2019148671A1
Authority
WO
WIPO (PCT)
Prior art keywords
xml file
child node
node object
target
text
Prior art date
Application number
PCT/CN2018/084227
Other languages
French (fr)
Chinese (zh)
Inventor
杨启正
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019148671A1 publication Critical patent/WO2019148671A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • G06F8/427Parsing

Definitions

  • the present application relates to the field of computer technologies, and in particular, to an XML file parsing method, apparatus, computer device, and storage medium.
  • Linux commands are generally used to parse XML files in full. However, when using the Linux command to parse the XML file, it takes up more memory space and takes longer. In particular, when multiple XML files need to be parsed, it is easy to cause memory overflow errors by parsing multiple XML files in parallel. If multiple XML files are serially parsed, it takes longer and the parsing efficiency is low.
  • the embodiment of the present application provides an XML file parsing method, apparatus, computer device, and storage medium, so as to reduce the memory size occupied by the parsing process and reduce the time taken for parsing.
  • an XML file parsing method including:
  • Child node object is not the last child node object in the target XML file, loading the next child node object into the memory, and returning to perform the step of parsing the child node object according to the label text pair form Until the row record data corresponding to all the child node objects in the target XML file is parsed.
  • an XML file parsing apparatus including:
  • a character replacement unit configured to perform an illegal character replacement on the XML file by using a preset text compiler to generate a target XML file
  • a loading unit configured to load the child node object in the target XML file into the memory one by one according to a preset loading rule
  • a parsing unit configured to parse the child node object according to a label text pair form to generate row record data corresponding to the child node object
  • An object determining unit configured to determine whether the child node object is the last child node object in the target XML file
  • the loading unit is further configured to: if the child node object is not the last child node object in the target XML file, load the next one of the child node objects into the memory, so that the parsing unit is to the child
  • the node object is parsed according to the form of the label text until the row record data corresponding to all the child node objects in the target XML file is parsed.
  • the embodiment of the present application further provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program
  • a computer device including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program
  • the embodiment of the present application further provides a storage medium, wherein the storage medium stores a computer program, where the computer program includes program instructions, and the program instructions, when executed by the processor, cause the processor to execute The XML file parsing method according to any one of the embodiments of the present application.
  • the embodiment of the present application provides an XML file parsing method, apparatus, computer device, and storage medium.
  • This method can make the parsing process occupy less memory and does not affect the terminal running other processes, especially for some terminals with lower memory configuration, so as to avoid memory overflow or insufficient memory.
  • the method can parse several or dozens of XML files in parallel, improve the parsing efficiency and reduce the time taken for parsing.
  • FIG. 1 is a schematic flowchart of a method for parsing an XML file according to an embodiment of the present application
  • FIG. 2 is another schematic flowchart of an XML file parsing method according to an embodiment of the present application
  • FIG. 3 is a schematic flowchart of a method for parsing an XML file shown in FIG. 1;
  • FIG. 4 is a schematic block diagram of an XML file parsing apparatus according to an embodiment of the present disclosure
  • FIG. 5 is another schematic block diagram of an XML file parsing apparatus according to an embodiment of the present disclosure.
  • FIG. 6 is a detailed schematic block diagram of the XML file parsing apparatus shown in FIG. 4;
  • FIG. 7 is a schematic block diagram of a computer device according to an embodiment of the present application.
  • FIG. 1 is a schematic flowchart of an XML file parsing method according to an embodiment of the present application.
  • the XML file parsing method is applied to terminals such as desktop computers, laptop computers, and tablet computers.
  • the XML file parsing method includes steps S101 to S105.
  • the preset text compiler may be a shell language sed command.
  • the sed command format can be: sed-i's ⁇ ( ⁇ userid>.* ⁇ ) ⁇ (.* ⁇ .*> ⁇ ) ⁇ 1 ⁇ 2/g'filename.
  • the preset text compiler may also be a tr command of the shell language.
  • the tr command format can be: tr-d" ⁇ 46" ⁇ input_file>output_file.
  • the preset text compiler can also be used for other commands for implementing illegal character replacement on the XML file, and is not specifically limited herein.
  • the encoding format of an XML file is not compatible with the text font in the XML file, an error is easily reported during parsing.
  • the XML file is in the latin-1 encoding format
  • the text font in the XML file has both English font and Chinese font. Since the Chinese font is not compatible with the latin-1 encoding format, if the encoding format is not converted, in the subsequent parsing, Will report an error.
  • FIG. 2 is a schematic flowchart of an XML file parsing method according to an embodiment of the present application. Before step S101, steps S106 to S109 are further included.
  • the encoding format may be a Latin-1 encoding format or a other encoding format such as a UTF-8 encoding format, and is not specifically limited herein.
  • determining whether the text font of the XML file is compatible with the encoding format specifically: acquiring a type of the text font in the XML file, and determining whether the type of the text font is related to the encoding format compatible.
  • the type of the text font includes a Chinese font, an English font, and the like.
  • step S101 may be performed, that is, the illegal character replacing step is performed.
  • step S108 is performed at this time.
  • the XML file is encoded and converted by a preset character set encoding conversion command.
  • the preset character set encoding conversion command may be a piconv command in the Perl language.
  • the XML file can be encoded and converted in a byte stream by the piconv command.
  • the piconv command can convert the format of an XML file from a Latin-1 encoding format to a UTF-8 encoding format, thereby solving the problem of garbled errors when there are Chinese fonts in the XML file.
  • the entire XML file is not placed in memory at one time, which avoids the problem of memory overflow caused by placing the entire XML file in memory at one time. Can slow down memory pressure and save memory.
  • step S101 is performed.
  • the steps of encoding format compatibility judgment and encoding format conversion are performed first, and then illegal character replacement is performed.
  • the step of illegal character replacement may be performed first, and then the encoding format compatibility judgment and the encoding format conversion of the XML file after the illegal character replacement may be performed. It may be understood that, in this case, The encoded formatted XML file will be used as the target XML file, and then step S102 is performed. There is no restriction on the order of the two processes of encoding format conversion and illegal character replacement.
  • loading the child node object in the target XML file into the memory according to the preset loading rule includes: when identifying a start tag of the root node object in the target XML file, Loading the child node object in the target XML file into the memory one by one by using the child node object of the next level of the root node object as a loading unit, wherein the root node object includes at least one of the child node objects .
  • the content of the target XML file is:
  • the target XML file includes a root node object with a ⁇ book store> as a start tag and a ⁇ /bookstore> as an end tag.
  • the root node object includes four child node objects, each of which has a ⁇ book> as a start tag and ⁇ /book> as an end tag.
  • the four child nodes are child node objects of the next level of the root node object.
  • Each child node object includes a start tag ⁇ book>, an end tag ⁇ /book>, and a plurality of elements between the two. For example, one of the four elements of the first child object is ⁇ title>Harry Potter ⁇ /title>.
  • the child node object may also include a grandchild node object, and the rule may nest the multi-layer node object.
  • the child node object of the next level of the root node object is still the loading unit.
  • the terminal After reading the start tag ⁇ book store> of the root node object, the terminal loads the child node object into the memory unit by loading the child node object of the following level into the memory unit. That is, each time the content between the start tag ⁇ book> to the end tag ⁇ /book> is loaded into memory. Each time a child node object is loaded, steps S103 and the like are performed to complete the parsing operation on the child node object, then the next child node object is loaded, and so on.
  • the child node object may be parsed according to a label text pair form by an iterparse() method to generate row record data corresponding to the child node object.
  • the iterparse() method is a method in ElementTree in the Python language.
  • the four label text pairs form a tuple, and one tuple records data for one line.
  • the method further includes: storing the line record data in the The line records the data file; and releases the memory space occupied by the child node object. That is to say, after the parsing of the child node object in the memory is completed, the row record data corresponding to the child node object is stored in the row record data file, and then the memory space occupied by the child node object is released, thereby The child node object provides memory space to avoid memory shortage or memory overflow.
  • FIG. 3 is a specific schematic flowchart of the XML file parsing method shown in FIG. 1.
  • This step S104 includes steps S1041 to S1044.
  • the terminal reads the string after the child node object in the current memory in the target XML file as " ⁇ Book>", the start tag of the next child node object. For example, if the current child node object in memory is the fourth child node object of the root node object, the terminal reads the string after the current child node object in the target XML file as " ⁇ /bookstore>", that is, the root node. The end tag of the object.
  • S1042 Determine whether the character string is an end tag of a root node object.
  • step S1041 it is determined whether the character string read in step S1041 is the end tag of the root node object ⁇ /book store>. If the string is not the end tag ⁇ /book store> of the root node object, it is determined that the child node object in the current memory is not the last child node object in the target XML file, that is, step S1043 is performed. If the string is the end tag of the root node object ⁇ /book store>, it indicates that the child node object in the current memory is the last child node object in the target XML file, that is, step S1044 is performed.
  • the manner of determining whether the current child node object in the memory is the last child node object in the target XML file is not limited to the manner shown in FIG. 3, and may be other modes, and is not specifically limited herein.
  • step S105 when it is determined that the child node object in the current memory is the last child node object in the target XML file, it indicates that the target XML file has been parsed, and the parsing process may be ended.
  • the current child node object in the memory is not the last child node object in the target XML file, it indicates that the target XML file has not been parsed yet, and step S105 needs to be performed.
  • Child node object is not the last child node object in the target XML file, load the next child node object into the memory, and return to perform execution on the child node object according to the label text pair form.
  • the step of generating row record data corresponding to the child node object until the row record data corresponding to all the child node objects in the target XML file is parsed.
  • step S104 it is determined that the child node object in the memory is the last child node object in the target XML file, thereby completing the parsing process of the entire target XML file.
  • the child node object is loaded into the memory and parsed one by one, so that the memory occupied by the parsing process is small, and the memory overflow or the insufficient memory is avoided.
  • the method can realize parsing several or dozens of XML files in parallel when parsing the XML file, thereby greatly shortening the time taken for parsing and improving the parsing efficiency.
  • FIG. 4 is a schematic block diagram of an XML file parsing apparatus according to an embodiment of the present application.
  • the XML file parsing apparatus 300 can be installed in a terminal such as a desktop computer, a tablet computer, a laptop computer, or the like.
  • the XML file parsing apparatus 300 includes a character replacing unit 301, a loading unit 302, a parsing unit 303, and an object judging unit 304.
  • the character replacement unit 301 is configured to perform an illegal character replacement on the XML file by using a preset text compiler to generate a target XML file.
  • FIG. 5 is another schematic block diagram of an XML file parsing apparatus according to an embodiment of the present application.
  • the XML file parsing apparatus 300 further includes a format acquisition unit 305, a format determination unit 306, and a code conversion unit 307.
  • the format obtaining unit 305 is configured to obtain an encoding format of the XML file.
  • the format determining unit 306 is configured to determine whether the text font of the XML file is compatible with the encoding format.
  • the format determining unit 306 is specifically configured to acquire a type of the text font in the XML file, and determine whether the type of the text font is compatible with the encoding format. If the format determining unit 306 determines that the text font of the XML file is compatible with the encoding format, then the encoding format conversion is not required. At this time, the format determining unit 306 sends a first signal to the character replacing unit 301, and the character replacing unit After receiving the first signal, the 301 performs an illegal character replacement on the XML file by using a preset text compiler to generate a target XML file.
  • the encoding conversion unit 307 is configured to: if the text font of the XML file is incompatible with the encoding format, perform encoding format conversion on the XML file by using a preset character set encoding conversion command; and convert the encoded format XML file Set to the XML file.
  • the transcoding unit 307 can perform encoding format conversion on the XML file in a byte stream by the piconv command.
  • the encoding conversion unit 307 replaces the original format XML file with the XML file after the encoding format conversion of the XML file, and then the encoding conversion unit 307 sends a third signal to the character replacement unit 301 so that the character replacement unit After receiving the third signal, the 301 performs an illegal character replacement on the XML file by using a preset text compiler to generate a target XML file.
  • the loading unit 302 is configured to load the child node object in the target XML file into the memory one by one according to a preset loading rule.
  • the loading unit 302 is specifically configured to: when the start tag of the root node object in the target XML file is identified, the child node object of the next level of the root node object is used as a loading unit, and one by one The child node object in the target XML file is loaded into memory, wherein the root node object includes at least one of the child node objects.
  • the parsing unit 303 is configured to parse the child node object according to a label text pair form to generate row record data corresponding to the child node object.
  • the parsing unit 303 may parse the child node object according to the label text pair form by the iterparse() method to generate row record data corresponding to the child node object.
  • the iterparse() method is a method in ElementTree in the Python language.
  • the XML file parsing apparatus 300 further includes a storage unit 308 and a memory release unit 309.
  • the storage unit 308 is configured to store the line record data in a line record data file.
  • the memory release unit 309 is configured to release the memory space occupied by the child node object.
  • the object determining unit 304 is configured to determine whether the child node object is the last child node object in the target XML file.
  • FIG. 6 is a specific schematic block diagram of the XML file parsing apparatus shown in FIG.
  • the object judging unit 304 includes a reading sub-unit 3041 and a judging sub-unit 3042.
  • the reading subunit 3041 is configured to read a character string in the target XML file that is located after the current child node object.
  • the determining subunit 3042 is configured to determine whether the character string is an end tag of the root node object. If the determining subunit 3042 determines that the character string is not the end tag of the root node object, it is determined that the current child node object is not the last child node object in the target XML file. If the determining subunit 3042 determines that the character string is the end tag of the root node object, it is determined that the current child node object is the last child node object in the target XML file.
  • the XML file parsing apparatus 300 in this embodiment loads the child node object into the memory one by one and parses it, so that the memory occupied by the parsing process is small, and the memory overflow or the insufficient memory is avoided. At the same time, the XML file parsing apparatus 300 can realize parsing several or dozens of XML files in parallel, thereby greatly shortening the time taken for parsing and improving the parsing efficiency.
  • FIG. 7 is a schematic block diagram of a computer device according to an embodiment of the present application.
  • the computer device 400 can be a terminal.
  • the terminal can be an electronic device such as a tablet computer, a notebook computer, a desktop computer, or a personal digital assistant.
  • the computer device 400 includes a processor 402, a memory, and a network interface 405 that are coupled by a system bus 401, where the memory can include a non-volatile storage medium 403 and an internal memory 404.
  • the non-volatile storage medium 403 can store an operating system 4031 and a computer program 4032.
  • the computer program 4032 includes program instructions that, when executed, cause the processor 402 to perform an XML file parsing method.
  • the processor 402 is used to provide computing and control capabilities to support the operation of the entire computer device 400.
  • the internal memory 404 provides an environment for the operation of the computer program 4032 in the non-volatile storage medium 403, which when executed by the processor 402, causes the processor 402 to perform an XML file parsing method.
  • the network interface 405 is used for network communication, such as sending assigned tasks and the like. It will be understood by those skilled in the art that the structure shown in FIG. 7 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation of the computer device 400 to which the solution of the present application is applied, and a specific computer device. 400 may include more or fewer components than shown in the figures, or some components may be combined, or have different component arrangements.
  • the processor 402 is configured to run a computer program 4032 stored in the memory to implement the following functions: performing illegal character replacement on the XML file by using a preset text compiler to generate a target XML file; loading one by one according to a preset loading rule.
  • the child node object in the target XML file is in memory; the child node object is parsed according to the label text pair form to generate row record data corresponding to the child node object; and determining whether the child node object is the The last child node object in the target XML file; if the child node object is not the last child node object in the target XML file, loading the next child node object into the memory, and returning execution to the child node
  • the object parses the form of the tag text pair until the line record data corresponding to all the child node objects in the target XML file is parsed.
  • the processor 402 before executing the illegal character replacement of the XML file by the preset text compiler to generate the target XML file, the processor 402 further executes the following procedure: acquiring an encoding format of the XML file; determining the text of the XML file Whether the font is compatible with the encoding format; if the text font of the XML file is compatible with the encoding format, performing a step of performing illegal character replacement on the XML file by the preset text compiler to generate the target XML file; The text font of the XML file is incompatible with the encoding format, and the XML file is encoded and converted by a preset character set encoding conversion command; and the XML file converted by the encoding format is set as the XML file, and executed.
  • the step of the default text compiler performing illegal character substitution on the XML file to generate the target XML file.
  • the processor 402 when the processor 402 loads the child node object in the target XML file into the memory one by one according to the preset loading rule, the processor specifically executes the following procedure: when the root node object in the target XML file is identified When the tag is started, the child node object in the target XML file is loaded into the memory one by one by using the child node object of the next level of the root node object as a loading unit, wherein the root node object includes at least One of the child node objects.
  • the processor 402 when performing the determining whether the child node object is the last child node object in the target XML file, specifically executes the following process: reading the current target file in the target XML file. a string after the child node object; determining whether the string is an end tag of the root node object; if the string is not an end tag of the root node object, determining that the current child node object is not The last child node object in the target XML file.
  • the processor 402 after performing processing on the child node object according to the form of the label text pair to generate the line record data corresponding to the child node object, the processor 402 further executes the following procedure: storing the line record data In the row record data file; and release the memory space occupied by the child node object.
  • the processor 402 when the processor 402 performs the parsing of the sub-object object according to the label text pair form to generate the line record data corresponding to the sub-node object, the following program is specifically executed: the iterparse() method is used to The child node object is parsed according to the form of the tag text pair to generate row record data corresponding to the child node object.
  • the processor 402 may be a central processing unit, and may also be other general purpose processors, digital signal processors, application specific integrated circuits, off-the-shelf programmable gate arrays or other programmable logic devices, and discrete gates. Or transistor logic devices, discrete hardware components, and so on.
  • the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • a storage medium in another embodiment, can be a computer readable storage medium.
  • the storage medium stores a computer program, wherein the computer program includes program instructions.
  • the program instructions when executed by the processor, cause the processor to perform an XML file parsing method in the present application.
  • the storage medium may be a medium that can store program codes, such as a USB flash drive, a removable hard disk, a Read-Only Memory (ROM), a flash memory, a magnetic disk, or an optical disk.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the integrated unit can be stored in a storage medium if it is implemented in the form of a software functional unit and sold or used as a standalone product. Based on such understanding, the technical solution of the present application may be in essence or part of the contribution to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium. There are a number of instructions for causing a computer device (which may be a personal computer, terminal, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
  • a computer device which may be a personal computer, terminal, or network device, etc.

Abstract

Embodiments of the present application disclose an XML file parsing method, a device, a computer apparatus, and a storage medium. The method comprises: replacing invalid characters in an XML file by means of a pre-determined file compiler, so as to generate a target XML file; successively loading, according to a pre-determined loading rule, child node objects in the target XML file to a memory; and parsing the child node objects to generate corresponding line record data until all the child node objects in the target XML file have been parsed to obtain the corresponding line record data.

Description

一种XML文件解析方法、装置、计算机设备及存储介质XML file parsing method, device, computer device and storage medium
本申请要求于2018年1月30日提交中国专利局、申请号为201810091360.5、发明名称为“一种XML文件解析方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to Chinese Patent Application No. 201810091360.5, filed on January 30, 2018, and entitled "An XML file parsing method, apparatus, computer equipment, and storage medium", the entire contents of which are incorporated herein by reference. This is incorporated herein by reference.
技术领域Technical field
本申请涉及计算机技术领域,尤其涉及一种XML文件解析方法、装置、计算机设备及存储介质。The present application relates to the field of computer technologies, and in particular, to an XML file parsing method, apparatus, computer device, and storage medium.
背景技术Background technique
在Linux运行环境中,一般采用Linux命令对XML文件进行全量解析。然而,在采用Linux命令对XML文件进行解析时,需要占用较多的内存空间,而且所耗时间也较长。尤其是在需要解析多个XML文件时,若采用并行解析多个XML文件,很容易导致内存溢出的错误,若采用串行解析多个XML文件,会花费更长的时间,解析效率低。In the Linux operating environment, Linux commands are generally used to parse XML files in full. However, when using the Linux command to parse the XML file, it takes up more memory space and takes longer. In particular, when multiple XML files need to be parsed, it is easy to cause memory overflow errors by parsing multiple XML files in parallel. If multiple XML files are serially parsed, it takes longer and the parsing efficiency is low.
发明内容Summary of the invention
本申请实施例提供了一种XML文件解析方法、装置、计算机设备及存储介质,以减小解析过程所占内存大小,减小解析所耗时长。The embodiment of the present application provides an XML file parsing method, apparatus, computer device, and storage medium, so as to reduce the memory size occupied by the parsing process and reduce the time taken for parsing.
第一方面,本申请实施例提供了一种XML文件解析方法,其包括:In a first aspect, an embodiment of the present application provides an XML file parsing method, including:
通过预设文本编译器对XML文件进行非法字符替换以生成目标XML文件;The illegal text substitution of the XML file by the preset text compiler to generate the target XML file;
根据预设加载规则逐个加载所述目标XML文件中的子节点对象至内存中;Loading the child node object in the target XML file into the memory one by one according to a preset loading rule;
对所述子节点对象按照标签文本对形式进行解析以生成所述子节点对象对应的行记录数据;Parsing the child node object according to the label text pair form to generate row record data corresponding to the child node object;
判断所述子节点对象是否为所述目标XML文件中最后一个子节点对象;Determining whether the child node object is the last child node object in the target XML file;
若所述子节点对象不是所述目标XML文件中最后一个子节点对象,加载下 一个所述子节点对象至所述内存,并返回执行对所述子节点对象按照标签文本对形式进行解析的步骤,直至解析出所述目标XML文件中全部所述子节点对象对应的行记录数据为止。If the child node object is not the last child node object in the target XML file, loading the next child node object into the memory, and returning to perform the step of parsing the child node object according to the label text pair form Until the row record data corresponding to all the child node objects in the target XML file is parsed.
第二方面,本申请实施例提供了一种XML文件解析装置,其包括:In a second aspect, an embodiment of the present application provides an XML file parsing apparatus, including:
字符替换单元,用于通过预设文本编译器对XML文件进行非法字符替换以生成目标XML文件;a character replacement unit, configured to perform an illegal character replacement on the XML file by using a preset text compiler to generate a target XML file;
加载单元,用于根据预设加载规则逐个加载所述目标XML文件中的子节点对象至内存中;a loading unit, configured to load the child node object in the target XML file into the memory one by one according to a preset loading rule;
解析单元,用于对所述子节点对象按照标签文本对形式进行解析以生成所述子节点对象对应的行记录数据;a parsing unit, configured to parse the child node object according to a label text pair form to generate row record data corresponding to the child node object;
对象判断单元,用于判断所述子节点对象是否为所述目标XML文件中最后一个子节点对象;An object determining unit, configured to determine whether the child node object is the last child node object in the target XML file;
所述加载单元,还用于若所述子节点对象不是所述目标XML文件中最后一个子节点对象,加载下一个所述子节点对象至所述内存,以使得所述解析单元对所述子节点对象按照标签文本对形式进行解析,直至解析出所述目标XML文件中全部所述子节点对象对应的行记录数据为止。The loading unit is further configured to: if the child node object is not the last child node object in the target XML file, load the next one of the child node objects into the memory, so that the parsing unit is to the child The node object is parsed according to the form of the label text until the row record data corresponding to all the child node objects in the target XML file is parsed.
第三方面,本申请实施例又提供了一种计算机设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现本申请实施例提供的任一项所述的XML文件解析方法。In a third aspect, the embodiment of the present application further provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program The XML file parsing method according to any one of the embodiments of the present application is implemented.
第四方面,本申请实施例还提供了一种存储介质,其中所述存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行本申请实施例提供的任一项所述的XML文件解析方法。In a fourth aspect, the embodiment of the present application further provides a storage medium, wherein the storage medium stores a computer program, where the computer program includes program instructions, and the program instructions, when executed by the processor, cause the processor to execute The XML file parsing method according to any one of the embodiments of the present application.
本申请实施例提供一种XML文件解析方法、装置、计算机设备及存储介质。该方法可以使得解析过程所占内存较小,不影响终端运行其他进程,尤其是对于一些内存配置较低的终端来说,这样可以避免内存溢出或内存不够用的情况发生。同时该方法可以并行解析几个或几十个XML文件,提高解析效率,减少解析所耗时长。The embodiment of the present application provides an XML file parsing method, apparatus, computer device, and storage medium. This method can make the parsing process occupy less memory and does not affect the terminal running other processes, especially for some terminals with lower memory configuration, so as to avoid memory overflow or insufficient memory. At the same time, the method can parse several or dozens of XML files in parallel, improve the parsing efficiency and reduce the time taken for parsing.
附图说明DRAWINGS
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments will be briefly described below. Obviously, the drawings in the following description are some embodiments of the present application, For the ordinary technicians, other drawings can be obtained based on these drawings without any creative work.
图1为本申请一实施例提供的一种XML文件解析方法的示意流程图;FIG. 1 is a schematic flowchart of a method for parsing an XML file according to an embodiment of the present application;
图2为本申请一实施例提供的一种XML文件解析方法的另一示意流程图;FIG. 2 is another schematic flowchart of an XML file parsing method according to an embodiment of the present application;
图3为图1所示XML文件解析方法的具体流程示意图;3 is a schematic flowchart of a method for parsing an XML file shown in FIG. 1;
图4为本申请一实施例提供的一种XML文件解析装置的示意性框图;FIG. 4 is a schematic block diagram of an XML file parsing apparatus according to an embodiment of the present disclosure;
图5为本申请一实施例提供的一种XML文件解析装置的另一示意性框图;FIG. 5 is another schematic block diagram of an XML file parsing apparatus according to an embodiment of the present disclosure;
图6为图4所示的XML文件解析装置的具体示意性框图;6 is a detailed schematic block diagram of the XML file parsing apparatus shown in FIG. 4;
图7为本申请一实施例提供的一种计算机设备的示意性框图。FIG. 7 is a schematic block diagram of a computer device according to an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application are clearly and completely described in the following with reference to the drawings in the embodiments of the present application. It is obvious that the described embodiments are a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.
请参阅图1,图1是本申请实施例提供的一种XML文件解析方法的示意流程图。该XML文件解析方法应用于台式电脑、手提电脑、平板电脑等终端中。如图1所示,该XML文件解析方法包括步骤S101~S105。Please refer to FIG. 1. FIG. 1 is a schematic flowchart of an XML file parsing method according to an embodiment of the present application. The XML file parsing method is applied to terminals such as desktop computers, laptop computers, and tablet computers. As shown in FIG. 1, the XML file parsing method includes steps S101 to S105.
S101、通过预设文本编译器对XML文件进行非法字符替换以生成目标XML文件。S101. Perform an illegal character replacement on the XML file by using a preset text compiler to generate a target XML file.
在一实施例中,该预设文本编译器可以为shell语言的sed命令。譬如,sed命令格式可以为:sed-i′s∧(<userid>.*\)<\(.*<∨.*>\)∧1<\2/g′filename。当然,在其他实施例中,该预设文本编译器也可以为shell语言的tr命令。譬如,tr命令格式可以为:tr-d″\46″<input_file>output_file。该预设文本编译器还可以为其他用于实现对XML文件进行非法字符替换的命令,在此不做具体限制。In an embodiment, the preset text compiler may be a shell language sed command. For example, the sed command format can be: sed-i's∧(<userid>.*\)<\(.*<∨.*>\)∧1<\2/g'filename. Of course, in other embodiments, the preset text compiler may also be a tr command of the shell language. For example, the tr command format can be: tr-d"\46"<input_file>output_file. The preset text compiler can also be used for other commands for implementing illegal character replacement on the XML file, and is not specifically limited herein.
一般来说,当XML文件的编码格式与XML文件中的文本字体不兼容时,解析时容易出现报错。譬如,XML文件是latin-1编码格式,且XML文件中文本字体既有英文字体又有中文字体,由于中文字体与latin-1编码格式不兼容, 若不进行编码格式转换,在后续解析时就会报错。In general, when the encoding format of an XML file is not compatible with the text font in the XML file, an error is easily reported during parsing. For example, the XML file is in the latin-1 encoding format, and the text font in the XML file has both English font and Chinese font. Since the Chinese font is not compatible with the latin-1 encoding format, if the encoding format is not converted, in the subsequent parsing, Will report an error.
为了避免编码格式不兼容导致的解析错误,在一实施例中,如图2所示,图2为本申请实施例提供的一种XML文件解析方法的示意流程图。在步骤S101之前,还包括步骤S106至S109。In an embodiment, as shown in FIG. 2, FIG. 2 is a schematic flowchart of an XML file parsing method according to an embodiment of the present application. Before step S101, steps S106 to S109 are further included.
S106、获取XML文件的编码格式。S106. Obtain an encoding format of the XML file.
在一实施例中,该编码格式可以为Latin-1编码格式,也可以为UTF-8编码格式等其他编码格式,在此不做具体限制。In an embodiment, the encoding format may be a Latin-1 encoding format or a other encoding format such as a UTF-8 encoding format, and is not specifically limited herein.
S107、判断所述XML文件的文本字体是否与所述编码格式兼容。S107. Determine whether the text font of the XML file is compatible with the encoding format.
在一实施例中,判断所述XML文件的文本字体是否与所述编码格式兼容,具体为:获取所述XML文件中文本字体的种类,以及判断所述文本字体的种类是否与所述编码格式兼容。其中,所述文本字体的种类包括中文字体、英文字体等种类。In an embodiment, determining whether the text font of the XML file is compatible with the encoding format, specifically: acquiring a type of the text font in the XML file, and determining whether the type of the text font is related to the encoding format compatible. The type of the text font includes a Chinese font, an English font, and the like.
若判断出所述XML文件的文本字体与所述编码格式兼容,那么就不需要进行编码格式转换,此时可以执行步骤S101,即进行非法字符替换步骤。If it is determined that the text font of the XML file is compatible with the encoding format, then the encoding format conversion is not required. In this case, step S101 may be performed, that is, the illegal character replacing step is performed.
若判断出所述XML文件的文本字体与所述编码格式不兼容,为了避免解析时出错,此时执行步骤S108。If it is determined that the text font of the XML file is incompatible with the encoding format, in order to avoid an error in parsing, step S108 is performed at this time.
S108、若所述XML文件的文本字体与所述编码格式不兼容,通过预设字符集编码转换命令对所述XML文件进行编码格式转换。S108. If the text font of the XML file is incompatible with the encoding format, the XML file is encoded and converted by a preset character set encoding conversion command.
其中,该预设字符集编码转换命令可以为Perl语言中的piconv命令。具体地,可以通过piconv命令以字节流的方式对XML文件进行编码格式转换。譬如,piconv命令可以将XML文件的格式从Latin-1编码格式转换成UTF-8编码格式,从而解决XML文件中存在中文字体时出现乱码错误问题。The preset character set encoding conversion command may be a piconv command in the Perl language. Specifically, the XML file can be encoded and converted in a byte stream by the piconv command. For example, the piconv command can convert the format of an XML file from a Latin-1 encoding format to a UTF-8 encoding format, thereby solving the problem of garbled errors when there are Chinese fonts in the XML file.
由于piconv命令是通过字节流的方式进行编码格式转换的,所以不会一次性将整个XML文件放在内存中,这样可以避免一次性将整个XML文件放在内存中而导致内存溢出的问题,可以减缓内存压力,节省内存。Since the piconv command is encoded in a byte stream format, the entire XML file is not placed in memory at one time, which avoids the problem of memory overflow caused by placing the entire XML file in memory at one time. Can slow down memory pressure and save memory.
S109、将编码格式转换后的XML文件设为所述XML文件,并返回执行步骤S101。S109. The XML file converted by the encoding format is set as the XML file, and the process returns to step S101.
在XML文件进行编码格式转换后,将编码格式转换后的XML文件替换掉原来的XML文件,然后执行步骤S101。After the encoding format conversion of the XML file, the XML file converted by the encoding format is replaced with the original XML file, and then step S101 is performed.
需要说明的是,在图2所示的实施例中,先进行了编码格式兼容性判断以 及编码格式转换等步骤,然后再进行非法字符替换的步骤。在其他实施例中,也可以先进行非法字符替换的步骤,然后再对非法字符替换后的XML文件进行编码格式兼容性判断、编码格式转换等步骤,可以理解的是,在此种情况下,编码格式转换后的XML文件将做为目标XML文件,然后执行步骤S102。在此不对编码格式转换和非法字符替换两个过程的先后顺序做限制。It should be noted that, in the embodiment shown in FIG. 2, the steps of encoding format compatibility judgment and encoding format conversion are performed first, and then illegal character replacement is performed. In other embodiments, the step of illegal character replacement may be performed first, and then the encoding format compatibility judgment and the encoding format conversion of the XML file after the illegal character replacement may be performed. It may be understood that, in this case, The encoded formatted XML file will be used as the target XML file, and then step S102 is performed. There is no restriction on the order of the two processes of encoding format conversion and illegal character replacement.
S102、根据预设加载规则逐个加载所述目标XML文件中的子节点对象至内存中。S102. Load the child node object in the target XML file into the memory one by one according to a preset loading rule.
具体地,在一实施例中,该根据预设加载规则逐个加载所述目标XML文件中的子节点对象至内存中包括:当识别出所述目标XML文件中根节点对象的起始标记符时,以所述根节点对象的下一级的子节点对象为加载单位,逐个将所述目标XML文件中的子节点对象加载至内存中,其中,所述根节点对象包括至少一个所述子节点对象。Specifically, in an embodiment, loading the child node object in the target XML file into the memory according to the preset loading rule includes: when identifying a start tag of the root node object in the target XML file, Loading the child node object in the target XML file into the memory one by one by using the child node object of the next level of the root node object as a loading unit, wherein the root node object includes at least one of the child node objects .
譬如,目标XML文件的内容为:For example, the content of the target XML file is:
Figure PCTCN2018084227-appb-000001
Figure PCTCN2018084227-appb-000001
Figure PCTCN2018084227-appb-000002
Figure PCTCN2018084227-appb-000002
在上述代码实例中,目标XML文件包括根节点对象,该根节点对象以<book store>为起始标记符,以</bookstore>为结束标记符。该根节点对象中包括四个子节点对象,每个子节点对象均以<book>为起始标记符,以</book>为结束标记符。四个子节点对象为根节点对象的下一级的子节点对象。每个子节点对象包括起始标记符<book>、结束标记符</book>以及两者之间的多个元素。譬如,在第一个子节点对象的四个元素中,其中一个元素为<title>Harry Potter</title>。In the above code example, the target XML file includes a root node object with a <book store> as a start tag and a </bookstore> as an end tag. The root node object includes four child node objects, each of which has a <book> as a start tag and </book> as an end tag. The four child nodes are child node objects of the next level of the root node object. Each child node object includes a start tag <book>, an end tag </book>, and a plurality of elements between the two. For example, one of the four elements of the first child object is <title>Harry Potter</title>.
可以理解的是,在其他代码实例中,子节点对象中还可以包括孙节点对象,以此规则可以嵌套多层节点对象。当目标XML文件中存在至少三层节点对象时,仍然以根节点对象的下一级的子节点对象为加载单位。It can be understood that in other code instances, the child node object may also include a grandchild node object, and the rule may nest the multi-layer node object. When there are at least three layer node objects in the target XML file, the child node object of the next level of the root node object is still the loading unit.
终端在读取到根节点对象的起始标记符<book store>之后,将以下一级的子节点对象为加载单位,逐个加载子节点对象至内存。也就是说,每次加载起始标记符<book>至结束标记符</book>之间的内容至内存中。每加载一个子节点对象就执行一次步骤S103等步骤以完成对该子节点对象的解析操作,然后再加载下一个子节点对象,以此类推。After reading the start tag <book store> of the root node object, the terminal loads the child node object into the memory unit by loading the child node object of the following level into the memory unit. That is, each time the content between the start tag <book> to the end tag </book> is loaded into memory. Each time a child node object is loaded, steps S103 and the like are performed to complete the parsing operation on the child node object, then the next child node object is loaded, and so on.
S103、对所述子节点对象按照标签文本对形式进行解析以生成所述子节点对象对应的行记录数据。S103. Parse the child node object according to a label text pair form to generate row record data corresponding to the child node object.
具体地,在一实施例中,可以通过iterparse()方法对所述子节点对象按照标签文本对形式进行解析以生成所述子节点对象对应的行记录数据。其中,iterparse()方法为Python语言中的ElementTree中的方法。Specifically, in an embodiment, the child node object may be parsed according to a label text pair form by an iterparse() method to generate row record data corresponding to the child node object. Among them, the iterparse() method is a method in ElementTree in the Python language.
譬如,子节点对象<book>与</book>之间有四个元素,其中一个元素为: <title>Harry Potter</title>,“Harry Potter”是该元素的文本内容,“title”是该元素的标签。那么iterparse()方法解析时,将以“>”为一个触发事件,以“</”为另一个触发事件,并得到两个触发事件之间的元素的文本内容“Harry Potter”。同时,还会解析出该文本内容“Harry Potter”的标签为“title”。标签“title”及元素的文本内容“Harry Potter”形成标签文本对。同理,可以得到其他三个标签文本对。四个标签文本对形成一个元组,一个元组为一行记录数据,如,一行记录数据Record={′title′:′Harry Potter′,′author′:′J K.Rowling′,′year′:′2005′,′price′:′29.99′}。For example, there are four elements between the child object <book> and </book>, one of which is: <title>Harry Potter</title>, "Harry Potter" is the text content of the element, and "title" is The label of the element. Then the iterparse() method parses with ">" as a trigger event and "</" as another trigger event, and gets the text content "Harry Potter" of the element between the two trigger events. At the same time, the text "Harry Potter" will be parsed as "title". The label "title" and the text content of the element "Harry Potter" form a label text pair. In the same way, you can get the other three label text pairs. The four label text pairs form a tuple, and one tuple records data for one line. For example, one line of record data Record={'title': 'Harry Potter', 'author': 'J K.Rowling', 'year': '2005', 'price': '29.99'}.
在一实施例中,为了避免内存溢出,在对所述子节点对象按照标签文本对形式进行解析以生成所述子节点对象对应的行记录数据之后,还包括:将所述行记录数据存储在行记录数据文件中;以及释放所述子节点对象所占用的内存空间。也就是说,在完成对内存中的子节点对象的解析后,将子节点对象对应的行记录数据存储在行记录数据文件中,然后再释放该子节点对象所占用的内存空间,从而为后续的子节点对象提供内存空间,避免发生内存不够或内存溢出的现象。In an embodiment, in order to avoid memory overflow, after parsing the child node object according to the form of the label text pair to generate the line record data corresponding to the child node object, the method further includes: storing the line record data in the The line records the data file; and releases the memory space occupied by the child node object. That is to say, after the parsing of the child node object in the memory is completed, the row record data corresponding to the child node object is stored in the row record data file, and then the memory space occupied by the child node object is released, thereby The child node object provides memory space to avoid memory shortage or memory overflow.
S104、判断所述子节点对象是否为所述目标XML文件中最后一个子节点对象。S104. Determine whether the child node object is the last child node object in the target XML file.
在完成对内存中当前的子节点对象的解析后,将判断当前内存中的子节点对象是否为目标XML文件中最后一个子节点对象。After the completion of the parsing of the current child node object in the memory, it is determined whether the child node object in the current memory is the last child node object in the target XML file.
具体地,在一实施例中,如图3所示,图3为图1所示的XML文件解析方法的具体示意流程图。该步骤S104包括步骤S1041至S1044。Specifically, in an embodiment, as shown in FIG. 3, FIG. 3 is a specific schematic flowchart of the XML file parsing method shown in FIG. 1. This step S104 includes steps S1041 to S1044.
S1041、读取所述目标XML文件中位于当前的所述子节点对象之后的字符串。S1041: Read a character string in the target XML file that is located after the current child node object.
譬如,在前述代码实例中,假设当前内存中的子节点对象为根节点对象的第一个子节点对象,那么终端读取目标XML文件中当前内存中的子节点对象之后的字符串为“<book>”,即下一个子节点对象的起始标记符。又譬如,假设当前内存中的子节点对象为根节点对象的第四个子节点对象,那么终端读取目标XML文件中当前的子节点对象之后的字符串为“</bookstore>”,即根节点对象的结束标记符。For example, in the foregoing code example, assuming that the child node object in the current memory is the first child node object of the root node object, the terminal reads the string after the child node object in the current memory in the target XML file as "< Book>", the start tag of the next child node object. For example, if the current child node object in memory is the fourth child node object of the root node object, the terminal reads the string after the current child node object in the target XML file as "</bookstore>", that is, the root node. The end tag of the object.
S1042、判断所述字符串是否为根节点对象的结束标记符。S1042: Determine whether the character string is an end tag of a root node object.
譬如,判断步骤S1041中读取的字符串是否为根节点对象的结束标记符</book store>。若字符串不为根节点对象的结束标记符</book store>,则判定当前内存中的子节点对象不是目标XML文件中最后一个子节点对象,即执行步骤S1043。若字符串为根节点对象的结束标记符</book store>,说明当前内存中的子节点对象为目标XML文件中最后一个子节点对象,即执行步骤S1044。For example, it is determined whether the character string read in step S1041 is the end tag of the root node object </book store>. If the string is not the end tag </book store> of the root node object, it is determined that the child node object in the current memory is not the last child node object in the target XML file, that is, step S1043 is performed. If the string is the end tag of the root node object </book store>, it indicates that the child node object in the current memory is the last child node object in the target XML file, that is, step S1044 is performed.
S1043、若所述字符串不为所述根节点对象的结束标记符,则判定当前的所述子节点对象不是所述目标XML文件中最后一个子节点对象。S1043. If the character string is not the end tag of the root node object, determine that the current child node object is not the last child node object in the target XML file.
S1044、若所述字符串为所述根节点对象的结束标记符,则判定当前的所述子节点对象是所述目标XML文件中最后一个子节点对象。S1044. If the character string is an end tag of the root node object, determine that the current child node object is the last child node object in the target XML file.
需要说明的是,判断当前内存中的子节点对象是否为目标XML文件中最后一个子节点对象的方式不局限于图3所示的方式,还可以为其他方式,在此不做具体限制。It should be noted that the manner of determining whether the current child node object in the memory is the last child node object in the target XML file is not limited to the manner shown in FIG. 3, and may be other modes, and is not specifically limited herein.
在本实施例中,当判断出当前内存中的子节点对象为目标XML文件中最后一个子节点对象时,说明该目标XML文件已经完成了解析,解析过程可以结束了。当判断出当前内存中的子节点对象不为目标XML文件中最后一个子节点对象时,说明该目标XML文件还没有完成解析,此时需要执行步骤S105。In this embodiment, when it is determined that the child node object in the current memory is the last child node object in the target XML file, it indicates that the target XML file has been parsed, and the parsing process may be ended. When it is determined that the current child node object in the memory is not the last child node object in the target XML file, it indicates that the target XML file has not been parsed yet, and step S105 needs to be performed.
S105、若所述子节点对象不是所述目标XML文件中最后一个子节点对象,加载下一个所述子节点对象至所述内存,并返回执行对所述子节点对象按照标签文本对形式进行解析以生成所述子节点对象对应的行记录数据的步骤,直至解析出所述目标XML文件中全部所述子节点对象对应的行记录数据为止。S105. If the child node object is not the last child node object in the target XML file, load the next child node object into the memory, and return to perform execution on the child node object according to the label text pair form. The step of generating row record data corresponding to the child node object until the row record data corresponding to all the child node objects in the target XML file is parsed.
若当前的子节点对象不是目标XML文件中最后一个子节点对象,也即该目标XML文件还没有解析完,此时将继续读取下一个子节点对象至内存中,然后返回执行步骤S103至S105,一直到步骤S104判断出内存中的子节点对象为目标XML文件中最后一个子节点对象为止,从而完成整个目标XML文件的解析过程。If the current child node object is not the last child node object in the target XML file, that is, the target XML file has not been parsed, the next child node object will continue to be read into the memory, and then the process returns to steps S103 to S105. Until step S104, it is determined that the child node object in the memory is the last child node object in the target XML file, thereby completing the parsing process of the entire target XML file.
本实施例中的XML文件解析方法,逐个加载子节点对象至内存并进行解析,可以使得解析过程所占内存较小,避免内存溢出或内存不够用的情况发生。同时,该方法在解析XML文件时可以实现并行解析几个或几十个XML文件,进而大大缩短解析所耗时间,提高解析效率。In the XML file parsing method in this embodiment, the child node object is loaded into the memory and parsed one by one, so that the memory occupied by the parsing process is small, and the memory overflow or the insufficient memory is avoided. At the same time, the method can realize parsing several or dozens of XML files in parallel when parsing the XML file, thereby greatly shortening the time taken for parsing and improving the parsing efficiency.
本申请实施例还提供一种XML文件解析装置,该XML文件解析装置用于执行前述任一项XML文件解析方法。具体地,请参阅图4,图4是本申请实施例提供的一种XML文件解析装置的示意性框图。XML文件解析装置300可以安装于台式电脑、平板电脑、手提电脑、等终端中。The embodiment of the present application further provides an XML file parsing apparatus, which is used to execute any of the foregoing XML file parsing methods. Specifically, please refer to FIG. 4. FIG. 4 is a schematic block diagram of an XML file parsing apparatus according to an embodiment of the present application. The XML file parsing apparatus 300 can be installed in a terminal such as a desktop computer, a tablet computer, a laptop computer, or the like.
如图4所示,XML文件解析装置300包括字符替换单元301、加载单元302、解析单元303和对象判断单元304。As shown in FIG. 4, the XML file parsing apparatus 300 includes a character replacing unit 301, a loading unit 302, a parsing unit 303, and an object judging unit 304.
字符替换单元301,用于通过预设文本编译器对XML文件进行非法字符替换以生成目标XML文件。The character replacement unit 301 is configured to perform an illegal character replacement on the XML file by using a preset text compiler to generate a target XML file.
在一实施例中,为了避免编码格式不兼容导致的解析错误,如图5所示,图5为本申请实施例提供的一种XML文件解析装置的另一示意性框图。该XML文件解析装置300还包括格式获取单元305、格式判断单元306和编码转换单元307。In an embodiment, in order to avoid the parsing error caused by the incompatibility of the encoding format, as shown in FIG. 5, FIG. 5 is another schematic block diagram of an XML file parsing apparatus according to an embodiment of the present application. The XML file parsing apparatus 300 further includes a format acquisition unit 305, a format determination unit 306, and a code conversion unit 307.
格式获取单元305,用于获取XML文件的编码格式。The format obtaining unit 305 is configured to obtain an encoding format of the XML file.
格式判断单元306,用于判断所述XML文件的文本字体是否与所述编码格式兼容。The format determining unit 306 is configured to determine whether the text font of the XML file is compatible with the encoding format.
在一实施例中,格式判断单元306具体用于获取所述XML文件中文本字体的种类,以及判断所述文本字体的种类是否与所述编码格式兼容。若格式判断单元306判断出所述XML文件的文本字体与所述编码格式兼容,那么就不需要进行编码格式转换,此时格式判断单元306向字符替换单元301发送第一信号,该字符替换单元301在接收到第一信号后,通过预设文本编译器对XML文件进行非法字符替换以生成目标XML文件。In an embodiment, the format determining unit 306 is specifically configured to acquire a type of the text font in the XML file, and determine whether the type of the text font is compatible with the encoding format. If the format determining unit 306 determines that the text font of the XML file is compatible with the encoding format, then the encoding format conversion is not required. At this time, the format determining unit 306 sends a first signal to the character replacing unit 301, and the character replacing unit After receiving the first signal, the 301 performs an illegal character replacement on the XML file by using a preset text compiler to generate a target XML file.
编码转换单元307,用于若所述XML文件的文本字体与所述编码格式不兼容,通过预设字符集编码转换命令对所述XML文件进行编码格式转换;以及将编码格式转换后的XML文件设为所述XML文件。The encoding conversion unit 307 is configured to: if the text font of the XML file is incompatible with the encoding format, perform encoding format conversion on the XML file by using a preset character set encoding conversion command; and convert the encoded format XML file Set to the XML file.
编码转换单元307可以通过piconv命令以字节流的方式对XML文件进行编码格式转换。The transcoding unit 307 can perform encoding format conversion on the XML file in a byte stream by the piconv command.
编码转换单元307在将XML文件进行编码格式转换后,将编码格式转换后的XML文件替换掉原来的XML文件,然后编码转换单元307向字符替换单元301发送第三信号,以使得该字符替换单元301在接收到第三信号后,通过预设文本编译器对XML文件进行非法字符替换以生成目标XML文件。The encoding conversion unit 307 replaces the original format XML file with the XML file after the encoding format conversion of the XML file, and then the encoding conversion unit 307 sends a third signal to the character replacement unit 301 so that the character replacement unit After receiving the third signal, the 301 performs an illegal character replacement on the XML file by using a preset text compiler to generate a target XML file.
加载单元302,用于根据预设加载规则逐个加载所述目标XML文件中的子节点对象至内存中。The loading unit 302 is configured to load the child node object in the target XML file into the memory one by one according to a preset loading rule.
在一实施例中,加载单元302具体用于当识别出所述目标XML文件中根节点对象的起始标记符时,以所述根节点对象的下一级的子节点对象为加载单位,逐个将所述目标XML文件中的子节点对象加载至内存中,其中,所述根节点对象包括至少一个所述子节点对象。In an embodiment, the loading unit 302 is specifically configured to: when the start tag of the root node object in the target XML file is identified, the child node object of the next level of the root node object is used as a loading unit, and one by one The child node object in the target XML file is loaded into memory, wherein the root node object includes at least one of the child node objects.
解析单元303,用于对所述子节点对象按照标签文本对形式进行解析以生成所述子节点对象对应的行记录数据。The parsing unit 303 is configured to parse the child node object according to a label text pair form to generate row record data corresponding to the child node object.
具体地,在一实施例中,解析单元303可以通过iterparse()方法对所述子节点对象按照标签文本对形式进行解析以生成所述子节点对象对应的行记录数据。其中,iterparse()方法为Python语言中的ElementTree中的方法。Specifically, in an embodiment, the parsing unit 303 may parse the child node object according to the label text pair form by the iterparse() method to generate row record data corresponding to the child node object. Among them, the iterparse() method is a method in ElementTree in the Python language.
在一实施例中,为了避免内存溢出,如图5所示,该XML文件解析装置300还包括存储单元308和内存释放单元309。存储单元308用于将所述行记录数据存储在行记录数据文件中。内存释放单元309用于释放所述子节点对象所占用的内存空间。In an embodiment, in order to avoid memory overflow, as shown in FIG. 5, the XML file parsing apparatus 300 further includes a storage unit 308 and a memory release unit 309. The storage unit 308 is configured to store the line record data in a line record data file. The memory release unit 309 is configured to release the memory space occupied by the child node object.
对象判断单元304,用于判断所述子节点对象是否为所述目标XML文件中最后一个子节点对象。The object determining unit 304 is configured to determine whether the child node object is the last child node object in the target XML file.
具体地,在一实施例中,如图6所示,图6为图4所示的XML文件解析装置的具体示意性框图。该对象判断单元304包括读取子单元3041和判断子单元3042。读取子单元3041用于读取所述目标XML文件中位于当前的所述子节点对象之后的字符串。判断子单元3042用于判断所述字符串是否为根节点对象的结束标记符。若判断子单元3042判断出所述字符串不为所述根节点对象的结束标记符,则判定当前的所述子节点对象不是所述目标XML文件中最后一个子节点对象。若判断子单元3042判断出所述字符串为所述根节点对象的结束标记符,则判定当前的所述子节点对象是所述目标XML文件中最后一个子节点对象。Specifically, in an embodiment, as shown in FIG. 6, FIG. 6 is a specific schematic block diagram of the XML file parsing apparatus shown in FIG. The object judging unit 304 includes a reading sub-unit 3041 and a judging sub-unit 3042. The reading subunit 3041 is configured to read a character string in the target XML file that is located after the current child node object. The determining subunit 3042 is configured to determine whether the character string is an end tag of the root node object. If the determining subunit 3042 determines that the character string is not the end tag of the root node object, it is determined that the current child node object is not the last child node object in the target XML file. If the determining subunit 3042 determines that the character string is the end tag of the root node object, it is determined that the current child node object is the last child node object in the target XML file.
需要说明的是,所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的XML文件解析装置300和单元的具体工作过程,可以参考前述XML文件解析方法实施例中的对应过程,在此不再赘述。It should be noted that those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the XML file parsing apparatus 300 and the unit described above can refer to the corresponding embodiment in the foregoing XML file parsing method embodiment. The process will not be repeated here.
本实施例中的XML文件解析装置300,逐个加载子节点对象至内存并进行解析,可以使得解析过程所占内存较小,避免内存溢出或内存不够用的情况发 生。同时,该XML文件解析装置300可以实现并行解析几个或几十个XML文件,进而大大缩短解析所耗时间,提高解析效率。The XML file parsing apparatus 300 in this embodiment loads the child node object into the memory one by one and parses it, so that the memory occupied by the parsing process is small, and the memory overflow or the insufficient memory is avoided. At the same time, the XML file parsing apparatus 300 can realize parsing several or dozens of XML files in parallel, thereby greatly shortening the time taken for parsing and improving the parsing efficiency.
上述XML文件解析装置可以实现为一种计算机程序的形式,该计算机程序可以在如图7所示的计算机设备上运行。请参阅图7,图7是本申请实施例提供的一种计算机设备的示意性框图。该计算机设备400可以是终端。该终端可以是平板电脑、笔记本电脑、台式电脑、个人数字助理等电子设备。The above XML file parsing apparatus can be implemented in the form of a computer program which can be run on a computer device as shown in FIG. Please refer to FIG. 7. FIG. 7 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 400 can be a terminal. The terminal can be an electronic device such as a tablet computer, a notebook computer, a desktop computer, or a personal digital assistant.
该计算机设备400包括通过系统总线401连接的处理器402、存储器和网络接口405,其中,存储器可以包括非易失性存储介质403和内存储器404。The computer device 400 includes a processor 402, a memory, and a network interface 405 that are coupled by a system bus 401, where the memory can include a non-volatile storage medium 403 and an internal memory 404.
该非易失性存储介质403可存储操作系统4031和计算机程序4032。该计算机程序4032包括程序指令,该程序指令被执行时,可使得处理器402执行一种XML文件解析方法。The non-volatile storage medium 403 can store an operating system 4031 and a computer program 4032. The computer program 4032 includes program instructions that, when executed, cause the processor 402 to perform an XML file parsing method.
该处理器402用于提供计算和控制能力,支撑整个计算机设备400的运行。The processor 402 is used to provide computing and control capabilities to support the operation of the entire computer device 400.
该内存储器404为非易失性存储介质403中的计算机程序4032的运行提供环境,该计算机程序4032被处理器402执行时,可使得处理器402执行一种XML文件解析方法。The internal memory 404 provides an environment for the operation of the computer program 4032 in the non-volatile storage medium 403, which when executed by the processor 402, causes the processor 402 to perform an XML file parsing method.
该网络接口405用于进行网络通信,如发送分配的任务等。本领域技术人员可以理解,图7中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备400的限定,具体的计算机设备400可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。The network interface 405 is used for network communication, such as sending assigned tasks and the like. It will be understood by those skilled in the art that the structure shown in FIG. 7 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation of the computer device 400 to which the solution of the present application is applied, and a specific computer device. 400 may include more or fewer components than shown in the figures, or some components may be combined, or have different component arrangements.
其中,所述处理器402用于运行存储在存储器中的计算机程序4032,以实现如下功能:通过预设文本编译器对XML文件进行非法字符替换以生成目标XML文件;根据预设加载规则逐个加载所述目标XML文件中的子节点对象至内存中;对所述子节点对象按照标签文本对形式进行解析以生成所述子节点对象对应的行记录数据;判断所述子节点对象是否为所述目标XML文件中最后一个子节点对象;若所述子节点对象不是所述目标XML文件中最后一个子节点对象,加载下一个所述子节点对象至所述内存,并返回执行对所述子节点对象按照标签文本对形式进行解析的步骤,直至解析出所述目标XML文件中全部所述子节点对象对应的行记录数据为止。The processor 402 is configured to run a computer program 4032 stored in the memory to implement the following functions: performing illegal character replacement on the XML file by using a preset text compiler to generate a target XML file; loading one by one according to a preset loading rule. The child node object in the target XML file is in memory; the child node object is parsed according to the label text pair form to generate row record data corresponding to the child node object; and determining whether the child node object is the The last child node object in the target XML file; if the child node object is not the last child node object in the target XML file, loading the next child node object into the memory, and returning execution to the child node The object parses the form of the tag text pair until the line record data corresponding to all the child node objects in the target XML file is parsed.
在一实施例中,处理器402在执行通过预设文本编译器对XML文件进行非 法字符替换以生成目标XML文件之前,还执行如下程序:获取XML文件的编码格式;判断所述XML文件的文本字体是否与所述编码格式兼容;若所述XML文件的文本字体与所述编码格式兼容,则执行通过预设文本编译器对XML文件进行非法字符替换以生成目标XML文件的步骤;若所述XML文件的文本字体与所述编码格式不兼容,通过预设字符集编码转换命令对所述XML文件进行编码格式转换;以及将编码格式转换后的XML文件设为所述XML文件,并执行通过预设文本编译器对XML文件进行非法字符替换以生成目标XML文件的步骤。In an embodiment, before executing the illegal character replacement of the XML file by the preset text compiler to generate the target XML file, the processor 402 further executes the following procedure: acquiring an encoding format of the XML file; determining the text of the XML file Whether the font is compatible with the encoding format; if the text font of the XML file is compatible with the encoding format, performing a step of performing illegal character replacement on the XML file by the preset text compiler to generate the target XML file; The text font of the XML file is incompatible with the encoding format, and the XML file is encoded and converted by a preset character set encoding conversion command; and the XML file converted by the encoding format is set as the XML file, and executed. The step of the default text compiler performing illegal character substitution on the XML file to generate the target XML file.
在一实施例中,处理器402在执行根据预设加载规则逐个加载所述目标XML文件中的子节点对象至内存中时,具体执行如下程序:当识别出所述目标XML文件中根节点对象的起始标记符时,以所述根节点对象的下一级的子节点对象为加载单位,逐个将所述目标XML文件中的子节点对象加载至内存中,其中,所述根节点对象包括至少一个所述子节点对象。In an embodiment, when the processor 402 loads the child node object in the target XML file into the memory one by one according to the preset loading rule, the processor specifically executes the following procedure: when the root node object in the target XML file is identified When the tag is started, the child node object in the target XML file is loaded into the memory one by one by using the child node object of the next level of the root node object as a loading unit, wherein the root node object includes at least One of the child node objects.
在一实施例中,处理器402在执行判断所述子节点对象是否为所述目标XML文件中最后一个子节点对象时,具体执行如下程序:读取所述目标XML文件中位于当前的所述子节点对象之后的字符串;判断所述字符串是否为根节点对象的结束标记符;若所述字符串不为所述根节点对象的结束标记符,则判定当前的所述子节点对象不是所述目标XML文件中最后一个子节点对象。In an embodiment, when performing the determining whether the child node object is the last child node object in the target XML file, the processor 402 specifically executes the following process: reading the current target file in the target XML file. a string after the child node object; determining whether the string is an end tag of the root node object; if the string is not an end tag of the root node object, determining that the current child node object is not The last child node object in the target XML file.
在一实施例中,处理器402在执行对所述子节点对象按照标签文本对形式进行解析以生成所述子节点对象对应的行记录数据之后,还执行如下程序:将所述行记录数据存储在行记录数据文件中;以及释放所述子节点对象所占用的内存空间。In an embodiment, after performing processing on the child node object according to the form of the label text pair to generate the line record data corresponding to the child node object, the processor 402 further executes the following procedure: storing the line record data In the row record data file; and release the memory space occupied by the child node object.
在一实施例中,处理器402在执行对所述子节点对象按照标签文本对形式进行解析以生成所述子节点对象对应的行记录数据时,具体执行如下程序:通过iterparse()方法对所述子节点对象按照标签文本对形式进行解析以生成所述子节点对象对应的行记录数据。In an embodiment, when the processor 402 performs the parsing of the sub-object object according to the label text pair form to generate the line record data corresponding to the sub-node object, the following program is specifically executed: the iterparse() method is used to The child node object is parsed according to the form of the tag text pair to generate row record data corresponding to the child node object.
应当理解,在本申请实施例中,处理器402可以是中央处理单元,还可以是其他通用处理器、数字信号处理器、专用集成电路、现成可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that in the embodiment of the present application, the processor 402 may be a central processing unit, and may also be other general purpose processors, digital signal processors, application specific integrated circuits, off-the-shelf programmable gate arrays or other programmable logic devices, and discrete gates. Or transistor logic devices, discrete hardware components, and so on. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
在本申请的另一实施例中提供一种存储介质。该存储介质可以为计算机可读存储介质。该存储介质存储有计算机程序,其中计算机程序包括程序指令。该程序指令被处理器执行时使处理器执行本申请中的一种XML文件解析方法。该存储介质可以是U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、闪存、磁碟或者光盘等各种可以存储程序代码的介质。In another embodiment of the present application, a storage medium is provided. The storage medium can be a computer readable storage medium. The storage medium stores a computer program, wherein the computer program includes program instructions. The program instructions, when executed by the processor, cause the processor to perform an XML file parsing method in the present application. The storage medium may be a medium that can store program codes, such as a USB flash drive, a removable hard disk, a Read-Only Memory (ROM), a flash memory, a magnetic disk, or an optical disk.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现。Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。本申请实施例方法中的步骤可以根据实际需要进行顺序调整、合并和删减。本申请实施例装置中的单元可以根据实际需要进行合并、划分和删减。另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以是两个或两个以上单元集成在一个单元中。In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, multiple units or components may be combined or integrated into another system, or some features may be omitted or not implemented. The steps in the method of the embodiment of the present application may be sequentially adjusted, merged, and deleted according to actual needs. The units in the apparatus of the embodiment of the present application may be combined, divided, and deleted according to actual needs. In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
该集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,终端,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。The integrated unit can be stored in a storage medium if it is implemented in the form of a software functional unit and sold or used as a standalone product. Based on such understanding, the technical solution of the present application may be in essence or part of the contribution to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium. There are a number of instructions for causing a computer device (which may be a personal computer, terminal, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。The foregoing is only a specific embodiment of the present application, but the scope of protection of the present application is not limited thereto, and any equivalents can be easily conceived by those skilled in the art within the technical scope disclosed in the present application. Modifications or substitutions are intended to be included within the scope of the present application. Therefore, the scope of protection of this application should be determined by the scope of protection of the claims.

Claims (20)

  1. 一种XML文件解析方法,其包括:An XML file parsing method, comprising:
    通过预设文本编译器对XML文件进行非法字符替换以生成目标XML文件;The illegal text substitution of the XML file by the preset text compiler to generate the target XML file;
    根据预设加载规则逐个加载所述目标XML文件中的子节点对象至内存中;Loading the child node object in the target XML file into the memory one by one according to a preset loading rule;
    对所述子节点对象按照标签文本对形式进行解析以生成所述子节点对象对应的行记录数据;Parsing the child node object according to the label text pair form to generate row record data corresponding to the child node object;
    判断所述子节点对象是否为所述目标XML文件中最后一个子节点对象;Determining whether the child node object is the last child node object in the target XML file;
    若所述子节点对象不是所述目标XML文件中最后一个子节点对象,加载下一个所述子节点对象至所述内存,并返回执行对所述子节点对象按照标签文本对形式进行解析的步骤,直至解析出所述目标XML文件中全部所述子节点对象对应的行记录数据为止。If the child node object is not the last child node object in the target XML file, loading the next child node object into the memory, and returning to perform the step of parsing the child node object according to the label text pair form Until the row record data corresponding to all the child node objects in the target XML file is parsed.
  2. 根据权利要求1所述的XML文件解析方法,其中,在所述通过预设文本编译器对XML文件进行非法字符替换以生成目标XML文件之前,还包括:The XML file parsing method according to claim 1, wherein before the illegal character replacement of the XML file by the preset text compiler to generate the target XML file, the method further includes:
    获取XML文件的编码格式;Get the encoding format of the XML file;
    判断所述XML文件的文本字体是否与所述编码格式兼容;Determining whether the text font of the XML file is compatible with the encoding format;
    若所述XML文件的文本字体与所述编码格式兼容,则执行通过预设文本编译器对XML文件进行非法字符替换以生成目标XML文件的步骤;If the text font of the XML file is compatible with the encoding format, performing a step of performing illegal character replacement on the XML file by using a preset text compiler to generate a target XML file;
    若所述XML文件的文本字体与所述编码格式不兼容,通过预设字符集编码转换命令对所述XML文件进行编码格式转换;以及If the text font of the XML file is incompatible with the encoding format, encoding the XML file by using a preset character set encoding conversion command;
    将编码格式转换后的XML文件设为所述XML文件,并执行通过预设文本编译器对XML文件进行非法字符替换以生成目标XML文件的步骤。The encoded format converted XML file is set as the XML file, and a step of performing illegal character replacement on the XML file by the preset text compiler to generate the target XML file is performed.
  3. 根据权利要求1所述的XML文件解析方法,其中,所述根据预设加载规则逐个加载所述目标XML文件中的子节点对象至内存中,包括:The method for parsing an XML file according to claim 1, wherein the loading the child node object in the target XML file into the memory one by one according to a preset loading rule comprises:
    当识别出所述目标XML文件中根节点对象的起始标记符时,以所述根节点对象的下一级的子节点对象为加载单位,逐个将所述目标XML文件中的子节点对象加载至内存中,其中,所述根节点对象包括至少一个所述子节点对象。When the start tag of the root node object in the target XML file is identified, the child node object in the target XML file is loaded one by one by using the child node object of the next level of the root node object as a loading unit. In memory, wherein the root node object includes at least one of the child node objects.
  4. 根据权利要求1所述的XML文件解析方法,其中,所述判断所述子节点对象是否为所述目标XML文件中最后一个子节点对象,包括:The XML file parsing method according to claim 1, wherein the determining whether the child node object is the last child node object in the target XML file comprises:
    读取所述目标XML文件中位于当前的所述子节点对象之后的字符串;Reading a character string in the target XML file that is located after the current child node object;
    判断所述字符串是否为根节点对象的结束标记符;Determining whether the string is an end tag of a root node object;
    若所述字符串不为所述根节点对象的结束标记符,则判定当前的所述子节点对象不是所述目标XML文件中最后一个子节点对象。If the character string is not the end tag of the root node object, it is determined that the current child node object is not the last child node object in the target XML file.
  5. 根据权利要求1所述的XML文件解析方法,其中,在所述对所述子节点对象按照标签文本对形式进行解析以生成所述子节点对象对应的行记录数据之后,还包括:The XML file parsing method according to claim 1, wherein after the parsing of the sub-object object in accordance with the form of the label text pair to generate the line record data corresponding to the child node object, the method further includes:
    将所述行记录数据存储在行记录数据文件中;以及Storing the line record data in a line record data file;
    释放所述子节点对象所占用的内存空间。Free up the memory space occupied by the child node object.
  6. 根据权利要求1所述的XML文件解析方法,其中,所述对所述子节点对象按照标签文本对形式进行解析以生成所述子节点对象对应的行记录数据,包括:通过iterparse()方法对所述子节点对象按照标签文本对形式进行解析以生成所述子节点对象对应的行记录数据。The XML file parsing method according to claim 1, wherein the parsing the sub-node object according to the form of the tag text pair to generate the row record data corresponding to the child node object comprises: using the iterparse() method The child node object is parsed according to a label text pair form to generate row record data corresponding to the child node object.
  7. 根据权利要求2所述的XML文件解析方法,其中,所述通过预设字符集编码转换命令对所述XML文件进行编码格式转换,包括:通过piconv命令以字节流的方式对所述XML文件进行编码格式转换。The XML file parsing method according to claim 2, wherein the encoding format conversion of the XML file by using a preset character set encoding conversion command comprises: using a piconv command to stream the XML file in a byte stream manner Perform encoding format conversion.
  8. 一种XML文件解析装置,其包括:An XML file parsing apparatus includes:
    字符替换单元,用于通过预设文本编译器对XML文件进行非法字符替换以生成目标XML文件;a character replacement unit, configured to perform an illegal character replacement on the XML file by using a preset text compiler to generate a target XML file;
    加载单元,用于根据预设加载规则逐个加载所述目标XML文件中的子节点对象至内存中;a loading unit, configured to load the child node object in the target XML file into the memory one by one according to a preset loading rule;
    解析单元,用于对所述子节点对象按照标签文本对形式进行解析以生成所述子节点对象对应的行记录数据;a parsing unit, configured to parse the child node object according to a label text pair form to generate row record data corresponding to the child node object;
    对象判断单元,用于判断所述子节点对象是否为所述目标XML文件中最后一个子节点对象;An object determining unit, configured to determine whether the child node object is the last child node object in the target XML file;
    所述加载单元,还用于若所述子节点对象不是所述目标XML文件中最后一个子节点对象,加载下一个所述子节点对象至所述内存,以使得所述解析单元对所述子节点对象按照标签文本对形式进行解析,直至解析出所述目标XML文件中全部所述子节点对象对应的行记录数据为止。The loading unit is further configured to: if the child node object is not the last child node object in the target XML file, load the next one of the child node objects into the memory, so that the parsing unit is to the child The node object is parsed according to the form of the label text until the row record data corresponding to all the child node objects in the target XML file is parsed.
  9. 根据权利要求8所述的XML文件解析装置,其中,还包括:The XML file parsing apparatus according to claim 8, further comprising:
    格式获取单元,用于获取XML文件的编码格式;a format obtaining unit, configured to obtain an encoding format of the XML file;
    格式判断单元,用于判断所述XML文件的文本字体是否与所述编码格式兼容;a format determining unit, configured to determine whether a text font of the XML file is compatible with the encoding format;
    编码转换单元,用于若所述XML文件的文本字体与所述编码格式不兼容,通过预设字符集编码转换命令对所述XML文件进行编码格式转换;以及将编码格式转换后的XML文件设为所述XML文件;a code conversion unit, configured to: if the text font of the XML file is incompatible with the encoding format, perform encoding format conversion on the XML file by using a preset character set encoding conversion command; and set the XML file after converting the encoding format For the XML file;
    所述字符替换单元,具体用于在所述编码转换单元将编码格式转换后的XML文件设为所述XML文件之后,通过预设文本编译器对XML文件进行非法字符替换以生成目标XML文件;The character replacement unit is specifically configured to: after the encoding conversion unit sets the XML file converted by the encoding format into the XML file, perform an illegal character replacement on the XML file by using a preset text compiler to generate a target XML file;
    所述字符替换单元,还具体用于若所述XML文件的文本字体与所述编码格式兼容,则通过预设文本编译器对XML文件进行非法字符替换以生成目标XML文件。The character replacement unit is further configured to: if the text font of the XML file is compatible with the encoding format, perform an illegal character replacement on the XML file by using a preset text compiler to generate a target XML file.
  10. 一种计算机设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现:通过预设文本编译器对XML文件进行非法字符替换以生成目标XML文件;根据预设加载规则逐个加载所述目标XML文件中的子节点对象至内存中;对所述子节点对象按照标签文本对形式进行解析以生成所述子节点对象对应的行记录数据;判断所述子节点对象是否为所述目标XML文件中最后一个子节点对象;若所述子节点对象不是所述目标XML文件中最后一个子节点对象,加载下一个所述子节点对象至所述内存,并返回执行对所述子节点对象按照标签文本对形式进行解析的步骤,直至解析出所述目标XML文件中全部所述子节点对象对应的行记录数据为止。A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the computer program by: presetting a text compiler Performing illegal character replacement on the XML file to generate a target XML file; loading the child node object in the target XML file into the memory one by one according to a preset loading rule; parsing the child node object according to the form of the label text to generate the Determining the row record data corresponding to the child node object; determining whether the child node object is the last child node object in the target XML file; if the child node object is not the last child node object in the target XML file, loading Passing the next child node object to the memory, and returning to perform the step of parsing the child node object according to the label text pair form until the row record corresponding to all the child node objects in the target XML file is parsed The data is up.
  11. 根据权利要求10所述的计算机设备,其中,所述处理器执行通过预设文本编译器对XML文件进行非法字符替换以生成目标XML文件之前,还实现:获取XML文件的编码格式;判断所述XML文件的文本字体是否与所述编码格式兼容;若所述XML文件的文本字体与所述编码格式兼容,则执行通过预设文本编译器对XML文件进行非法字符替换以生成目标XML文件的步骤;若所述XML文件的文本字体与所述编码格式不兼容,通过预设字符集编码转换命令对所述XML文件进行编码格式转换;以及将编码格式转换后的XML文件设为所述XML文件,并执行通过预设文本编译器对XML文件进行非法字符替换以生 成目标XML文件的步骤。The computer device according to claim 10, wherein the processor performs an illegal character replacement of the XML file by the preset text compiler to generate the target XML file, and further: obtaining an encoding format of the XML file; determining the Whether the text font of the XML file is compatible with the encoding format; if the text font of the XML file is compatible with the encoding format, performing the step of performing illegal character replacement on the XML file by the preset text compiler to generate the target XML file If the text font of the XML file is incompatible with the encoding format, encoding the XML file by using a preset character set encoding conversion command; and setting the XML file converted by the encoding format to the XML file And performing the step of illegally replacing the XML file by the default text compiler to generate the target XML file.
  12. 根据权利要求10所述的计算机设备,其中,所述处理器执行根据预设加载规则逐个加载所述目标XML文件中的子节点对象至内存中时,具体实现:当识别出所述目标XML文件中根节点对象的起始标记符时,以所述根节点对象的下一级的子节点对象为加载单位,逐个将所述目标XML文件中的子节点对象加载至内存中,其中,所述根节点对象包括至少一个所述子节点对象。The computer device according to claim 10, wherein when the processor executes loading the child node object in the target XML file into the memory one by one according to a preset loading rule, the specific implementation is: when the target XML file is identified When the start tag of the root node object is the child node object of the next level of the root node object, the child node object in the target XML file is loaded into the memory one by one, wherein the root The node object includes at least one of the child node objects.
  13. 根据权利要求10所述的计算机设备,其中,所述处理器执行判断所述子节点对象是否为所述目标XML文件中最后一个子节点对象时,具体实现:读取所述目标XML文件中位于当前的所述子节点对象之后的字符串;判断所述字符串是否为根节点对象的结束标记符;若所述字符串不为所述根节点对象的结束标记符,则判定当前的所述子节点对象不是所述目标XML文件中最后一个子节点对象。The computer device according to claim 10, wherein, when the processor performs determining whether the child node object is the last child node object in the target XML file, the specific implementation is: reading the target XML file located a character string after the current child node object; determining whether the character string is an end tag of the root node object; if the character string is not an end tag of the root node object, determining the current The child node object is not the last child node object in the target XML file.
  14. 根据权利要求10所述的计算机设备,其中,所述处理器执行对所述子节点对象按照标签文本对形式进行解析以生成所述子节点对象对应的行记录数据之后,还实现:将所述行记录数据存储在行记录数据文件中;以及释放所述子节点对象所占用的内存空间。The computer device according to claim 10, wherein after the processor performs the parsing of the child node object in the form of the label text pair to generate the line record data corresponding to the child node object, the method further implements: The row record data is stored in the row record data file; and the memory space occupied by the child node object is released.
  15. 根据权利要求10所述的计算机设备,其中,所述处理器执行对所述子节点对象按照标签文本对形式进行解析以生成所述子节点对象对应的行记录数据时,具体实现:通过iterparse()方法对所述子节点对象按照标签文本对形式进行解析以生成所述子节点对象对应的行记录数据。The computer device according to claim 10, wherein the processor performs a parsing of the form of the tag text pair to generate the row record data corresponding to the child node object, and the specific implementation is: by iterparse ( The method parses the child node object according to the form of the tag text pair to generate row record data corresponding to the child node object.
  16. 一种存储介质,其中,所述存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行:通过预设文本编译器对XML文件进行非法字符替换以生成目标XML文件;根据预设加载规则逐个加载所述目标XML文件中的子节点对象至内存中;对所述子节点对象按照标签文本对形式进行解析以生成所述子节点对象对应的行记录数据;判断所述子节点对象是否为所述目标XML文件中最后一个子节点对象;若所述子节点对象不是所述目标XML文件中最后一个子节点对象,加载下一个所述子节点对象至所述内存,并返回执行对所述子节点对象按照标签文本对形式进行解析的步骤,直至解析出所述目标XML文件中全部所述子节点对象对应的行记录数据为止。A storage medium, wherein the storage medium stores a computer program, the computer program comprising program instructions that, when executed by a processor, cause the processor to execute: by a preset text compiler to an XML file Performing illegal character replacement to generate a target XML file; loading the child node object in the target XML file into the memory one by one according to a preset loading rule; parsing the child node object according to the label text pair form to generate the child node The row corresponding to the object records data; determining whether the child node object is the last child node object in the target XML file; if the child node object is not the last child node object in the target XML file, loading the next node Deriving the child node object to the memory, and returning to perform the step of parsing the child node object according to the form of the label text pair until the line record data corresponding to all the child node objects in the target XML file is parsed.
  17. 根据权利要求16所述的存储介质,其中,所述程序指令当被处理器执行通过预设文本编译器对XML文件进行非法字符替换以生成目标XML文件之前,还使所述处理器执行:获取XML文件的编码格式;判断所述XML文件的文本字体是否与所述编码格式兼容;若所述XML文件的文本字体与所述编码格式兼容,则执行通过预设文本编译器对XML文件进行非法字符替换以生成目标XML文件的步骤;若所述XML文件的文本字体与所述编码格式不兼容,通过预设字符集编码转换命令对所述XML文件进行编码格式转换;以及将编码格式转换后的XML文件设为所述XML文件,并执行通过预设文本编译器对XML文件进行非法字符替换以生成目标XML文件的步骤。The storage medium of claim 16, wherein the program instructions, when executed by the processor, perform an illegal character replacement of the XML file by the preset text compiler to generate the target XML file, further causing the processor to execute: acquire An encoding format of the XML file; determining whether the text font of the XML file is compatible with the encoding format; if the text font of the XML file is compatible with the encoding format, performing an illegal XML file by using a preset text compiler a step of replacing a character to generate a target XML file; if the text font of the XML file is incompatible with the encoding format, encoding the XML file by a preset character set encoding conversion command; and converting the encoding format The XML file is set to the XML file, and the step of performing illegal character replacement on the XML file by the preset text compiler to generate the target XML file is performed.
  18. 根据权利要求16所述的存储介质,其中,所述程序指令当被处理器执行根据预设加载规则逐个加载所述目标XML文件中的子节点对象至内存中时使所述处理器执行:当识别出所述目标XML文件中根节点对象的起始标记符时,以所述根节点对象的下一级的子节点对象为加载单位,逐个将所述目标XML文件中的子节点对象加载至内存中,其中,所述根节点对象包括至少一个所述子节点对象。The storage medium of claim 16, wherein the program instructions, when executed by the processor to load the child node objects in the target XML file one by one according to a preset loading rule, cause the processor to execute: When the start tag of the root node object in the target XML file is identified, the child node object in the target XML file is loaded into the memory one by one by using the child node object of the next level of the root node object as a loading unit. Where the root node object includes at least one of the child node objects.
  19. 根据权利要求16所述的存储介质,其中,所述程序指令当被处理器执行判断所述子节点对象是否为所述目标XML文件中最后一个子节点对象时使所述处理器执行:读取所述目标XML文件中位于当前的所述子节点对象之后的字符串;判断所述字符串是否为根节点对象的结束标记符;若所述字符串不为所述根节点对象的结束标记符,则判定当前的所述子节点对象不是所述目标XML文件中最后一个子节点对象。The storage medium of claim 16, wherein the program instructions, when executed by the processor to determine whether the child node object is the last child node object in the target XML file, cause the processor to execute: reading a character string in the target XML file that is located after the current child node object; determining whether the character string is an end tag of the root node object; if the character string is not an end tag of the root node object And determining that the current child node object is not the last child node object in the target XML file.
  20. 根据权利要求16所述的存储介质,其中,所述程序指令当被处理器执行对所述子节点对象按照标签文本对形式进行解析以生成所述子节点对象对应的行记录数据之后,还使所述处理器执行:将所述行记录数据存储在行记录数据文件中;以及释放所述子节点对象所占用的内存空间。The storage medium according to claim 16, wherein said program instruction, after being executed by said processor to parse said child node object in accordance with a form of a tag text pair to generate line record data corresponding to said child node object, further The processor executes: storing the row record data in a line record data file; and releasing a memory space occupied by the child node object.
PCT/CN2018/084227 2018-01-30 2018-04-24 Xml file parsing method, device, computer apparatus, and storage medium WO2019148671A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810091360.5A CN108255494A (en) 2018-01-30 2018-01-30 A kind of XML file analytic method, device, computer equipment and storage medium
CN201810091360.5 2018-01-30

Publications (1)

Publication Number Publication Date
WO2019148671A1 true WO2019148671A1 (en) 2019-08-08

Family

ID=62743469

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/084227 WO2019148671A1 (en) 2018-01-30 2018-04-24 Xml file parsing method, device, computer apparatus, and storage medium

Country Status (2)

Country Link
CN (1) CN108255494A (en)
WO (1) WO2019148671A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111124479A (en) * 2019-12-18 2020-05-08 北京像素软件科技股份有限公司 Configuration file analysis method and system and electronic equipment
CN112149391A (en) * 2020-09-28 2020-12-29 平安证券股份有限公司 Information processing method, information processing apparatus, terminal device, and storage medium
CN113761283A (en) * 2020-06-01 2021-12-07 中移(苏州)软件技术有限公司 Method, device, equipment and storage medium for reading XML file

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109561209A (en) * 2018-11-22 2019-04-02 努比亚技术有限公司 A kind of font switching method, terminal and computer readable storage medium
CN109710808B (en) * 2018-12-28 2023-04-28 深圳市元征科技股份有限公司 XML file analysis method, system, device and readable storage medium
CN111597389B (en) * 2019-02-21 2024-02-06 上海微电子装备(集团)股份有限公司 Data processing method, device, equipment and storage medium
CN110674093A (en) * 2019-08-28 2020-01-10 金蝶汽车网络科技有限公司 File data processing method and device, computer equipment and storage medium
CN110737409B (en) * 2019-10-21 2023-09-26 网易(杭州)网络有限公司 Data loading method and device and terminal equipment
CN111339370B (en) * 2019-12-11 2023-08-11 山东航空股份有限公司 Quick decoding method for airplane QAR data
CN111427899A (en) * 2020-03-17 2020-07-17 中国建设银行股份有限公司 Method, device, equipment and computer readable medium for storing file
CN112540958B (en) * 2020-12-08 2023-08-29 北京百度网讯科技有限公司 File processing method, device, equipment and computer storage medium
CN113033150A (en) * 2021-03-18 2021-06-25 深圳市元征科技股份有限公司 Method and device for coding program text and storage medium
CN113051333B (en) * 2021-04-21 2023-10-13 中国平安财产保险股份有限公司 Data processing method and device, electronic equipment and storage medium
CN113949749B (en) * 2021-10-15 2024-04-02 中国农业银行股份有限公司 XML message processing method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7499915B2 (en) * 2004-04-09 2009-03-03 Oracle International Corporation Index for accessing XML data
CN104112015A (en) * 2014-07-19 2014-10-22 国家电网公司 DOM (document object model) and XML (extensible markup language) path language based intelligent substation SCD (substation configuration description) file parsing method
CN106469137A (en) * 2015-08-19 2017-03-01 互联网域名系统北京市工程研究中心有限公司 XML document analysis method and device
CN107577460A (en) * 2017-08-29 2018-01-12 苏州优圣美智能系统有限公司 A kind of method from unstructured data extraction structural data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107291673A (en) * 2017-05-19 2017-10-24 广州视源电子科技股份有限公司 A kind of processing method of document, system, readable storage medium storing program for executing and computer equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7499915B2 (en) * 2004-04-09 2009-03-03 Oracle International Corporation Index for accessing XML data
CN104112015A (en) * 2014-07-19 2014-10-22 国家电网公司 DOM (document object model) and XML (extensible markup language) path language based intelligent substation SCD (substation configuration description) file parsing method
CN106469137A (en) * 2015-08-19 2017-03-01 互联网域名系统北京市工程研究中心有限公司 XML document analysis method and device
CN107577460A (en) * 2017-08-29 2018-01-12 苏州优圣美智能系统有限公司 A kind of method from unstructured data extraction structural data

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111124479A (en) * 2019-12-18 2020-05-08 北京像素软件科技股份有限公司 Configuration file analysis method and system and electronic equipment
CN111124479B (en) * 2019-12-18 2024-03-22 北京像素软件科技股份有限公司 Method and system for analyzing configuration file and electronic equipment
CN113761283A (en) * 2020-06-01 2021-12-07 中移(苏州)软件技术有限公司 Method, device, equipment and storage medium for reading XML file
CN113761283B (en) * 2020-06-01 2023-09-05 中移(苏州)软件技术有限公司 Method and device for reading XML file, equipment and storage medium
CN112149391A (en) * 2020-09-28 2020-12-29 平安证券股份有限公司 Information processing method, information processing apparatus, terminal device, and storage medium
CN112149391B (en) * 2020-09-28 2023-06-09 平安证券股份有限公司 Information processing method, information processing apparatus, terminal device, and storage medium

Also Published As

Publication number Publication date
CN108255494A (en) 2018-07-06

Similar Documents

Publication Publication Date Title
WO2019148671A1 (en) Xml file parsing method, device, computer apparatus, and storage medium
US10755093B2 (en) Hierarchical information extraction using document segmentation and optical character recognition correction
US11734364B2 (en) Method and system for document similarity analysis
US11036491B1 (en) Identifying and resolving firmware component dependencies
US9104797B1 (en) Efficient cloud-based annotation of crash reports
CN109359283B (en) Summarizing method of form data, terminal equipment and medium
CN106557470B (en) Data extraction method and device
CN108334609B (en) Method, device, equipment and storage medium for realizing JSON format data access in Oracle
WO2019042349A1 (en) Translation method, mobile terminal and storage device of operating system framework
CN109857389B (en) Model data generation method and device, computer equipment and storage medium
GB2498724A (en) Automatically determining File Transfer Mode
CN110780860A (en) Generation method and device of table building script, computer equipment and storage medium
CN114610957A (en) Data processing method, device, equipment and computer storage medium
US20160147474A1 (en) Writable clone data structure
CN114546432A (en) Multi-application deployment method, device, equipment and readable storage medium
US10540151B1 (en) Graphical customization of a firmware-provided user interface (UI)
CN113468118A (en) File increment storage method and device, computer equipment and storage medium
US9009172B2 (en) Methods, systems and computer readable media for comparing XML documents
CN110795920A (en) Document generation method and device
CN108089973A (en) A kind of information processing method and equipment
US8701119B1 (en) Parsing XML in software on CPU with multiple execution units
CN113868249A (en) Data storage method and device, computer equipment and storage medium
CN112364580A (en) Method and device for automatically inserting specific code into register transmission level design file
US11586583B2 (en) Systems and methods for ingesting data files using multi-threaded processing
CN108388424B (en) Method and device for calling interface data and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18904347

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 11.11.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18904347

Country of ref document: EP

Kind code of ref document: A1