WO2005101210A1

WO2005101210A1 - Data analysis device, data analysis method, data analysis program, and recording medium containing the data analysis program

Info

Publication number: WO2005101210A1
Application number: PCT/JP2005/002201
Authority: WO
Inventors: Hiroaki Zaima; Kentarou Sakakura
Original assignee: Sharp Kabushiki Kaisha
Priority date: 2004-04-09
Filing date: 2005-02-15
Publication date: 2005-10-27
Also published as: JPWO2005101210A1

Abstract

In a data analysis device, intermediate data of the data to be analyzed is spread in a data spread area on a memory little by little (S201). From the intermediate data which has been spread, a data block of a unit which can be analyzed is acquired (S203) and the data block is subjected to data analysis (S207). When the data spread area has a size smaller than the data block of the unit which can be analyzed, the data spread area size is extended (S217) and subsequently, the data is spread.

Description

Specification

Data analysis apparatus, data analysis method, data analysis program, and recording medium recording data analysis program

Technical field

The present invention relates to a data analysis apparatus, a data analysis method, a data analysis program, and a recording medium recording the data analysis program, and in particular, data analysis that can reduce the usage of a storage device at the time of data analysis. The present invention relates to an apparatus, a data analysis method, a data analysis program, and a recording medium recording the data analysis program.

Background art

[0002] In recent years, mobile information terminals such as mobile phones have been promoted to be highly functional and multifunctional, and the ability to process complex, large-volume data as was conventionally handled with personal computers is required. There is. If you try to analyze and process such data, you will tend to run out of storage capacity in portable information terminals such as mobile phones.

[0003] In a conventional portable information terminal such as a mobile phone, which is a data analysis device, a storage area is provided with a data expansion area sufficient to expand all compressed data, and all compressed data is expanded there. Analyze. In other words, if the data size after expansion is 1 Kbyte, it is necessary to prepare a data expansion area for 1 K noot in the storage unit. Therefore, when large data after expansion are handled in the portable information terminal, the capacity of the storage device of the portable information terminal may be insufficient.

Note that, as a specific example of the data structure of the data to be analyzed, the data size of which is specified, there are data structures shown in FIG. 16 and FIG. That is, as shown in FIG. 16, the data to be analyzed includes headers 1101, 1103, and 1105 for each data block 1102, 1104, and 1106. The headers 1101, 1103 and 1105 contain information on the size of the subsequent data block 1102, 1104 and 1106, respectively, or the size after expansion. In addition, data to be analyzed includes a header 1201 for each data as shown in FIG. The header 1201 contains information on the length of all subsequent data blocks 1202, 1203, 12 04 or the length after expansion. [0005] In order to solve such problems, for example, Patent Document 1 discloses a multimedia data processing system capable of transmitting a large amount of data continuously. The system controls data streaming and provides continuous streaming in real time.

Patent Document 1: Japanese Patent Application Laid-Open No. 5-265781

Disclosure of the invention

Problem that invention tries to solve

[0006] Data used in streaming as disclosed in Patent Document 1 is deleted from the analyzed part of the data so that playback can be started without receiving all the data. As you can, the data block to be analyzed has a completely independent data structure.

However, when analyzing data in a data structure that does not have such completely independent data blocks, all data to be analyzed can not be deleted at all until analysis of all data is complete. Therefore, if the data volume to be analyzed is large, there is a problem that the storage area runs out quickly.

Also, if the data to be analyzed does not have completely independent data blocks and the data having a data structure is compressed or encoded, as described above, the data to be analyzed is There is also a problem that a very large data expansion area must be prepared relative to the data volume to be analyzed in order to perform analysis completely by decompression or decoding o

The present invention has been made to solve the above-mentioned problems, and an area where the analysis target data is stored as described above is released to release a part where analysis is completed. Data analysis by making it possible to use it as a region or analyzing it in a smaller or smaller size than the data volume after expansion as a data expansion region of compressed or encoded data to be analyzed It is an object of the present invention to provide a data analysis device, a data analysis method, a data analysis program, and a recording medium recording the data analysis program, which can reduce the amount of use of a storage device at the time.

Means to solve the problem A data analysis apparatus according to an aspect of the present invention generates intermediate data of analysis target data, and analyzes the intermediate data and a data expansion unit that expands the data to a predetermined data expansion area. The data expanding unit generates intermediate data in sequence using part of each of the data to be analyzed, and the data analyzing unit analyzes the generated intermediate data in sequence.

According to an aspect of the present invention, even if the data to be analyzed is data having a data structure such that the length of the data block is not specified, the data to be analyzed or the data is Of the area where intermediate data obtained by expansion is stored, the part for which analysis has been completed is released and can be used as a work area. Therefore, even if the data to be analyzed is data obtained by compressing or coding data of a data structure in which the length of the data block is not clearly specified, it is less than the data capacity after expansion as a data expansion area. The size can be analyzed, and the amount of storage used at the time of data analysis can be reduced.

Further, in the data analysis device according to the present invention, the data expansion unit preferably decodes intermediate data to be analyzed to generate intermediate data.

Further, in the data analysis apparatus according to the present invention, the encoded data to be analyzed is preferably compressed data.

Further, in the data analysis device according to the present invention, the data expansion unit specifies the size on the storage unit to secure the data expansion area, and the analysis target data is allocated according to the size of the data expansion area. It is preferable to generate intermediate data by decoding part of the data.

Further, in the data analysis apparatus according to the present invention, the data expansion unit stores the data position at which expansion of the analysis object data is completed, and the stored data position force of the analysis object data is also decoded. It is preferable to generate intermediate data.

Further, in the data analysis device according to the present invention, it is preferable that the data expansion unit determines whether it is necessary to expand the data expansion area and when it is determined that the data expansion area is expanded. .

Further, in the data analysis device according to the present invention, it is preferable that the data expansion unit sets an area adjacent to the data expansion area as a data expansion area to be expanded prior to other areas. . Further, in the data analysis device according to the present invention, the data expansion unit may set an area adjacent to the rear of the data expansion area as a data expansion area to be expanded with priority over other areas. preferable.

In addition, the data analysis device according to the present invention further includes a data block extraction unit that extracts data blocks as well as intermediate data strength, and the data analysis unit is a data block extracted by the data block extraction unit. It is preferable to analyze

Further, in the data analysis device according to the present invention, the data block extraction unit preferably stores the position of the remaining data from which the data block is extracted among the intermediate data.

Further, in the data analysis device according to the present invention, it is preferable that the data expansion unit move the remaining data to the beginning of the data expansion area according to the position of the remaining stored data. .

In the data analysis device according to the present invention, preferably, the data expansion unit generates and expands next intermediate data following the intermediate data after the remaining data in the data expansion area.

Further, in the data analysis device according to the present invention, the data block extraction unit preferably extracts the next data block of the data block from the start position of the stored remaining data.

Further, in the data analysis device according to the present invention, the data block extraction unit preferably stores the extraction status of the data block.

Further, in the data analysis device according to the present invention, the data analysis unit preferably determines a data block analysis method according to the extraction status of the stored data block.

In the data analysis device according to the present invention, the data analysis unit preferably stores the analysis status of the data block.

Further, in the data analysis device according to the present invention, the data block extraction unit preferably determines a data block extraction method according to the analysis status of the stored data block.

Further, in the data analysis device according to the present invention, the analysis target data is data in which character string data of a character code is encoded, and the data expansion unit generates data from intermediate data It is preferable to detect the code, convert the detected character code to string data, and expand the converted string data into a data expansion area.

Further, in the data analysis apparatus according to the present invention, the analysis target data is data in which character string data of a fixed-length character code is encoded, and the data expansion unit detects It is preferable to secure the data development area by specifying the fixed length of the character code and the size on the storage unit as the minimum unit.

Further, in the data analysis device according to the present invention, the analysis target data is data in which character string data of a variable-length character code is coded, and the data expansion unit detects It is preferable to convert the character code into character string data of a fixed length character code.

Further, in the data analysis device according to the present invention, it is preferable that the fixed-length character code converted by the data expansion unit is a fixed-length Unicode.

A data analysis method according to an aspect of the present invention generates intermediate data of data to be analyzed, and performs a data expansion step of expanding the data into a predetermined data expansion area, and data analysis of analyzing the intermediate data. And the data expansion step generates intermediate data sequentially by using a part of the analysis object data, and the data analysis step analyzes the generated intermediate data sequentially.

[0033] A data analysis program according to an aspect of the present invention is a program that causes a computer to execute analysis of data, and generates intermediate data of analysis target data in the computer to generate a predetermined data expansion area. The data expansion step executes the data expansion step to expand the data and the data analysis step to analyze the intermediate data, and the data expansion step sequentially generates intermediate data using a part of the analysis target data, and the data analysis step It is characterized by sequentially analyzing the generated intermediate data.

[0034] A computer readable recording medium according to one aspect of the present invention stores a data analysis program as described above.

[0035] A data analysis device according to another aspect of the present invention is a data analysis device that processes data described in a compressed markup language, comprising a storage unit, and the storage unit is a buffer memory. A data expanding unit for expanding data described in the compressed markup language into a first buffer, the data expanding unit including a first and a second cookies used as Also, a first data analysis unit that extracts data blocks from the data power expanded in the browser, and a second data analysis unit that generates an abstract syntax tree from the data blocks extracted by the first data analysis unit, and And the first data analysis unit stores the extracted data block in one continuous storage area of the second buffer, and the second data analysis unit transmits the first buffer or the second data analysis unit. An abstract syntax tree is generated using data stored in the buffer.

[0036] According to another aspect of the present invention, a first buffer of fixed capacity and a second buffer of which capacity is changed as necessary to form one continuous storage area force are used. The analysis of is done. As a result, the capacity of the first buffer can be made smaller compared to the prior art. In addition, since the second buffer is used only when a capacitively involuntary data block appears in the first buffer, the storage area of the storage unit can be efficiently used.

Further, in the data analysis device according to the present invention, the second data analysis unit generates an abstract syntax tree using an analysis result by the first data analysis unit of the extracted data block. Is preferred.

Further, in the data analysis device according to the present invention, the data expansion unit stores the data described in the compressed mark-up language in the first buffer, for each capacity less than the capacity of the first buffer. It is preferable to expand into

Further, in the data analysis device according to the present invention, the first data analysis unit performs the first phrase analysis by performing lexical analysis on the data expanded to the first template. It is preferable to extract data blocks from the data power developed in the buffer.

In the data analysis device according to the present invention, the first data analysis unit preferably performs lexical analysis in accordance with the XML grammar.

Further, in the data analysis device according to the present invention, the first data analysis unit preferably determines the capacity of the second buffer according to the XML grammar.

Further, in the data analysis device according to the present invention, the second data analysis unit preferably analyzes the data according to the SVG grammar.

Further, in the data analysis device according to the present invention, the second data analysis unit It is preferable to analyze the expanded data in the state of being expanded to the first buffer.

Further, in the data analysis device according to the present invention, the first data analysis unit preferably includes a schema language change unit that changes a schema language used for character analysis.

Further, in the data analysis device according to the present invention, it is preferable that the second data analysis unit semantically analyze the data expanded in the first buffer.

Further, in the data analysis device according to the present invention, the second data analysis unit causes the first data analysis unit to extract the block data based on the analysis result of the first data analysis unit. Preferred to be changed,.

In the data analysis device according to the present invention, the first data analysis unit may

It is preferable to reserve a second buffer in the storage unit based on the capacity of part of data of a certain data block in the data expanded into one buffer.

Further, in the data analysis device according to the present invention, the first data analysis unit preferably stores a part of the data block in the first buffer in the second buffer. .

Further, in the data analysis device according to the present invention, the first data analysis unit is configured to transmit part of the data block in the first buffer and part of the same data block in the second buffer. It is preferable to combine and store in the second buffer.

Further, in the data analysis device according to the present invention, the first data analysis unit is a part of the data block in the first buffer and a part of the same data block in the second buffer. Preferably, combine with and store in a third buffer.

It is preferable to reserve a third buffer in the storage unit based on the capacity of part of data of a certain data block in the data expanded into one buffer and the capacity of the second buffer.

A data analysis method according to another aspect of the present invention is a data analysis method in an apparatus for processing data described in a compressed markup language, which is described in a compressed markup language. Extracting the extracted data into the first buffer, extracting the data block expanded into the first buffer and extracting the data block, and extracting the extracted data And storing the data block in one continuous storage area of the second buffer, and generating the extraction syntax tree using the data of the first and second cookies. It is characterized by

A data analysis program according to another aspect of the present invention is a data analysis program for causing a device processing data described in a compressed markup language to analyze data, and a computer , Expanding the data described in the compressed markup language into the first buffer, extracting the data block expanded into the first buffer, and extracting the extracted data block And storing the data in one continuous storage area of the buffer and generating the extraction syntax tree using the data of the first or second buffer.

[0054] A computer readable recording medium according to another aspect of the present invention stores the data analysis program described above.

The above and other objects, features, aspects and advantages of the present invention will be apparent from the following detailed description of the present invention which is understood in conjunction with the accompanying drawings.

Brief description of the drawings

FIG. 1 is a view showing a specific example of the configuration of a data analysis apparatus according to a first embodiment of the present invention.

[FIG. 2] A flowchart specifically showing data analysis processing in the data analysis device of FIG.

FIG. 3 is a view showing a specific example in the case where intermediate data of analysis target data in the data analysis device of FIG. 1 is data described in SVG (Scalable Vector Graphics) language.

4] A diagram showing a specific example of status information generated by the data block extraction unit of FIG. 1. [FIG.

FIG. 5 is a diagram showing changes in the state of data expansion processing of the data expansion area in the storage unit of FIG. 1;

6 is a view showing a specific example of a DOM (Docume nt Object Model) tree generated when the data analysis unit of FIG. 1 executes data analysis.

7 is a view for explaining a method of securing a data development area in the storage unit of FIG. 1; FIG. 8 is a flow chart specifically showing contents of data block extraction processing and status information update executed in the data block extraction unit of FIG. 1;

[FIG. 9] A flow chart specifically showing the contents of data block extraction processing and status information update in the data block extraction unit of FIG. 1.

FIG. 10 is a diagram showing a specific example of intermediate data of analysis target data when the intermediate data of analysis target data in the data analysis device of FIG. 1 is data described in SVG language.

11 is a view showing a specific example of intermediate data of analysis target data when the intermediate data of analysis target data in the data analysis device of FIG. 1 is binary.

[FIG. 12] A diagram showing a specific example of the binary data structure of FIG.

[FIG. 13] A diagram showing a specific example of the binary data structure of FIG.

[FIG. 14] A diagram showing a specific example of correspondence between instruction types and data Neute numbers regarding the binary in FIG.

FIG. 15 is a view for explaining the case where, in the data analysis processing executed in the data analysis device of FIG. 1, data whose data format becomes binary as shown in FIG. 12 when expanded are expanded; .

[FIG. 16] A diagram specifically showing an example of the data structure of data to be analyzed with the data size specified.

[FIG. 17] A diagram specifically showing another example of the data structure of the data to be analyzed, the data size of which is clearly specified.

{Circle around (18)} A flowchart of data analysis processing executed in the data analysis apparatus according to the second embodiment of this invention.

圆 19] A diagram showing a relationship between data to be expanded and a data block included in the area of the expansion for use in the data analysis device according to the second embodiment of the present invention. FIG. 20 is a view schematically showing a storage state of data in the continuous buffer area for coupling of the data analysis device according to the second embodiment of the present invention.

21] A diagram schematically showing a storage state of data in a continuous buffer area for coupling of the data analysis apparatus according to the second embodiment of the present invention. FIG. 22 is a view schematically showing a storage state of data in a continuous buffer area for coupling of the data analysis system according to the second embodiment of the present invention.

FIG. 23 is a view for explaining the overall content of data analysis processing in the data analysis apparatus of the second embodiment of the present invention.

FIG. 24 is a diagram showing a modification of the configuration of the storage unit shown in FIG. 23.

Explanation of sign

1 data analysis device, 101 control unit, 102 input unit, 103 storage unit, 103A expansion buffer area, 103B continuous buffer area for connection, 103C continuous buffer area, 104 data expansion unit, 105 data block extraction 106, Data analysis unit, 107 output unit, 1001-1009 memory area, 1101, 1103, 1105, 1201 header, 1102, 1104, 1106, 1202-1204 data block.

BEST MODE FOR CARRYING OUT THE INVENTION

Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the following description, the same parts and components are denoted by the same reference numerals. Their names and functions are also the same. Therefore, their detailed description is not repeated.

First Embodiment

First, with reference to FIG. 1, the configuration of a data analysis apparatus according to the first embodiment of the present invention will be specifically described.

The data analysis device 1 is configured by a central processing unit (CPU) and configured as a control unit 101 that controls the entire device, and communicates with other devices to receive data, and a CD-ROM (Compact The control unit consists of an input unit 102 for reading data recorded on a recording medium such as a Disc-Read Only Memory) and inputting the data, a ROM (Read Only Memory), a RAM (Random Access Memory), etc. Storage unit 103 for storing the program to be executed in 101, data to be analyzed, etc., data development unit 104 for expanding data input via input unit 102 in storage unit 103, and storage unit 103. With a data block extraction unit 105 that performs lexical analysis of the extracted data, a data analysis unit 106 that performs semantic analysis based on the result of the lexical analysis in the data block extraction unit 105, a display device such as a display, and other devices. Communicate and make data Send A transmission unit and an output unit 107 such as an audio output device such as a speaker are included.

The above-described data development unit 104, data block extraction unit 105, and data analysis unit 106 may be configured by dedicated hardware such as a dedicated circuit or the like, or a storage device or In a data analysis apparatus comprising an apparatus for reading and writing information recorded in a recording medium, and an information processing apparatus, the information processing apparatus includes a specific program recorded in the storage apparatus or the recording medium. It may be configured to be realized by execution. In addition to the data input through the input unit 102, the data expansion unit 104 can also expand data stored in the storage unit 103 or the like.

The input unit 102 inputs data to be analyzed, and passes the input data to the data expansion unit 104.

The data to be analyzed by the data analysis apparatus 1 according to the present embodiment, which is input from the input unit 102, is an HTML (Hypertext Markup Language) document or XML tagged in a markup language. (Extensible Markup Language) is composed of several blocks, and each block is not necessarily completely independent. Such data or such data may be GZIP or LZH. Etc. and data converted by a code method such as Base64.

The data expansion unit 104 secures a predetermined amount of data expansion area in the storage unit 103, and expands the data expansion area as necessary.

Data expanding section 104 expands a part or all of the data to be analyzed input from input section 102 into the data expansion area secured in storage section 103 so as to be able to be analyzed and obtain intermediate data. . At this time, if the input data is compressed, it is decompressed and expanded, and if encoded, it is decoded and expanded. Also, if it is raw (unanalyzable) data that is not compressed or encoded, it may be copied and expanded as it is.

Then, in the data block extraction unit 105, a block of data that is easy to analyze (hereinafter, this data block is referred to as a data block) is extracted from the intermediate data that is the expanded data, and the data analysis unit 106 is extracted. Passed to Also, the data block extraction unit 105 generates status information indicating the extraction status of the data block in the data block extraction unit 105, and stores the status information in the storage unit 103. The data block extraction unit 105 stores the data in the storage unit 103. Data block extraction proceeds with reference to the stored status information.

In the data analysis device 1 according to the present embodiment, the data input from the input unit 102 may be passed directly to the data expansion unit 104, and subsequent analysis processing may be performed. In addition, the power stored in the storage unit 103 may also be transferred to the data development unit 104, and the subsequent analysis processing may be performed.

Further, when raw data is already stored in storage unit 103, data analysis unit 106 analyzes data as it is without executing processing in data expansion unit 104 or data block extraction unit 105. Alternatively, after the data block is extracted by the data block extraction unit 105, the input data in the storage unit 103 is used for the portion analyzed by the data analysis unit 106. The area corresponding to the analyzed part of the storage area of may be released sequentially.

The data analysis unit 106 analyzes the data block passed from the data block extraction unit 105 with reference to the status information stored in the storage unit 103, and passes the analysis result to the output unit 107. The analysis here is, for example, to generate a DOM tree when the intermediate data is data in HTML format. Then, the output unit 107 outputs the analysis result received from the data analysis unit 106.

When the data to be analyzed input from the input unit 102 is an XML document or data obtained by compressing or encoding the data, the data analysis result output from the output unit 107 is a tree of the content of the document. It is data constructed as a DOM tree expressed by a structure. The output unit 107 may either store the analysis result received from the data analysis unit 106 in the storage unit 103 or may output it directly to the outside.

Based on FIG. 2, data analysis processing in the data analysis device 1 according to the present embodiment will be described. The process shown in the flowchart of FIG. 2 is realized by the control unit 101 reading out the program stored in the storage unit 103 and controlling and executing each unit shown in FIG.

Referring to FIG. 2, first, in data expansion section 104, a part or all of the data to be analyzed input from input section 102 is expanded in the data expansion area secured in storage section 103. (S201). In step S201, part of the data to be analyzed is When expanding, how much the amount of input data is expanded from the beginning of the data expansion unit 104 may be determined depending on the size of the prepared data expansion area, or after the data expansion. If the capacity of the data can be obtained from the header information etc., it may be determined depending on the size or may be determined depending on the capacity of the input data itself. Further, the size of the data development area may be determined in advance without depending on the input data.

Next, in the data block extraction unit 105, a data block of a size suitable for analysis is extracted from the data expanded in the data expansion area (S203). At this time, status information indicating the extraction status of the data block by the data block extraction unit 105 is generated, and the force stored in the storage unit 103 or the status information stored in the storage unit 103 is updated. Then, while referring to the status information, the data block extraction unit 105 executes the extraction of the data block in step S203. The process of extracting data blocks in step S203 will be described later using a specific example.

At this time, the data block extraction unit 105 preferably stores in the storage unit 103 up to which position of the expanded data the data block has been extracted. By so doing, when the process of step S203 is repeated next, it is possible to extract a data block from the next position (at the position where the previous process was performed). The size suitable for analysis of the data block extracted in step S203 is determined depending on the unit analyzed by the data analysis unit 106. For example, when the data to be analyzed is an XML document, the unit to be analyzed by the data analysis unit 106 is a unit such as a tag and a part surrounded by tags such as a start tag, an end tag, and an empty element tag, As further detailed blocks, units such as tag names, attribute names, and attribute values that exist in tags correspond. In step S203, the data analysis unit 106 ignores tags, tag names, etc. that are not to be analyzed and discards them, and extracts data blocks for tags, tag names, etc. that require analysis. This is preferable.

Next, the data block extraction unit 105 determines whether the data block extraction has succeeded (S 205). As a result, if it is determined that the data block extraction has succeeded (YES in S205), the data analysis unit 106 analyzes the data of the data block extracted in step S203 (S207). Then, the process is returned to step S203 again. In tap S203, position force data blocks after the position where the data block was extracted in step S203 last time are extracted.

If it is determined in step S205 that the data block extraction has failed (NO in SDOM tree 205), it is determined whether there is an unexpanded portion in the data to be analyzed. It is determined (S209). In step S205, in addition to an error that occurs when the expanded data itself is incorrect data, extraction of the data block has been executed and completed to the end of the expanded data, or the data to be analyzed is If it is detected that the data of the data block is cut off because there is an unexpanded part among them, it is judged that the data block extraction has failed. If it is detected in step S205 that the data itself is correct, the process may be ended immediately.

[0077] As a result of the determination in step S209, when it is determined that there is no unexpanded data among the data to be analyzed (NO in S209), that is, the data block is extracted until the end of the data to be analyzed If it is determined in step S 205 that the extraction of the data block has failed because the analysis has been completed, this processing ends.

As a result of the determination in step S 209, when it is determined that there is unexpanded data among the data to be analyzed (YES in S 209), data of the data block is cut off halfway because there is unexpanded data. Since it is determined in step S205 that the extraction of the data block has failed, it is necessary to further expand and analyze the data. Therefore, next, the data block extraction unit 105 determines whether or not there is data that can not be discarded in the data expansion area (S211).

[0079] As a result of the determination in step S211, it can not be discarded! /, If the data is not in the data expansion area (NO in S211), that is, if all the data expanded in the data expansion area can be discarded. The data development unit 104 discards all the data in the data development area, and the process returns to step S201, and in step S201, the subsequent data developed in step S201 in the previous data expansion area is emptied. Is expanded.

[0080] As a result of the determination in step S211, the data can not be discarded! / When the data is in the data expansion area

(YES in S211), data block extraction unit 105 further displays the data in the data development area. It is determined whether all data can not be discarded or not, or some data can not be discarded (S213). If it is determined in step S213 that some of the data expanded in the data expansion area can not be discarded, the data expanded in the data expansion area is broken halfway and can not be extracted as a data block. If there is a case (NO at S213), the data expansion unit 104 discards the remaining portion except the portion that can not be extracted as the data block, and continues the portion after the remaining portion. By expanding the data, it becomes possible to extract the data block as a data block even if the data has been cut halfway. Therefore, in this case, the data development unit 104 moves the portion that can not be discarded to the top of the data development area (S215). This is a preprocessing to expand data behind data that can not be discarded.

If it is determined in step S213 that all the data expanded in the data expansion area can not be discarded (YES in S213), the size of the current data expansion area is smaller than the data block. As it is conceivable, the size of the data expansion area is expanded in the data expansion unit 104 (S217).

In step S213, it is determined whether the data can not be discarded or not, but it can not be discarded. The data exceeds 80% of the size of the data deployment area. In the case of! /, It may be determined whether or not the predetermined condition is satisfied!,.

Next, in the data expanding unit 104, subsequent to the data expanded last time is expanded after the data which can not be discarded in step S213 (S219). In step S219, the data is continuously expanded in the data expansion area, so that the data block extraction unit 105 can extract it as a block of data blocks, which is passed to the data analysis unit 106.

After that, the process returns to step S203, and the above-described process is repeated.

Hereinafter, the above-mentioned data analysis processing will be specifically described by way of specific examples of data to be analyzed.

A specific example of intermediate data obtained by expanding data to be analyzed in the data analysis device 1 according to the present embodiment is shown in FIG. The data to be analyzed in this specific example is assumed to be the intermediate data shown in FIG. In addition, here The data to be analyzed, or the intermediate data of the data, is described below as being described in the SVG language, which is a subset of the XML language. In fact, it consists of several such blocks. It is assumed that each intermediate data is not completely independent of each other, or the intermediate data is encoded data or compressed data. For example, as shown in FIG. 3, data obtained by compressing such intermediate data as LZH or GZIP without using only data described in text may be input to the data analysis device 1.

Here, basic syntax rules in the description in the SVG language will be described.

[0088] In data described in the SVG language, the start tag is expressed by a tag name sandwiched by the character "ku" and the character ">". If the start tag contains an attribute, one or more attributes in the following order: blank (space, tab, or newline), attribute name, character "," and attribute value in between the tag name and the character ">" In addition, there may be spaces between the attribute name and the character “=”, and also between the character “=” and the attribute value! / ヽ.

The end tag is expressed by a tag name sandwiched between the character “ku Z” and the character “>”. Also, an empty tag is represented by a tag name sandwiched between the character "K" and the character "Z>". If the empty tag contains an attribute, it is described between the tag name and "Z>" in the same way as the start tag.

In this specific example, for convenience, in the intermediate data to be handled, an XML declaration sentence surrounded by the character string “ku?” And the character string “?>” Or the character string “<! DOCTYPE” A DOCTYPE declaration enclosed by the character ">", a comment statement enclosed by the character string "<!-" And the character string "->", a character string "<! (CDATA []" and the character string "]] Force that assumes that there is no tag such as a CDATA section enclosed in> Even when handling intermediate data including these tags, add status information that appropriately indicates the status to the status information, and use the data block. This can be coped with by enabling the processing according to the status information in the extraction unit 105 and the data analysis unit 106.

[0091] When such data is handled, the data block extraction unit 105 sets "tag name", "attribute name 'attribute value", and "content (character between start tag and end tag)" to one. It shall be extracted as one data block.

Note that the data block extraction unit 105 generates an extraction when extracting a data block. The storage unit 103 stores status information indicating the situation. FIG. 4 shows a specific example of the details of the status information generated by the data block extraction unit 105 and stored in the storage unit 103 when the intermediate data is analyzed as data to be analyzed as shown in FIG. As shown in FIG. 4, the data block extraction unit 105 generates status information as represented by, for example, Status 1-Status 12 as status information representing the extraction status of the data block, and stores the status information in the storage unit 103. .

Status information as shown in detail in FIG. 4 may be passed to the data analysis unit 106 together with the extracted data block. In addition, the extraction status of the data block said here is any of the extracted data block power start tag name, end tag name, empty tag, content, and attribute name 'attribute value', and extraction of the data block fails. If you are extracting what kind of data block and if you are searching for the start position of the data block, etc.

The state of the data block extraction unit 105 is represented.

Further, in this specific example, it is assumed that 20 bytes for 10 characters (10 characters for 20 characters and 20 bytes for 2 Neut characters are required) are prepared in advance as a data expansion area. In addition, the powerful data expansion area shall be expanded by 10 bytes for 5 characters as needed.

The expansion size when expanding the data expansion area in the data expansion unit 104 may be a fixed size for each data analysis device, but the data size of the analysis target, the size after expansion, and the decryption It may be changed according to the later data size. For example, it may be determined to be 1/10 of the data size to be analyzed, or the ratio may be changed depending on the compression method or the coding method. Also, the size may be determined depending on the size and state of the current data expansion area, the size of unexpanded data of the data to be analyzed, and the like. For example, it may be expanded by the same amount as the current data expansion area, or may be expanded by 10% of the size of unexpanded data. Also, if the extracted data block is an "attribute", it may be expanded significantly as compared to the "tag".

The size of the data expansion area prepared first may be similarly changed.

The data size after expansion and decoding can be determined accordingly only when stored in the header. A data analysis apparatus according to the present embodiment is not limited to this. Even when the data size after expansion is unknown, prepare a data expansion area of a size determined by a predetermined size or input data size, and expand the data little by little in the area. Therefore, in the data analysis device according to the present embodiment, the storage area of the storage unit can be effectively used.

Here, with reference to FIG. 5, the change in the state of the data development area when data is expanded will be described.

In FIG. 5, as the state (A), in the above-described step S 201, the data developing unit 104 prepares a 10-character data developing area for 10 characters prepared in advance in the storage unit 103. The first force of is also shown in a state where 10 characters are expanded.

At this time, if the data block extraction unit 105 succeeds in extracting the character string “svg” from the first character string “<svg” as the data block of the start tag name in step S203—S207, the data analysis unit Pass to 106.

The data analysis unit 106 analyzes the contents with the data block “svg” passed from the data block extraction unit 105 as the start tag name, and adds the object of the svg element to the DOM tree. The data block extraction unit 105 determines that the character string following the character “<” is the start tag name and the character following the character string “<Z” is the end tag name, or the character “>” after the tag name. Alternatively, status information indicating the current status is generated and stored in the storage unit 103, such as "attribute name · attribute value" until the character string "Z>" appears. It will Then, the data block extraction unit 105 determines the type of data block to be extracted using the status information, and transmits the determination result to the data analysis unit 106.

Further, the status information stored in the storage unit 103 may be passed from the data block extraction unit 105 to the data analysis unit 106 together with the extracted data block or at a predetermined timing. For example, the second>rect> tag in Figure 3 is an empty tag with no content. Therefore, when the character string “<rect” is extracted, the status information indicating that the extracted data block is the start tag to the data analysis unit 106 from the data block extraction unit 105 via the storage unit 103 is displayed. Passed, it is determined that this tag is an empty tag when the string "Z>" is extracted. In this case, the data block extraction unit 105 passes the character string “Z>” as a data block to the data analysis unit 106 and sets the data analysis unit 106 as an empty tag. Alternatively, the data block extraction unit 105 may pass status information to the data analysis unit 106 that the previous data block is the content of the empty tag.

Also, the data block which is the start tag name is passed from the data block extraction unit 105 to the data analysis unit 106, and the data block extraction unit 105 is fed back according to the analysis result in the data analysis unit 106. Thus, the status information managed by the data block extraction unit 105 may be changed. For example, the extraction method of the data block of the attribute in the data block extraction unit 105 may be changed based on the status information, particularly when rect> tag and te xt> tag.

As shown as state (A) in FIG. 5, after the data is expanded and data block data “svg” which is a start tag name is passed from block extraction unit 105 to data analysis unit 106. The block extraction unit 105 enters a search start state of "attribute name · attribute value" following the start tag name. In the above-described steps S203 to S207, since the space following the character string "k svg" is searched, the next character becomes the character "w". Therefore, the block extraction unit 105 is in a search-in-progress state of "attribute name · attribute value". If the character "Z" or the character ">" appears as the next character, the data analysis unit 106 processes this svg element as the end of the empty tag or the start tag.

Furthermore, when the block extraction unit 105 advances the search, the data expansion area ends behind the character string “width”, and the block extraction unit 105 is in the “attribute name 'attribute value” searching state and the data is Since the end of the data development area has been reached without completion of the block extraction, it is determined in step S205 that the data block extraction has failed (NO in S205).

) o

Furthermore, it is determined that there is still unexpanded data, and is passed to the data analysis unit 106 in the data expansion area, and can not be discarded. A character string “width” that is a part of data. Because it is determined that there is a (YES at S209, YES at S211, and S213 at NO), at step S215, as shown as state (B) in FIG. The column "width" is moved (copied) to the beginning of the data expansion area. Furthermore, in step S219, as shown as state (C) in FIG. The string "=" 240 ", which is the subsequent data, is expanded.

As described above, it is preferable that the data developing unit 104 store in the storage unit 103 how far the data has been expanded in the data expanding unit 104 since the intermediate power of the data to be analyzed is also expanded in step S219. Also, in order to move (copy) data that can not be discarded in step S215, the data block extraction unit 105 counts the length from the start position of the data that can not be discarded or the start position of the data to the current search location. Preferably, the storage unit 103 stores the position and length of the data to be discarded. Furthermore, after the data that can not be discarded in step S215 is moved (copied), the storage unit 1 in the data block extraction unit 105

The data expansion unit 104 stores the position and length of the data to be discarded by counting the length from the start position of the data to the location currently being searched for, or the start position of the non-discardable data stored in 03. It is preferable to be updated.

When the block extraction unit 105 proceeds with the search, the data development area ends after the character string “width =“ 240 ”, and the block extraction unit 105 searches for“ attribute name 'attribute value ”. Since extraction of the data block has not been completed and the end of the data development area has been reached, it is determined in step S205 that extraction of the data block has failed (NO in S205).

At this time, it is determined that there is still unexpanded data, and it is determined that all data in the data expansion area can not be discarded (YES in S209-S213). Part 104 expands the data expansion area by five characters as shown as state (D) in FIG. Then, in step S219, the data expanding unit 104 expands the data for the expanded data expanding area as shown in FIG. 5 as state (E), and again in steps S203 to S207, the data block extracting unit 105. The data block “width =“ 240 ”” extracted in is analyzed as “attribute name + attribute value” in the data analysis unit 106, and the analysis result is reflected in the DOM tree. Likewise, the data is analyzed sequentially for each data block until unexplored data in the analysis target data disappears, that is, until the end of the analysis target data.

An example of the DOM tree analyzed and generated by the data analysis unit 106 in this way is shown in FIG. Figure 6 shows the intermediate data generated when the data shown in Figure 3 is analyzed. Show a concrete example of the DOM tree created!

[0111] Referring to FIG. 6, the DOM tree includes _SV g element (element), rect element, text element, and svg element and rect element, svg element and text element have a parent-child relationship. I am being scolded. Also, it is shown that content exists in the text element. Furthermore, in each element, width attribute, height attribute, viewBox attribute in svg element, width attribute, height attribute, style attribute in rect element, x attribute and y attribute in text element are represented. ing. The output unit 107 can easily extract the parent-child relationship or sibling relationship between such elements and content, the attribute included in each element, and the attribute value thereof in the data output from the data analysis device 1. Output possible data as an analysis result.

It should be noted that the condition that the data development area is expanded in step S 217 is determined above as being expanded in the data development area in step S 213 and all data can not be discarded. For example, as described above, in step S213, 80% or more of the data expanded in the data expansion area is determined to be data that can not be discarded. It is also good. Furthermore, depending on whether the type of data block currently being extracted is a tag name or an attribute, the condition for whether the data expansion area is expanded when the percentage of data that can not be discarded becomes greater changes. May be In addition, the condition may be changed depending on whether the tag name, attribute, or content is currently extracted.

When the data expansion unit 104 prepares a data expansion area expanded in an area completely different from the current data expansion area of the storage unit 103 by the data expansion unit 104 in step S 217, the data that can not be discarded in step S 215. The processing time will be longer when moving (copying) to a new data development area. Further, even if the data development area currently in use is released after copying the data that can not be discarded after copying the new data development area, the usage of the storage unit 103 is temporarily doubled during the copy processing. Therefore, when the data expansion area is expanded at step S217, it is preferable that the area immediately after the current data expansion area be expanded.

[0114] Alternatively, if the area immediately after the current data expansion area can not be expanded, the current data Preferably, the area immediately before the data development area is expanded. By doing this, although the predetermined area of the storage unit 103 is used for copy processing, the storage unit is more than when the expanded data development area is prepared to an area which is significantly different from the current data development area. The amount of 103 used can be reduced.

Here, with reference to FIG. 7, a method of securing the data development area by the data expansion unit 104 in the storage unit 103 will be described in detail. In FIG. 7, the truncated block represents the memory area in use, and the truncated block represents the memory area unused.

When data blocks are extracted by the data block extraction unit 105 or when data analysis is performed by the data analysis unit 106, usually, the first memory address and its length, or the last memory address is passed. Therefore, a continuous memory area must be secured in the storage unit 103 as a data expansion area.

Now, when memory area 1006 on storage unit 103 is secured as a data expansion area, and the data expansion area is further expanded by the same amount in step S 217, memory which is the data expansion area by data expansion unit 104. The area after the memory area adjacent to the area 1006 is searched. If the unused memory area adjacent to the memory area 1006 which is the data expansion area can be secured by the data expansion area extension, the data expansion area is expanded behind the memory area 1006 in step S217. Ru. That is, the memory area 1007 is searched and the size thereof is searched if it is unused, and the memory area 1007 is secured as an expanded data expansion area if the size is sufficient compared with the size to be expanded. If not enough, the memory area 1008 is further searched, and if the size of the memory area 1007 and the size of the memory area 1008 are not used and the size required for the expansion is sufficient, those memory areas are expanded. It is secured as a designated data development area. In this way, the memory sizes up to the memory area in use are sequentially searched and compared with the size required for expansion.

Furthermore, if it is not possible to reserve only the size required for the expansion of the free memory area behind the memory area currently reserved as the data expansion area, the memory area in use is similarly used in the forward direction. The memory size up to the memory area is searched in order, and the size required for expansion Compared with. For example, when memory area 1003 is secured as a data expansion area, and the data expansion area is further expanded in step S217, memory area 1004 after memory area 1003 is in use, so memory area 1002 is searched. If the size required for expansion is sufficient, the memory area 1002 is secured as the expanded data expansion area.

In addition, in the memory area on one side forward or backward of the memory area secured as the data expansion area, if the size required for expansion is insufficient, the size obtained by adding both the forward and backward empty memory areas If the size is sufficient for the expansion, the memory area in both directions is secured as the expanded data expansion area.

When the data development area is expanded as described above, it is preferable that a continuous memory area is secured as an expanded data expansion area. This is because, as described above, the amount of use of the storage unit 103 can be reduced compared to the case where a data expansion area expanded to an area completely different from the current data expansion area is prepared. That is, if the size of the current data expansion area is 00 bytes and 200 bytes are further expanded, if it is expanded to a memory area adjacent to the data expansion area, the size is adjusted to the current data expansion area. It is sufficient to reserve 600 bytes, but if the expanded data expansion area is secured elsewhere, it is newly added until data that can not be discarded is copied to the newly secured memory area. A total of 1000 bytes of memory is required, including the reserved 600 bytes of memory and the currently used 400 bytes of memory.

Also, if all the data expanded in the current data expansion area can not be discarded, the data expansion area is expanded behind the data expansion area to expand the subsequent data, It is not necessary to shift the data to the beginning of the data expansion area, which can not be discarded. If the data expansion area is expanded ahead of the data expansion area or if the expanded data expansion area is secured in a completely different memory area, copy the data that can not be discarded to the beginning of the new data expansion area. The processing time is longer than in the former case because processing is required.

Furthermore, with reference to FIGS. 8 and 9, the extraction process of the data block which is the tab name of the element by the data block extraction unit 105 and the update of the status information Explain it physically.

The process shown in the flowchart in FIG. 8 refers to the status information and refers to the data block when the extraction status of the data block in the current data block extraction unit 105 is “Status 1” shown in FIG. In the process performed by the extraction unit 105, the process is performed until the character “K” that is the start of the element is extracted. As described above, the extraction status represented by "Statusl" is initially set if the data to be analyzed (or its intermediate data) is an SVG document such as an XML declaration, DOCTY PE declaration or comment statement. It is the extraction situation. Also, in the processing when the extraction status is "Status 7" and "Status 8", the status information is "Status 1", for example, when the character "<" appears during the data block extraction of the element content (content). It is set.

Referring to FIG. 8, in this case, the data block extraction unit 105 first determines the character at the reading start position, and the character to be read is another character that is neither a character nor a space (space). In the case (NO in S401 and NO in S403), this processing ends and an error is returned as the extraction result.

If the target character is a space, the next character is determined (YES in S403, 5405), and if the character is the character “<” (YES in S401), the next character is It is determined. Then, if the character following the character "<" is not the character "Z" (NO in S409), "to detect the data block start position and end position of the extraction status start tag name" The status information is updated to “Status 2”. Then, the extraction process of the data pattern in the extraction situation is continued.

On the other hand, if the character following the character “<” is the character “Z” (YES in S409), the extraction status is to detect the data block start position and end position of the end tag name. Migrate to "Stat usl2" and update the status information. Then, the extraction process of the data block in the extraction situation is continued.

Note that if it is determined in step S405 or S407 that the next character is determined and the end of the data expanded in the data expansion area is exceeded, the main processing shown in FIG. In step S205, it is determined that the extraction of the data block has failed. Then, in step S211, the data development area It is determined that data can not be discarded, but data is expanded to a data expansion area, and all data can be discarded.

As described above, processing executed when the extraction status of the data block in the data block extraction unit 105 becomes “Status 2” will be described using FIG.

Referring to FIG. 9, first, the first position of the tag name (the start position of the data block) is stored (S601), and if the following character is a character to be used as the tag name, then The character is determined (YES in S603, S605). Then, when the character is no longer used as the character name tag name (NO at S603), the position is stored as the last position of the tag name (the end position of the data block) (S607).

If the character at that position is blank (space) (YES at S 609), the extraction status is subsequently transferred to “Status 3” for detecting the attribute name, and the status information is Information is updated. Then, the extraction process of the data block in the extraction situation is continued. If the character at that position is the character ">" (NO in S609, YES in S611), the extraction status is "Status 7" for subsequently detecting the content of the element (content). And the status information is updated. Then, the extraction process of the data pattern in the extraction situation is continued. If the character at that position is the character “Z” (NO in S609, NO in S611, and YES in S613), the extraction status is “Statusl l” for detecting the end of the tag. And the status information is updated. Then, the extraction process of the data block in the extraction situation is continued. Also, if the character at that position is other than that (S609-S613 NO), this processing is ended and an error is returned as the extraction result.

As described above, when the status information is updated and the extraction status in data block extraction section 105 changes from “Status 2” to “Status 3” or “Status 2” to “Status 7”, the data block Preferably, information indicating the first position and the last position of the tag name or information indicating the length of the tag name is passed from the extraction unit 105 to the data analysis unit 106. By doing this, the data analysis unit 106 can analyze data blocks including tag names.

In step S 605, when the next character is to be determined, in the data development area, If it exceeds the end of the expanded data, the interrupt processing returns the processing to the processing shown in FIG. 2 which is the main routine, and it is determined in step S205 that the data block extraction has failed. Then, in step S211, it is determined that all data that can not be discarded in the data expansion area can be discarded! /, That is, expanded in the data expansion area.

In this manner, in data block extraction unit 105, while the status information is updated every moment according to the extraction status of the data block, the intermediate data force expanded in the data expansion area based on the status information The block extraction processing is advanced, and the extracted data block is delivered to the data analysis unit 106. Also, together with the data block (or at a timing different from the timing when the data block is passed), information on the data block is passed from the data block extraction unit 105 to the data analysis unit 106.

When the current extraction status is “Status 7” and extraction of a data block which is content (“content”) present in the third line of FIG. 3 is started, a character string is included in the content. In the extraction processing when the extraction status is “Status 8” (during extraction of the data block of “content”), the data block extraction unit 105 extracts up to the end of the content as one data block. Instead, it may be extracted as a data block in units of one character and passed to the data analysis unit 106. Also, in the extraction processing when the extraction status is “Status 8”, when the next character is determined and the end of the data expanded in the data expansion area is exceeded, the data block extraction unit 105 determines that state. To the data analysis unit 106, and after the subsequent intermediate data is expanded, extraction of the data block is continued, and the data block previously transferred to the data analysis unit 106 is As a continuation of, let 's pass an extracted data block.

Furthermore, as another concrete example of intermediate data described in SVG language, which is a subset of XML language, data analysis device 1 analyzes data by extracting data blocks as shown in FIG. The case will be described.

In the intermediate data shown in FIG. 10, the image element is included in the second line and the fourth line, and the image data encoded with the base64 in the xlin k: href attribute is described as an attribute value. . In such a case, the data analysis unit 106 resolves the attribute name of the xlink: href attribute of the image element. Next, the data block extraction unit 105 specifies the data block extraction unit 105 to pass the data block of the attribute value in units of, for example, 20 characters. You may be able to change the behavior such as how to pass. Encoding method base 64 is an encoding method that can start conversion without having all the encoded data complete. Therefore, if the data analysis unit 106 can analyze the character string “data :; base64,” at the beginning of the data block and detect that the encoding method of the image data is base64, such small data It is possible to receive block units from the data block extraction unit 105 and execute analysis.

The above-described specific example is an example when the data analysis apparatus 1 is data obtained by coding the character string data of the data character code of the analysis target. The character code is the international character set defined as ISOZ IEC 10646-1: UCS (Universal multi-octet coded Character Set)-2 or UCS-4, the conversion method UTF (UCS Transform Format-8) conversion Character codes, etc.

[0138] UCS-2 and UCS-4 are character codes in which one character is represented by a 2-byte and 4-byte fixed-length byte string, respectively. When the intermediate data is data consisting of a character code represented by such a fixed-length byte string, when the data to be analyzed is expanded into intermediate data by the data expansion unit 104, the data expansion unit 104 The character code is detected from the intermediate data, and the detected character code is converted into a byte string of a predetermined fixed length.

On the other hand, the character code converted by UTF-8 is a character code represented by a variable-length byte string of 1 to 16 bytes. If the data block extraction unit 105 and the data analysis unit 106 treat data as a variable-length character code such as a character code converted to UTF-8, the character code may only be read by one character. However, processing becomes difficult because the number of bytes is different. Therefore, when the data expansion unit 104 expands data to be analyzed into intermediate data, if the intermediate data is data consisting of such a variable-length character code, the data expansion unit 104 also converts the intermediate data power. It is preferable to detect the character code and convert the detected character code to a fixed-length byte string.

Furthermore, at this time, it is preferable that the data development unit 104 determines the size of the data development area as the minimum length of the data development area as the fixed length of the byte string for converting the character code . Also, if there is an unexpanded portion in the data to be analyzed, the data expansion unit 104 detects that the intermediate data has a character code less than one character data, a character code less than the one character data. It is preferable to move to the beginning of the data expansion area, expand the continuation of the data to be analyzed, and convert these character codes into fixed-length byte strings as one-character data. Also, one character of data may be regarded as one data block. In this case, the data analysis device analyzes data in units of one character and changes the current analysis state.

In the above-described specific example, as shown in FIG. 3 and FIG. 10, the encoded data whose intermediate data is data described in the SVG language, which is a subset of the XML language, is an analysis target Although data is used as the data, the intermediate data may be binary as shown in FIG. Even in this case, in the data analysis device according to the present embodiment, the analysis processing can be performed as in the case where the data to be analyzed is text data. Therefore, the case of analyzing data in which the intermediate data is binary as shown in FIG. 11 by the data analysis device according to the present embodiment will be specifically described below.

Here, specifically, it is assumed that the binary which is intermediate data is an instruction which is a data structure as shown in FIG. 12 or an instruction which is a data structure as shown in FIG. 13 and FIG. .

The instruction, which is a data structure shown in FIG. 12, is the first 1 byte of the instruction, the number of bytes of data attached to this instruction is stored, and the attached data follow.

[0144] An instruction that is a data structure shown in FIG. 13 is an instruction whose first byte is an instruction followed by data attached to this instruction, but as shown in FIG. The number of bytes of data is predetermined. For example, it is determined in advance that 5 bytes of data must always be attached to the “0 × 12” instruction, and that “0 × 12” instructions must always be attached with 6 bytes of data.

First, the case of analyzing data that will be expanded into the form of FIG. 12 will be described with reference to FIG.

In FIG. 15, as the state (A), the data processing unit 104 is used in step S 201 described above. The first data force of 5 bytes is shown expanded.

At this time, since the beginning of the data is “O X 12 O X 04”, it is determined by the data block extraction unit 105 that 4-byte data is to be attached after this. Therefore, since the data block extraction unit 105 has only 3 bytes of data following “0 X 12 0 X 04” currently expanded in the data expansion area, the data block extraction unit 105 transmits data to the data expansion unit 104 in steps S203 to S207. Request to further expand the data.

It should be noted that, since the data block currently to be extracted by data block extraction section 105 is from the first 1 byte of data, it can not be discarded from the first 1 byte in the data expansion area. It is judged (S209--YES in S213). Therefore, in step S217, as shown in state (B) in FIG. 15, the data expansion unit 104 expands the data expansion area by 5 bytes.

At step S 219, as shown as state (C) in FIG. 15, the data expanding unit 104 expands subsequent data after the already expanded data in the expanded data expansion area. . Then, in step S205, the data block extraction unit 105 succeeds in extracting six bytes as a data block at the head force, and is passed to the data analysis unit 106 in step S207.

Next, in step S203, the data block extraction unit 105 extracts data blocks from the seventh byte onwards. Since the first and seventh bytes of the data are “0 × 230 × 05”, the data block extraction unit 105 determines that 5 bytes of data are to be attached. However, since the data is cut off by only 2 bytes after this, it is determined that the 7th and subsequent bytes in the data expansion area are data that can not be discarded (YES in S209, YES in S211, NO in S213). ). Therefore, as shown in FIG. 15 as state (D) in step S215, the 7th-10th byte data is moved to the beginning of the data development area, and in step S219 the state in FIG. Subsequent data are expanded as indicated by (E).

In step S203, the data block extraction unit 105 extracts 7 bytes from the top of the data development area as a data block, and is passed to the data analysis unit 106 in step S207. In step S207, the data analysis unit 106 receives the data from the data block extraction unit 105. The instruction and attached data are analyzed from the data block, and the contents are constructed as output data.

Next, the case of analyzing data which is expanded into the form of FIGS. 13 and 14 will be described with reference to FIG.

Similarly, FIG. 15 shows, as state (A), a state in which the initial power of the data is also expanded by 5 bytes by the data expansion unit 104 in the above-mentioned step S201. In state (A), since the beginning of the data is “0 × 12”, it is determined by the data block extraction unit 105 that 5 bytes of data are to be attached based on the rules shown in FIG. It is done. Therefore, the data block extraction unit 105 further expands the data in the data expansion unit 104 in steps S203 to S207 because there is only a data strength byte following “0 × 12” expanded in the data expansion area at present. To request.

It should be noted that, since the data block currently being extracted by data block extraction section 105 is from the first 1 byte of data, it can not be discarded from the first 1 byte in the data expansion area. It is judged (S209--YES in S213). Therefore, in step S217, the data development area is expanded by 5 bytes as shown as state (B) in FIG.

[0155] In step S219, as shown as state (C) in FIG. 15, the data expanding unit 104 expands the subsequent data after the already expanded data in the expanded data expanding area. . Then, in step S205, the data block extraction unit 105 succeeds in extracting six bytes as a data block at the head force, and is passed to the data analysis unit 106 in step S207.

Next, in step S203, the data block extraction unit 105 extracts data blocks from the seventh byte onwards. Since the first byte of the data is “0 × 23”, it is determined that 6 bytes of data are to be added, based on the rules shown in FIG. After that, since the data is cut off with only 3 bytes, it is judged that the 7th and subsequent bytes in the data development area are data that can not be discarded (YES in S209, Y ES in S211). No in S213). Therefore, in step S215, as shown in state (D) in FIG. 15, the 7th to 10th bytes of data are moved to the top of the data development area, and step S2 At 19, the following data is expanded, as shown as state (E) in FIG.

In step S203, the data block extraction unit 105 extracts 7 bytes from the top of the data development area as a data block, and is passed to the data analysis unit 106 in step S207.

By executing the data analysis process described mainly with reference to FIG. 15 in the data analysis apparatus 1 according to the present embodiment, the data to be analyzed has a clear data block length. Even in the case of data having a data structure that is not shown, in the area where the data to be analyzed or the intermediate data obtained by expanding the data is stored, the portion for which analysis is completed is released. Can be used as a work area. Therefore, even if the data to be analyzed is data in which the length of the data block is explicitly specified, the data of the data structure is compressed! /, The data after coding is expanded as a data expansion area. Analysis can be performed in a smaller size than the data capacity, and the amount of storage used at the time of data analysis can be reduced.

Second Embodiment

The basic configuration of the data analysis device of the present embodiment can be the same as the configuration of the data analysis device described with reference to FIG. 1 and the like in the first embodiment. Therefore, hereinafter, data analysis device 1 shown in FIG. 1 will be described as a data analysis device of the present embodiment.

[0160] The data analysis device of the present embodiment decompresses compressed markup language data (for example, compressed SVG data (SVGZ data)), and internal data structure of the decompressed data. It is characterized by the aspect of processing (data analysis processing) for analyzing

FIG. 18 is a flowchart of data analysis processing performed in the data analysis device of the present embodiment. Referring to FIG. _18, the operation of the data analysis device of the present embodiment. The process shown in the flowchart of FIG. 18 is realized, for example, by the control unit 101 reading and executing a program stored in the storage unit 103 and controlling each unit shown in FIG.

In the data analysis process according to the present embodiment, first, data is input via input unit 102. The data is expanded in the data expansion area (the expansion buffer area 103A in FIG. 23, which will be described later) secured as a continuous buffer in the storage unit 103 by the SVGZ data force data expansion unit 104 (S2001). Note that in the process of S2001, the data decompression unit 104 decompresses SVGZ data of a capacity according to the size of the above-mentioned buffer area for decompression. Specifically, for example, in the case where 100 K bytes of SVG data can be obtained by completely expanding 10 K bytes of SVGZ data, it is conditional that 100 K bytes of capacity of the expansion knock area is secured. Extract 10K bytes of SVGZ data at once. In the same case, if the capacity of the development knocker area is 50 Kbytes and the force is not secured, the data expansion unit 104 sets the capacity of SVGZ data to be expanded at one time to 5 Kbytes.

Next, the data block extraction unit 105 performs lexical analysis of the expanded data, and detects a data block determined according to the XML grammar (S2003). The SVG language follows the XML grammar because it is a subset of the XML language. And, “detection of data block according to XML grammar” in S2003 means “white space (white space) consisting of one or more space characters, carriage return, line feed, and tab. e) ”or a method of detecting the beginning or end of a data block by the appearance of a markup delimiter. Furthermore, the beginning or the end of the block may be detected by the occurrence of an equal sign, a single quotation mark, or a double quotation mark.

If the lexical analysis proceeds to the end of the data development area in the middle of the data block by the processing in S2003, the data block extraction unit 105 sets a flag or the like, and the end of the data block. Indicates that does not exist in the deployment area at that time. Then, the subsequent data processing in the data block is expanded in the expansion buffer area by the data expansion unit 104 in the processing of the next S2001, and the data block extraction unit 105 is expanded in the processing of the next S2003. The presence or absence of data constituting the end (last) of the last disconnected data block in the buffer area for expansion is detected last time.

Also, the data block extraction unit 105 may be configured to analyze data blocks in accordance with grammar rules described in a schema language such as XML Schema and Relax. For example, a part surrounded by element A can contain two elements B. If there is a limit, the number of element B present in the part surrounded by element A is fed back based on the analysis result from data analysis unit 106, and the third element B attribute name, attribute value, and content are data Do not perform block extraction.

Further, the data block extraction unit 105 can analyze the data block according to a plurality of schema languages, and is used for analysis of the data block based on the type of the data block, the setting of the user, etc. You can also change the schema language.

Then, the data block extraction unit 105 notifies the data analysis unit 106 of the type of data block (S2004). For example, the information obtained by XML analysis is notified, such as the content of the detected data block is the start tag name or attribute name. Specific examples of the process actually performed as the notification in this case include setting a flag that causes the start of the process for semantic analysis in the data analysis unit 106, and the like. Further, in the notification here, the status information is used as described in the first embodiment using FIG.

Next, the data block extraction unit 105 notifies the control unit 101 whether the extracted data block is in the above-described expansion buffer area from the beginning to the end (S2005).

Then, when the first force of the data block detected in S2005 is also in the buffer area for expansion until the end, the data analysis unit 106 performs semantic analysis of the data with respect to the data block in the expansion area (see FIG. S2007). Note that the semantic analysis of S 2007 is performed on the data block as SVG, for example, whereby the generation of the DOM tree and the transition of the current analysis state are performed.

Next, based on the data analysis result in S2007, the data analysis unit 106 determines the type and unit (size) of the data block at the time of detection of the next data block by the data block extraction unit 105. To do (S2008). For example, if a comment is not extracted as a data block, data will be read and discarded until the end of the comment "->" appears if the comment start "<!-" Appears. Is possible. Conversely, when extracting a comment as a data block, you may treat the character string enclosed by "!--" And "-->" as one data block, but treat it as a data block in one-character units. Well! ,.

[0171] In the processing of data block type and unit in S2008, for example, the start of a comment If a "!" Appears, the data will be read out until the data block "->" which is the end of the comment is detected. In addition, since the contents enclosed by the start tag and the end tag of text contain characters to be displayed as text information, it is desirable to use one character unit as a data block. If the result of data analysis in the previous data block extraction unit 105 includes the xlink: href of the image element and the attribute name, the data block corresponding to the analysis corresponds to the attribute name. Between the '' at the beginning of the attribute value and the '' representing the end, the case of the file name and data encoded with base 64 etc. are embedded as the text as the attribute value. is there. In such a case, the data analysis unit 106 performs semantic analysis on data of a size sufficient to determine whether it is a file name or data. Specifically, first, for example, only 100 bytes may be analyzed semantically, and only when encoded in base 64, it may be analyzed and decoded in units of one character.

Next, it is checked whether the data block extraction unit 105 has performed lexical analysis to the end of the buffer area for expansion in order to detect data blocks (S2009), and the lexical analysis has not been performed to the end yet. The processing is returned to S2003. On the other hand, if the lexical analysis has been performed to the end, it is checked whether there is unexpanded data among the data input from the input unit 102 (S2017), and if there is unexpanded data, Processing power is returned to S2001. If there are no unexpanded data, the data analysis process ends.

On the other hand, if it is determined in S2005 that only a part of the detected data block is present in the buffer area for expansion, data block extraction unit 105 is expanded in the current area for expansion. The copied data is copied to the continuous buffer area for connection (the continuous buffer area 103B for connection shown in FIG. 23, which will be described later) (S2011). The coupling continuous buffer area is a storage area defined in the storage unit 103, and is a continuous storage area defined separately from the above-mentioned expansion buffer area.

The capacity of the continuous buffer area for combination for extracting the next data block of the data block from the start position of the stored remaining data is, for example, according to the character analysis mode of the data block extraction unit 105. Be determined. That is, for example, when the data block extraction unit 105 performs character analysis according to the XML grammar, Will be decided. For example, in XML grammar, element names must always be blank (including line feed) or ">" or "Z" immediately after "ku", such as "ku text""kutext>""ku text Z>" When it appears, there is a rule, and until it appears, it is decided to be an element name, and so on. If the data block is an element name, the capacity of the contiguous buffer area is 50 bytes, and if the attribute name is 100 bytes, the capacity of the contiguous buffer area can not be sufficient. For example, the capacity of the continuous buffer area for coupling may be increased as needed at any time.

When only a part of the detected data block is in the deployment buffer area, it is interrupted at the first part of the data block expansion buffer area (state 1) force last part If it is in the interrupted state (state 2), or if it is in the broken state (state 3) at both the first and the last.

[0176] In the case of state 2, that is, if the first part of the data block exists but the last part is interrupted in the buffer area for expansion, the data block extraction unit 105 Of the above, just copy the part that exists in the buffer area for expansion to the continuous buffer area for connection.

[0177] In the case of state 3, that is, in the case of interruption at the beginning and end of the data block, data existing before the interruption portion is a continuous buffer for coupling that has already been processed so far. Since the data is copied to the area, the data block extraction unit 105 joins the interrupted portions by copying the data currently in the buffer area for expansion after the data already copied.

[0178] The data block extraction unit 105 determines whether or not the capacity of the continuous buffer area for coupling is sufficient for joining data as described above, and if it is determined that it is small, the size is sufficiently large. A continuous buffer area for combining data may be separately secured in the storage unit 103 as shown as a continuous buffer area 103C in FIG. In this case, the data block extraction unit 105 joins data in the newly secured connection continuous buffer area 103C. In this case, the newly reserved continuous buffer area 103C constitutes the third buffer of the present invention.

[0179] Further, the copy of the data in S2011 is the data block extraction unit in S2003. When the end of the data block or the end of the buffer area for expansion is detected by 105, it may be performed in block units, or data block in smaller units, such as 1-character units or 1-byte units. May be performed sequentially until the end of the can be detected

Then, after the process of S2011, the data block extraction unit 105 checks whether the data from the beginning to the end of the data block exist in the continuous buffer area for connection (S2 013), and the data block If there is only a part of, proceed to S2009. On the other hand, if all exist, the data analysis unit 106 semantically analyzes the data block present in the continuous buffer area for coupling (S2015), and advances the process to S2008.

If it is determined that only part of the data block data is present in the continuous buffer area for coupling, the process proceeds to S2009, and detection of the data block is performed until the end of the expansion buffer area. Therefore, when processing proceeds from S 2009 to S 2017, it should always be judged that there is unexpanded data. From this, in the data analysis device, when it is determined that only part of the data of the data block is present in the continuous buffer area for coupling (when NO in S2013), unexpanded data is not present in S2017. If it is determined that (S2017 NO), error processing is performed. The error process is, for example, a process in which a process related to analysis of data is stopped and a notification that an error has occurred is issued.

The relationship between the data to be expanded and the data block contained in the buffer area for expansion will be described with reference to FIG.

In FIG. 19, the buffer area force areas 2104 2105 2106 2107 for expansion are shown. These may be expanded into different buffer areas, but may be expanded into the same storage area as an expansion noffer area at different timings. That is, for example, after data is expanded in area 2104 and processing such as analysis of the data expanded in area 2104 is completed, if data is expanded next in the same area, the same area strength area 2105 is generated. It is assumed. Also, in the case of the area 2106, for example, if the same area as the area 2105 and data is overwritten to the area designated as the area 2105, the same area after being overwritten is regarded as the area 2106. Also for area 2107, For example, if data is overwritten in the same area as the area 2106 and the area 2106, the same area after being overwritten is made the area 2107.

Further, in FIG. 19, the expanded data 2100 is assumed to include three data blocks such as data blocks 2101, 2102, and 2103. The data that make up each data block is hatched differently for convenience. Then, FIG. 19 is expanded to the area 2104 in the first S2001, and is expanded to the area 2105 in the second S2001, and the area 2106 in the third S2001. It is shown that it is expanded to the area 2107 in the fourth S2001.

Referring to FIG. 19, in the first S 2001, the entire data block 2101 and a part of the data block 2102 are expanded in the area 2104 which is a buffer area for expansion. Region 2104 contains the beginning of data block 2102.

In the second S2001, a part of the data block 2 102 is expanded in the area 2105 which is a buffer area for expansion. Region 2105 includes the middle portion of data block 2102.

At the third S2001, the end portion of the data block 2 102 and the first half of the data block 2103 are expanded in the area 2106 which is a buffer area for expansion.

Then, in the fourth S2001, the second half of the data block 2103 is expanded in the area 2107 which is a buffer area for expansion.

Data block 2101 has its first and last data expanded in its buffer area for expansion by one expansion process. For this reason, with respect to the data block 2101, after the processing of S2005, processing power is advanced as in S2007, S2008, S2009! In the area 2104 (buffer area for expansion), even if the lexical analysis is performed to the end of the data block 2101, the lexical analysis is not performed to the end of the area 2104. It is returned to S2003.

[0190] As to data block 2102, only the head of the data block is expanded by the first expansion of data, as shown as area 2104. From this, data block 2102! /! S ブロック S2005 ら et al. Processing proceeds to S2011, and the portion developed in area 2104 is copied to the continuous buffer area for coupling. . FIG. 20 to FIG. 22 schematically show the storage state of data in the continuous buffer area for coupling. In FIG. 20-FIG. 22, each is shown as a continuous buffer area force area 2300 for coupling, and part or all of the data block 2102 is copied to / from areas 2300, 2301 and 2302. Forced, shown to be applied. In FIGS. 20 to 22, the buffer area force secured at least at that time is represented by a solid line. The areas 2300, 2301 and 2302 may have the same position and capacity of the buffer area, or may be different buffer areas having different positions and capacities.

The data block 2102 in which only the top portion is expanded in the area 2104 which is a buffer area for expansion is, as shown in FIG. It is copied to the area. When analysis of all the data of the data block 2101 in S2007 and copying of the data of the portion expanded in the area 2104 of the data block 2102 to the area 2300 in S2011 are completed, the area 2104 is performed. The data may be overwritten as the area 2105. Note that the area 2105 can also be defined in an area in the storage unit 103 that is different from the area set as the area 2104. However, if the same storage unit 103 is also used for the construction of a DOM tree of data analysis results and data analysis, it may be difficult in some cases to secure a continuous knocker in the storage unit 103. It is desirable that the data expansion unit 104 use the same continuous buffer in the storage unit 103 as an expansion buffer area until analysis of all data input through the input unit 102 is completed.

Then, the data block 2102 partially expanded in the expansion buffer area 2105 is the data of the portion expanded in the area 2105 by the processing of S2011 executed again. As shown at 21, the data is copied to the area 2300. At this time, in the area 2300, the data at the beginning of the data block 2102, which has been copied first, and the area developed in the area 2105 are joined.

Then, the data block 2102 whose rear end is expanded in the area 2106 which is a buffer area for expansion is the data of the part expanded in the area 2106 by the processing of S2011 executed again. It is copied to area 2300 as shown in FIG. At this time, area 23 At 00, the data at the head and middle of the data block 2102, which was previously copied, and the rear end developed in the area 2106 are joined.

In the present embodiment described above, the data block extraction unit 105 notifies the data analysis unit 106 of the type and the like of the data block in S2004.

Note that, as processing for notification here, the data block extraction unit 105 transmits a start tag, an end tag, an empty tag, an attribute name, an attribute value, and a content to the data block according to the XML grammar. (Contents sandwiched by tags), comments, etc. can be detected by lexical analysis. Then, the data block extraction unit 105 notifies the data analysis unit 106 of the data block type and the data block that are the element name, the attribute name, and the attribute value as a result of the lexical analysis.

For example,

Text xml: space = 〃 preserve "x = 10 a <tspan tex t — decoration =" underline ^> b </ tspan> c def </ text> and!, U, from the "start of text" tag "text In processing for notification of data blocks up to the end tag, the data block extraction unit 105 uses attribute values “text” as element names, “x” as attribute names, and attribute values of X. It is possible to detect a certain “10”, “abc (blank) (blank) (blank) def” of the content, and for the data analysis unit 106, whether the detected data block is an element or not Can communicate information such as whether there is.

The data analysis unit 106 semantically analyzes the contents of the data block detected by the data block extraction unit 105 by the lexical analysis, and as a result of the semantic analysis, the data block extraction unit 105 performs the lexical analysis. As a result of the above, it is possible to notify a data block unit that is notified to the data analysis unit 106 that is a unit of data block suitable for semantic analysis.

[0199] For example, the data analysis unit 106 analyzes the meaning of the element name "text", and the content of the content contains a character string. Notice. Note that if the attribute value of the attribute "xml: space" is either 6561 ^ 6 or "(16 £ &1111;" and the attribute value is "preserve", keep the space in the content. Applied to the above example in the sense that there are three spaces between abc and def Be explained. On the other hand, when the attribute value is "default", consecutive blanks in the content are interpreted as one. As described above, as in the case where the attribute is "xml: space", etc., if the types of attributes that can be possessed are always finite types, one data block is extracted by the data block extraction unit 105. The size and strength of the data determined to be

[0200] Further, for example, it can be determined whether the data is data representing a file name or data representing an image and subjected to processing such as encoding with base 64 as described below. In the data block extraction unit 105, the data analysis unit 106 is present.

[0201] An example of such data is shown below.

[0202] For file name:

For base64 encoded data:

<image xlink: href = "data: image / jpeg; base64, / 9j / 4AAQSkZJRgAB AQEAYABgAAD /

rkFR / UOUDP // Z "x =" 10 "y =" 20 "width =" 100 "height =" 2 00 "/>

The data analysis unit 106 determines whether the attribute value is a file name from the element name “image” and its attribute name “xlink: href”. For this reason, the data analysis unit 106 passes the first several dozens of hundreds of bytes to the data block extraction unit 105 (as it stores the data in the continuous buffer area for coupling at one time). Give notice. Then, the portion “data: imageZjpeg; base64,” included in the data encoded by the above base64 appears in the data to be analyzed by the data analysis unit 106, so that the object of interpretation is generated. The data in question is determined to be base64 encoded data, and the data block extraction unit 105 is notified that the attribute value thereafter can be subjected to semantic analysis in units of one character. Based on such notification, the data block extraction unit 105 changes the size of the data block to be extracted.

[0203] FIG. 23 is a block diagram of the data analysis apparatus according to the second embodiment of the present invention described above. The figure for demonstrating the content of data analysis entirely is shown. FIG. 23 shows a state in which the above-mentioned expansion buffer area 103A and coupling continuous buffer area 103B are defined and stored in the storage unit 103! / ヽ.

Referring to FIG. 23, in the data analysis process of the present embodiment, SVGZ data force input via input unit 102 causes data expansion unit 104 to respond to the size of expansion buffer area 103 A. For each capacity, SVG data is expanded in the expansion buffer area 103A. Then, the data block extraction unit 105 performs lexical analysis according to the XML grammar for the expanded data, and notifies the data analysis unit 106 of the result. The data analysis unit 106 sequentially generates a DOM tree based on the SVG data expanded in the expansion buffer area 103A in response to such notification. When lexical analysis is performed, if the data block expanded in the expansion buffer area 103A is interrupted in the middle of the data block, the data block extraction unit 105 uses the interrupted portion as the coupling block. Copy to the continuous buffer area 103B of Such work is sequentially performed, and the interrupted data blocks are combined in the combining continuous buffer area 103 B and used by the data analysis unit 106 to generate a DOM tree.

[0205] In the present embodiment, by using a buffer area 103A for expansion of a fixed capacity and a continuous buffer area 103B for connection which is a continuous buffer whose capacity is changed as necessary, Analysis of data is possible. In addition, since expansion of data is notified to the data analysis unit 106 in association with the data block power status information each time a data block is extracted, the entire amount of data input from the input unit 102 is one at a time. Even if it is not expanded, it can be done sequentially. As a result, the capacity of the buffer area 103A for expansion can be made smaller than that of the prior art. In addition, since the continuous buffer area 103B for coupling is used only when the data block having no capacity is extracted in the buffer area 103A for expansion, the storage area of the storage unit 103 can be efficiently used. Available.

Note that the DOM tree generated in the present embodiment is an example of an abstract syntax tree for SVG data, and the abstract syntax tree in the present invention is not limited to this.

In the first and second embodiments described above, the data analysis device Data analysis methods can also be provided as a program. Such a program may be recorded as a computer readable recording medium such as a flexible disk, a CD-ROM, a ROM, a RAM and a memory card attached to the computer, and may be provided as a program product. The program can also be provided by recording on a recording medium such as a built-in hard disk. The program can also be provided by downloading via a network.

The provided program product is installed in a program storage unit such as a hard disk and executed. The program product includes the program itself and a recording medium in which the program is recorded.

[0209] The force which has described and illustrated the present invention in detail This is for illustration only and should not be taken as a limitation, the spirit and scope of the invention being limited only by the appended claims. Will be clearly understood.

Claims

The scope of the claims

[1] A data expansion unit (104) that generates intermediate data of analysis target data and expands it in a predetermined data expansion area;

A data analysis unit (106) for analyzing the intermediate data;

The data developing unit (104) sequentially generates intermediate data using a part of the analysis target data.

The data analysis unit (1) is characterized in that the data analysis unit (106) sequentially analyzes the generated intermediate data.

[2] The data analysis device (1) according to claim 1, wherein the data expansion unit (104) decodes the encoded analysis target data to generate the intermediate data.

[3] The data analysis device (1) according to claim 2, wherein the encoded data to be analyzed is compressed data.

[4] The data expansion unit (104) designates a size on a storage unit to secure the data expansion area, and decodes a part of the analysis target data according to the size of the data expansion area. The data analysis device (1) according to claim 1, wherein the intermediate data is generated to generate the intermediate data.

[5] The data developing unit (104) stores the data position of the analysis target data that has been expanded, and decodes the stored data position data of the analysis target data. The data analysis device (1) according to claim 4, wherein the intermediate data is generated.

[6] The data analysis according to claim 4, wherein the data expansion unit (104) determines whether or not the data expansion area needs to be expanded, and when it is determined that the data expansion area is necessary, the data expansion area is expanded. Equipment (1).

[7] The data analysis device according to claim 6, wherein the data development unit (104) sets the area adjacent to the data development area as the data development area to be expanded prior to other areas. Position (1).

[8] The data analysis according to claim 6, wherein the data development unit (104) sets the area adjacent to the rear of the data development area as the data development area to be expanded prior to other areas. Device (1).

[9] The above-mentioned intermediate data strength is also added to the data block extraction unit (105) for extracting In preparation for

The data analysis device (1) according to claim 1, wherein the data analysis unit (106) analyzes the data block extracted by the data block extraction unit (105).

10. The data analysis device (1) according to claim 9, wherein said data block extraction unit (105) stores the position of the remaining data from which said data block is extracted among said intermediate data.

11. The data analysis apparatus according to claim 10, wherein said data development unit (104) moves the remaining data to the top of said data development area according to the position of said stored remaining data. (1).

[12] The data according to claim 10, wherein the data expanding unit (104) generates and expands next intermediate data following the intermediate data, after the remaining data in the data expanding area. Data analyzer (1).

[13] The data analysis device according to claim 10, wherein the data block extraction unit (105) extracts the next data block of the data block from the start position of the stored remaining data. .

[14] The data analysis device (1) according to claim 9, wherein the data block extraction unit (105) stores the extraction status of the data block.

[15] The data analysis device (1) according to claim 14, wherein the data analysis unit (106) determines an analysis method of the data block according to an extraction condition of the stored data block.

[16] The data analysis device (1) according to claim 9, wherein the data analysis unit (106) stores an analysis status of the data block.

[17] The data analysis device according to claim 16, wherein the data block extraction unit (105) determines an extraction method of the data block according to an analysis situation of the stored data block.

(D o

[18] The analysis target data is data obtained by encoding character string data of a character code.

The data development unit (104) detects the character code from the intermediate data, converts the detected character code into the character string data, and develops the converted character string data into the data development area. The data analysis device (1) according to claim 1.

[19] The analysis target data is data obtained by encoding character string data of a fixed-length character code,

The data expansion unit according to claim 18, wherein said data expansion unit (104) secures said data expansion area by designating a size on a storage unit as a fixed unit of said detected character code as a minimum unit. Data analysis device (1).

[20] The analysis target data is data obtained by encoding character string data of a variable-length character code,

The data analysis device (1) according to claim 18, wherein the data developing unit (104) converts the detected character code into character string data of a fixed length character code.

21. The data analysis device (1) according to claim 20, wherein the fixed length character code converted by the data development unit (104) is a fixed length Un icode.

[22] a data expansion step of generating intermediate data of analysis target data and expanding the data into a predetermined data expansion area;

Data analysis step of analyzing the intermediate data;

The data expanding step sequentially generates intermediate data using a part of analysis target data.

The data analysis step is characterized in that the generated intermediate data is sequentially analyzed.

[23] A program that causes a computer to analyze data,

On the computer

A data expansion step of generating intermediate data of the analysis target data and expanding the intermediate data into a predetermined data expansion area;

Performing a data analysis step of analyzing the intermediate data;

The data analysis program, wherein the data analysis step analyzes the generated intermediate data sequentially.

[24] A computer readable recording of the data analysis program according to claim 23 Recording media.

[25] A data analysis device (1) for processing data described in compressed markup language, which is

A storage unit (103);

The storage unit (103) includes a first buffer (103A) and a second buffer (103B) used as a buffer memory,

A data expanding unit (104) which expands data described in compressed markup language into the first buffer (103A);

A first data analysis unit (105) for extracting data blocks from the data developed in the first buffer (103A);

A second data analysis unit (106) for generating an abstract syntax tree from the data block extracted by the first data analysis unit (105);

The first data analysis unit (105) stores the extracted data block in one continuous storage area of the second buffer (103B).

The second data analysis unit (106) generates an abstract syntax tree using data stored in the first buffer (103A) or the second buffer (103B). , Data analysis device (1).

[26] The second data analysis unit (106) generates an abstract syntax tree using an analysis result by the first data analysis unit (105) of the extracted data block. 26. Data analysis apparatus (1) according to claim 25, characterized in that.

[27] The data expanding unit (104) transmits the data described in the compressed markup language to the first buffer (103A) for each capacity equal to or less than the capacity of the first buffer (103A). 26. Data analysis apparatus (1) according to claim 25, characterized in that it is deployed.

[28] The first data analysis unit (106) performs lexical analysis on the data developed in the first buffer (103A) to develop the data in the first buffer (103A). 26. A data analysis device (1) according to claim 25, characterized in that data blocks are extracted from the collected data.

[29] The first data analysis unit (105) performs lexical analysis in accordance with the XML grammar. The data analysis device (1) according to claim 28, characterized in that:

30. The data analysis device (1 according to claim 29, characterized in that the first data analysis unit (106) determines the capacity of the second buffer (103B) according to an XML syntax. ).

31. The data analysis device (1) according to claim 25, wherein the second data analysis unit (106) analyzes data in accordance with SVG grammar.

[32] The second data analysis unit (106) analyzes the data expanded in the first buffer (103A) in a state expanded in the first buffer (103A). 26. Data analysis apparatus (1) according to claim 25, characterized in that.

33. The data analysis device according to claim 25, wherein the first data analysis unit (105) includes a schema language change unit that changes a schema language used for character analysis.

Do

[34] The data analysis according to claim 25, characterized in that the second data analysis unit (106) analyzes the data expanded in the first buffer (103A) semantically. Device (1).

[35] The second data analysis unit (106) causes the first data analysis unit (105) to extract block data based on the analysis result of the first data analysis unit (105). The data analysis device (1) according to claim 25, characterized in that it is changed.

[36] The first data analysis unit (106) is based on a capacity of a part of data of the certain data block in the data expanded in the first buffer (103A) at the certain time. The

The data analysis device (1) according to claim 25, characterized in that the second buffer (103B) is secured in the storage unit (103).

[37] The first data analysis unit (105) is characterized in that a part of the data block in the first buffer (103A) is stored in the second buffer (103B). Item 25. Data analysis device (1).

[38] The first data analysis unit (106) combines a portion of a data block in the first buffer (103A) and a portion of the same data block in the second buffer (103B). The data analysis device (1) according to claim 37, characterized in that they are combined and stored in the second buffer (103B).

[39] The first data analysis unit (106) generates a data block in the first buffer (103A). The data analysis device according to claim 37, characterized in that part of the data block and part of the same data block in said second buffer (103B) are combined and stored in a third buffer (103C). Position (1).

[40] The first data analysis unit (106) is a capacity of part of data of the certain data block in the data expanded in the first buffer (103A) at the certain time, 40. The data analysis device (1) according to claim 39, wherein a third buffer (103C) is secured in the storage unit based on the capacity of the second buffer (103B).

[41] A data analysis method in an apparatus for processing data described in compressed markup language,

Expanding the data described in the compressed markup language to the first buffer, and

Extracting the data block expanded in the first buffer; storing the extracted data block in one continuous storage area of the second buffer;

Generating an extraction syntax tree using data of the first or second buffer.

[42] A data analysis program for causing a device for processing data described in compressed markup language to analyze data,

On the computer

Generating a extraction syntax tree using the data of the first or second buffer.

[43] A computer readable recording medium recording the data analysis program according to claim 42.