CN111026399A

CN111026399A - Method for processing binary data file through SQL statement

Info

Publication number: CN111026399A
Application number: CN201911048643.2A
Authority: CN
Inventors: 连接力
Original assignee: Sichuan Winshare Education Science & Technology Co ltd
Current assignee: Sichuan Winshare Education Science & Technology Co ltd
Priority date: 2019-10-30
Filing date: 2019-10-30
Publication date: 2020-04-17

Abstract

The invention discloses a method for processing a binary data file through an SQL statement, which comprises the following steps: compiling SQL sentences for query or update according to the business scene; analyzing keywords in the SQL sentence, including SQL request keywords, table names, field names in the conditional clause and corresponding field values; translating the keywords into high-level language codes, and executing according to conditions: if the query statement is the query statement, reading the mapped file according to a typical data reading mode to obtain a result; if the updated statement is the updated statement, the target data entry is found according to the first query method, and the condition that the length of the updated data field is smaller than, equal to or larger than the length of the original field is processed according to the condition. The invention makes the encoding personnel who do not care about the bottom data structure easily realize the reading of the data file by using the SQL sentence mode, has low learning cost for the development personnel, and is suitable for the application of selecting the binary data file instead of the database with small data volume.

Description

Method for processing binary data file through SQL statement

Technical Field

The invention relates to a method for processing a binary data file, in particular to a method for processing a binary data file through an SQL statement, and belongs to the technical field of IT digitization.

Background

Software often needs to access data, and a database is generally used for accessing data of a system with a large data volume. For systems with small data size, especially systems with read-only data, the use of databases is heavy, and alternative solutions include the use of text files for data access, such as the common xml, json, ini, etc. files. In addition, the use of custom binary data files as carriers for data is also a common practice. The format of binary data file is generally a private definition, and the program for reading and writing the data file carries out specific code writing according to the structure of the data file. The method has the advantages that the data file is directly operated, the read-write efficiency is generally higher, and particularly, the higher read efficiency can be obtained by matching with the file mapping technology of an operating system and other technologies under the condition that the data is read only. The method has the disadvantages that when a specific file is read and written, codes completely depend on the structure of the data file, developers need to know the details of the file structure clearly, and the files need to be operated, mapped and other Api functions deeply.

The format of the binary data file is not standardized and there are companies/entities that typically develop software that are customized for the business. A typical binary data file consists of several parts: a header, a data index table, a data block area, such as a bitmap file, and the like, belong to this category. The file header records the version, mark, checksum, data index table, initial address of data block, etc. of the file; the file index table (which may not) records the offset distance of data entries in the data block from the data block, usually to facilitate fast jumps or to save data space; the data block area records the actual data value.

Without loss of generality, a binary data file structure according to the above description is as follows:

typical data reading methods are:

1. loading a data file:

the method can be read by using a standard file reading mode, and particularly can be finished by using a memory mapping series function under a windows system, such as CreateFileMapping/MapViewOfFile and the like;

2. reading a file header, verifying data, and acquiring the initial position of each data area;

3. determining, based on the characteristics of the data to be queried, the index and data tables to which the data belongs, e.g.

For example, unsigned long pdatefststab ═ pMapFile + pheader- > pe.dwindexpos;

LPBYTE pDataBlock＝pMapFile+pheader-＞pe.dwDataPos；

4. the offset of the data in the data area recorded in the index table is read. The piece of data is read according to the offset, for example:

Index＝dateID-IDBEGIN；

dwDataIndex＝(DWORD*)pFileMap-＞pDatOfstTab+Index；

pDataItem＝(DataItem*)(pDataBlock+*(DWORD*)((BYTE*)pDatOfstTab+dwCandIndex))。

similarly, the data is updated in a similar manner, and the implementation becomes to update the fields in the piece of data in step 4. It should be noted that, when a certain piece of data is updated, if a certain field is lengthened and the new length is smaller than the original length, the fields after the certain piece of data need to be moved forward by the difference; when the new length is larger than the original length, all data after the piece of data needs to be moved forward by the difference value, and the offset of the affected data in the response index table, the checksum in the file header, the update date, the data version and other fields are updated.

It can be seen that, in the prior art, a binary data file is usually read by a direct code, typically, a jump of a data pointer in c/c + +, so that a person who codes an upper layer service needs to know details of the data file, and learning cost is increased for the syntax of Api familiar with files, file memory mapping, data byte access, and the like.

Disclosure of Invention

The present invention is directed to solving the above-mentioned problems by providing a method for processing a binary data file by SQL statements, so that a coding person who does not care about the underlying data structure can easily read the data file.

The invention realizes the purpose through the following technical scheme:

a method of processing a binary data file through an SQL statement, comprising the steps of:

step 1, compiling SQL sentences for query or update according to a service scene; for example, a query statement, stringSQL ═ select [ field name ] from [ table name ] where [ field name 1] - [ field name 1] "[ and | or ] [ field name 2] - [ field name 2 ]"; generally, the reading processing of binary data files involves a few complex multi-table query conditions, and the invention only processes the query conditions that a single table and a where clause(s) are (are) equal/not equal/or contain; if multi-table association query needs to be processed, for example, multiple index tables, wherein the query result of the index table 1 is the index of an entry in the index table 2, the association query can be split, the index table 1 is queried by using a single table to obtain a result, and then the single table query is called again to query the index table 2;

step 2, resolving keywords in the SQL sentence, including the SQL request keywords, the table name, the field name in the conditional clause and the corresponding field value;

step 3, translating the keywords into high-level language codes, and executing according to conditions:

firstly, if the query statement is a query statement, reading the mapped file according to a typical data reading mode to obtain a result;

secondly, if the statement is an update statement, firstly finding a target data entry according to the first query method, and then carrying out situation processing on the situation that the length of the updated data field is smaller than, equal to or larger than the length of the original field.

Preferably, in step 2, the SQL statement is parsed by using a regular expression or other third-party SQL parsing library.

Preferably, in step 3, if the first method is adopted, the method comprises the following steps: if the data can be directly positioned, the index entry is found first, the data entry offset recorded by the index entry is read, then the data entry in the data block is directly positioned by the offset, and the data to be inquired is read; if the data can not be directly positioned, directly starting to inquire from the first piece of data of the corresponding data block until the data meeting the condition is found or returning a null value when the data is not found; if the second method is adopted, the method comprises the following steps: if the length of the updated data field is smaller than the length of the original field, updating the field appointed in the data entry, and advancing the subsequent fields in sequence, wherein the movement amount is the difference between the lengths of the front field and the rear field, and the subsequent data items are not affected; if the length of the updated data field is equal to the length of the original field, directly updating the field appointed in the data entry; if the length of the updated data field is larger than the length of the original field, updating the field appointed in the data entry, and shifting all the subsequent fields and all the subsequent data which do not belong to the data of the data area backwards, wherein the shift amount is the difference between the lengths of the front field and the rear field, and meanwhile, updating the positions of the affected data block and the data block in the file header, wherein the difference between the lengths of the front field and the rear field is added to the position of the data block; after the update is completed, relevant fields including but not limited to update time, checksum or update personnel in the header of the rewrite file are included.

The invention has the beneficial effects that:

the invention makes the encoding personnel who do not care about the bottom data structure easily realize the reading of the data file by using the SQL sentence mode, has low learning cost for the development personnel, and is suitable for the application of selecting the binary data file instead of the database with small data volume.

Drawings

FIG. 1 is a flow chart of a method of processing a binary data file through an SQL statement in accordance with the present invention.

Detailed Description

The invention will be further described with reference to the accompanying drawings in which:

as shown in fig. 1, the method for processing binary data files by SQL statements of the present invention includes the following steps:

step 1, compiling SQL sentences for query or update according to a service scene; for example, a query statement, stringSQL ═ select [ field name ] from [ table name ] where [ field name 1] - [ field name 1] "[ and | or ] [ field name 2] - [ field name 2 ]"; generally, the reading processing of binary data files involves a few complex multi-table query conditions, and the invention only processes the query conditions that a single table and a where clause(s) are (are) equal/not equal/or contain; if a multi-table association query needs to be processed, for example, under the condition that the query result of the index table 1 is the index of an entry in the index table 2, the association query can be split, the index table 1 is queried by using a single table to obtain a result, and then the single table query is called again to query the index table 2.

Step 2, resolving keywords in the SQL sentence, including the SQL request keywords, the table name, the field name in the conditional clause and the corresponding field value; SQL statement parsing is performed using regular expressions or other third party SQL parsing libraries.

firstly, if the query statement is a query statement, reading the mapped file according to a typical data reading mode to obtain a result; the specific method comprises the following steps: if the data can be directly positioned, the index entry is found first, the data entry offset recorded by the index entry is read, then the data entry in the data block is directly positioned by the offset, and the data to be inquired is read; if the data can not be directly positioned, directly starting to inquire from the first piece of data of the corresponding data block until the data meeting the condition is found or returning a null value when the data is not found;

secondly, if the statement is an update statement, firstly finding a target data entry according to the first query method, and then carrying out situation processing on the situation that the length of the updated data field is smaller than, equal to or larger than the length of the original field; the specific method comprises the following steps: if the length of the updated data field is smaller than the length of the original field, updating the field appointed in the data entry, and advancing the subsequent fields in sequence, wherein the movement amount is the difference between the lengths of the front field and the rear field, and the subsequent data items are not affected; if the length of the updated data field is equal to the length of the original field, directly updating the field appointed in the data entry; if the length of the updated data field is larger than the length of the original field, updating the field appointed in the data entry, and shifting all the subsequent fields and all the subsequent data which do not belong to the data of the data area backwards, wherein the shift amount is the difference between the lengths of the front field and the rear field, and meanwhile, updating the positions of the affected data block and the data block in the file header, wherein the difference between the lengths of the front field and the rear field is added to the position of the data block; after the update is completed, relevant fields including but not limited to update time, checksum or update personnel in the header of the rewrite file are included.

Description of the drawings: the steps in fig. 1 are not exactly the same as those described above, but correspond to each other, so as to facilitate simple extraction into a flow chart and understanding.

In order to illustrate the method of the present invention in more detail, several practical examples are described in more detail below.

Example 1:

the present embodiment explains a method for compiling an SQL statement to query and update resource data by operating a specific resource data file.

The format and main values of the digital resource binary data file are shown in the following table:

the data file contains two tables: the resource table mainly comprises a resource index table and a resource data table; the configuration table mainly comprises a configuration data table. The resource ID is sequentially numbered from 100, and for example, the resource with ID 199 is the 199 th resource, 100+1, or 100. The data of each resource is variable length data (because the title and description fields in each resource are variable length fields).

If only query operation is carried out on the data file, the data file can be mapped in a read-only mode by using an OpenFileMapping/MapViewOfFile function under a windows system, and better reading efficiency can be obtained. If the data file is possibly written, especially the length of the written data is possibly longer than that of the original file, the file is opened by using a common file in a reading and writing mode, and the data is read into the memory better.

Taking the query of the resource data with the resource ID 199 as an example, the query method is as follows:

step 1, compiling an SQL (structured query language) statement according to a service scene:

writing an SQL statement: string SQL, select from resource, resource id, 199';

step 2, resolving keywords in the SQL statement:

the SQLParser component (translated into c + + code) in the Ali open source framework reduce is used for parsing, so that a query keyword select, a table name resource and a query field resource id/199 can be obtained;

finding that the query statement is a query statement by querying the keyword select, finding that the current query is a query to a resource table by a table name resource, and finding that the index table corresponding to the resource table in the file header is 84 bytes (from the initial position of the file header) and the data area is 4912 bytes (from the initial position of the file header); the query field is used as resource id to know that the position of the resource in the resource index table is 199 + 100+1, which is 100 pieces of data;

therefore, the resource index table can jump to the position of the 100 th index directly, and find that the offset (from the starting position of the resource data area) of the resource entry is 28512 bytes; directly positioning to the data entry in the data block according to the offset, reading the resource data, and reading the following data:

example 2:

taking the resource size of the query resource title beginning with the fifth generation and the resource type being the image as an example, the query method comprises the following steps:

writing an SQL statement: string SQL size from resource name title like 'five generation% s' and mediatypeName 'image';

step 2, resolving keywords in the SQL statement:

similarly, an SQLParser component is used for analysis, so that a query keyword select, a table name resource, a query field title/like/'five-generation% s' can be obtained; and; mediatypeName/═ image';

finding that the query statement is a query statement by querying the keyword select, finding that the current query is a query to a resource table by a table name resource, and finding that the index table corresponding to the resource table in the file header is 84 bytes (from the initial position of the file header) and the data area is 4912 bytes (from the initial position of the file header);

knowing that the position of the resource in the resource index table cannot be directly calculated by using the query fields as title and mediatypeName;

therefore, the program directly jumps to the data area with the position of 4912, and reads from the 1 st resource to match the title and the mediatynmeme fields of each resource; when the 100 th resource is read, matching is successful, and the size data 540000 of the resource is read back.

Example 3:

to update the resource data with resource ID 199 and change its resource description field (description) to "partition limit: and (4) classification: the examples of the tropical zone, the northern temperate zone, the northern frigid zone, the southern temperate zone and the southern frigid zone are as follows:

step 1, compiling an update SQL statement according to a service scene:

writing an SQL statement: string SQL "update resource set description" partition limit: and (4) classification: tropical zone, northern temperate zone, northern frigide zone, southern temperate zone, southern frigide zone 'where resource aid ═ 199';

step 2, resolving keywords in the SQL statement:

the SQL is analyzed by the SQLParser component in the durid, so that the partition limit of the keyword update, the table name resource, the field description/' can be obtained: and (4) classification: tropical zone, northern temperate zone, northern frigide zone, southern temperate zone, southern frigide zone', and condition resourceId/═ 199;

the update statement is obtained by inquiring the keyword update, the processing of the resource table is obtained by the table name resource, the index table corresponding to the resource table can be searched in the file header, the index table is 84 bytes (from the initial position of the file header), and the data area is 4912 bytes (from the initial position of the file header); the query field is used as resource id to know that the position of the resource in the resource index table is 199 + 100+1, which is 100 pieces of data;

therefore, the offset (from the starting position of the resource data area) of the resource entry can be found to be 28512 bytes in the resource index table, and the data entry in the data block is directly positioned according to the offset;

then, the length of the new description value is calculated to be 53 bytes (one Chinese character or double-byte punctuation is 2 bytes, and a character string ending character is added), and the length of the old description value 'demonstrating the five-generation region boundary' is 21 bytes; the length difference is 53-21-32 bytes, and the length of the new field is larger than that of the old field, so that data movement is needed;

finding that the offset of the 101 st resource in the index table of resources is 28656 bytes (from the start of the resource data area), therefore

(1) Reading data from the file header 4912+28656 to 33668 bytes (the initial position of the resource data area + the offset of 101 resources) to the tail of the file into a temporary memory;

(2) rewriting the value of the description field in the target resource entry to be a new value, and rewriting the descriptionLen to be 53;

(3) the data in the temporary memory is continuously written to the end of the target resource entry, and the temporary memory is released;

(4) rewriting all index offset values in the resource table from the 101 st entry, and adding 32 bytes;

(5) and rewriting the position, the time stamp, the file length and the checksum of the configuration data block in the file header.

The above embodiments are only preferred embodiments of the present invention, and are not intended to limit the technical solutions of the present invention, so long as the technical solutions can be realized on the basis of the above embodiments without creative efforts, which should be considered to fall within the protection scope of the patent of the present invention.

Claims

1. A method for processing binary data files through SQL statements is characterized in that: the method comprises the following steps:

step 1, compiling SQL sentences for query or update according to a service scene;

2. The method of processing a binary data file by an SQL statement according to claim 1, wherein: in the step 2, the SQL statement is analyzed by using a regular expression or other third-party SQL analysis libraries.

3. The method of processing a binary data file by an SQL statement according to claim 1, wherein: in step 3, if the first method is adopted, the method comprises the following steps: if the data can be directly positioned, the index entry is found first, the data entry offset recorded by the index entry is read, the data entry in the data block is directly positioned according to the offset, and the data to be inquired is read; if the data can not be directly positioned, directly starting to inquire from the first piece of data of the corresponding data block until the data meeting the condition is found or returning a null value when the data is not found; if the second method is adopted, the method comprises the following steps: if the length of the updated data field is smaller than the length of the original field, updating the field appointed in the data entry, and advancing the subsequent fields in sequence, wherein the movement amount is the difference between the lengths of the front field and the rear field, and the subsequent data items are not affected; if the length of the updated data field is equal to the length of the original field, directly updating the field appointed in the data entry; if the length of the updated data field is larger than the length of the original field, updating the field appointed in the data entry, and shifting all the subsequent fields and all the subsequent data which do not belong to the data of the data area backwards, wherein the shift amount is the difference between the lengths of the front field and the rear field, and meanwhile, updating the positions of the affected data block and the data block in the file header, wherein the difference between the lengths of the front field and the rear field is added to the position of the data block; after the update is completed, relevant fields including but not limited to update time, checksum or update personnel in the header of the rewrite file are included.