CN111666327A

CN111666327A - Text-based structured data description method and system

Info

Publication number: CN111666327A
Application number: CN202010521372.4A
Authority: CN
Inventors: 赵启杰
Original assignee: Shandong Huimao Electronic Port Co Ltd
Current assignee: Shandong Huimao Electronic Port Co Ltd
Priority date: 2020-06-10
Filing date: 2020-06-10
Publication date: 2020-09-15

Abstract

The invention discloses a text-based structured data description method and a text-based structured data description system, belonging to the technical field of information system data exchange; the invention can describe one or more structured data sets with incidence relation in a plain text form, and further serialize the structured data into a text sequence so as to facilitate data exchange among systems, and the text data format defined by the method is man-machine readable so as to facilitate technical personnel to check abnormal conditions in the data exchange process; compared with the current popular JSON format, the data format saves about half of space, is convenient for program fluidization processing, and can improve the throughput capacity of the application.

Description

Text-based structured data description method and system

Technical Field

The invention discloses a text-based structured data description method and a text-based structured data description system, and relates to the technical field of information system data exchange.

Background

The software development technology is developed rapidly at present, various development platforms and technology stacks coexist, and the integration of a heterogeneous application system is in a wide and common scene in the construction and maintenance processes of various information systems;

various standards or factual standards of the remote information exchange technology in the field of information technology are evolving from XML markup language trend of active lead of SOA architecture to JSON (Java Server pages) army projection in the era of mobile interconnection and then to protobuf of Google. Due to the universality and business neutrality of the XML language, the XML mark needs to be expanded greatly to meet the requirements of the business field, and the XML mark is a language with high redundancy degree. With the popularity of Javascript, JSON tag technology also becomes a widely used exchange format, but JSON is suitable for exchanging some simple small objects and is not suitable for exchanging a large amount of structured data, because a large amount of redundant tags still exist in a JSON tag mode and no corresponding standard exists for structured data exchange, and therefore, exchanging structured data of a complex structure by using JSON is not a better choice. In order to solve the above problems, the present invention provides a text-based structured data description method and system, which mainly focuses on the field of remote method invocation rather than the field of data exchange, is low in usability, and is not suitable for structured data exchange.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a text-based structured data description method and a text-based structured data description system, and the adopted technical scheme is as follows:

a text-based structured data description method comprises the following steps:

s1, combining the field metadata information into a field list;

s2 generating a header of the textual description;

s3, the definition of each field in the field list is described by a line of text with a fixed format;

s4, checking and outputting the index information of the original structured data table;

s5, outputting the field information in the field list in the form of field name;

s6 reading the values of the fields from the head line of the structured data in turn according to the sequence S5 and converting the values into a string character form;

s7 goes to the next line of structured data and repeats the S6 step as the current line.

The metadata name of the header described in the text in S2 is generated from a character string with a data format version.

The specific steps of checking and outputting the index information of the original structured data table by the S4 are as follows:

s401, checking index information of an original structured data table;

s411, if single index information exists, outputting the index information as a character string;

s412 has a plurality of index information. Then outputting in sequence;

s402 outputs an index end flag.

The specific steps of S6 reading the values of the fields sequentially from the head line of the structured data according to the sequence S5 and converting the values into string characters are as follows:

s601, positioning to the head line of the structured data, and reading values according to the field sequence in S5;

s611, translating the numeric value of the character string type and containing the quotation mark according to the step S3;

s612, translating the value containing the line feed character;

s613, the value of the binary type is translated by MIME coding;

s614, the numerical value of the time type is coded and translated by MM/DD/YYYYY/HH/MM/SS/SSs format;

s602 the current line ends with a carriage return linefeed.

A text-based structured data description system specifically comprises a combination module, a file header processing module, a structure processing module, an index processing module, a title processing module, a current line processing module and a line switching module:

combining the modules: combining the field metadata information into a field list;

a file header processing module: generating a header of the textual description;

a structure processing module: the definition of each field in the field list is described by a line of text with a fixed format;

the index processing module: checking and outputting the index information of the original structured data table;

a title processing module: outputting field information in the field list in a field name form;

the line processing module: sequentially reading the numerical values of the fields from the first line of the structured data by using a title processing module and converting the numerical values into a string character form;

a line switching module: the next line of the structured data is switched to be used as the current line to repeat the operation of the line processing module;

the data name of the information head generated by the file head processing module consists of string characters with data format versions.

The index processing module can also process index information with different quantities, and comprises an inspection module, a single information processing module, a multi-information processing module and an end module:

an inspection module: checking index information of an original structured data table;

the single information processing module: if single index information exists, outputting the index information as a character string;

the multi-information processing module: there are a plurality of index information. Then outputting in sequence;

an end module: and outputting an index end identifier.

The line processing module can also translate different types of numerical values, and comprises a reading module, a first transfer module, a second translation module, a third translation module, a fourth translation module and a line changing module:

a reading module: positioning the head line of the structured data, and reading the numerical value by utilizing the field sequence in the title processing module;

a first translation module: translating the numeric value of the character string type and containing the quotation mark according to the operation of a structure processing module;

a second translation module: translating a value comprising a line feed;

a third translation module: translating the value of the binary type by MIME encoding;

a fourth translation module: the numerical value of the time type is coded and translated by using an MM/DD/YYYYYY/HH/MM/SS/SSs format;

a line feed module: the current line is finished with a carriage return linefeed character.

The invention has the beneficial effects that: the method of the invention is a revolutionary modification of the file format of the traditional CSV (Comma-Separated Values, CSV, sometimes also called character-Separated Values because the Separated characters may not be commas) format, the file of which stores table data (numbers and text) in a plain text form, the plain text means that the file is a character sequence and does not contain data that must be interpreted like binary numbers; CSV files are composed of any number of records, and the records are separated by a certain linefeed character; each record is made up of fields, and separators between fields are other characters or strings, most commonly commas or tabs; the invention can describe one or more structured data sets with incidence relation in a plain text form, and further serialize the structured data into a text sequence so as to facilitate data exchange among systems, and the text data format defined by the method is man-machine readable so as to facilitate technical personnel to check abnormal conditions in the data exchange process; compared with the current popular JSON format, the data format saves about half of space, is convenient for program fluidization processing, and can improve the throughput capacity of the application.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flow chart of the method of the present invention; FIG. 2 is a schematic diagram of the system of the present invention; fig. 3 is a schematic diagram of an implementation of the first embodiment.

Detailed Description

The present invention is further described below in conjunction with the following figures and specific examples so that those skilled in the art may better understand the present invention and practice it, but the examples are not intended to limit the present invention.

The first embodiment is as follows:

a text-based structured data description method comprises the following steps:

s1, combining the field metadata information into a field list;

s2 generating a header of the textual description;

s7 moving to the next line of the structured data, repeating the step S6 as the current line;

when the data processing is carried out according to the method of the invention, firstly, a file header is processed: traversing field metadata information of the structured data table according to S1, wherein the field metadata information comprises field names, field types, field lengths, field titles, edit masks, display lengths and default values to form a field list;

firstly, processing a file header: generating a textual description information header from the generated field list according to S2, wherein the information header contains a VERSION of a data format, the format is "@ @ FILE VERSION @", "300", and a character string formed by two @ letters can be used as a metadata name of the data format, such as FILE VERSION;

the data structure is then processed: traversing the field list in the S1, performing embedded escaping form on each field according to S3, and outputting a character string on one line, wherein the format is 'field name ═ field type, field length,' field title 'edit mask', display length, Data, 'default value' and each field is escaped and then is in a line independently;

the data index is then processed: checking and outputting the index information of the original structured data table according to S4, traversing the field information in S1,

the field header is then processed: outputting the field information in the field list in the form of field names "field 1", "field 2", … "field n" through S5;

then, the data of the current line is processed: reading the values of the fields from the head line of the structured data in the sequence of S5 as described above through S6, and converting the values into a string form, each field being separated by a comma, and the text of each field value being enclosed by a quotation mark (");

and finally, switching uplink and downlink data: repeating the conversion step of S6 according to S7 until the last line of the structured data is reached, and completing the description of the whole segment of the structured data;

the method of the invention is a revolutionary modification of the file format of the traditional CSV (Comma-Separated Values, CSV, sometimes also called character-Separated Values because the Separated characters may not be commas) format, the file of which stores table data (numbers and text) in a plain text form, the plain text means that the file is a character sequence and does not contain data that must be interpreted like binary numbers; CSV files are composed of any number of records, and the records are separated by a certain linefeed character; each record is made up of fields, and separators between fields are other characters or strings, most commonly commas or tabs; the invention can describe one or more structured data sets with incidence relation in a plain text form, and further serialize the structured data into a text sequence so as to facilitate data exchange among systems, and the text data format defined by the method is man-machine readable so as to facilitate technical personnel to check abnormal conditions in the data exchange process; compared with the current popular JSON format, the data format saves about half of space, is convenient for program fluidization processing and can improve the throughput capacity of the application;

further, the metadata name of the header described in the step S2 is generated from a character string with a data format version;

further, the step of S4 examining and outputting the index information of the original structured data table includes:

s401, checking index information of an original structured data table;

s412 has a plurality of index information. Then outputting in sequence;

s402, outputting an index ending identifier;

when the data index is processed, firstly checking the index information of the original structured data table according to S401, and when the original structured data table has single index information, outputting a start identifier according to S411: "@ @ INDEX DEEF START @", then output the index information, the form is "field name", "ASC/DESC", when the check result is multiple index information, output the index information according to S412 sequentially, finally output the index END mark "@ @ INDEX DED END @";

further, the specific steps of S6 reading the values of the fields sequentially from the top row of the structured data according to the sequence S5 and converting the values into string characters are as follows:

s612, translating the value containing the line feed character;

s613, the value of the binary type is translated by MIME coding;

s602, the current line is finished by using a carriage return line feed symbol;

when processing the data of the current line, firstly positioning the first line of the structured data according to S601, reading the numerical values according to the field sequence in S5, when the numerical values read by S601 are of a character string type and contain quotation marks, performing notational translation according to the step S3 according to S611, when the numerical values read by S601 contain line breaks, performing escape by using% n according to S612, when the numerical values read by S601 contain carriage returns, performing escape by using% c, if the numerical values read by S601 contain carriage returns, performing escape by using two% which is%, and otherwise, encoding the character string by UTF-8; if the binary type is adopted, the code is coded by MIME; if the field is time type, then the field is converted into standard time and then encoded by the format of MM/DD/YYYYY/HH/MM/SS/SSs; after all field values are converted, the current line is finished by using a carriage return line-changing character, that is, all data of the current line of the current record set are placed in a line character string.

Example two:

when the system processes data, firstly, the file header is processed: traversing field metadata information of the structured data table by using a combination module, wherein the field metadata information comprises field names, field types, field lengths, field titles, editing masks, display lengths and default values to form a field list;

firstly, processing a file header: the generated field list is used for generating a textual description information header through a FILE header processing module, the information header comprises a data format VERSION, the format is ' @ @ FILE VERSION ' @ ' or ' 300 ', and a character string formed by two @ can be used as a metadata name of the data format, such as FILE VERSION;

the data structure is then processed: traversing a field list in a combination module, carrying out an embedded escape form on each field through a structure processing module, and outputting a character string on one line, wherein the format is 'field name ═ field type, field length,' field title;

the data index is then processed: the index processing module is used for checking and outputting the index information of the original structured data table, traversing the field information in the combination module,

the field header is then processed: outputting the field information in the field list in the form of field names ' field 1 ', ' field 2 ', … ' and ' field n ' by using the line processing module;

then, the data of the current line is processed: reading the numerical values of the fields from the first line of the structured data according to the sequence of the title processing module through the line processing module, converting the numerical values into a character string form, separating each field by commas, and enclosing the text of each field value by a quotation mark (");

and finally, switching uplink and downlink data: repeating the conversion steps of the line processing module by using the line switching module until the last line of the structured data is converted, and finishing the description of the whole section of the structured data;

the system of the invention is a revolutionary modification of the file format of the traditional CSV (Comma-Separated Values, CSV, sometimes also called character-Separated Values because the Separated characters may not be commas) format, the file of which stores table data (numbers and text) in a plain text form, the plain text means that the file is a character sequence and does not contain data that must be interpreted like binary numbers; CSV files are composed of any number of records, and the records are separated by a certain linefeed character; each record is made up of fields, and separators between fields are other characters or strings, most commonly commas or tabs; the invention can describe one or more structured data sets with incidence relation in a plain text form, and further serialize the structured data into a text sequence so as to facilitate data exchange among systems, and the textual data format defined by the system of the invention is man-machine readable so as to facilitate technical personnel to check abnormal conditions in the data exchange process; compared with the current popular JSON format, the data format saves about half of space, is convenient for program fluidization processing and can improve the throughput capacity of the application;

further, the data name of the header generated by the header processing module is composed of string characters with data format versions;

furthermore, the index processing module can also process index information with different quantities, and the index processing module comprises an inspection module, a single information processing module, a multi-information processing module and an end module:

an end module: outputting an index ending identifier;

when the data index is processed, the index information of the original structured data table is checked through the checking module, and when the original structured data table has single index information, the single information processing module is used for outputting a start identifier: "@ @ INDEX EFSTART @", then output the index information, the form is "field name", "ASC/DESC", when the check result is multiple index information, output the index information sequentially through the multi-information processing module, output the index end mark "@ @ INDEX EFEND @finally;

furthermore, the line processing module can also translate different types of values, and the line processing module includes a reading module, a first transferring module, a second translating module, a third translating module, a fourth translating module, and a line changing module:

a reading module: locating the head line of the structured data, and reading the values according to the field sequence in S5;

a second translation module: translating a value comprising a line feed;

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A text-based structured data description method is characterized by comprising the following steps:

s1, combining the field metadata information into a field list;

s2 generating a header of the textual description;

2. The method according to claim 1, wherein the metadata name of the header of the textual description in S2 is generated from a string character with a data format version.

3. The method according to claim 2, wherein the step S4 of checking and outputting the index information of the original structured data table includes the following steps:

s401, checking index information of an original structured data table;

s412 has a plurality of index information. Then outputting in sequence;

s402 outputs an index end flag.

4. The method as claimed in claim 3, wherein the step of S6 reading the values of the fields from the head line of the structured data in sequence according to S5 and converting the values into string characters comprises:

s612, translating the value containing the line feed character;

s613, the value of the binary type is translated by MIME coding;

s602 the current line ends with a carriage return linefeed.

5. A structured data description system based on text is characterized in that the system specifically comprises a combination module, a file header processing module, a structure processing module, an index processing module, a title processing module, a line processing module and a line switching module:

a line switching module: and moving to the next line of the structured data, and repeating the operation of the line processing module as the current line.

6. The text-based structured data description system of claim 5 wherein said header processing module generates a header having a data name comprised of a string character with a data format version.

7. The text-based structured data description system of claim 6 wherein said index processing module is further capable of processing different amounts of index information, said index processing module comprising a check module, a single information processing module, a multiple information processing module, and a finish module:

an end module: and outputting an index end identifier.

8. The text-based structured data description system of claim 7 wherein said line processing module is further capable of translating different types of values, said line processing module including a read module, a first transfer module, a second translation module, a third translation module, a fourth translation module, and a line feed module:

a second translation module: translating a value comprising a line feed;