CN107341135A - A kind of analytic method and instrument towards generic text form - Google Patents

A kind of analytic method and instrument towards generic text form Download PDF

Info

Publication number
CN107341135A
CN107341135A CN201710372929.0A CN201710372929A CN107341135A CN 107341135 A CN107341135 A CN 107341135A CN 201710372929 A CN201710372929 A CN 201710372929A CN 107341135 A CN107341135 A CN 107341135A
Authority
CN
China
Prior art keywords
field
symbol
record
separator
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710372929.0A
Other languages
Chinese (zh)
Other versions
CN107341135B (en
Inventor
刘帆
木伟民
张云
王伟平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201710372929.0A priority Critical patent/CN107341135B/en
Publication of CN107341135A publication Critical patent/CN107341135A/en
Application granted granted Critical
Publication of CN107341135B publication Critical patent/CN107341135B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation

Abstract

The invention discloses a kind of analytic method and instrument towards generic text form.This method is:1) for a data a to be resolved, various self-defined symbols corresponding to it is imported in analytical tool first, the data a to be resolved is then read using the file coding format specified;Self-defined symbol includes line Separator, field surrounds symbol and interfield separator;2) the self-defined symbol parsed in data a is uniformly changed Chinese character string type by analytical tool;3) analytical tool analyzes read character one by one, if the character string of the character and behind n character composition is consistent with line Separator, data a to be resolved is divided into row data according to line Separator;4) analytical tool analyzes obtained row data, and record all in symbol parsing trip data is surrounded according to field;5) analytical tool analyzes obtained each record one by one, and field all in every record is parsed according to interfield separator.The present invention substantially increases analyzing efficiency.

Description

A kind of analytic method and instrument towards generic text form
Technical field
The present invention relates to a kind of analytical tool towards generic text form, belong to computer software technical field.
Background technology
Generic text format specification is:Generic text is made up of any number of record, with customized line feed between record Symbol separates;Every record is made up of field, and interfield is with customized interfield separators;It can customize interfield encirclement Symbol;Include newline in field, the field must surround symbol with field and bracket;Include interfield separator in field, The field must surround symbol with field and bracket;Include field in field and surround symbol, the field must surround symbol with field and include Get up;Field in field is surrounded symbol and represented with two fields encirclement symbols.The analytical tool of generic text form is realized to following The text field parsing of format specification.Current text resolution instrument is mainly for comma separated value (Comma-Separated Value, CSV) file, line Separator uses the newline of system default, and interfield separator uses comma or tab, field Surround symbol and use double quotation marks, text resolution instrument parses each field in every record and every record in file.
Current text resolution instrument can customize interfield separator and field bag mainly for comma separated value file Symbol is enclosed, but line Separator uses the newline of system default, it is impossible to self-defined newline;Interfield separator and field surround symbol It can customize as specific character, but specific character string or byte arrays can not be defined as.
The content of the invention
It is an object of the invention to provide a kind of analytic method and instrument towards generic text form, realize to following lattice The field parsing of the text of formula specification.The present invention can parse the file or stream of prescribed coding form, it is allowed to which self-defined row separates It is specific character, byte, character string or byte arrays that symbol, interfield separator and field, which surround symbol,;These customized letters Breath is inputted by User Defined, is stored in inside analytical tool.
The technical scheme is that:
A kind of analytic method towards generic text form, its step are:
1) for a data a to be resolved, various self-defined symbols corresponding to it is imported in analytical tool first, then adopted The data a to be resolved is read with specified file coding format;The wherein data a to be resolved is a file or data flow, described Self-defined symbol includes line Separator, field surrounds symbol and interfield separator;
2) the self-defined symbol parsed in data a is uniformly changed Chinese character string type by analytical tool;
3) analytical tool analyzes read character one by one, if the character and the behind character string of n character composition It is consistent with line Separator, then data a to be resolved is divided into by row data according to line Separator, wherein n is the length of line Separator Subtract one;
4) analytical tool analyzes obtained row data, and record all in symbol parsing trip data is surrounded according to field;
5) analytical tool analyzes obtained each record one by one, is parsed in every record and owned according to interfield separator Field.
Further, it is according to the method for record all in field encirclement symbol parsing trip data:
21) set a record end to identify and its value is initialized as false, by two continuation fields in each row of data Surround one that symbol is resolved in field and surround symbol, a field encirclement symbol is resolved into field one surrounds symbol, Ran Houxiang Preceding scanning character;If the encirclement symbol of the field is the last character or character string of current line, and record end ident value is True, then the record be parsed, by record end mark be set to false;If the encirclement symbol of the field is the last of current line One character or character string, and record end is identified as false, then judges whether encirclement symbol is that a field starts in record Symbol is surrounded, if the encirclement symbol of field beginning, then be set to true by record end mark, then analyze the data of next line;Such as The encirclement symbol of the fruit field is the last character or character string of the row, and record end identifier is false, and the encirclement accords with It is not that the encirclement that certain field starts in record accords with line number, offset and the type of error of error of then dishing out, by record end Mark is set to false;
If 22) the encirclement symbol of the field is not the last character or character string of the row, and record end is identified as True, then line number, offset and the type of error of error of dishing out, record end mark is otherwise set to false;If the field Encirclement symbol be not the row last character or character string, and record end is identified as false, then judges that encirclement symbol is The no encirclement symbol started for a field in record, if the encirclement symbol of field beginning, then be set to true by record end mark, connect The character for analyzing the row forward;If the encirclement symbol of the field is not the last character or character string of the row, record knot Beam identification symbol is false, and encirclement symbol is not that the encirclement that certain field starts in record accords with, then line number, the offset of error of dishing out And type of error, record end mark is set to false.
Further, the method for all field in every record being parsed according to interfield separator is:
31) it is false to set a field end of identification and initialize its value, if reading field surrounds symbol, is swept forward Character is retouched, if reach record ending when scanning forward, and field end of identification value is true, then the field is parsed, should Bar record is also parsed;Otherwise, dish out line number, offset and the type of error of error;
If 32) reading field surrounds symbol, fashion is scanned forward and not up to records ending, then continues to scan forward:
If it is true that a) scanning surrounds symbol and field end of identification to field, two continuous fields are surrounded into symbol parsing For the encirclement symbol in field;Otherwise, judge whether encirclement symbol is that the encirclement that field starts accords with, if then by field end of identification True is set to, continues to parse;If scanning is false to symbol, field end of identification is surrounded forward, encirclement symbol is not that field is opened During the encirclement symbol of head, line number, offset and the type of error of error of dishing out;
If b) field seperator is arrived in scanning forward, and field end of identification is true, then encirclement symbol is the encirclement of field Symbol, the separator are the separator of interfield, and the field is parsed, and field end of identification is reset into false;Otherwise, should Separator is separator in field, then judges whether encirclement symbol is that the encirclement that a field starts accords with, if then continuing to parse, Otherwise dish out line number, offset and the type of error of error;
If c) line Separator is arrived in scanning forward, and field end of identification is true, then parses the line Separator in field; Otherwise, this record does not meet text formatting specification, line number, offset and the type of error of error of dishing out.If scan forward Line Separator, other characters surrounded outside symbol, interfield separator, then parse the character, continue to parse;
33) if line Separator is read, and field end of identification is true, then parses the line Separator in field;It is no Then, dish out line number, offset and the type of error of error;
If interfield separator 34) is read, and field end of identification is true, then the interfield separator is in field Separator;Otherwise, the separator is the separator of interfield, parses a field of record;
If 35) read customized line Separator, other characters that field is surrounded outside symbol and interfield separator, The character is a part for field contents.
Further, the line Separator is character, byte, character string or byte arrays.
Further, the field surrounds symbol and surrounds character, byte, character string or byte arrays for field.
Further, the interfield separator is character, byte, character string or byte arrays.
A kind of analytical tool towards generic text form, it is characterised in that including document parser, row resolver, note Record resolver, field parser and exception handler;Wherein,
Document parser, for according to the character encoding format specified, the file or data flow in reading specified file path;
Row resolver, the character read for Study document resolver, file, solution are split according to customized line Separator Separate out each row of data of file;
Resolver is recorded, the character in each row of data being partitioned into for analyzing row resolver one by one, according to customized Field surrounds symbol and the record end set mark, parses all record datas that every row includes;
Field parser, for the every record parsed for record resolver, the word in every record is scanned one by one Symbol, according to customized interfield separator and the field end of identification set, parse field all in every record;
Exception handler, for doing abnormality processing to produced problem during document analysis, error message is recorded in detail.
The invention provides a kind of analytical tool towards generic text form, mainly including document parser, row parsing Device, record resolver, field parser and exception handler.Document parser refers to according to the character encoding format specified, reading Determine the file or specified file stream of file path;Row resolver one by one Study document resolver read file character, according to from The line Separator segmentation file of definition, parses each row of data of file, and wherein line Separator may be defined as character, byte, word Symbol string or byte arrays;Record resolver analyzes the character in each row of data that row resolver is partitioned into one by one, according to self-defined Field surround symbol, identified by the record end of setting, parse every record in trip data, wherein field surround symbol can It is defined as field and surrounds character, byte, character string or byte arrays;Field parser is directed to every that record resolver parses Record, the character in every record is scanned one by one, according to customized interfield separator, by setting field end of identification, Field all in every record is parsed, wherein interfield separator can be character, byte, character string or byte arrays; Produced problem does abnormality processing during document analysis of the exception handler to not meeting text formatting specification, records out in detail Wrong information.
Compared with prior art, the present invention has following advantage:
1st, the file or stream of prescribed coding form can be read;
2nd, it can customize line Separator, field surrounds symbol, interfield separator is character, byte, character string or byte number Group;
3rd, every note in surrounding file in symbol and the parsing of interfield separator according to customized line Separator, field or flow All fields of record;
4th, abnormality processing is done to the record for not meeting text formatting specification, records Error Location and type of error in detail;
5th, multithreading safety is ensured.
Brief description of the drawings
Fig. 1 is record process of analysis figure;
Fig. 2 is field process of analysis figure.
Embodiment
The present invention will be further described in detail with specific embodiment below in conjunction with the accompanying drawings, but does not limit in any way The scope of the present invention.
A kind of analytic method towards generic text form of example 1
1) file or stream are read using the file coding format specified;
If 2) it is character, byte or byte arrays that customized i.e. line Separator, field, which surround symbol and interfield separator, Then it is converted into character string;
3) character in file or stream is read in analysis one by one, if the character and the behind character of n character composition Go here and there as a line Separator, then according to line Separator by the row data of file division a line in a row, wherein n is newline length Subtract one;
4) the often capable data of analysis, parse all records, as shown in Figure 1.
Initialization record end is identified as false, and two continuation fields in each row of data are surrounded into symbol resolves to field Interior one surrounds symbol, and a field is surrounded and accords with an encirclement symbol for resolving to field.Parsing an encirclement of field Fu Shi, character is scanned forward.
If the encirclement symbol of the field is the last character (string) of the row, then judge that record end identifies, if record End of identification is true, then the record is parsed, and record end mark is set into false;If the encirclement symbol of the field is to work as Forward last character or character string, and record end is identified as false, then judges whether encirclement symbol is certain in record The encirclement symbol of individual field beginning, if the encirclement symbol of field beginning, then this record is not yet parsed, and record end is identified True is set to, then analyzes the data of next line;If the encirclement symbol of the field is the last character (string) of the row, record End identifier is false, and encirclement symbol is not the encirclement symbol that certain field starts in record, then this record does not meet text This format specification, line number, offset and the type of error of error of dishing out, record end mark is set to false.
If the encirclement symbol of the field is not the last character (string) of the row, then judge that record end identifies, if note Record end of identification is true, then this record does not meet text formatting specification, line number, offset and the wrong class of error of dishing out Type, record end mark is otherwise set to false;If the encirclement symbol of the field is not the last character or character of the row String, and record end is identified as false, then judges whether encirclement symbol is that the encirclement that certain field starts in record accords with, if word Duan Kaitou encirclement is accorded with, then this record is not yet parsed, and record end mark is set into true, then analyzes the row forward Character;If the encirclement symbol of the field is not the last character (string) of the row, record end identifier is false, should It is not the encirclement symbol that certain field starts in record to surround symbol, then this record does not meet text formatting specification, the row for error of dishing out Number, offset and type of error, record end mark is set to false.
5) data of every record are analyzed, parse all fields of every record, as shown in Figure 2.
Initialization field end of identification is false.If reading field surrounds symbol, character is scanned forward, if sweeping forward Reach record ending when retouching, then judge field end of identification, if field end of identification is true, the field is parsed, This record is also parsed;Otherwise, this record does not meet text formatting specification, the line number of error of dishing out, offset and Type of error.
If reading field surrounds symbol, fashion is scanned forward and not up to records ending.If scanning is to symbol is surrounded forward, then Judge field end of identification, if field end of identification is true, solve two continuous symbols that surround according to text formatting specification Analyse as the encirclement symbol in field;Otherwise, then judge whether encirclement symbol is that the encirclement that field starts accords with, if then by field knot Beam identification is set to true, continues to parse;If to symbol is surrounded, field end of identification is false, encirclement Fu Bushi for scanning forward During the encirclement symbol of field beginning, this record does not meet text formatting specification, line number, offset and the wrong class of error of dishing out Type.If field seperator is arrived in scanning forward, field end of identification is then judged, if field end of identification is true, the encirclement The encirclement symbol for field is accorded with, the separator is the separator of interfield, and the field is parsed, field end of identification is reset to false;Otherwise, the separator is separator in field, then judges whether encirclement symbol is that the encirclement that certain field starts accords with, if It is to continue to parse, otherwise this record does not meet text formatting specification, line number, offset and the wrong class of error of dishing out Type.If line Separator is arrived in scanning forward, judge field end of identification, if field end of identification is true, parse in field Line Separator;Otherwise, this record does not meet text formatting specification, line number, offset and the type of error of error of dishing out. If scanning line Separator, other characters surrounded outside symbol, interfield separator forward, the character is parsed, continues to parse.
If reading line Separator, field end of identification is judged, if field end of identification is true, parse in field Line Separator;Otherwise, this record does not meet text formatting specification, line number, offset and the type of error of error of dishing out.
If reading interfield separator, field end of identification is judged, if field end of identification is true, the interfield Separator is the separator in field;Otherwise, the separator is the separator of interfield, parses a field of record.
If reading customized line Separator, other characters surrounded outside symbol and separator, the character is in field A part for appearance.

Claims (10)

1. a kind of analytic method towards generic text form, its step are:
1) for a data a to be resolved, various self-defined symbols corresponding to it is imported in analytical tool first, then uses and refers to Fixed file coding format reads the data a to be resolved;The wherein data a to be resolved is a file or data flow, described to make by oneself Adopted symbol includes line Separator, field surrounds symbol and interfield separator;
2) the self-defined symbol parsed in data a is uniformly changed Chinese character string type by analytical tool;
3) analytical tool analyzes read character one by one, if the character and the behind character string and row of n character composition Separator is consistent, then data a to be resolved is divided into row data according to line Separator, wherein n is that the length of line Separator subtracts one;
4) analytical tool analyzes obtained row data, and record all in symbol parsing trip data is surrounded according to field;
5) analytical tool analyzes obtained each record one by one, and word all in every record is parsed according to interfield separator Section.
2. the method as described in claim 1, it is characterised in that record all in symbol parsing trip data is surrounded according to field Method be:
21) set a record end to identify and its value is initialized as false, two continuation fields in each row of data are surrounded Accord with one resolved in field and surround symbol, a field is surrounded and accords with an encirclement symbol for resolving to field, is then swept forward Retouch character;If the encirclement symbol of the field is the last character or character string of current line, and record end ident value is true, Then the record is parsed, and record end mark is set into false;If the encirclement symbol of the field is the last character of current line Symbol or character string, and record end is identified as false, then judges whether encirclement symbol is a field starts in record encirclement Symbol, if the encirclement symbol of field beginning, then be set to true by record end mark, then analyze the data of next line;If should The encirclement symbol of field is the last character or character string of the row, and record end identifier is false, and encirclement Fu Bushi The encirclement that certain field starts in record accords with line number, offset and the type of error for error of then dishing out, and record end is identified It is set to false;
If 22) the encirclement symbol of the field is not the last character or character string of the row, and record end is identified as true, then Dish out line number, offset and the type of error of error, record end mark is otherwise set to false;If the encirclement of the field Symbol is not the last character or character string of the row, and record end is identified as false, then judges whether encirclement symbol is note The encirclement symbol that a field starts in record, if the encirclement symbol of field beginning, then be set to true, then forward by record end mark Analyze the character of the row;If the encirclement symbol of the field is not the last character or character string of the row, record end mark Symbol is false, and encirclement symbol is not that the encirclement that certain field starts in record accords with, then line number, offset and the mistake of error of dishing out Type by mistake, record end mark is set to false.
3. the method as described in claim 1, it is characterised in that parsed according to interfield separator all in every record The method of field is:
31) it is false to set a field end of identification and initialize its value, is accorded with if reading field and surrounding, forward scan word Symbol, if reach record ending when scanning forward, and field end of identification value is true, then the field is parsed, this note Record is also parsed;Otherwise, dish out line number, offset and the type of error of error;
If 32) reading field surrounds symbol, fashion is scanned forward and not up to records ending, then continues to scan forward:
If it is true that a) scanning surrounds symbol and field end of identification to field, two continuous fields are surrounded into symbol resolves to word Encirclement symbol in section;Otherwise, judge whether encirclement symbol is that the encirclement that field starts accords with, if being then set to field end of identification True, continue to parse;If scanning is false to symbol, field end of identification is surrounded forward, encirclement symbol is not that field starts When surrounding symbol, line number, offset and the type of error of error of dishing out;
If b) field seperator is arrived in scanning forward, and field end of identification is true, then encirclement symbol accords with for the encirclement of field, should Separator is the separator of interfield, and the field is parsed, and field end of identification is reset into false;Otherwise, the separator For separator in field, then judge whether encirclement symbol is that the encirclement that a field starts accords with, if then continuing to parse, is otherwise thrown Line number, offset and the type of error of error;
If c) line Separator is arrived in scanning forward, and field end of identification is true, then parses the line Separator in field;It is no Then, this record does not meet text formatting specification, line number, offset and the type of error of error of dishing out.If scan line forward Separator, other characters surrounded outside symbol, interfield separator, then parse the character, continue to parse;
33) if line Separator is read, and field end of identification is true, then parses the line Separator in field;Otherwise, throw Line number, offset and the type of error of error;
If interfield separator 34) is read, and field end of identification is true, then the interfield separator is point in field Every symbol;Otherwise, the separator is the separator of interfield, parses a field of record;
If 35) read customized line Separator, other characters that field is surrounded outside symbol and interfield separator, the word Accord with the part for field contents.
4. the method as described in claims 1 to 3 is any, it is characterised in that the line Separator is character, byte, character string Or byte arrays.
5. the method as described in claims 1 to 3 is any, it is characterised in that the field surrounds symbol and surrounds character, word for field Section, character string or byte arrays.
6. the method as described in claims 1 to 3 is any, it is characterised in that the interfield separator is character, byte, word Symbol string or byte arrays.
7. a kind of analytical tool towards generic text form, it is characterised in that including document parser, row resolver, record Resolver, field parser and exception handler;Wherein,
Document parser, for according to the character encoding format specified, the file or data flow in reading specified file path;
Row resolver, the character read for Study document resolver, file is split according to customized line Separator, parsed The each row of data of file;
Resolver is recorded, the character in each row of data being partitioned into for analyzing row resolver one by one, according to customized field Symbol and the record end set mark are surrounded, parses all record datas that every row includes;
Field parser, for the every record parsed for record resolver, the character in every record, root are scanned one by one According to customized interfield separator and the field end of identification set, field all in every record is parsed;
Exception handler, for doing abnormality processing to produced problem during document analysis, error message is recorded in detail.
8. analytical tool as claimed in claim 7, it is characterised in that the line Separator is character, byte, character string or word Joint number group.
9. analytical tool as claimed in claim 7, it is characterised in that the field surround symbol for field surround character, byte, Character string or byte arrays.
10. analytical tool as claimed in claim 7, it is characterised in that the interfield separator is character, byte, character String or byte arrays.
CN201710372929.0A 2017-05-24 2017-05-24 A kind of analytic method and tool towards generic text format Active CN107341135B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710372929.0A CN107341135B (en) 2017-05-24 2017-05-24 A kind of analytic method and tool towards generic text format

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710372929.0A CN107341135B (en) 2017-05-24 2017-05-24 A kind of analytic method and tool towards generic text format

Publications (2)

Publication Number Publication Date
CN107341135A true CN107341135A (en) 2017-11-10
CN107341135B CN107341135B (en) 2019-11-05

Family

ID=60219894

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710372929.0A Active CN107341135B (en) 2017-05-24 2017-05-24 A kind of analytic method and tool towards generic text format

Country Status (1)

Country Link
CN (1) CN107341135B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108319589A (en) * 2018-03-14 2018-07-24 腾讯科技(深圳)有限公司 Parameter string processing method, apparatus, computer readable storage medium and equipment
CN108595453A (en) * 2017-12-20 2018-09-28 中国联合网络通信集团有限公司 URL identity maps acquisition methods and device
CN109033410A (en) * 2018-08-03 2018-12-18 韩雪松 A kind of SQL analytic method based on canonical and character string cutting
CN111143554A (en) * 2019-12-10 2020-05-12 中盈优创资讯科技有限公司 Data sampling method and device based on big data platform
CN111177484A (en) * 2019-12-09 2020-05-19 贵阳语玩科技有限公司 System and method for loading and managing different data sources and format character string resource files
CN111427899A (en) * 2020-03-17 2020-07-17 中国建设银行股份有限公司 Method, device, equipment and computer readable medium for storing file
CN113761283A (en) * 2020-06-01 2021-12-07 中移(苏州)软件技术有限公司 Method, device, equipment and storage medium for reading XML file
CN114422498A (en) * 2021-12-14 2022-04-29 杭州安恒信息技术股份有限公司 Big data real-time processing method and system, computer equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010147394A3 (en) * 2009-06-17 2011-03-31 Kim Hoyon Chinese language and chinese character input system and method
CN102855306A (en) * 2012-08-21 2013-01-02 飞天诚信科技股份有限公司 Method and device for parsing source file
CN103164538A (en) * 2013-04-11 2013-06-19 深圳市华力特电气股份有限公司 Method and device for analyzing data
CN103294652A (en) * 2012-02-27 2013-09-11 腾讯科技(深圳)有限公司 Data conversion method and system
CN104023018A (en) * 2014-06-11 2014-09-03 中国联合网络通信集团有限公司 Text protocol reverse resolution method and system
CN106534267A (en) * 2016-10-19 2017-03-22 中国银行股份有限公司 File uploading and resolving method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010147394A3 (en) * 2009-06-17 2011-03-31 Kim Hoyon Chinese language and chinese character input system and method
CN103294652A (en) * 2012-02-27 2013-09-11 腾讯科技(深圳)有限公司 Data conversion method and system
CN102855306A (en) * 2012-08-21 2013-01-02 飞天诚信科技股份有限公司 Method and device for parsing source file
CN103164538A (en) * 2013-04-11 2013-06-19 深圳市华力特电气股份有限公司 Method and device for analyzing data
CN104023018A (en) * 2014-06-11 2014-09-03 中国联合网络通信集团有限公司 Text protocol reverse resolution method and system
CN106534267A (en) * 2016-10-19 2017-03-22 中国银行股份有限公司 File uploading and resolving method and device

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108595453A (en) * 2017-12-20 2018-09-28 中国联合网络通信集团有限公司 URL identity maps acquisition methods and device
CN108595453B (en) * 2017-12-20 2020-09-01 中国联合网络通信集团有限公司 URL (Uniform resource locator) identifier mapping obtaining method and device
CN108319589B (en) * 2018-03-14 2021-08-10 腾讯科技(深圳)有限公司 Parameter string processing method, device, computer readable storage medium and equipment
CN108319589A (en) * 2018-03-14 2018-07-24 腾讯科技(深圳)有限公司 Parameter string processing method, apparatus, computer readable storage medium and equipment
CN109033410A (en) * 2018-08-03 2018-12-18 韩雪松 A kind of SQL analytic method based on canonical and character string cutting
CN109033410B (en) * 2018-08-03 2021-10-29 韩雪松 SQL (structured query language) analysis method based on regular and character string cutting
CN111177484A (en) * 2019-12-09 2020-05-19 贵阳语玩科技有限公司 System and method for loading and managing different data sources and format character string resource files
CN111143554A (en) * 2019-12-10 2020-05-12 中盈优创资讯科技有限公司 Data sampling method and device based on big data platform
CN111143554B (en) * 2019-12-10 2024-03-12 中盈优创资讯科技有限公司 Data sampling method and device based on big data platform
CN111427899A (en) * 2020-03-17 2020-07-17 中国建设银行股份有限公司 Method, device, equipment and computer readable medium for storing file
CN113761283A (en) * 2020-06-01 2021-12-07 中移(苏州)软件技术有限公司 Method, device, equipment and storage medium for reading XML file
CN113761283B (en) * 2020-06-01 2023-09-05 中移(苏州)软件技术有限公司 Method and device for reading XML file, equipment and storage medium
CN114422498A (en) * 2021-12-14 2022-04-29 杭州安恒信息技术股份有限公司 Big data real-time processing method and system, computer equipment and storage medium

Also Published As

Publication number Publication date
CN107341135B (en) 2019-11-05

Similar Documents

Publication Publication Date Title
CN107341135B (en) A kind of analytic method and tool towards generic text format
US5359673A (en) Method and apparatus for converting bitmap image documents to editable coded data using a standard notation to record document recognition ambiguities
CN105653517A (en) Recognition rate determining method and apparatus
CN109976840B (en) Method and system for realizing multi-language automatic adaptation based on foreground and background separation platform
CN108021540A (en) The analytic method and instrument of a kind of generic text form towards Hadoop
CN103902918B (en) Method and device for rapidly extracting text from Word document
US20080179406A1 (en) Method for the dual coding of information on physical media and in a comptuerized format (DOTEM)
US9049400B2 (en) Image processing apparatus, and image processing method and program
CN110795606A (en) Method for generating log analysis rule
CN104079450B (en) Feature mode set creation method and device
CN104035765B (en) A kind of analysis method of embedded system context
TWI557647B (en) Two - dimensional code, generation method and recognition method with two - dimensional software installation information
JP5853531B2 (en) Information processing apparatus and information processing program
CN107949852A (en) Character recognition device, character identifying method and program
CN108021711A (en) A kind of method of information processing
JP2011060268A (en) Image processing apparatus and program
KR101790544B1 (en) Information processing apparatus, information processing method, and storage medium
CN116361586B (en) Method for realizing HTTP protocol request data highlighting in webpage
JP5673277B2 (en) Image processing apparatus and program
KR101165201B1 (en) Conversion server for a contents providing system
JP6260181B2 (en) Information processing apparatus and information processing program
CN109145125A (en) A kind of method and system, the storage medium of dynamic Extracting Information
US8256687B2 (en) Method of coding information in a dual fashion on physical media and in DOTEM computerised form
CN108694229A (en) String data analytical equipment and string data analysis method
JP2014235694A (en) Document processing device, document processing method, and document processing program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant