CN103095644B - A kind of data content analytic method and device - Google Patents

A kind of data content analytic method and device Download PDF

Info

Publication number
CN103095644B
CN103095644B CN201110334808.XA CN201110334808A CN103095644B CN 103095644 B CN103095644 B CN 103095644B CN 201110334808 A CN201110334808 A CN 201110334808A CN 103095644 B CN103095644 B CN 103095644B
Authority
CN
China
Prior art keywords
character
matching value
ascii
value
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110334808.XA
Other languages
Chinese (zh)
Other versions
CN103095644A (en
Inventor
吴博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201110334808.XA priority Critical patent/CN103095644B/en
Publication of CN103095644A publication Critical patent/CN103095644A/en
Application granted granted Critical
Publication of CN103095644B publication Critical patent/CN103095644B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Processing Or Creating Images (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of data content analytic method and device, during in order to resolve at the data content issued server, reduce and resolve time complexity, reduce the parsing time.Wherein, described data content analytic method, comprising: when resolving the data content in data inclusion, travels through each character that described data content comprises successively, determines the ASCII value that each character is corresponding and ascii character; According to the ASCII value determined, ascii character and default coupling array, determine the matching value that each character is corresponding; The matching value corresponding according to each character and ascii character, determine the starting position of a field, and resolve head field; According to the starting position of head field determination binary content after parsing and the size of binary content, and resolve binary content.

Description

A kind of data content analytic method and device
Technical field
The present invention relates to mobile terminal data analytic technique field, particularly relate to a kind of data content analytic method and device.
Background technology
Multipurpose internet mail extensions (MIME, Multipurpose Internet Mail Extensions) be an internet standard, it extends standard email, can support the email message of the multiple format such as non-ascii character, binary format annex.MIME agreement is used widely in mobile Internet, and a lot of application all adopts this agreement to transmit the static resources such as picture, audio frequency, text.Field such as this protocol information head content type (Content-Type), content delivery coding (Content-Transfer-Encoding) and content designator (Content-ID) etc.When transmitting multiple data content, Content-Type can be defined as Content-Type:multipart/mixed usually; Boundary=End (End is the self-defining character string of server, as separator).One is typical as follows based on the data inclusion of MIME agreement:
HTTP/1.1 200 OK
X-Powered-By:Servlet/2.5
Server:Sun Java System Application Server 9.1_02
X-DP-next URI:/content/refresh/
Content-Type:multipart/mixed;boundary=End
Content-Length:7479
Date:Tue,26 May 2009 01:57:34 GMT
Connection:Keep-Alive
--End
Content-Type:image/jpeg
Content-Transfer-Encoding:binary
Content-Id:0526090000018182
Content-Length:1764
* * * * * * * * (binary content) * * * * * * * * * * * * *
--End
Content-Type:image/jpeg
Content-Transfer-Encoding:binary
Content-Id:0526090023018276
Content-Length:1521
* * * * * * (binary content) * * * * * * * * * * * * * * * *
--End--
In the above example, transmit two pictures contents, its ID is respectively 0526090000018182 and 0526090023018276 simultaneously.Receiving terminal, after receiving above-mentioned data inclusion, in order to parse each pictures, needs starting position and the end position of the binary content navigating to every pictures exactly.Be the picture of 0526090000018182 for ID, receiving terminal, when parsing the first pictures, needs to find first "--"+" boundary " (being End in this example) character string, and then finds the header field of this picture.After the position determining " Content-Length:1764/r/n ", just the starting position of the binary content of this picture can be navigated to, and obtain the size of this picture, after the starting position of binary content obtaining this picture and length, corresponding binary content can be read from data inclusion, and then parse the content of this picture.In like manner, when parsing the second pictures, need to find the next one "--"+" boundary " (being End in this example) character string, repeat above-mentioned steps, to connect the content of next pictures.
Seen from the above description, the key factor of picture is correctly parsed when the starting position of accurate location binary content and length, prior art is when locating beginning and the length of binary content, what adopt is keyword search method, the method needs to find a upper character string of the binary content beginning of each picture (in upper example not " Content-Length:1764/r/n "), thus orients the starting position of binary content.For the data inclusion comprising multiple picture, need repeatedly repeatedly search for multiple keyword to locate binary content starting position, judge that binary content terminates or resolves value corresponding to header field.Such as, when parsing the first pictures, need to search the position of the header field such as Content-Type, Content-Transfer-Encoding and Content-ID in data inclusion successively, this means that receiving terminal needs repeatedly ergodic data inclusion, pictorial information successfully could be resolved, thus add the parsing time.
Summary of the invention
The embodiment of the present invention provides a kind of data content analytic method and device, in order to when resolving the data content in data inclusion, reduces the parsing time.
The embodiment of the present invention provides a kind of data content analytic method, comprising:
When resolving the data content in data inclusion, traveling through each character that described data content comprises successively, determining the ASCII value that each character is corresponding and ascii character;
According to the ASCII value determined, ascii character and default coupling array, determine the matching value that each character is corresponding;
The matching value corresponding according to each character and ascii character, determine the starting position of a field, and resolve head field;
According to the starting position of head field determination binary content after parsing and the size of binary content, and resolve binary content.
The embodiment of the present invention provides a kind of data content resolver, comprising:
First determining unit, for when resolving the data content in data inclusion, traveling through each character that described data content comprises successively, determining the ASCII value that each character is corresponding and ascii character;
Second determining unit, for according to ASCII value, ascii character and the default coupling array determined, determines the matching value that each character is corresponding;
First resolution unit, for according to matching value corresponding to each character and ascii character, determines the starting position of a field, and resolves head field;
Second resolution unit, for according to the starting position of head field determination binary content after resolving and the size of binary content, and resolves binary content.
The data content analytic method that the embodiment of the present invention provides and device, when resolving the content in data inclusion, each character successively in ergodic data inclusion, and determine the ASCII value that each character is corresponding in ASCII character table and ascii character, according to the coupling array preset, the ASCII value corresponding to each character is mated, obtain the matching value that this character is corresponding, and according to ascii character corresponding to this character and matching value after determining to lift one's head the starting position of field, resolve head field, according to the head field after parsing, just starting position and its size of binary content can be determined, thus can resolve binary content.Due to when resolving the character in data inclusion, only from first to last once need travel through the character comprised in data inclusion, thus reduce the time complexity of data content parsing, decrease the parsing time.
Other features and advantages of the present invention will be set forth in the following description, and, partly become apparent from specification, or understand by implementing the present invention.Object of the present invention and other advantages realize by structure specifically noted in write specification, claims and accompanying drawing and obtain.
Accompanying drawing explanation
Fig. 1 is in the embodiment of the present invention, the implementing procedure schematic diagram of data content analytic method;
Fig. 2 is in the embodiment of the present invention, determines the implementing procedure schematic diagram of the matching value that arbitrary character is corresponding;
Fig. 3 is in the embodiment of the present invention, to the process of analysis schematic diagram of data inclusion comprising an image content;
Fig. 4 is in the embodiment of the present invention, head field process of analysis schematic diagram;
Fig. 5 is in the embodiment of the present invention, the structural representation of data content resolver.
Embodiment
During in order to resolve at the data content issued server, reduce and resolving time complexity, reduce the parsing time, embodiments provide a kind of data content analytic method and device.
Below in conjunction with Figure of description, the preferred embodiments of the present invention are described, be to be understood that, preferred embodiment described herein is only for instruction and explanation of the present invention, be not intended to limit the present invention, and when not conflicting, the embodiment in the present invention and the feature in embodiment can combine mutually.
By known to carrying out analysis to the resolving of data content in prior art, the key element of resolution data content during the length of the accurately starting position of location binary content and binary content.Carefully analyze MIME agreement, can find to adopt the data content of MIME protocol transmission to be ASCII character.From the angle of Context resolution, the content represented by these ASCII character can be divided into five classes: 1, NULL; 2, null character (NUL); 3, colon (i.e. ": "); 4, new line symbol (i.e. "/r " or "/n "), 5, expression content character (all ascii characters namely in ASCII character except above-mentioned 4 class symbols).Therefore, as long as ASCII character is correctly mated for above 5 classes, just can correctly resolve data content.Such as, data content starts with "--"+" boundary "+"/r/n ", then judge whether character string corresponding to current location side-play amount is "--"+" boundary ", if, then continue to travel through backward, find that "/r/n " character belongs to new line, then can determine that ensuing content is the head field of data content, after correct field processes, then continue process binary content backward.
Based on above-mentioned analysis, in order to mate ascii character, the embodiment of the present invention provides a kind of method for building up mating array: the matching value that the NULL character in definition ASCII character table is corresponding is the first matching value, the matching value defining null character (NUL) corresponding is the second matching value, the matching value that definition colon is corresponding is the 3rd matching value, the matching value that definition new line symbol is corresponding is the 4th matching value, and the matching value that the ascii character beyond definite division NULL, null character (NUL), colon and new line symbol is corresponding is the 5th matching value.
For the ease of understanding, with matching value corresponding to NULL character for 1, matching value corresponding to null character (NUL) is 2, the matching value that colon is corresponding is 4, the matching value that new line symbol is corresponding is 8, and the matching value that other ascii character is corresponding is 0 is example, and the matching value that ascii character is corresponding is as shown in table 1:
Table 1
Ascii character Matching value
NULL character 1
Null character (NUL) 2
Colon 4
New line accords with 8
Other character 0
According to above-mentioned definition, can be expressed as follows according to the coupling array A that ASCII character table is set up: A={1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 8, 2, 2, 8, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, .
Based on coupling array defined above, embodiments provide a kind of data content analytic method, from the data content that position offset is zero, the character that comprises of ergodic data content successively, until traverse the end separator ("--"+" boundary "+"--") of representative end, whole resolving terminates.
As shown in Figure 1, be the implementing procedure schematic diagram of the data content analytic method that the embodiment of the present invention provides, comprise the following steps:
S101, when resolving the data content in data inclusion, traveling through each character that described data content comprises successively, determining the ASCII value that each character is corresponding and ascii character;
ASCII value, ascii character and default coupling array that S102, basis are determined, determine the matching value that each character is corresponding;
S103, according to matching value corresponding to each character and ascii character, determine the starting position of a field, and resolve head field;
S104, according to the starting position of head field determination binary content after resolving and the size of binary content, and resolve binary content.
Wherein, as shown in Figure 2, in step 102, can determine according to following process the matching value that arbitrary character is corresponding:
S1021, for each character, judge whether the ASCII value that this character is corresponding exceedes preset value, if so, perform step S1022, if not, perform step S1023.
Concrete, the ASCII value (decimal system) corresponding due to ASC character is no more than 127, and therefore, in the embodiment of the present invention, preset value can be set to 127.
S1022, determine that the matching value that this character is corresponding is the 5th matching value;
S1023, by ascii character corresponding for this character matching value corresponding in the coupling array preset, be defined as the matching value that this character is corresponding.
For ascii character corresponding to a certain character for ": ", decimal system ASCII value corresponding to ": " is 58, because 58 are less than 127, therefore, according to the coupling array set up in the embodiment of the present invention, determines that matching value corresponding to ": " is the 3rd matching value.
In concrete enforcement, the starting position of a field can be determined according to following process:
For the data content that current location side-play amount is corresponding, determine that this data content is default beginning separator, the data content that described position offset is corresponding be by new line symbol segmentation obtain arbitrary line character; And
The matching value determining to start character after separator corresponding is the 4th matching value;
This character string is defined as the starting position of a field.
For the ease of understanding the present invention, below to comprise an image content in data inclusion, the embodiment of the present invention being described, supposing that the content in the data inclusion that server issues is as follows:
HTTP/1.1 200 OK
X-Powered-By:Servlet/2.5
Server:Sun Java System Application Server 9.1_02
X-DP-next URI:/content/refresh/
Content-Type:multipart/mixed;boundary=End
Content-Length:7479
Date:Tue,26 May 2009 01:57:34 GMT
Connection:Keep-Alive
--END
Content-Type:image/jpeg
Content-Transfer-Encoding:binary
Content-Id:0526090000018182
Content-Length:1764
* * * * * * * * (binary content) * * * * * * * * * * * * *
--END--
In the embodiment of the present invention, the beginning separator of the starting position of image content is "--END ", and the end separator of end position is "--END--".
As shown in Figure 3, in the embodiment of the present invention, to the schematic flow sheet that the image content in the data inclusion received is resolved, comprise the following steps:
S301, judge whether current location side-play amount is greater than 0, if so, perform step S302, if not, perform step S304;
S302, judge content whether belong to new line symbol, if so, perform step S303, if not, perform step S304;
From the content of above-mentioned data inclusion, before image content starts separator, be new line symbol ("/r/n "), therefore, before the beginning separator of location image content, need to get rid of the new line symbol started before separator.
S303, position offset add 1;
From the content of above-mentioned data inclusion, before image content starts separator, for new line symbol ("/r/n "), according to the coupling array preset, determine whether current character is new line symbol, if, then position offset is added 1, that is the character that the data content of next line comprises is traveled through, to filter out the new line symbol started before separator, the accurately position of the beginning separator "--END " of location image content.
S304, judge that whether data content corresponding to current location side-play amount be "--END " beginning, if so, perform step S305, if not, perform step S306;
Concrete, travel through each character that data content that current location side-play amount points to comprises successively, and determine that whether the character string of the ascii character composition that each character is corresponding is start separator "--END ".
S305, judge in the data content that current location side-play amount is corresponding, whether data content is below "--", if so, performs step S314, if not, performs step S306;
Concrete, continue the character that the data content after traversal "--EN D " comprises, whether the character string judging the ascii character composition of its correspondence is "--", with determine current matching to ascii string be not the character string "--END--" representing end position.
S306, judge that whether this data content is new line symbol, if so, perform step S308, if not, perform step 307;
After the starting position separator determining image content, continue traversal data content below, and determine whether "/r/n ", if so, then illustrate that new line symbol content is below the starting position of a field.
S307, next data content of continuation traversal, and perform step S306;
S308, position offset add 1;
The head field of S309, parsing image content;
Concrete, head field process of analysis also adopts the mode of traversal, and the position offset that from the beginning field starts starts, and determines a certain wardrobe field contents and head field value according to ": " and new line symbol.After certain a line is resolved, continue to travel through parsing downwards.Wherein, the character string before ": " is head field name, and the character string between ": " and new line accord with is head field value.
S310, judge head field resolve whether terminate, if so, perform step S311, if not, perform step S309;
Concrete, when finding that this row does not have an analysable content, then determine that a field terminates.
S311, determine the starting position of binary content and the size of binary content;
During concrete enforcement, binary content starts with "--", whether the beginning character string namely judging the next line data content of head field end position is "--", if, then being defined as the starting position of binary content, by resolving head field, the size of binary content can being determined, such as, be 1764 in this example.
S312, parsing binary content;
S313, judge whether binary content resolves complete, if so, perform step S314, if not, perform step S312;
Concrete, by determining that character string is--END--separator is determined to resolve end, it should be noted that, if comprise multiple image content in data inclusion, then by determining that character is that "--END " separator is determined to resolve an image content to terminate, namely "--END " is the end decollator of a upper image content, is also the beginning separator of next image content simultaneously.
S314, to be parsed.
In said process, by step S301 ~ step S303, filter out the new line symbol started before separator, accurately to locate the position starting separator; By step SS304 ~ step S305, judge whether the end position traversing data content; The new line symbol after "--END " separator is filtered out, with the starting position of positioning head field by step S306 ~ step S307; By step S308 ~ step S310, resolve head field contents, and locate the size of binary content starting position and binary content; Step S311 ~ step S313 resolves binary content.
In concrete enforcement, can according to following process analysis head field:
For each wardrobe field data that head field comprises, travel through each character that this wardrobe field data comprises successively, determine that corresponding matching value is the character of the 3rd matching value and the 4th matching value respectively;
The character string that ascii character corresponding for character before character corresponding for 3rd matching value forms is defined as the head field name of this wardrobe field data;
The character string of ascii character corresponding for the character between character corresponding for the 3rd matching value and character corresponding to the 4th matching value composition is defined as the head field value of this wardrobe field data.
As shown in Figure 4, be head field process of analysis schematic diagram, comprise the following steps:
S401, for head field data corresponding to current location side-play amount, travel through each character that this field data comprises successively, and judge whether this character is ": ", if so, perform step S403, otherwise perform step S402;
Concrete, by determining the ascii character that this character is corresponding, determine whether character is ": "
S402, next character of continuation traversal, and perform step S401;
The name of S403, recording head field;
S404, judge whether character is space character, if so, perform step S405, if not, perform step S406;
S405, next character of continuation traversal, and perform step S404;
The starting position of S406, mark head field value;
S407, judge that whether current character is new line symbol, if so, perform step S409, if not, perform step S408;
S408, next character of continuation traversal, and perform step S407;
The end position of S409, mark head field value, and obtain head field value.
In said process, by step S401 ~ step S403, the position of location ": ", and obtain field name to the end; Step S404 ~ step S406, gets rid of the space character before head field respective value; Step S407 ~ step S409, the starting position of positioning head field value and end position, and obtain head field value.
Based on same inventive concept, a kind of data content resolver is additionally provided in the embodiment of the present invention, the principle of dealing with problems due to this data content resolver is similar to above-mentioned data content analytic method, therefore the enforcement of this data content resolver see the enforcement of above-mentioned data content analytic method, can repeat part and repeats no more.
As shown in Figure 5, be the structural representation of the data content resolver that the embodiment of the present invention provides, comprise:
First determining unit 501, for when resolving the data content in data inclusion, traveling through each character that described data content comprises successively, determining the ASCII value that each character is corresponding and ascii character;
Second determining unit 502, ASCII value, ascii character and the default coupling array determined for basis, determine the matching value that each character is corresponding;
First resolution unit 503, for according to matching value corresponding to each character and ascii character, determines the starting position of a field, and resolves head field;
Second resolution unit 504, for according to the starting position of head field determination binary content after resolving and the size of binary content, and resolves binary content.
In concrete enforcement, data content resolver, can also comprise:
Coupling array sets up unit, for setting up coupling array as follows: the matching value that the NULL character in definition ASCII character table is corresponding is the first matching value, the matching value defining null character (NUL) corresponding is the second matching value, the matching value that definition colon is corresponding is the 3rd matching value, the matching value that definition new line symbol is corresponding is the 4th matching value, and the matching value that the ascii character beyond definite division NULL, null character (NUL), colon and new line symbol is corresponding is the 5th matching value.
Individual in concrete enforcement, the second determining unit 502, can comprise:
Judge module, for for each character, judges whether the ASCII value that this character is corresponding exceedes preset value;
First determination module, for when the judged result of described judge module is for being, determines that the matching value that this character is corresponding is the 5th matching value;
Second determination module, for when the judged result of described judge module is no, the matching value corresponding in the coupling array preset by ascii character corresponding for this character, is defined as the matching value that this character is corresponding.
In concrete enforcement, the first resolution unit 503, can comprise:
Separator determination module, for for data content corresponding to current location side-play amount, determines that this data content is default beginning separator, and the data content that described position offset is corresponding is split by new line symbol the arbitrary line character obtained;
3rd determination module is the 4th matching value for the matching value determining to start character after separator corresponding;
Starting position determination module, for being defined as the starting position of a field by this character string.
In concrete enforcement, the first resolution unit 503, can comprise:
Character determination module, for the every wardrobe field data comprised for head field, travels through each character that this wardrobe field data comprises successively, determines that corresponding matching value is the character of the 3rd matching value and the 4th matching value respectively;
Head field name determination module, the character string for being formed by ascii character corresponding for the character before character corresponding for the 3rd matching value is defined as the head field name of this wardrobe field data;
Head field value determination module, for being defined as the head field value of this wardrobe field data by the character string of ascii character composition corresponding for the character between character corresponding for the 3rd matching value and character corresponding to the 4th matching value.
Those skilled in the art should understand, embodiments of the invention can be provided as method, system or computer program.Therefore, the present invention can adopt the form of complete hardware embodiment, completely software implementation or the embodiment in conjunction with software and hardware aspect.And the present invention can adopt in one or more form wherein including the upper computer program implemented of computer-usable storage medium (including but not limited to magnetic disc store, CD-ROM, optical memory etc.) of computer usable program code.
The present invention describes with reference to according to the flow chart of the method for the embodiment of the present invention, equipment (system) and computer program and/or block diagram.Should understand can by the combination of the flow process in each flow process in computer program instructions realization flow figure and/or block diagram and/or square frame and flow chart and/or block diagram and/or square frame.These computer program instructions can being provided to the processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing device to produce a machine, making the instruction performed by the processor of computer or other programmable data processing device produce device for realizing the function of specifying in flow chart flow process or multiple flow process and/or block diagram square frame or multiple square frame.
These computer program instructions also can be stored in can in the computer-readable memory that works in a specific way of vectoring computer or other programmable data processing device, the instruction making to be stored in this computer-readable memory produces the manufacture comprising command device, and this command device realizes the function of specifying in flow chart flow process or multiple flow process and/or block diagram square frame or multiple square frame.
These computer program instructions also can be loaded in computer or other programmable data processing device, make on computer or other programmable devices, to perform sequence of operations step to produce computer implemented process, thus the instruction performed on computer or other programmable devices is provided for the step realizing the function of specifying in flow chart flow process or multiple flow process and/or block diagram square frame or multiple square frame.
Although describe the preferred embodiments of the present invention, those skilled in the art once obtain the basic creative concept of cicada, then can make other change and amendment to these embodiments.So claims are intended to be interpreted as comprising preferred embodiment and falling into all changes and the amendment of the scope of the invention.
The data content analytic method that the embodiment of the present invention provides and device, when resolving the data content in data inclusion, each character of comprising of ergodic data content successively, and determine the ASCII value that each character is corresponding in ASCII character table and ascii character, according to the coupling array preset, the ASCII value corresponding to each character is mated, obtain the matching value that this character is corresponding, and according to ascii character corresponding to this character and matching value after determining to lift one's head the starting position of field, resolve head field, according to the head field after parsing, just starting position and its size of binary content can be determined, thus can resolve binary content.Due to when resolving the data content in data inclusion, only from first to last once need travel through the character that data content comprises, thus reduce the time complexity of data content parsing, decrease the parsing time.
Obviously, those skilled in the art can carry out various change and modification to the present invention and not depart from the spirit and scope of the present invention.Like this, if these amendments of the present invention and modification belong within the scope of the claims in the present invention and equivalent technologies thereof, then the present invention is also intended to comprise these change and modification.

Claims (8)

1. a data content analytic method, is characterized in that, comprising:
When resolving the data content in data inclusion, traveling through each character that described data content comprises successively, determining the ASCII value that each character is corresponding and ascii character;
According to the ASCII value determined, ascii character and default coupling array, determine the matching value that each character is corresponding, wherein, the matching value that NULL character in pre-defined ASCII character table is corresponding is the first matching value, the matching value defining null character (NUL) corresponding is the second matching value, the matching value that definition colon is corresponding is the 3rd matching value, the matching value that definition new line symbol is corresponding is the 4th matching value, and the matching value that the ascii character beyond definite division NULL, null character (NUL), colon and new line symbol is corresponding is the 5th matching value;
The matching value corresponding according to each character and ascii character, determine the starting position of a field, and resolve head field;
According to the starting position of head field determination binary content after parsing and the size of binary content, and resolve binary content.
2. the method for claim 1, is characterized in that, according to the ASCII value determined, ascii character and default coupling array, determines specifically to comprise the matching value that each character is corresponding:
For each character, judge whether the ASCII value that this character is corresponding exceedes preset value;
When judged result is for being, determine that the matching value that this character is corresponding is the 5th matching value;
When judged result is no, the matching value corresponding in the coupling array preset by ascii character corresponding for this character, is defined as the matching value that this character is corresponding.
3. the method for claim 1, is characterized in that, the matching value corresponding according to each character and ascii character, determines the starting position of a field, specifically comprises:
For the data content that current location side-play amount is corresponding, determine that this data content is default beginning separator, the data content that described position offset is corresponding be by new line symbol segmentation obtain arbitrary line character; And
Determine that the matching value that character after described beginning separator is corresponding is the 4th matching value;
It is the starting position that character after the character of the 4th matching value is defined as a field by matching value.
4. the method as described in claim 1,2 or 3, is characterized in that, resolves head field, specifically comprises:
For each wardrobe field data that head field comprises, travel through each character that this wardrobe field data comprises successively, determine that corresponding matching value is the character of the 3rd matching value and the 4th matching value respectively;
The character string that ascii character corresponding for character before character corresponding for 3rd matching value forms is defined as the head field name of this wardrobe field data;
The character string of ascii character corresponding for the character between character corresponding for the 3rd matching value and character corresponding to the 4th matching value composition is defined as the head field value of this wardrobe field data.
5. a data content resolver, is characterized in that, comprising:
Coupling array sets up unit, for setting up coupling array as follows: the matching value that the NULL character in definition ASCII character table is corresponding is the first matching value, the matching value defining null character (NUL) corresponding is the second matching value, the matching value that definition colon is corresponding is the 3rd matching value, the matching value that definition new line symbol is corresponding is the 4th matching value, and the matching value that the ascii character beyond definite division NULL, null character (NUL), colon and new line symbol is corresponding is the 5th matching value;
First determining unit, for when resolving the data content in data inclusion, traveling through each character that described data content comprises successively, determining the ASCII value that each character is corresponding and ascii character;
Second determining unit, for according to ASCII value, ascii character and the default coupling array determined, determines the matching value that each character is corresponding;
First resolution unit, for according to matching value corresponding to each character and ascii character, determines the starting position of a field, and resolves head field;
Second resolution unit, for according to the starting position of head field determination binary content after resolving and the size of binary content, and resolves binary content.
6. device as claimed in claim 5, it is characterized in that, described second determining unit, comprising:
Judge module, for for each character, judges whether the ASCII value that this character is corresponding exceedes preset value;
First determination module, for when the judged result of described judge module is for being, determines that the matching value that this character is corresponding is the 5th matching value;
Second determination module, for when the judged result of described judge module is no, the matching value corresponding in the coupling array preset by ascii character corresponding for this character, is defined as the matching value that this character is corresponding.
7. device as claimed in claim 5, it is characterized in that, described first resolution unit, comprising:
Separator determination module, for for data content corresponding to current location side-play amount, determines that this data content is default beginning separator, and the data content that described position offset is corresponding is split by new line symbol the arbitrary line character obtained;
3rd determination module, for determining that the matching value that character after described beginning separator is corresponding is the 4th matching value;
Starting position determination module, for by matching value be the 4th matching value character after character be defined as the starting position of a field.
8. the device as described in claim 5,6 or 7, is characterized in that, described first resolution unit, comprising:
Character determination module, for each the wardrobe field data comprised for head field, travels through each character that this wardrobe field data comprises successively, determines that corresponding matching value is the character of the 3rd matching value and the 4th matching value respectively;
Head field name determination module, the character string for being formed by ascii character corresponding for the character before character corresponding for the 3rd matching value is defined as the head field name of this wardrobe field data;
Head field value determination module, for being defined as the head field value of this wardrobe field data by the character string of ascii character composition corresponding for the character between character corresponding for the 3rd matching value and character corresponding to the 4th matching value.
CN201110334808.XA 2011-10-28 2011-10-28 A kind of data content analytic method and device Active CN103095644B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110334808.XA CN103095644B (en) 2011-10-28 2011-10-28 A kind of data content analytic method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110334808.XA CN103095644B (en) 2011-10-28 2011-10-28 A kind of data content analytic method and device

Publications (2)

Publication Number Publication Date
CN103095644A CN103095644A (en) 2013-05-08
CN103095644B true CN103095644B (en) 2015-10-07

Family

ID=48207788

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110334808.XA Active CN103095644B (en) 2011-10-28 2011-10-28 A kind of data content analytic method and device

Country Status (1)

Country Link
CN (1) CN103095644B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104767710B (en) * 2014-01-02 2018-08-07 中国科学院声学研究所 The transmission payload extracting method of HTTP block transmissions coding based on DFA
CN104572898B (en) * 2014-12-22 2017-09-22 上海找钢网信息科技股份有限公司 The data analysis method and system of a kind of steel trade industry stock resource
CN108021540B (en) * 2017-11-09 2023-05-02 中国科学院信息工程研究所 Hadoop-oriented general text format analysis method and tool
CN108055266A (en) * 2017-12-15 2018-05-18 南京邮电大学盐城大数据研究院有限公司 A kind of method and system of 8583 data message of parsing based on position offset

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1852320A (en) * 2006-01-26 2006-10-25 华为技术有限公司 Signaling message detecting method and system based on text coding
CN101179769A (en) * 2007-12-04 2008-05-14 南京吉美思系统集成有限公司 LBS position service based community rectification work management method
CN101227435A (en) * 2008-01-28 2008-07-23 浙江大学 Method for filtering Chinese junk mail based on Logistic regression
CN101345720A (en) * 2008-08-15 2009-01-14 浙江大学 Junk mail classification method based on partial match estimation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0228972D0 (en) * 2002-12-11 2003-01-15 Nokia Corp Downloading software applications

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1852320A (en) * 2006-01-26 2006-10-25 华为技术有限公司 Signaling message detecting method and system based on text coding
CN101179769A (en) * 2007-12-04 2008-05-14 南京吉美思系统集成有限公司 LBS position service based community rectification work management method
CN101227435A (en) * 2008-01-28 2008-07-23 浙江大学 Method for filtering Chinese junk mail based on Logistic regression
CN101345720A (en) * 2008-08-15 2009-01-14 浙江大学 Junk mail classification method based on partial match estimation

Also Published As

Publication number Publication date
CN103095644A (en) 2013-05-08

Similar Documents

Publication Publication Date Title
CN103095644B (en) A kind of data content analytic method and device
CN104866542A (en) POI data verification method and device
CN104580454A (en) Data synchronizing method, device and system
CN114205665B (en) Information processing method, device, electronic equipment and storage medium
CN104077294A (en) Information recommendation method, information recommendation device and information resource recommendation system
CN110390082B (en) Communication matrix comparison method and system
US8880108B2 (en) Short message processing method and apparatus
CN104105007A (en) Video loading method of mobile terminal, devices and system
CN103399965A (en) Reading content recommending method, reading content recommending system and server
CN104079623A (en) Method and system for controlling multilevel cloud storage synchrony
CN106161656B (en) Interface jumping method and device
CN114780519A (en) DBC file generation method, device, equipment and medium based on CAN communication
CN113204555B (en) Data table processing method, device, electronic equipment and storage medium
CN104052774A (en) Data transmission method and system
CN104252541A (en) Webpage information push method, data server and terminal
CN111949746A (en) Data processing method and device, electronic equipment and computer readable medium
CN111367689A (en) Interactive prompt information sending method and device of online document and electronic equipment
CN104079368B (en) A kind of the test data transmission method and server of application software
CN105488199A (en) Mixed form processing method, device and mobile terminal
CN104102728A (en) News list display method and device
CN114239501A (en) Contract generation method, apparatus, device and medium
CN111641690A (en) Session message processing method and device and electronic equipment
CN112417276A (en) Paging data acquisition method and device, electronic equipment and computer readable storage medium
CN112256700A (en) Data storage method and device, electronic equipment and computer readable storage medium
CN105187633A (en) Mobile phone number display method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant