CN104994128B - A kind of identification of data encoding type and code-transferring method and device - Google Patents

A kind of identification of data encoding type and code-transferring method and device Download PDF

Info

Publication number
CN104994128B
CN104994128B CN201510249023.0A CN201510249023A CN104994128B CN 104994128 B CN104994128 B CN 104994128B CN 201510249023 A CN201510249023 A CN 201510249023A CN 104994128 B CN104994128 B CN 104994128B
Authority
CN
China
Prior art keywords
coding
data
type
character
section
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510249023.0A
Other languages
Chinese (zh)
Other versions
CN104994128A (en
Inventor
王照旗
刘岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING NETENTSEC Inc
Original Assignee
BEIJING NETENTSEC Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING NETENTSEC Inc filed Critical BEIJING NETENTSEC Inc
Priority to CN201510249023.0A priority Critical patent/CN104994128B/en
Publication of CN104994128A publication Critical patent/CN104994128A/en
Application granted granted Critical
Publication of CN104994128B publication Critical patent/CN104994128B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/30Network architectures or network communication protocols for network security for supporting lawful interception, monitoring or retaining of communications or communication related information
    • H04L63/306Network architectures or network communication protocols for network security for supporting lawful interception, monitoring or retaining of communications or communication related information intercepting packet switched data communications, e.g. Web, Internet or IMS communications

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Technology Law (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The present invention provides a kind of identification of data encoding type and code-transferring methods, comprising: extracts the critical data in the network message that user's operation generates, is decoded to the critical data;Determine the corresponding type of coding of the decoded data of the critical data;According to the type of coding, transcoding is carried out to the decoded data of the critical data.The present invention also provides a kind of identification of data encoding type and transcoding devices.

Description

A kind of identification of data encoding type and code-transferring method and device
Technical field
The present invention relates to network security technology more particularly to a kind of data encoding types of unified finger URL (URL) data Identification and code-transferring method and device.
Background technique
With the fast development of network technology, more and more users use the equipment such as mobile phone, computer and tablet computer Online, user can be browsed under normal circumstances by browser (such as IE browser, Firefo browser and Chrome browser) Webpage submits data, or is submitted using network application software (Taobao's software, Jingdone district net software and Dangdang.com's software) Data.In network security and network log-in management field, in order to quickly prevent network crime behavior, it is often necessary to obtain and divide User is analysed by network data caused by browser and application software, and most of network data usually by UTF8 and It is encoded again after GB18030 coding by URLENCODE, wherein GB18030 coding is again comprising GBK coding and GB2312 coding; Therefore, it when restoring user data, needs to carry out URLDECODE decoding to network data, and decoded user data is usual It is differed for UTF8 or GB18030 coding, so, the type of coding of user data how effectively and is accurately identified, by number of users It is current urgent problem to be solved according to showing.
Existing network data code identification scheme is limited primarily to following several:
1) in user's submission form or downloading data, charset printed words can be had in data message, extract charset Corresponding type of coding can carry out encoding and decoding to data message can if not extracting the type of coding of charset printed words Encoding and decoding are carried out to data message using preset type of coding.But for the datagram of no charset printed words Text can directly contribute data messy code in the case where the type of coding of Non-precondition or preset type of coding mistake;And And the above method needs regularly to update and safeguard default ground type of coding, maintenance cost is high, and accuracy rate is lower.
2) by the reference encoder array of webpage to be encoded and local preset alternative coding array, determine include In one of the reference encoder array and the alternative coding array type of coding of the type of coding as webpage to be encoded;But this The method that kind obtains data encoding type is larger to the dependence of reference encoder array and alternative coding array, if data message The type of coding for not meeting the reference encoder or alternatively encoding, it will cause data messy codes;And which depends on browser, needs User is wanted to remove selection " detecting literal code automatically " option, user's perception is stronger, is unable to reach the automatic detection text of unaware Coding, and continuous update and maintenance reference encoder array and alternative coding array are needed, cost is larger.
3) it by needing decoded URL character string number to be decoded by different coding modes input, obtains different URL character string, then these different URL character strings are encoded by its corresponding decoding process, by URL different after coding Character string and input need the URL character string after decoded coding to compare, if one of URL character string different after coding URL character string after needing decoded coding with input is identical, then inputs the coding of the URL character string after needing decoded coding Type is the former type of coding.But using the type of coding identifying schemes, if input needs decoded URL character string Meet UTF8 coding and GB18030 encodes superimposed coding section, or meet the coding range of a variety of type of codings simultaneously, according to this The url data is decoded by a variety of decoding processes in scheme, then is encoded by a variety of coding modes, then will appear multipair original URL Character string will be incapable of recognizing that correct data encoding when this occurs with the data unanimous circumstances after encoding again Type.
Therefore, current type of coding identifying schemes all have apparent defect: accuracy rate is low, and maintenance cost is high, is easy Cause data messy code.
Summary of the invention
In view of this, can be improved an embodiment of the present invention is intended to provide a kind of identification of data encoding type and code-transferring method The accuracy rate of data encoding identification, reduces messy code, improves data encoding type identification and transcoding efficiency, reduces maintenance cost.
In order to achieve the above objectives, the technical scheme of the present invention is realized as follows:
The embodiment of the invention provides a kind of identification of data encoding type and code-transferring methods, which comprises
The critical data in the network message that user's operation generates is extracted, the critical data is decoded;
Determine the corresponding type of coding of the decoded data of the critical data;
According to the type of coding, transcoding is carried out to the decoded data of the critical data.
In above scheme, the critical data extracted in the network message that user's operation generates includes: according to keyword Or regular expression, extract the critical data in the network message.
In above scheme, the method also includes:
Different type of codings is divided into multiple coding sections, and determines the priority relationship in each coding section.
In above scheme, the corresponding type of coding of the decoded data of the determination critical data includes:
Load the configuration information in each coding section;
The decoded data are looped through, the character number for meeting each coding section is calculated;
According to the priority of the character number and each coding section that meet each coding section in the decoded data Relationship carries out type of coding judgement, determines type of coding corresponding to the decoded data;
Discharge the configuration information in each coding section.
It is described to loop through the decoded data in above scheme, calculate the character number for meeting each coding section Include:
According to first pre-set priority in each coding section, successively judge whether the character in the decoded data is full Each coding section of foot;Count the number of characters for meeting each coding section in the decoded data.
It is described according to the character number and each coding that meet each coding section in decoded data in above scheme The priority relationship in section carries out type of coding judgement, determines that type of coding corresponding to the decoded data includes:
According to second pre-set priority in each coding section, successively judge to meet each coding section in decoded data Relationship between total length after number of characters and the decoded data deduction null character and 0 character, according to decoded number Between total length after deducting null character and 0 character according to the middle number of characters for meeting each coding section and the decoded data Relationship determines type of coding corresponding to the critical data.
It include: key the embodiment of the invention also provides a kind of identification of data encoding type and transcoding device, described device Data extracting unit, decoding unit, type of coding recognition unit, transcoded data unit, wherein
The critical data extraction unit, the critical data in network message for extracting user's operation generation, and will The critical data of extraction is sent to decoding unit;
The decoding unit, for being decoded to the critical data, and decoded data are sent to type of coding Recognition unit;
The type of coding recognition unit, for determining the corresponding type of coding of the decoded data of the critical data, And transcoded data unit is sent by determining type of coding;
The transcoded data unit, for being carried out to the decoded data of the critical data according to the type of coding Transcoding.
In above scheme, the critical data extraction unit is specifically used for: according to keyword or regular expression, extracting institute State the critical data in network message.
In above scheme, described device further includes coding section division unit, for different type of codings to be divided into Multiple coding sections, and determine the priority relationship in each coding section.
In above scheme, the type of coding recognition unit include configuration subelement, statistics subelement, decision subelement, Cancel subelement, wherein
The configuration subelement, for loading the configuration information in each coding section;
The statistics subelement calculates the character for meeting each coding section for looping through the decoded data Number;
The decision subelement, for according to meet in the decoded data it is each coding section character number, with And the priority relationship in each coding section carries out type of coding judgement, determines coding class corresponding to the decoded data Type;
The revocation subelement, for discharging the configuration information in each coding section.
In above scheme, the statistics subelement is specifically used for: according to first pre-set priority in each coding section, successively Judge whether the character in the decoded data meets each coding section;It counts in the decoded data and meets each volume The number of characters in code section.
In above scheme, the decision subelement is specifically used for: according to second pre-set priority in each coding section, successively After judging that the number of characters for meeting each coding section in decoded data and the decoded data deduct null character and 0 character Total length between relationship, according to meet in decoded data it is each coding section number of characters and the decoded data The relationship between total length after deducting null character and 0 character determines type of coding corresponding to the critical data.
Method and device provided by the embodiment of the present invention first extracts the key in the network message that user's operation generates Data are decoded the critical data;The corresponding type of coding of the decoded data of the critical data is determined again;Finally According to the type of coding, transcoding is carried out to the decoded data of the critical data.It so, it is possible efficiently and accurately to identify And change data improves data encoding identification to solve the problems, such as to identify Chinese incorrect codes caused by mistake as type of coding Accuracy rate;Also, data encoding type identification and decoding process reduce maintenance cost, mention without manual analysis and maintenance High user experience.
Detailed description of the invention
Fig. 1 is the identification of one data encoding type of the embodiment of the present invention and code-transferring method flow diagram;
Fig. 2 is the identification of two data encoding type of the embodiment of the present invention and code-transferring method flow diagram;
Fig. 3 is the identification of data encoding type of the embodiment of the present invention and transcoding device structural schematic diagram.
Specific embodiment
It is mostly by extracting hypertext transfer protocol (HTTP, HyperText Transfer in the prior art Protocol) charset in message in request header obtains the type of coding of critical data, or passes through extraction Charset in HTTP message in response header obtains the type of coding of critical data, still, if HTTP is reported There is no there is no charset in response header in charset or HTTP message in text in request header, or There is no the data encoding type in relevant data encoding type or data HTTP data message to be in person's HTTP data message Mistake, it will cause data messy codes for this method.If by predefined mode come the type of coding of designated key data, So, once the type of coding of the predefined type of coding mistake or HTTP message changes, then data be will also result in Messy code, and staff is needed to go to safeguard the predefined type of coding, need to take a significant amount of time analysis and maintenance.
The identification of type of coding described in the embodiment of the present invention and code-transferring method, by analyzing the relationship between common coding, knot The characteristics of closing various codings summarizes the rule between coding, the method for realizing a kind of URL type of coding identification and transcoding, can With effective and accurate identification and change data, scalability is stronger, without manual analysis and maintenance, reduces maintenance cost, Good experience is brought to user.
In the embodiment of the present invention, the critical data in the network message that user's operation generates first is extracted, to the crucial number According to being decoded;The corresponding type of coding of the decoded data of the critical data is determined again;Finally according to the type of coding, Transcoding is carried out to the decoded data of the critical data.
The identification of type of coding described in the embodiment of the present invention and code-transferring method, are determining the decoded data of the critical data Before corresponding type of coding, it is necessary first to different type of codings are divided into multiple coding sections, and determine each code area Between priority relationship.The priority relationship in the coding section includes the first priority and the second priority, wherein described the One priority calculates the character number for meeting each coding section for looping through the decoded data;Described second is excellent First grade is used to carry out type of coding judgement according to the character number for meeting each coding section in the decoded data, determines institute State type of coding corresponding to decoded data;The priority can be according to the relationship between different coding, coding section The characteristics of service condition of inner code word, coding, code word uncommon degree determine.
In the embodiment of the present invention, by taking UTF8 coding and GB18030 coding as an example, it is specifically described volume described in the embodiment of the present invention Code type identifies code-transferring method;In the embodiment of the present invention, first according to UTF8 coding and GB18030 encode the characteristics of, relationship with And rule, it is accustomed in conjunction with the online of people, UTF8 coding and GB18030 coding is divided into several are more specific, range is smaller And the coding section with specific use, and arrange first priority relationship in each coding section;By UTF8 coding and GB18030 type of coding is divided into multiple coding sections, and after first priority relationship in determining each coding section, each code area Between according to the first priority from high to low successively are as follows: ASCII is encoded in section, UTF8 coding and GB18030 coding overlapping interval Show coding section (can show character section) and can not show coding section (can not show character section), cavity encode section Non- common 6 bytes of the UTF8 that (being encoded to empty code word section), 4 byte code sections of GB18030 coding, GB18030 are encoded Assembly coding section, the UTF8 Chinese character code of GB18030 coding and the non-common 6 combination of bytes coding section UTF8, GB18030 are compiled Coding section (can not show character section), UTF8 6 combination of bytes of coding are shown between the common code area UTF8 of code The two byte code sections in section, the coding section UTF8 and GB18030 coding.
In the embodiment of the present invention, second priority relationship in each coding section for type of coding judgement is not done specifically Limit, in practical applications, can according between different coding relationship, encode the spy of the service condition of section inner code word, coding Point, the uncommon degree of code word are determining.
With reference to the accompanying drawing and specific embodiment, the implementation of technical solution of the present invention is described in further detail.Figure 1 is the identification of one data encoding type of the embodiment of the present invention and code-transferring method flow diagram, as shown in Figure 1, the present embodiment data Type of coding identification and code-transferring method the following steps are included:
Step 101: extracting the critical data in the network message that user's operation generates, the critical data is solved Code;
Specifically, the critical data extracted in the network message that user's operation generates includes: according to keyword or just Then expression formula extracts the critical data in the network message;
Described be decoded to the critical data includes: to compile solution rule according to URLENCODE, and critical data is carried out URLDECODE decoding.
Step 102: determining the corresponding type of coding of the decoded data of the critical data;
Specifically, the corresponding type of coding of the decoded data of the determination critical data includes: each coding of load The configuration information in section;The decoded data are looped through, the character number for meeting each coding section is calculated;According to described The priority relationship of the character number and each coding section that meet each coding section in decoded data carries out type of coding Judgement, determines type of coding corresponding to the decoded data;Discharge the configuration information in each coding section;
Wherein, the configuration information in each coding section includes but is not limited to each coding dividing condition in section and corresponding Priority relationship;
Described to loop through the decoded data, the character number that calculating meets each coding section includes: according to each The first pre-set priority for encoding section, successively judges whether the character in the decoded data meets each coding section; Count the number of characters for meeting each coding section in the decoded data;
Specifically, the data length that can limit traversal by the deflected length maximum value configured is total, according to each coding class The priority relationship in type section, successively statistics meets and is unsatisfactory for the character number in each coding section, if it is satisfied, then count, Continue to traverse subsequent data after offset, if conditions are not met, then continuing to match subsequent coding section;
It is described according to the character number for meeting each coding section in decoded data and the priority in each coding section Relationship carries out type of coding judgement, determines that type of coding corresponding to the decoded data includes: according to each coding section The second pre-set priority, successively judge to meet in decoded data the number of characters and the decoded number in each coding section According to relationship between the total length after deduction null character and 0 character, according to the word for meeting each coding section in decoded data The relationship between total length after symbol number and the decoded data deduction null character and 0 character determines the critical data institute Corresponding type of coding.
Specifically, according to the second of the character number and each coding section that meet each coding section in decoded data The sequence of priority relationship from high to low successively judges the corresponding type of coding of the decoded data, if may determine that The corresponding type of coding of the decoded data, then otherwise output is as a result, continue type of coding judgement next time, directly Can extremely determine the corresponding type of coding of the decoded data, or in the case where the type of coding that can not be determined it is defeated The data encoding type defaulted out.
In the embodiment of the present invention, the decoded data are looped through, calculate the character number for meeting each coding section During, the character number that traversal and statistics meet the character number in each coding section and be unsatisfactory for each coding section, if In ergodic process, there is the character for being unsatisfactory for each coding section, and may determine that the type of coding of data, then traverse knot Beam, and type of coding is provided;Otherwise continue to traverse follow-up data, until the maximum value of extremely offset is traversed, after traversal, then root According to the priority relationship in each coding section, judge that the character number in each coding section judges the coding class of the decoded data Type, and type of coding is provided.
Step 103: according to the type of coding, transcoding being carried out to the decoded data of the critical data;
In this step, according to interface or the demand of other coded formats, decoded critical data is encoded into class according to it Type carries out transcoding.
In the embodiment of the present invention, different type of codings is being divided into multiple coding sections, and determine each coding section Priority relationship when, can also respectively encode the corresponding weight in section determines according to actual conditions, carry out type of coding judgement During, it is closed according to the priority of the character number and each coding section that meet each coding section in decoded data System and weight carry out type of coding judgement;For example, when the character currently judged meets the overlapping interval in two coding sections, root The coding section that the character meets is determined according to weight, such as: the volume that the biggish coding section of weight is met as current character Code section.
Fig. 2 is the identification of two data encoding type of the embodiment of the present invention and code-transferring method flow diagram, as shown in Fig. 2, originally Inventive embodiments data encoding type identification and code-transferring method the following steps are included:
Step 200: receiving the network message that user's operation generates;
In this step, the network message includes but is not limited to HTTP message;
Step 201: extracting the critical data in the message;
In this step, according to keyword or regular expression, the critical data in the network message is extracted;
Step 202: the critical data is decoded;
In this step, solution rule is compiled according to URLENCODE, critical data is subjected to URLDECODE decoding;
Step 203: different type of codings being divided into multiple coding sections, and determines that the priority in each coding section is closed System.
Step 204: the configuration information in load each coding section;
Wherein, the configuration information in each coding section includes but is not limited to each coding dividing condition in section and corresponding Priority relationship;It can also include the corresponding weight in each coding section;
In the embodiment of the present invention, step 203 and step 204 can be default step, that is, carry out institute of the embodiment of the present invention Before stating type of coding identification and code-transferring method, different type of codings is divided into multiple coding sections in advance, determines each volume The priority relationship in code section, and successfully load the configuration information in each coding section;When step 203 and step 204 are default step When rapid, in the present embodiment, after executing step 202, step 205 is executed, directly using the excellent of pre-set each coding section First grade message loop traverses decoded data;
Step 205-240 is the process for determining the corresponding type of coding of the decoded data of critical data;The pass The decoded data of key data referred to as decoded data in following steps;Specifically,
In step 205-240, step 205-229 is to loop through the decoded data, and calculating meets each code area Between character number process;" current character " being previously mentioned in step 206-240 can be a character, be also possible to more A character is determined with specific reference to the coding section determined, for example, when judging whether current character meets ASCII coding section When, since the character in ASCII coding section is 1 character, " current character " here refers in decoded data One character;When judging whether current character meets the non-common 6 combination of bytes coding section UTF8 of GB18030 coding, by Character in the non-common 6 combination of bytes coding section UTF8 of GB18030 coding is 6 characters, therefore, here " current Character " refers to 6 characters in decoded data;Specifically,
Step 205: judging whether the decoded data traverse completion;When the decoded data have not traversed Cheng Shi executes step 206, otherwise, executes step 230;
Step 206: judging whether deflected length is more than limitation;When deflected length is not above limitation, step is executed 207;Otherwise, step 230 is executed;
Step 207: judging whether current character meets ASCII coding section, if current character meets the code area ASCII Between when, execute step 208, otherwise, execute step 209;
Step 208: the number of characters for meeting ASCII coding section is counted;Execute step 229;
Specifically, currently meet ASCII coding section number of characters=counted meet ASCII coding section character The number of characters count+currently judged;
The initial value of the number of characters for meeting ASCII coding section is 0;
Step 209: judging whether current character meets in UTF8 coding and GB18030 coding overlapping interval and show coding Section;If current character meets UTF8 coding and when showing coding section, executes step in GB18030 coding overlapping interval 210;Otherwise, step 211 is executed;
Step 210: will meet UTF8 coding and GB18030 coding overlapping interval in showing coding section number of characters into Row counts;Execute step 229;
Specifically, the number of characters showing coding section in current UTF8 coding and GB18030 coding overlapping interval= The number of characters showing coding section+number of characters for currently judging in the UTF8 coding and GB18030 coding overlapping interval of counting;
The initial value for meeting UTF8 coding and the number of characters for showing coding section in GB18030 coding overlapping interval It is 0;
Step 211: volume can not be shown by judging whether current character meets in UTF8 coding and GB18030 coding overlapping interval Code section;If current character meets UTF8 coding and when can not show coding section, executes in GB18030 coding overlapping interval Step 212;Otherwise, step 213 is executed;
Step 212: the number of characters that can not show coding section in UTF8 coding and GB18030 coding overlapping interval will be met It is counted;Execute step 229;
Detailed process is referring to step 208,210;
The UTF8 that meets encodes the initial of the number of characters can not show coding section encoded in overlapping interval with GB18030 Value is 0;
Step 213: judging whether current character meets cavity coding section;If current character meets empty code area Between when, execute step 214;Otherwise, step 215 is executed;
Step 214: the number of characters for meeting cavity coding section is counted;Execute step 229;
Detailed process is referring to step 208,210;
The initial value of the number of characters for meeting cavity coding section is 0;
Step 215: judging whether current character meets 4 byte code sections of GB18030 coding;If current character is full 4 byte code sections of sufficient GB18030 coding, execute step 228;Otherwise, step 216 is executed;
Step 216: judging whether current character meets the non-common 6 combination of bytes coding section UTF8 of GB18030 coding; If current character meets the non-common 6 combination of bytes coding section UTF8 of GB18030 coding, step 217 is executed;Otherwise, it holds Row step 218;
Step 217: the number of characters in the non-common 6 combination of bytes coding section the UTF8 for meeting GB18030 coding is counted Number;Execute step 229;
Detailed process is referring to step 208,210;
The initial value of the number of characters in the non-common 6 combination of bytes coding section UTF8 for meeting GB18030 coding is 0;
Step 218: judging whether current character meets the UTF8 Chinese character code and non-common 6 words of UTF8 of GB18030 coding Save assembly coding section;If current character meets the UTF8 Chinese character code and non-common 6 combination of bytes of UTF8 of GB18030 coding Section is encoded, step 219 is executed;Otherwise, step 220 is executed;
Step 219: by the non-common 6 combination of bytes coding section of the UTF8 Chinese character code and UTF8 that meet GB18030 coding Number of characters counted;Execute step 229;
Detailed process is referring to step 208,210;
The character of the UTF8 Chinese character code for meeting GB18030 coding and the non-common 6 combination of bytes coding section UTF8 Several initial values is 0;
Step 220: judging showing between whether current character meets the common code area UTF8 that GB18030 is encoded Encode section;If current character meet GB18030 coding the common code area UTF8 between in show coding section, hold Row step 221;Otherwise, step 222 is executed;
Step 221: by meet GB18030 coding the common code area UTF8 between in show coding section word Symbol number is counted;Execute step 229;
Detailed process is referring to step 208,210;
The first of the number of characters for encoding section is shown between the common code area UTF8 for meeting GB18030 coding Initial value is 0;
Step 222: judging whether current character meets UTF8 and encode 6 combination of bytes sections;If current character meets UTF8 encodes 6 combination of bytes sections, executes step 223;Otherwise, step 224 is executed;
Step 223: the number of characters for meeting 6 combination of bytes sections of UTF8 coding is counted;Execute step 229;
Detailed process is referring to step 208,210;
The initial value for meeting the number of characters that UTF8 encodes 6 combination of bytes sections is 0;
Step 224: judging whether current character meets UTF8 coding section;If current character meets the code area UTF8 Between, execute step 225;Otherwise, step 226 is executed;
Step 225: the number of characters for meeting UTF8 coding section is counted;Execute step 229;
Detailed process is referring to step 208,210;
The initial value of the number of characters for meeting UTF8 coding section is 0;
Step 226: judging whether current character meets two byte code sections of GB18030 coding;If current character Meet two byte code sections of GB18030 coding, executes step 227;Otherwise, step 228 is executed;
Step 227: the number of characters for meeting two byte code sections of GB18030 coding is counted;Execute step 229;
Detailed process is referring to step 208,210;
The initial value of the number of characters in the two byte code sections for meeting GB18030 coding is 0;
Step 228: determining the corresponding type of coding of the decoded data for GB18030 coding;And execute step 240;
In this step, if the character in the decoded data is unsatisfactory for any coding section, institute can be determined The corresponding type of coding of decoded data is stated as GB18030 coding, but this only can determine that the decoded data are corresponding Type of coding be GB18030 coding a kind of situation;Other can determine the corresponding type of coding of the decoded data Following steps are referred to for the case where GB18030 coding.
Step 229: the character digit that offset current procedures are judged;Return step 205, after continuing judgement offset, solution The character traversed whether is had or not in data after code;
In this step, when judging that current character meets corresponding coding section, the character that current procedures are judged is deviated Digit then deviates 6, continues when non-common 6 combination of bytes of UTF8 for meeting GB18030 coding such as current character encode section Traversal statistics follow-up data.
In step 204-240, step 230-240 is according to the character for meeting each coding section in the decoded data Number and the priority relationship in each coding section carry out type of coding judgement, determine corresponding to the decoded data The process of type of coding;In the present embodiment, the type of coding of the decoded data is successively judged according to the second priority, such as First judgement meets the number of characters and N of single coding section (ASCII encodes section, GB18030 coding section, UTF8 coding section) Relationship, then successively judgement meet it is each coding section number of characters between relationship and meet other coding sections character Several relationships with N, to determine the coding section of decoded data satisfaction;The present embodiment be only by taking this sequence as an example, but It is not limited to this second priority orders.
Specifically,
Step 230: calculating decoded data and deduct the total length N after null character and 0 character;
Step 231: judging whether N is equal to the number of characters for meeting ASCII coding section, meet the code area ASCII when N is equal to Between number of characters when, execute step 232;Otherwise, step 233 is executed;
Step 232: determining the corresponding data type of the decoded data for ASCII coding;Execute step 240;
Step 233: judging whether N is equal to the number of characters for meeting ASCII coding section and meets GB18030 coding section The sum of number of characters, when N is equal to the number of characters for meeting ASCII coding section and the sum of the number of characters for meeting GB18030 coding section When, execute step 228;Otherwise, step 234 is executed;
Step 234: judging whether N is equal to the number of characters for meeting ASCII coding section and the word for meeting UTF8 coding section The sum of number is accorded with, when N is equal to the number of characters for meeting ASCII coding section with the sum of the number of characters for meeting UTF8 coding section, is held Row step 235;Otherwise, step 236 is executed;
Step 235: determining the corresponding data type of the decoded data for UTF8 coding;Execute step 240;
Step 236: judging whether N is equal to the number of characters for meeting ASCII coding section and the word for meeting 6 byte code sections According with the sum of number and meeting the number of characters in 6 byte code sections is 6;When meeting above-mentioned condition simultaneously, 228 are thened follow the steps;It is no Then, situation 1:N, which is equal to, meets the sum of the number of characters in 6 byte code sections of number of characters and satisfaction in ASCII coding section and meets The number of characters in 6 byte code sections is greater than 6, thens follow the steps 235;Situation 2:N is not equal to the character for meeting ASCII coding section Number and the sum of the number of characters for meeting 6 byte code sections, then follow the steps 237;
In this step, only by taking above-mentioned deterministic process as an example, in practical applications, it can also be compiled according to 6 bytes are met Whether the number of characters in code section is 12 to be judged that specific value can be set according to the actual situation, can be set to 6 Other numerical value of integral multiple.
Step 237: judging whether that the number of characters for meeting UTF8 coding section is greater than 0, and N deduction meets the code area ASCII Between number of characters after value be equal to meet UTF8 coding section number of characters and meet each overlapping interval or each 6 combination of bytes area Between the sum of number of characters;When the conditions are satisfied, step 235 is executed;Otherwise, step 238 is executed;
Step 238: whether the number of characters that judgement meets GB18030 is more than or equal to the number of characters for meeting UTF8 coding section; When the number of characters for meeting GB18030 is more than or equal to the number of characters for meeting UTF8 coding section, step 228 is executed;Otherwise it executes Step 239;
Step 239: judging whether that the number of characters for meeting GB18030 is greater than 0 and meets UTF8 coding and GB18030 coding The number of characters that can not show coding section in overlapping interval is greater than 0, or meets the non-common code area UTF8 of GB18030 coding Between in can not show coding section number of characters with meet UTF8 encode 6 combination of bytes sections number of characters be greater than 0;When in satisfaction When stating condition, step 228 is executed;Otherwise, step 235 is executed;
Step 240: type of coding determined by exporting;
Step 241: the configuration information in release each coding section;
Step 242: according to the type of coding, transcoding being carried out to the decoded data of the critical data;
In this step, according to interface or the demand of other coded formats, decoded critical data is encoded into class according to it Type carries out transcoding.
For the embodiment of the present invention is only each step in the identification code-transferring method of the type of coding described in Fig. 2, but simultaneously This range is not limited, in practical applications, the process in Fig. 2 can be increased according to actual needs, delete, merge, or Person redefines the priority in coding section and executes sequence with adjust each step in Fig. 2, can also be the increasing of each coding section Weighted, according to the priority of the character number and each coding section that meet each coding section in the decoded data Relationship and weight carry out type of coding judgement, determine type of coding corresponding to the decoded data.These technical solutions It all belongs to the scope of protection of the present invention.
The embodiment of the present invention encodes the relationship and volume between the coding range of GB18030 coding, coding according to UTF8 The characteristics of code type, realizes the scheme of URL type of coding identification and transcoding, can effectively improve the standard of URL type of coding identification True rate avoids the type of coding because of mistake that data is caused to show messy code.
Technical solution of the present invention is completely retouched in conjunction with specific network message using process described in Fig. 2 It states.The network message that first specific embodiment of the invention is analyzed is as follows, in the present embodiment, by taking HTTP message as an example, but It is not limited to HTTP message;The present embodiment only belongs to a part of inventive concept, not full content.
In the present embodiment, network message is as follows:
GET/s? wd=%E8%BF%99%E5%B0%B1%E6%98%AF%E6%88%91%20%C2% 87&rsv_spt=1&issp=1&f=8&rsv_bp=0&ie=utf-8&tn=monline_5_d g&rsv_enter=0& Rsv_sug3=89&rsv_sug4=8344&rsv_sug1=30&rsv_sug2=0&inputT=2237639HTTP/1.1;
The present embodiment the method process referring to shown in Fig. 2, specifically includes the following steps:
Step 1: extracting the critical data in network message;
In this step, data after extracting keyword wd=, then critical data are as follows:
%E8%BF%99%E5%B0%B1%E6%98%AF%E6%88%91%20%C2%87.
Step 2: the critical data extracted is decoded;
%E8%BF%99%E5%B0%B1%E6%98%AF%E6%88%91%20%C2%87 is decoded Hexadecimal data are as follows: 98 AF E6 of E8 BF 99E5 B0 B1 E6,88 91 20 C2 87;
Step 3: decoded data are subjected to data encoding type identification;
In the present embodiment, step 3 includes following sub-step:
Step 3.1: code character statistics is carried out to the decoded data;
In this step, according to preset coding section, priority relationship and weight, from high priority to low priority pair The data carry out traversal statistics, specific as follows:
Step 3.1.1: judge whether first character (example: E8) meets ASCII coding section;It is unsatisfactory for, then carries out down One step;
In this step, if first character E8 meets ASCII coding section, meet the character in ASCII coding section Number+1;Offset 1, return step 3.1.1 continues traversal statistics follow-up data, until the maximum length that can be deviated, i.e., the described solution Data traversal after code finishes;
Step 3.1.2: judge whether the first two character (example: E8 BF) meets the overlapping interval (packet of UTF8 and GB18030 Include can display interval and can not display interval), be unsatisfactory for, then carry out in next step;
In this step, if the first two character meets the overlapping interval of UTF8 and GB18030, meet UTF8 with The number of characters+2 of the overlapping interval of GB18030;Offset 2, return step 3.1.1 continues traversal statistics follow-up data, until can The maximum length of offset;
Step 3.1.3: judging whether the first two character (example: E8 BF) meets cavity coding section, is unsatisfactory for, then carries out In next step;
In this step, if the first two character meets cavity coding section, meet the number of characters+2 in cavity coding section; Offset 2, return step 3.1.1 continues traversal statistics follow-up data, until the maximum length that can be deviated;
Step 3.1.4: judge whether first four character (example: 99 E5 of E8 BF) meets the four byte code area of GB18030 Between, it is unsatisfactory for, then carries out in next step;If it is satisfied, then determining that GB18030 type of coding is final type of coding;
Step 3.1.5: judge the first six character (example: 99 E5 B0 B1 of E8 BF) whether and meanwhile meet GB18030 coding Section is encoded with non-common 6 combination of bytes of UTF8, is unsatisfactory for, then carries out in next step;
In this step, if the first six character meets GB18030 coding and the non-common 6 combination of bytes code areas UTF8 simultaneously Between, then meet the number of characters+6 of GB18030 coding and the non-common 6 combination of bytes coding section UTF8 simultaneously;Offset 6 returns Step 3.1.1 continues traversal statistics follow-up data, until the maximum length that can be deviated;
Step 3.1.6: judge the first six character (example: 99 E5 B0 B1 of E8 BF) whether and meanwhile meet GBB18030 volume Code and non-6 byte codes of commonly using of UTF8 Chinese character code and UTF8 combine section, are unsatisfactory for, then carry out next step;
In this step, if the first six character meets GBB18030 coding and UTF8 Chinese character code simultaneously and UTF8 is non-common 6 byte codes combine section, then meet GBB18030 coding and UTF8 Chinese character code and the non-common 6 byte code groups of UTF8 simultaneously Close the number of characters+6 in section;Offset 6, return step 3.1.1 continues traversal statistics follow-up data, until the maximum that can be deviated Length;
Step 3.1.7: judge the first six character (example: 99 E5 B0 B1 of E8 BF) whether and meanwhile meet GB18030 coding Code displaying section and UTF8 between the common code area UTF8 encode 6 combination of bytes sections, are unsatisfactory for, then carry out down One step;
In this step, if the first six character meets simultaneously between GB18030 coding and the common code area UTF8 Code displaying section and UTF8 encode 6 combination of bytes sections, then meet simultaneously between GB18030 coding and the non-common code area UTF8 Can not code displaying section and UTF8 encode the number of characters+6 in 6 combination of bytes sections;Offset 6, return step 3.1.1, after Continuous traversal statistics follow-up data, until the maximum length that can be deviated;
Step 3.1.8: judging whether first three character (example: E8 BF 99) meets UTF8 coding section, meets, then meets The number of characters+3 in UTF8 coding section;Offset 3, return step 303A1 continues traversal statistics follow-up data, until can deviate Maximum length.
In this step, if first three character is unsatisfactory for UTF8 coding section, current character is unsatisfactory for any code area Between, determine the corresponding type of coding of the decoded data for GB18030 coding.
Circulation executes each step in step 3.1, until all characters traversal in decoded data finishes, this implementation The statistical result of decoded data described in example are as follows: the character number for meeting ASCII coding section is 1, meets UTF8 volume Code section character number is 12, and meeting UTF8 and GB18030 coding can show that the character number in superimposed coding section is 2, The number of characters for meeting other coding sections is 0.
Step 3.2: type of coding decision is carried out according to statistical result;
Specifically, decoded total length of data M=15, M described in the embodiment of the present invention subtract null character and 0 character Length after N=15;
The embodiment of the present invention carry out type of coding decision process the following steps are included:
Step 3.2.1: judging whether N is equal to the character length 1 in the section ASCII, and judging result is not equal to progress is next Step;
Step 3.2.2: judge N whether be equal to UTF8 coding section character length 12 and ASCII encode section character it is long The sum of degree 1 13, judging result are not equal to progress is in next step;
Step 3.2.3: judge whether N is equal to the character length 0 in GB18030 coding section and ASCII encodes section character The sum of length 11, judging result are not equal to progress is in next step;
Step 3.2.4: judge whether N is equal to the character length 12 in UTF8 coding section, ASCII coding section character length 1 and UTF8 coding and GB18030 coding the sum of overlapping interval character length 2 15, judging result be equal to, it is believed that after the decoding The corresponding type of coding of data be UTF8, provide need decoding data type of coding be UTF8.
Step 3.3: the type of coding for exporting critical data is UTF8;
Step 4: according to the data encoding type, being carried out according to the coded format of browser or other coded format demands Transcoding.
In this step, if the type of coding recognized is UTF8, and the coded format of browser is GB18030, then needs The data of UTF8 format are converted to the data of GB18030 format, can be shown, it otherwise will messy code.
The network message that second specific embodiment of the invention is analyzed is as follows, in the present embodiment, is with HTTP message Example, but it is not limited to HTTP message;The present embodiment only belongs to a part of inventive concept, not full content.
In the present embodiment, network message is as follows:
POST/aj/mblog/add? domain=2869929424&ajwvr=6&__rnd= 1416799662398HTTP/1.1
Host:weibo.com
Connection:keep-alive
... ... .. is omited
Location=v6_content_home&appkey=&style_type=1&pic_id=&te xt=% EA%89%81%ED%84%87&pdetail=&rank=0&rankid=&module=stissue &pub_type= Dialog&_t=0
The present embodiment the method process referring to shown in Fig. 2, specifically includes the following steps:
Step 1: the critical data after extracting the keyword " text=" in the message;
In this step, the critical data are as follows: %ea%89%81%ed%84%87;
Step 2: the keyword that will be extracted: %EA%89%81%ED%84%87 is decoded;
In this step, decoded hexadecimal number are as follows: 89 81 ED 8487 of EA;
Step 3: decoded data are subjected to data encoding type identification;
In the present embodiment, step 3 includes following sub-step:
Step 3.1: code character statistics is carried out to the decoded data;
In this step, according to preset coding section, priority relationship and weight, from high priority to low priority pair The data carry out traversal statistics, specific as follows:
Step 3.1.1: judging whether first character (example: EA) meets ASCII coding section, is unsatisfactory for, then carries out down One step;
In this step, if first character E8 meets ASCII coding section, meet the character in ASCII coding section Number+1;Offset 1, return step 3.1.1 continues traversal statistics follow-up data, until the maximum length that can be deviated;
Step 3.1.2: judge whether the first two character (example: EA 89) meets the overlapping interval (packet of UTF8 and GB18030 Include can display interval and can not display interval), be unsatisfactory for, then carry out in next step;
In this step, if the first two character meets the overlapping interval of UTF8 and GB18030, meet UTF8 with The number of characters+2 of the overlapping interval of GB18030;Offset 2, return step 3.1.1 continues traversal statistics follow-up data, until can The maximum length of offset;
Step 3.1.3: judging whether the first two character (example: EA 89) meets cavity coding section, is unsatisfactory for, then carries out In next step;
In this step, if the first two character meets cavity coding section, meet the number of characters+2 in cavity coding section; Offset 2, return step 3.1.1 continues traversal statistics follow-up data, until the maximum length that can be deviated;
Step 3.1.4: judge whether first four character (example: 89 81 ED of EA) meets the four byte code area of GB18030 Between, it is unsatisfactory for, then carries out in next step, if it is satisfied, then determining that GB18030 is encoded to final type of coding;
Step 3.1.5: judge the first six character (example: 89 81 ED 84 87 of EA) whether and meanwhile meet GB18030 coding Section is encoded with non-common 6 combination of bytes of UTF8, is met, then meets GB18030 coding and non-common 6 combination of bytes of UTF8 simultaneously The number of characters+6 in section is encoded, deviates 6, return step 3.1.1, continues traversal statistics follow-up data;
In this step, if the first six character does not meet GB18030 coding and the non-common 6 combination of bytes coding of UTF8 simultaneously Section, then current character is unsatisfactory for any coding section, determines that the corresponding type of coding of the decoded data is GB18030 Coding.
Circulation executes each step in step 3.1, until all characters traversal in decoded data finishes, this implementation The statistical result of decoded data described in example are as follows: meeting the character number that non-common 6 combination of bytes of UTF8 encode section is 6, the number of characters for meeting other coding sections is 0.
Step 3.2: type of coding decision is carried out according to statistical result;
Specifically, decoded total length of data M=6 described in the embodiment of the present invention, subtracts the length of null character and 0 character It is N=6 after degree;
The embodiment of the present invention carry out type of coding decision process the following steps are included:
Step 3.2.1: judging whether N is equal to the character length 0 in the section ASCII, and judging result is not equal to progress is next Step;
Step 3.2.2: judge whether N is equal to the character length 0 in UTF8 coding section and ASCII encodes section character length The sum of 00, judging result is not equal to progress is in next step;
Step 3.2.3: judge whether N is equal to the character length 0 in GB18030 coding section and ASCII encodes section character The sum of length 00, judging result are not equal to progress is in next step;
Step 3.2.4: judge whether N is equal to the character length 0 in UTF8 coding section, ASCII coding section character length 0 And the sum of UTF8 coding and GB18030 coding overlapping interval character length 00, judging result are not equal to progress is in next step;
Step 3.2.5: judge whether N is equal to while meeting GB18030 coding and the non-common 6 byte code combination regions UTF8 Between character length be 6, judging result be equal to, it is believed that the corresponding type of coding of the decoded data is GB18030, is mentioned The type of coding of supply and demand decoding data is GB18030.
Step 3.3: the type of coding for exporting critical data is GB18030;
Step 4: according to the type of coding, being turned according to the coded format of browser or other coded format demands Code.
The network message that third specific embodiment of the present invention is analyzed is as follows, in the present embodiment, is with HTTP message Example, but it is not limited to HTTP message;The present embodiment only belongs to a part of inventive concept, not full content.
In the present embodiment, network message is as follows:
POST/f/commit/post/add HTTP/1.1
Host:tieba.baidu.com
Connection:keep-alive
………
Content=%EA%89%81%ED%84%87%ED%84%87%ED%84%87%ED%84% 87&files=%5B%5D&mouse_pwd_isclick=0&__type__=reply
The present embodiment the method process referring to shown in Fig. 2, specifically includes the following steps:
Step 1: extracting the critical data in network message;
Critical data in this step, after extracting the keyword content=in the message;Then critical data are as follows: % EA%89%81%ED%84%87%ED%84%87%ED%84%87%ED%84%87;
Step 2: the critical data extracted is decoded;
Decoded hexadecimal number are as follows: 84 87 ED of EA 89 81 ED, 84 87 ED, 84 87 ED 84 87;
Step 3: decoded data are subjected to data encoding type identification;
In the present embodiment, step 3 includes following sub-step:
Step 3.1: code character statistics is carried out to the decoded data;
In this step, according to preset coding section, priority relationship and weight, from high priority to low priority Traversal statistics is carried out to the data, specific as follows:
Step 3.1.1: judging whether first character (example: EA, ED) meets ASCII coding section, is unsatisfactory for, then carries out In next step;
In this step, if first character E8 meets ASCII coding section, meet the character in ASCII coding section Number+1;Offset 1, return step 3.1.1 continues traversal statistics follow-up data, until the maximum length that can be deviated;
Step 3.1.2: judge whether the first two character (example: EA 89, ED 84) meets the overlay region of UTF8 and GB18030 Between (including can display interval and can not display interval), be unsatisfactory for, then carry out in next step;
In this step, if the first two character meets the overlapping interval of UTF8 and GB18030, meet UTF8 with The number of characters+2 of the overlapping interval of GB18030;Offset 2, return step 3.1.1 continues traversal statistics follow-up data, until can The maximum length of offset;
Step 3.1.3: judging whether the first two character (example: EA 89, ED 84) meets cavity coding section, is unsatisfactory for, It then carries out in next step;
In this step, if the first two character meets cavity coding section, meet the number of characters+2 in cavity coding section; Offset 2, return step 3.1.1 continues traversal statistics follow-up data, until the maximum length that can be deviated;
Step 3.1.4: judge whether first four character (example: 84 87 ED of EA 89 81 ED, ED) meets GB18030's Four byte code section, is unsatisfactory for, then carries out in next step, if it is satisfied, then determining that GB18030 is encoded to final type of coding;
Step 3.1.5: judge that (example: 84 87 ED 84 87 of EA 89 81 ED 84 87, ED is all satisfied the first six character This part) whether meet GB18030 coding and the non-common 6 combination of bytes coding section UTF8 simultaneously, meet, then meets simultaneously The number of characters+6 of GB18030 coding and the non-common 6 combination of bytes coding section UTF8, deviates 6, return step 3.1.1 continues Traversal statistics follow-up data.If conditions are not met, then carrying out in next step;
Step 3.1.6: judge the first six character (example: last three byte ED 84 87) whether and meanwhile meet GB18030 Coding and non-6 byte codes of commonly using of UTF8 Chinese character code and UTF8 combine section, are unsatisfactory for, then carry out next step;
In this step, if the first six character meets GBB18030 coding and UTF8 Chinese character code and UTF8 very simultaneously Section is combined with 6 byte codes, then meets GBB18030 coding and UTF8 Chinese character code and non-common 6 byte codes of UTF8 simultaneously Combine the number of characters+6 in section;Offset 6, return step 3.1.1 continues traversal statistics follow-up data, until can deviate most Long length;
Step 3.1.7: judge the first six character (example: last three byte ED 84 87) whether and meanwhile meet GB18030 Code displaying section and UTF8 between encoding the common code area UTF8 encode 6 combination of bytes sections, are unsatisfactory for, then into Row is in next step;
In this step, if the first six character meets simultaneously between GB18030 coding and the common code area UTF8 Code displaying section and UTF8 encode 6 combination of bytes sections, then meet simultaneously between GB18030 coding and the non-common code area UTF8 Can not code displaying section and UTF8 encode the number of characters+6 in 6 combination of bytes sections;Offset 6, return step 3.1.1, after Continuous traversal statistics follow-up data, until the maximum length that can be deviated;
Step 3.1.8: judge whether first three character (example: last three byte ED 84 87) meets the code area UTF8 Between, meet, then meets the number of characters+3 in UTF8 coding section;Offset 3, return step 3.1.1 continue traversal and count subsequent number According to.
Step 3.1.9: judging whether the first two character meets two byte code sections of GB18030 coding, meets, then full The number of characters+2 in two byte code sections of sufficient GB18030 coding;Offset 2, return step 3.1.1 continues to traverse, until complete At;
Step 3.1.10: whether interpretation has the character for being unsatisfactory for any coding section, if there is, then it is assumed that the volume of the data Code type is GB18030, and providing GB18030 is final type of coding.
Circulation executes each step in step 3.1, until all characters traversal in decoded data finishes, this implementation The statistical result of decoded data described in example are as follows: while meeting GB18030 coding and the non-common 6 combination of bytes coding of UTF8 The character number in section is 12, and the character number for meeting UTF8 coding section is 3, and the number of characters for meeting other coding sections is 0。
Step 3.2: type of coding decision is carried out according to statistical result;
Specifically, decoded total length of data M=15 described in the embodiment of the present invention, M subtract null character and 0 character N=15 after length;
The embodiment of the present invention carry out type of coding decision process the following steps are included:
Step 3.2.1: judging whether N is equal to the character length 0 in the section ASCII, and judging result is not equal to progress is next Step;
Step 3.2.2: judge whether N is equal to the character length 3 in UTF8 coding section and ASCII encodes section character length The sum of 00, judging result is not equal to progress is in next step;
Step 3.2.3: judge whether N is equal to the character length 0 in GB18030 coding section and ASCII encodes section character The sum of length 00, judging result are not equal to progress is in next step;
Step 3.2.4: judge whether N is equal to the character length 3 in UTF8 coding section, ASCII coding section character length 0 And the sum of UTF8 coding and GB18030 coding overlapping interval character length 03, judging result are not equal to progress is in next step;
Step 3.2.5: judge whether N is equal to while meeting GB18030 coding and the non-common 6 byte code combined characters of UTF8 According with length is 12, and judging result is not equal to progress is in next step;
Step 3.2.6: judging while meeting GB18030 coding and UTF8 coding section character length and the non-common volume of UTF8 Whether 6 byte code pattern length 12 of code is greater than 6, and if it is greater than 6, then decision goes out the corresponding coding class of the decoded data Type is UTF8, and provides the type of coding.
In this step, due to UTF8 coding section weight it is higher, when simultaneously meet GB18030 coding and UTF8 When coding section character length and 6 byte code pattern lengths 12 of the non-common coding of UTF8 are greater than 6, determine described decoded The corresponding type of coding of data is UTF8.
Step 3.3: the type of coding for exporting critical data is UTF8;
Step 4: according to the data encoding type, being carried out according to the coded format of browser or other coded format demands Transcoding.
The advantage of the method for the identification of url data type of coding described in the embodiment of the present invention and transcoding is: not needing frequently Encoding and decoding are carried out to data message, do not need to carry out data comparison to obtain the type of codings of data, thus the present invention is opposite Efficiency is higher, better performances;Setting reference encoder array and alternative coding array are not needed, user is not needed and manually selects automatically Literal code tool is detected, maintenance cost is reduced, the user experience is improved, and will not be because of reference encoder array and alternative coding Array mistake causes data messy code, and accuracy rate is high, greatly reduces messy code rate;It does not need that preset type of coding is arranged, reduces Maintenance cost, solves the problems, such as to lead to data messy code because of pre-arranged code mistake;By determining the overlapping interval of coding, draw Showing code word section (can show code word) and can not show code word section (can not show code word) in overlapping interval is separated, is passed through The syntagmatic that code word section can be shown, can not show code word section Yu other nonoverlapping intervals determines the type of coding of data, with solution Certainly in overlapping interval the problem of Chinese incorrect codes;Based on user behavior, the uncommon degree of each coding codeword is divided, it is normal to sum up user Code word section solves the Confused-code for meeting multiple type of coding sections because of multibyte code word combination;According to difference The characteristics of type of coding, can carry out different degrees of offset when carrying out type of coding detection, to improve the effect of program operation Rate;For the difficult problem of short text identification, the present invention can preferably solve the identification of short text type of coding.It can be effectively and quasi- True identification data encoding type reduces messy code rate, reduces maintenance cost, improves user experience.
The embodiment of the invention also provides a kind of identification of type of coding and transcoding device, Fig. 3 is coding of the embodiment of the present invention Type identification and transcoding device structural schematic diagram, as shown in figure 3, described device includes: critical data extraction unit 31, decoding list First 32, type of coding recognition unit 33, transcoded data unit 34, wherein
The critical data extraction unit 31, the critical data in network message for extracting user's operation generation, and Decoding unit is sent by the critical data of extraction;
Specifically, the critical data extraction unit 31 extracts the critical packet in the network message that user's operation generates Include: the critical data extraction unit 31 extracts the critical data in the network message according to keyword or regular expression;
The decoding unit 32, for being decoded to the critical data, and decoded data are sent to coding class Type recognition unit;
Specifically, it includes: 32 basis of decoding unit that the decoding unit 32, which is decoded the critical data, URLENCODE compiles solution rule, and critical data is carried out URLDECODE decoding.
Described device further includes coding section division unit 35, for different type of codings to be divided into multiple code areas Between, and determine the priority relationship in each coding section.
The type of coding recognition unit 33, for determining the corresponding coding class of the decoded data of the critical data Type, and transcoded data unit is sent by determining type of coding;
The type of coding recognition unit includes configuration subelement 331, statistics subelement 331, decision subelement 333, removes Peg unit 334, wherein
The configuration subelement 331, for loading the configuration information in each coding section;
The configuration information in each coding section includes but is not limited to the dividing condition in each coding section and corresponding preferential Grade relationship;
The statistics subelement 332 calculates the word for meeting each coding section for looping through the decoded data Accord with number;
The statistics subelement 332 is specifically used for: according to first pre-set priority in each coding section, successively described in judgement Whether the character in decoded data meets each coding section;It counts and meets each coding section in the decoded data Number of characters.
Specifically, the data that the statistics subelement 332 can limit traversal by the deflected length maximum value configured are long Degree is total, and according to the priority relationship in each type of coding section, successively statistics meets and be unsatisfactory for the character number in each coding section, If it is satisfied, then counting, continue to traverse subsequent data after offset, if conditions are not met, then continuing to match subsequent coding section;
The decision subelement 333, for according to meet in the decoded data it is each coding section character number, And the priority relationship in each coding section carries out type of coding judgement, determines coding class corresponding to the decoded data Type;
The decision subelement 333 is specifically used for: according to second pre-set priority in each coding section, successively judgement decoding The number of characters and the decoded data for meeting each coding section in data afterwards deduct the total length after null character and 0 character Between relationship, empty word is deducted according to the number of characters and the decoded data that meet each coding section in decoded data The relationship between total length after symbol and 0 character determines type of coding corresponding to the critical data.
Specifically, the decision subelement 333 according to meet in decoded data it is each coding section character number with And the sequence of the priority relationship in each coding section from high to low, successively judge the corresponding coding class of the decoded data Type exports if may determine that the corresponding type of coding of the decoded data as a result, otherwise, continuing next time Type of coding judgement, until the corresponding type of coding of the decoded data can be determined, or in the volume that can not be determined The data encoding type of default is exported in the case where code type.
The revocation subelement 334, for discharging the configuration information in each coding section.
In the embodiment of the present invention, the statistics subelement 332 loops through the decoded data, calculates and meets respectively During the character number for encoding section, traversal and statistics meet the character number in each coding section and are unsatisfactory for each code area Between character number, if there is the character for being unsatisfactory for each coding section in ergodic process, and may determine that the volume of data Code type, then traversal terminates, and provides type of coding;Otherwise the statistics subelement 332 continues to traverse follow-up data, until time It goes through to the maximum value of offset, after traversal, the decision subelement 333 is sentenced further according to the priority relationship in each coding section Each character number for encoding section that breaks judges the type of coding of the decoded data, and provides type of coding.
The transcoded data unit 34, for according to the type of coding, to the decoded data of the critical data into Row transcoding.
In the embodiment of the present invention, the coding section division unit 35 be also used to different type of codings is divided into it is more A coding section, and when the priority relationship in determining each coding section, the corresponding power in section is respectively encoded determines according to actual conditions Weight;The type of coding recognition unit 33 is each according to meeting in decoded data during carrying out type of coding judgement The priority relationship and weight of the character number and each coding section that encode section carry out type of coding judgement;For example, working as When the character of preceding judgement meets the overlapping interval in two coding sections, the coding section that the character meets is determined according to weight, Such as: the coding section that the biggish coding section of weight is met as current character.
The realization function that unit is managed everywhere in the identification of type of coding shown in Fig. 3 and transcoding, can refer to afore-mentioned code Type identification and the associated description of code-transferring method and understand.It will be appreciated by those skilled in the art that type of coding shown in Fig. 3 is known The function of each processing unit can be realized and running on the program on processor in other and transcoding device, can also be by specific Logic circuit and realize, such as: can by central processing unit (CPU), microprocessor (MPU), digital signal processor (DSP) or Field programmable gate array (FPGA) is realized;The storage unit can also be realized by various memories or storage medium.
In several embodiments provided by the present invention, it should be understood that disclosed method, apparatus and system, it can be with It realizes in other way.The apparatus embodiments described above are merely exemplary, for example, the division of the unit, Only a kind of logical function partition, there may be another division manner in actual implementation, such as: multiple units or components can be tied It closes, or is desirably integrated into another system, or some features can be ignored or not executed.In addition, shown or discussed each group It can be through some interfaces at the mutual communication connection in part, the indirect coupling or communication connection of equipment or unit can To be electrical, mechanical or other forms.
Above-mentioned unit as illustrated by the separation member, which can be or may not be, to be physically separated, aobvious as unit The component shown can be or may not be physical unit, it can and it is in one place, it may be distributed over multiple network lists In member;Some or all of units can be selected to achieve the purpose of the solution of this embodiment according to the actual needs.
In addition, each functional unit in various embodiments of the present invention can be fully integrated in one processing unit, it can also To be each unit individually as a unit, can also be integrated in one unit with two or more units;It is above-mentioned Integrated unit both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.
Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above method embodiment can pass through The relevant hardware of program instruction is completed, and program above-mentioned can store in computer-readable storage medium, which exists When execution, step including the steps of the foregoing method embodiments is executed;And storage medium above-mentioned includes: movable storage device, read-only deposits The various media that can store program code such as reservoir (ROM, Read-Only Memory), magnetic or disk.
Alternatively, if the above-mentioned integrated unit of the embodiment of the present invention is realized in the form of SFU software functional unit and as independence Product when selling or using, also can store in a computer readable storage medium.Based on this understanding, this hair Substantially the part that contributes to existing technology can body in the form of software products in other words for the technical solution of bright embodiment Reveal and, which is stored in a storage medium, including some instructions are with so that a computer is set Standby (can be personal computer, server or network equipment etc.) executes the whole of each embodiment the method for the present invention Or part.And storage medium above-mentioned includes: that movable storage device, ROM, magnetic or disk etc. are various can store program generation The medium of code.
The present invention is the data encoding type identification recorded in example and code-transferring method, device are only with above-described embodiment Example, but it is not limited only to this, those skilled in the art should understand that: it still can be to documented by foregoing embodiments Technical solution is modified, or equivalent substitution of some or all of the technical features;And these are modified or replace It changes, the range for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.

Claims (6)

1. a kind of data encoding type identification and code-transferring method, which is characterized in that the described method includes:
The critical data in the network message that user's operation generates is extracted, the critical data is decoded;
Determine the corresponding type of coding of the decoded data of the critical data;
According to the type of coding, transcoding is carried out to the decoded data of the critical data;
Wherein, the corresponding type of coding of the decoded data of the determination critical data includes: each coding section of load Configuration information;The decoded data are looped through, the character number for meeting each coding section is calculated;After the decoding Data in meet the character number in each coding section and the priority relationship in each coding section carries out type of coding judgement, Determine type of coding corresponding to the decoded data;Discharge the configuration information in each coding section;
Described to loop through the decoded data, the character number that calculating meets each coding section includes:
According to first pre-set priority in each coding section, successively judge whether the character in the decoded data meets respectively Encode section;Count the number of characters for meeting each coding section in the decoded data;
It is described according to the character number for meeting each coding section in decoded data and the priority relationship in each coding section Type of coding judgement is carried out, determines that type of coding corresponding to the decoded data includes:
According to second pre-set priority in each coding section, the character for meeting each coding section in decoded data is successively judged Relationship between several total lengths with after the decoded data deduction null character and 0 character, according in decoded data Meet the relationship between the number of characters in each coding section and the total length after the decoded data deduction null character and 0 character Determine type of coding corresponding to the critical data.
2. method according to claim 1, which is characterized in that the key extracted in the network message that user's operation generates Data include: to extract the critical data in the network message according to keyword or regular expression.
3. method according to claim 1, which is characterized in that the method also includes:
Different type of codings is divided into multiple coding sections, and determines the priority relationship in each coding section.
4. a kind of data encoding type identification and transcoding device, which is characterized in that described device includes: that critical data extracts list Member, decoding unit, type of coding recognition unit, transcoded data unit, wherein
The critical data extraction unit, the critical data in network message for extracting user's operation generation, and will extract Critical data be sent to decoding unit;
The decoding unit, for being decoded to the critical data, and decoded data are sent to type of coding identification Unit;
The type of coding recognition unit, for determining the corresponding type of coding of the decoded data of the critical data, and will Determining type of coding is sent to transcoded data unit;
The transcoded data unit, for carrying out transcoding to the decoded data of the critical data according to the type of coding;
Wherein, the type of coding recognition unit includes configuration subelement, statistics subelement, decision subelement, cancels subelement, Wherein, the configuration subelement, for loading the configuration information in each coding section;The statistics subelement, for recycling The decoded data are traversed, the character number for meeting each coding section is calculated;The decision subelement, for according to The priority relationship of the character number and each coding section that meet each coding section in decoded data carries out type of coding Judgement, determines type of coding corresponding to the decoded data;The revocation subelement, for discharging each code area Between configuration information;
The statistics subelement is specifically used for: according to first pre-set priority in each coding section, after successively judging the decoding Data in character whether meet each coding section;Count the character for meeting each coding section in the decoded data Number;
The decision subelement is specifically used for: according to second pre-set priority in each coding section, successively judging decoded number Between total length after deducting null character and 0 character according to the middle number of characters for meeting each coding section and the decoded data Relationship deducts null character and 0 according to the number of characters and the decoded data that meet each coding section in decoded data The relationship between total length after character determines type of coding corresponding to the critical data.
5. device according to claim 4, which is characterized in that the critical data extraction unit is specifically used for: according to key Word or regular expression extract the critical data in the network message.
6. device according to claim 4, which is characterized in that described device further includes coding section division unit, and being used for will Different type of codings is divided into multiple coding sections, and determines the priority relationship in each coding section.
CN201510249023.0A 2015-05-15 2015-05-15 A kind of identification of data encoding type and code-transferring method and device Active CN104994128B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510249023.0A CN104994128B (en) 2015-05-15 2015-05-15 A kind of identification of data encoding type and code-transferring method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510249023.0A CN104994128B (en) 2015-05-15 2015-05-15 A kind of identification of data encoding type and code-transferring method and device

Publications (2)

Publication Number Publication Date
CN104994128A CN104994128A (en) 2015-10-21
CN104994128B true CN104994128B (en) 2019-04-26

Family

ID=54305879

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510249023.0A Active CN104994128B (en) 2015-05-15 2015-05-15 A kind of identification of data encoding type and code-transferring method and device

Country Status (1)

Country Link
CN (1) CN104994128B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766305A (en) * 2017-10-23 2018-03-06 广东欧珀移动通信有限公司 Decoding algorithm determines method, apparatus, terminal and storage medium
CN107729302B (en) * 2017-10-23 2021-10-15 Oppo广东移动通信有限公司 Decoding algorithm determination method, device, terminal and storage medium
CN107770844B (en) * 2017-10-23 2020-12-29 Oppo广东移动通信有限公司 Decoding algorithm determination method, device, terminal and storage medium
CN107797976A (en) * 2017-10-23 2018-03-13 广东欧珀移动通信有限公司 Decoding algorithm determines method, apparatus, terminal and storage medium
CN109625079B (en) * 2018-10-24 2021-09-14 蔚来(安徽)控股有限公司 Control method and controller for Electric Power Steering (EPS) system of automobile
CN109495214B (en) * 2018-11-26 2020-03-24 电子科技大学 Channel coding type identification method based on one-dimensional inclusion structure
CN113595683A (en) * 2021-07-07 2021-11-02 西安震有信通科技有限公司 Conversion processing method, device, terminal and medium based on various encoding files

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101526963A (en) * 2009-04-17 2009-09-09 深圳华为通信技术有限公司 Method for identifying web page coding, device and terminal equipment
CN103207877A (en) * 2012-01-17 2013-07-17 阿里巴巴集团控股有限公司 Decoding method and device
CN103593277A (en) * 2012-08-15 2014-02-19 深圳市世纪光速信息技术有限公司 Log processing method and system
CN104361021A (en) * 2014-10-21 2015-02-18 小米科技有限责任公司 Webpage encoding identifying method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8775919B2 (en) * 2006-04-25 2014-07-08 Adobe Systems Incorporated Independent actionscript analytics tools and techniques

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101526963A (en) * 2009-04-17 2009-09-09 深圳华为通信技术有限公司 Method for identifying web page coding, device and terminal equipment
CN103207877A (en) * 2012-01-17 2013-07-17 阿里巴巴集团控股有限公司 Decoding method and device
CN103593277A (en) * 2012-08-15 2014-02-19 深圳市世纪光速信息技术有限公司 Log processing method and system
CN104361021A (en) * 2014-10-21 2015-02-18 小米科技有限责任公司 Webpage encoding identifying method and device

Also Published As

Publication number Publication date
CN104994128A (en) 2015-10-21

Similar Documents

Publication Publication Date Title
CN104994128B (en) A kind of identification of data encoding type and code-transferring method and device
CN108737333B (en) Data detection method and device
CN109246064B (en) Method, device and equipment for generating security access control and network access rule
JP6055548B2 (en) Apparatus, method, and network server for detecting data pattern in data stream
CN107341399B (en) Method and device for evaluating security of code file
CN105322969B (en) The method and device of data compression and decompression
CN111352907A (en) Method and device for analyzing pipeline file, computer equipment and storage medium
CN107545451B (en) Advertisement pushing method and device
CN105224600B (en) A kind of detection method and device of Sample Similarity
EP2585962A1 (en) Password checking
CN108234347A (en) A kind of method, apparatus, the network equipment and storage medium for extracting feature string
CN112163008A (en) Big data analysis-based user behavior data processing method and cloud computing platform
CN104765882B (en) A kind of internet site statistical method based on web page characteristics character string
CN111708921B (en) Number selection method, device, equipment and storage medium
CN110598109A (en) Information recommendation method, device, equipment and storage medium
CN111563560A (en) Data stream classification method and device based on time sequence feature learning
CN109558531A (en) News information method for pushing, device and computer equipment
CN113364784B (en) Detection parameter generation method and device, electronic equipment and storage medium
WO2018077059A1 (en) Barcode identification method and apparatus
CN110830499B (en) Network attack application detection method and system
CN112631945A (en) Test case generation method and device and storage medium
CN117294480A (en) Account security detection method and device, electronic equipment and storage medium
CN115146174B (en) Multi-dimensional weight model-based key clue recommendation method and system
CN111049813A (en) Message assembling method, message analyzing method, message assembling device, message analyzing device and storage medium
CN114390015B (en) Data pushing system, method, equipment and storage medium based on object model

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant