CN104994128B - A kind of identification of data encoding type and code-transferring method and device - Google Patents
A kind of identification of data encoding type and code-transferring method and device Download PDFInfo
- Publication number
- CN104994128B CN104994128B CN201510249023.0A CN201510249023A CN104994128B CN 104994128 B CN104994128 B CN 104994128B CN 201510249023 A CN201510249023 A CN 201510249023A CN 104994128 B CN104994128 B CN 104994128B
- Authority
- CN
- China
- Prior art keywords
- coding
- data
- type
- character
- section
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/02—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/30—Network architectures or network communication protocols for network security for supporting lawful interception, monitoring or retaining of communications or communication related information
- H04L63/306—Network architectures or network communication protocols for network security for supporting lawful interception, monitoring or retaining of communications or communication related information intercepting packet switched data communications, e.g. Web, Internet or IMS communications
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Technology Law (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The present invention provides a kind of identification of data encoding type and code-transferring methods, comprising: extracts the critical data in the network message that user's operation generates, is decoded to the critical data;Determine the corresponding type of coding of the decoded data of the critical data;According to the type of coding, transcoding is carried out to the decoded data of the critical data.The present invention also provides a kind of identification of data encoding type and transcoding devices.
Description
Technical field
The present invention relates to network security technology more particularly to a kind of data encoding types of unified finger URL (URL) data
Identification and code-transferring method and device.
Background technique
With the fast development of network technology, more and more users use the equipment such as mobile phone, computer and tablet computer
Online, user can be browsed under normal circumstances by browser (such as IE browser, Firefo browser and Chrome browser)
Webpage submits data, or is submitted using network application software (Taobao's software, Jingdone district net software and Dangdang.com's software)
Data.In network security and network log-in management field, in order to quickly prevent network crime behavior, it is often necessary to obtain and divide
User is analysed by network data caused by browser and application software, and most of network data usually by UTF8 and
It is encoded again after GB18030 coding by URLENCODE, wherein GB18030 coding is again comprising GBK coding and GB2312 coding;
Therefore, it when restoring user data, needs to carry out URLDECODE decoding to network data, and decoded user data is usual
It is differed for UTF8 or GB18030 coding, so, the type of coding of user data how effectively and is accurately identified, by number of users
It is current urgent problem to be solved according to showing.
Existing network data code identification scheme is limited primarily to following several:
1) in user's submission form or downloading data, charset printed words can be had in data message, extract charset
Corresponding type of coding can carry out encoding and decoding to data message can if not extracting the type of coding of charset printed words
Encoding and decoding are carried out to data message using preset type of coding.But for the datagram of no charset printed words
Text can directly contribute data messy code in the case where the type of coding of Non-precondition or preset type of coding mistake;And
And the above method needs regularly to update and safeguard default ground type of coding, maintenance cost is high, and accuracy rate is lower.
2) by the reference encoder array of webpage to be encoded and local preset alternative coding array, determine include
In one of the reference encoder array and the alternative coding array type of coding of the type of coding as webpage to be encoded;But this
The method that kind obtains data encoding type is larger to the dependence of reference encoder array and alternative coding array, if data message
The type of coding for not meeting the reference encoder or alternatively encoding, it will cause data messy codes;And which depends on browser, needs
User is wanted to remove selection " detecting literal code automatically " option, user's perception is stronger, is unable to reach the automatic detection text of unaware
Coding, and continuous update and maintenance reference encoder array and alternative coding array are needed, cost is larger.
3) it by needing decoded URL character string number to be decoded by different coding modes input, obtains different
URL character string, then these different URL character strings are encoded by its corresponding decoding process, by URL different after coding
Character string and input need the URL character string after decoded coding to compare, if one of URL character string different after coding
URL character string after needing decoded coding with input is identical, then inputs the coding of the URL character string after needing decoded coding
Type is the former type of coding.But using the type of coding identifying schemes, if input needs decoded URL character string
Meet UTF8 coding and GB18030 encodes superimposed coding section, or meet the coding range of a variety of type of codings simultaneously, according to this
The url data is decoded by a variety of decoding processes in scheme, then is encoded by a variety of coding modes, then will appear multipair original URL
Character string will be incapable of recognizing that correct data encoding when this occurs with the data unanimous circumstances after encoding again
Type.
Therefore, current type of coding identifying schemes all have apparent defect: accuracy rate is low, and maintenance cost is high, is easy
Cause data messy code.
Summary of the invention
In view of this, can be improved an embodiment of the present invention is intended to provide a kind of identification of data encoding type and code-transferring method
The accuracy rate of data encoding identification, reduces messy code, improves data encoding type identification and transcoding efficiency, reduces maintenance cost.
In order to achieve the above objectives, the technical scheme of the present invention is realized as follows:
The embodiment of the invention provides a kind of identification of data encoding type and code-transferring methods, which comprises
The critical data in the network message that user's operation generates is extracted, the critical data is decoded;
Determine the corresponding type of coding of the decoded data of the critical data;
According to the type of coding, transcoding is carried out to the decoded data of the critical data.
In above scheme, the critical data extracted in the network message that user's operation generates includes: according to keyword
Or regular expression, extract the critical data in the network message.
In above scheme, the method also includes:
Different type of codings is divided into multiple coding sections, and determines the priority relationship in each coding section.
In above scheme, the corresponding type of coding of the decoded data of the determination critical data includes:
Load the configuration information in each coding section;
The decoded data are looped through, the character number for meeting each coding section is calculated;
According to the priority of the character number and each coding section that meet each coding section in the decoded data
Relationship carries out type of coding judgement, determines type of coding corresponding to the decoded data;
Discharge the configuration information in each coding section.
It is described to loop through the decoded data in above scheme, calculate the character number for meeting each coding section
Include:
According to first pre-set priority in each coding section, successively judge whether the character in the decoded data is full
Each coding section of foot;Count the number of characters for meeting each coding section in the decoded data.
It is described according to the character number and each coding that meet each coding section in decoded data in above scheme
The priority relationship in section carries out type of coding judgement, determines that type of coding corresponding to the decoded data includes:
According to second pre-set priority in each coding section, successively judge to meet each coding section in decoded data
Relationship between total length after number of characters and the decoded data deduction null character and 0 character, according to decoded number
Between total length after deducting null character and 0 character according to the middle number of characters for meeting each coding section and the decoded data
Relationship determines type of coding corresponding to the critical data.
It include: key the embodiment of the invention also provides a kind of identification of data encoding type and transcoding device, described device
Data extracting unit, decoding unit, type of coding recognition unit, transcoded data unit, wherein
The critical data extraction unit, the critical data in network message for extracting user's operation generation, and will
The critical data of extraction is sent to decoding unit;
The decoding unit, for being decoded to the critical data, and decoded data are sent to type of coding
Recognition unit;
The type of coding recognition unit, for determining the corresponding type of coding of the decoded data of the critical data,
And transcoded data unit is sent by determining type of coding;
The transcoded data unit, for being carried out to the decoded data of the critical data according to the type of coding
Transcoding.
In above scheme, the critical data extraction unit is specifically used for: according to keyword or regular expression, extracting institute
State the critical data in network message.
In above scheme, described device further includes coding section division unit, for different type of codings to be divided into
Multiple coding sections, and determine the priority relationship in each coding section.
In above scheme, the type of coding recognition unit include configuration subelement, statistics subelement, decision subelement,
Cancel subelement, wherein
The configuration subelement, for loading the configuration information in each coding section;
The statistics subelement calculates the character for meeting each coding section for looping through the decoded data
Number;
The decision subelement, for according to meet in the decoded data it is each coding section character number, with
And the priority relationship in each coding section carries out type of coding judgement, determines coding class corresponding to the decoded data
Type;
The revocation subelement, for discharging the configuration information in each coding section.
In above scheme, the statistics subelement is specifically used for: according to first pre-set priority in each coding section, successively
Judge whether the character in the decoded data meets each coding section;It counts in the decoded data and meets each volume
The number of characters in code section.
In above scheme, the decision subelement is specifically used for: according to second pre-set priority in each coding section, successively
After judging that the number of characters for meeting each coding section in decoded data and the decoded data deduct null character and 0 character
Total length between relationship, according to meet in decoded data it is each coding section number of characters and the decoded data
The relationship between total length after deducting null character and 0 character determines type of coding corresponding to the critical data.
Method and device provided by the embodiment of the present invention first extracts the key in the network message that user's operation generates
Data are decoded the critical data;The corresponding type of coding of the decoded data of the critical data is determined again;Finally
According to the type of coding, transcoding is carried out to the decoded data of the critical data.It so, it is possible efficiently and accurately to identify
And change data improves data encoding identification to solve the problems, such as to identify Chinese incorrect codes caused by mistake as type of coding
Accuracy rate;Also, data encoding type identification and decoding process reduce maintenance cost, mention without manual analysis and maintenance
High user experience.
Detailed description of the invention
Fig. 1 is the identification of one data encoding type of the embodiment of the present invention and code-transferring method flow diagram;
Fig. 2 is the identification of two data encoding type of the embodiment of the present invention and code-transferring method flow diagram;
Fig. 3 is the identification of data encoding type of the embodiment of the present invention and transcoding device structural schematic diagram.
Specific embodiment
It is mostly by extracting hypertext transfer protocol (HTTP, HyperText Transfer in the prior art
Protocol) charset in message in request header obtains the type of coding of critical data, or passes through extraction
Charset in HTTP message in response header obtains the type of coding of critical data, still, if HTTP is reported
There is no there is no charset in response header in charset or HTTP message in text in request header, or
There is no the data encoding type in relevant data encoding type or data HTTP data message to be in person's HTTP data message
Mistake, it will cause data messy codes for this method.If by predefined mode come the type of coding of designated key data,
So, once the type of coding of the predefined type of coding mistake or HTTP message changes, then data be will also result in
Messy code, and staff is needed to go to safeguard the predefined type of coding, need to take a significant amount of time analysis and maintenance.
The identification of type of coding described in the embodiment of the present invention and code-transferring method, by analyzing the relationship between common coding, knot
The characteristics of closing various codings summarizes the rule between coding, the method for realizing a kind of URL type of coding identification and transcoding, can
With effective and accurate identification and change data, scalability is stronger, without manual analysis and maintenance, reduces maintenance cost,
Good experience is brought to user.
In the embodiment of the present invention, the critical data in the network message that user's operation generates first is extracted, to the crucial number
According to being decoded;The corresponding type of coding of the decoded data of the critical data is determined again;Finally according to the type of coding,
Transcoding is carried out to the decoded data of the critical data.
The identification of type of coding described in the embodiment of the present invention and code-transferring method, are determining the decoded data of the critical data
Before corresponding type of coding, it is necessary first to different type of codings are divided into multiple coding sections, and determine each code area
Between priority relationship.The priority relationship in the coding section includes the first priority and the second priority, wherein described the
One priority calculates the character number for meeting each coding section for looping through the decoded data;Described second is excellent
First grade is used to carry out type of coding judgement according to the character number for meeting each coding section in the decoded data, determines institute
State type of coding corresponding to decoded data;The priority can be according to the relationship between different coding, coding section
The characteristics of service condition of inner code word, coding, code word uncommon degree determine.
In the embodiment of the present invention, by taking UTF8 coding and GB18030 coding as an example, it is specifically described volume described in the embodiment of the present invention
Code type identifies code-transferring method;In the embodiment of the present invention, first according to UTF8 coding and GB18030 encode the characteristics of, relationship with
And rule, it is accustomed in conjunction with the online of people, UTF8 coding and GB18030 coding is divided into several are more specific, range is smaller
And the coding section with specific use, and arrange first priority relationship in each coding section;By UTF8 coding and
GB18030 type of coding is divided into multiple coding sections, and after first priority relationship in determining each coding section, each code area
Between according to the first priority from high to low successively are as follows: ASCII is encoded in section, UTF8 coding and GB18030 coding overlapping interval
Show coding section (can show character section) and can not show coding section (can not show character section), cavity encode section
Non- common 6 bytes of the UTF8 that (being encoded to empty code word section), 4 byte code sections of GB18030 coding, GB18030 are encoded
Assembly coding section, the UTF8 Chinese character code of GB18030 coding and the non-common 6 combination of bytes coding section UTF8, GB18030 are compiled
Coding section (can not show character section), UTF8 6 combination of bytes of coding are shown between the common code area UTF8 of code
The two byte code sections in section, the coding section UTF8 and GB18030 coding.
In the embodiment of the present invention, second priority relationship in each coding section for type of coding judgement is not done specifically
Limit, in practical applications, can according between different coding relationship, encode the spy of the service condition of section inner code word, coding
Point, the uncommon degree of code word are determining.
With reference to the accompanying drawing and specific embodiment, the implementation of technical solution of the present invention is described in further detail.Figure
1 is the identification of one data encoding type of the embodiment of the present invention and code-transferring method flow diagram, as shown in Figure 1, the present embodiment data
Type of coding identification and code-transferring method the following steps are included:
Step 101: extracting the critical data in the network message that user's operation generates, the critical data is solved
Code;
Specifically, the critical data extracted in the network message that user's operation generates includes: according to keyword or just
Then expression formula extracts the critical data in the network message;
Described be decoded to the critical data includes: to compile solution rule according to URLENCODE, and critical data is carried out
URLDECODE decoding.
Step 102: determining the corresponding type of coding of the decoded data of the critical data;
Specifically, the corresponding type of coding of the decoded data of the determination critical data includes: each coding of load
The configuration information in section;The decoded data are looped through, the character number for meeting each coding section is calculated;According to described
The priority relationship of the character number and each coding section that meet each coding section in decoded data carries out type of coding
Judgement, determines type of coding corresponding to the decoded data;Discharge the configuration information in each coding section;
Wherein, the configuration information in each coding section includes but is not limited to each coding dividing condition in section and corresponding
Priority relationship;
Described to loop through the decoded data, the character number that calculating meets each coding section includes: according to each
The first pre-set priority for encoding section, successively judges whether the character in the decoded data meets each coding section;
Count the number of characters for meeting each coding section in the decoded data;
Specifically, the data length that can limit traversal by the deflected length maximum value configured is total, according to each coding class
The priority relationship in type section, successively statistics meets and is unsatisfactory for the character number in each coding section, if it is satisfied, then count,
Continue to traverse subsequent data after offset, if conditions are not met, then continuing to match subsequent coding section;
It is described according to the character number for meeting each coding section in decoded data and the priority in each coding section
Relationship carries out type of coding judgement, determines that type of coding corresponding to the decoded data includes: according to each coding section
The second pre-set priority, successively judge to meet in decoded data the number of characters and the decoded number in each coding section
According to relationship between the total length after deduction null character and 0 character, according to the word for meeting each coding section in decoded data
The relationship between total length after symbol number and the decoded data deduction null character and 0 character determines the critical data institute
Corresponding type of coding.
Specifically, according to the second of the character number and each coding section that meet each coding section in decoded data
The sequence of priority relationship from high to low successively judges the corresponding type of coding of the decoded data, if may determine that
The corresponding type of coding of the decoded data, then otherwise output is as a result, continue type of coding judgement next time, directly
Can extremely determine the corresponding type of coding of the decoded data, or in the case where the type of coding that can not be determined it is defeated
The data encoding type defaulted out.
In the embodiment of the present invention, the decoded data are looped through, calculate the character number for meeting each coding section
During, the character number that traversal and statistics meet the character number in each coding section and be unsatisfactory for each coding section, if
In ergodic process, there is the character for being unsatisfactory for each coding section, and may determine that the type of coding of data, then traverse knot
Beam, and type of coding is provided;Otherwise continue to traverse follow-up data, until the maximum value of extremely offset is traversed, after traversal, then root
According to the priority relationship in each coding section, judge that the character number in each coding section judges the coding class of the decoded data
Type, and type of coding is provided.
Step 103: according to the type of coding, transcoding being carried out to the decoded data of the critical data;
In this step, according to interface or the demand of other coded formats, decoded critical data is encoded into class according to it
Type carries out transcoding.
In the embodiment of the present invention, different type of codings is being divided into multiple coding sections, and determine each coding section
Priority relationship when, can also respectively encode the corresponding weight in section determines according to actual conditions, carry out type of coding judgement
During, it is closed according to the priority of the character number and each coding section that meet each coding section in decoded data
System and weight carry out type of coding judgement;For example, when the character currently judged meets the overlapping interval in two coding sections, root
The coding section that the character meets is determined according to weight, such as: the volume that the biggish coding section of weight is met as current character
Code section.
Fig. 2 is the identification of two data encoding type of the embodiment of the present invention and code-transferring method flow diagram, as shown in Fig. 2, originally
Inventive embodiments data encoding type identification and code-transferring method the following steps are included:
Step 200: receiving the network message that user's operation generates;
In this step, the network message includes but is not limited to HTTP message;
Step 201: extracting the critical data in the message;
In this step, according to keyword or regular expression, the critical data in the network message is extracted;
Step 202: the critical data is decoded;
In this step, solution rule is compiled according to URLENCODE, critical data is subjected to URLDECODE decoding;
Step 203: different type of codings being divided into multiple coding sections, and determines that the priority in each coding section is closed
System.
Step 204: the configuration information in load each coding section;
Wherein, the configuration information in each coding section includes but is not limited to each coding dividing condition in section and corresponding
Priority relationship;It can also include the corresponding weight in each coding section;
In the embodiment of the present invention, step 203 and step 204 can be default step, that is, carry out institute of the embodiment of the present invention
Before stating type of coding identification and code-transferring method, different type of codings is divided into multiple coding sections in advance, determines each volume
The priority relationship in code section, and successfully load the configuration information in each coding section;When step 203 and step 204 are default step
When rapid, in the present embodiment, after executing step 202, step 205 is executed, directly using the excellent of pre-set each coding section
First grade message loop traverses decoded data;
Step 205-240 is the process for determining the corresponding type of coding of the decoded data of critical data;The pass
The decoded data of key data referred to as decoded data in following steps;Specifically,
In step 205-240, step 205-229 is to loop through the decoded data, and calculating meets each code area
Between character number process;" current character " being previously mentioned in step 206-240 can be a character, be also possible to more
A character is determined with specific reference to the coding section determined, for example, when judging whether current character meets ASCII coding section
When, since the character in ASCII coding section is 1 character, " current character " here refers in decoded data
One character;When judging whether current character meets the non-common 6 combination of bytes coding section UTF8 of GB18030 coding, by
Character in the non-common 6 combination of bytes coding section UTF8 of GB18030 coding is 6 characters, therefore, here " current
Character " refers to 6 characters in decoded data;Specifically,
Step 205: judging whether the decoded data traverse completion;When the decoded data have not traversed
Cheng Shi executes step 206, otherwise, executes step 230;
Step 206: judging whether deflected length is more than limitation;When deflected length is not above limitation, step is executed
207;Otherwise, step 230 is executed;
Step 207: judging whether current character meets ASCII coding section, if current character meets the code area ASCII
Between when, execute step 208, otherwise, execute step 209;
Step 208: the number of characters for meeting ASCII coding section is counted;Execute step 229;
Specifically, currently meet ASCII coding section number of characters=counted meet ASCII coding section character
The number of characters count+currently judged;
The initial value of the number of characters for meeting ASCII coding section is 0;
Step 209: judging whether current character meets in UTF8 coding and GB18030 coding overlapping interval and show coding
Section;If current character meets UTF8 coding and when showing coding section, executes step in GB18030 coding overlapping interval
210;Otherwise, step 211 is executed;
Step 210: will meet UTF8 coding and GB18030 coding overlapping interval in showing coding section number of characters into
Row counts;Execute step 229;
Specifically, the number of characters showing coding section in current UTF8 coding and GB18030 coding overlapping interval=
The number of characters showing coding section+number of characters for currently judging in the UTF8 coding and GB18030 coding overlapping interval of counting;
The initial value for meeting UTF8 coding and the number of characters for showing coding section in GB18030 coding overlapping interval
It is 0;
Step 211: volume can not be shown by judging whether current character meets in UTF8 coding and GB18030 coding overlapping interval
Code section;If current character meets UTF8 coding and when can not show coding section, executes in GB18030 coding overlapping interval
Step 212;Otherwise, step 213 is executed;
Step 212: the number of characters that can not show coding section in UTF8 coding and GB18030 coding overlapping interval will be met
It is counted;Execute step 229;
Detailed process is referring to step 208,210;
The UTF8 that meets encodes the initial of the number of characters can not show coding section encoded in overlapping interval with GB18030
Value is 0;
Step 213: judging whether current character meets cavity coding section;If current character meets empty code area
Between when, execute step 214;Otherwise, step 215 is executed;
Step 214: the number of characters for meeting cavity coding section is counted;Execute step 229;
Detailed process is referring to step 208,210;
The initial value of the number of characters for meeting cavity coding section is 0;
Step 215: judging whether current character meets 4 byte code sections of GB18030 coding;If current character is full
4 byte code sections of sufficient GB18030 coding, execute step 228;Otherwise, step 216 is executed;
Step 216: judging whether current character meets the non-common 6 combination of bytes coding section UTF8 of GB18030 coding;
If current character meets the non-common 6 combination of bytes coding section UTF8 of GB18030 coding, step 217 is executed;Otherwise, it holds
Row step 218;
Step 217: the number of characters in the non-common 6 combination of bytes coding section the UTF8 for meeting GB18030 coding is counted
Number;Execute step 229;
Detailed process is referring to step 208,210;
The initial value of the number of characters in the non-common 6 combination of bytes coding section UTF8 for meeting GB18030 coding is 0;
Step 218: judging whether current character meets the UTF8 Chinese character code and non-common 6 words of UTF8 of GB18030 coding
Save assembly coding section;If current character meets the UTF8 Chinese character code and non-common 6 combination of bytes of UTF8 of GB18030 coding
Section is encoded, step 219 is executed;Otherwise, step 220 is executed;
Step 219: by the non-common 6 combination of bytes coding section of the UTF8 Chinese character code and UTF8 that meet GB18030 coding
Number of characters counted;Execute step 229;
Detailed process is referring to step 208,210;
The character of the UTF8 Chinese character code for meeting GB18030 coding and the non-common 6 combination of bytes coding section UTF8
Several initial values is 0;
Step 220: judging showing between whether current character meets the common code area UTF8 that GB18030 is encoded
Encode section;If current character meet GB18030 coding the common code area UTF8 between in show coding section, hold
Row step 221;Otherwise, step 222 is executed;
Step 221: by meet GB18030 coding the common code area UTF8 between in show coding section word
Symbol number is counted;Execute step 229;
Detailed process is referring to step 208,210;
The first of the number of characters for encoding section is shown between the common code area UTF8 for meeting GB18030 coding
Initial value is 0;
Step 222: judging whether current character meets UTF8 and encode 6 combination of bytes sections;If current character meets
UTF8 encodes 6 combination of bytes sections, executes step 223;Otherwise, step 224 is executed;
Step 223: the number of characters for meeting 6 combination of bytes sections of UTF8 coding is counted;Execute step 229;
Detailed process is referring to step 208,210;
The initial value for meeting the number of characters that UTF8 encodes 6 combination of bytes sections is 0;
Step 224: judging whether current character meets UTF8 coding section;If current character meets the code area UTF8
Between, execute step 225;Otherwise, step 226 is executed;
Step 225: the number of characters for meeting UTF8 coding section is counted;Execute step 229;
Detailed process is referring to step 208,210;
The initial value of the number of characters for meeting UTF8 coding section is 0;
Step 226: judging whether current character meets two byte code sections of GB18030 coding;If current character
Meet two byte code sections of GB18030 coding, executes step 227;Otherwise, step 228 is executed;
Step 227: the number of characters for meeting two byte code sections of GB18030 coding is counted;Execute step
229;
Detailed process is referring to step 208,210;
The initial value of the number of characters in the two byte code sections for meeting GB18030 coding is 0;
Step 228: determining the corresponding type of coding of the decoded data for GB18030 coding;And execute step
240;
In this step, if the character in the decoded data is unsatisfactory for any coding section, institute can be determined
The corresponding type of coding of decoded data is stated as GB18030 coding, but this only can determine that the decoded data are corresponding
Type of coding be GB18030 coding a kind of situation;Other can determine the corresponding type of coding of the decoded data
Following steps are referred to for the case where GB18030 coding.
Step 229: the character digit that offset current procedures are judged;Return step 205, after continuing judgement offset, solution
The character traversed whether is had or not in data after code;
In this step, when judging that current character meets corresponding coding section, the character that current procedures are judged is deviated
Digit then deviates 6, continues when non-common 6 combination of bytes of UTF8 for meeting GB18030 coding such as current character encode section
Traversal statistics follow-up data.
In step 204-240, step 230-240 is according to the character for meeting each coding section in the decoded data
Number and the priority relationship in each coding section carry out type of coding judgement, determine corresponding to the decoded data
The process of type of coding;In the present embodiment, the type of coding of the decoded data is successively judged according to the second priority, such as
First judgement meets the number of characters and N of single coding section (ASCII encodes section, GB18030 coding section, UTF8 coding section)
Relationship, then successively judgement meet it is each coding section number of characters between relationship and meet other coding sections character
Several relationships with N, to determine the coding section of decoded data satisfaction;The present embodiment be only by taking this sequence as an example, but
It is not limited to this second priority orders.
Specifically,
Step 230: calculating decoded data and deduct the total length N after null character and 0 character;
Step 231: judging whether N is equal to the number of characters for meeting ASCII coding section, meet the code area ASCII when N is equal to
Between number of characters when, execute step 232;Otherwise, step 233 is executed;
Step 232: determining the corresponding data type of the decoded data for ASCII coding;Execute step 240;
Step 233: judging whether N is equal to the number of characters for meeting ASCII coding section and meets GB18030 coding section
The sum of number of characters, when N is equal to the number of characters for meeting ASCII coding section and the sum of the number of characters for meeting GB18030 coding section
When, execute step 228;Otherwise, step 234 is executed;
Step 234: judging whether N is equal to the number of characters for meeting ASCII coding section and the word for meeting UTF8 coding section
The sum of number is accorded with, when N is equal to the number of characters for meeting ASCII coding section with the sum of the number of characters for meeting UTF8 coding section, is held
Row step 235;Otherwise, step 236 is executed;
Step 235: determining the corresponding data type of the decoded data for UTF8 coding;Execute step 240;
Step 236: judging whether N is equal to the number of characters for meeting ASCII coding section and the word for meeting 6 byte code sections
According with the sum of number and meeting the number of characters in 6 byte code sections is 6;When meeting above-mentioned condition simultaneously, 228 are thened follow the steps;It is no
Then, situation 1:N, which is equal to, meets the sum of the number of characters in 6 byte code sections of number of characters and satisfaction in ASCII coding section and meets
The number of characters in 6 byte code sections is greater than 6, thens follow the steps 235;Situation 2:N is not equal to the character for meeting ASCII coding section
Number and the sum of the number of characters for meeting 6 byte code sections, then follow the steps 237;
In this step, only by taking above-mentioned deterministic process as an example, in practical applications, it can also be compiled according to 6 bytes are met
Whether the number of characters in code section is 12 to be judged that specific value can be set according to the actual situation, can be set to 6
Other numerical value of integral multiple.
Step 237: judging whether that the number of characters for meeting UTF8 coding section is greater than 0, and N deduction meets the code area ASCII
Between number of characters after value be equal to meet UTF8 coding section number of characters and meet each overlapping interval or each 6 combination of bytes area
Between the sum of number of characters;When the conditions are satisfied, step 235 is executed;Otherwise, step 238 is executed;
Step 238: whether the number of characters that judgement meets GB18030 is more than or equal to the number of characters for meeting UTF8 coding section;
When the number of characters for meeting GB18030 is more than or equal to the number of characters for meeting UTF8 coding section, step 228 is executed;Otherwise it executes
Step 239;
Step 239: judging whether that the number of characters for meeting GB18030 is greater than 0 and meets UTF8 coding and GB18030 coding
The number of characters that can not show coding section in overlapping interval is greater than 0, or meets the non-common code area UTF8 of GB18030 coding
Between in can not show coding section number of characters with meet UTF8 encode 6 combination of bytes sections number of characters be greater than 0;When in satisfaction
When stating condition, step 228 is executed;Otherwise, step 235 is executed;
Step 240: type of coding determined by exporting;
Step 241: the configuration information in release each coding section;
Step 242: according to the type of coding, transcoding being carried out to the decoded data of the critical data;
In this step, according to interface or the demand of other coded formats, decoded critical data is encoded into class according to it
Type carries out transcoding.
For the embodiment of the present invention is only each step in the identification code-transferring method of the type of coding described in Fig. 2, but simultaneously
This range is not limited, in practical applications, the process in Fig. 2 can be increased according to actual needs, delete, merge, or
Person redefines the priority in coding section and executes sequence with adjust each step in Fig. 2, can also be the increasing of each coding section
Weighted, according to the priority of the character number and each coding section that meet each coding section in the decoded data
Relationship and weight carry out type of coding judgement, determine type of coding corresponding to the decoded data.These technical solutions
It all belongs to the scope of protection of the present invention.
The embodiment of the present invention encodes the relationship and volume between the coding range of GB18030 coding, coding according to UTF8
The characteristics of code type, realizes the scheme of URL type of coding identification and transcoding, can effectively improve the standard of URL type of coding identification
True rate avoids the type of coding because of mistake that data is caused to show messy code.
Technical solution of the present invention is completely retouched in conjunction with specific network message using process described in Fig. 2
It states.The network message that first specific embodiment of the invention is analyzed is as follows, in the present embodiment, by taking HTTP message as an example, but
It is not limited to HTTP message;The present embodiment only belongs to a part of inventive concept, not full content.
In the present embodiment, network message is as follows:
GET/s? wd=%E8%BF%99%E5%B0%B1%E6%98%AF%E6%88%91%20%C2%
87&rsv_spt=1&issp=1&f=8&rsv_bp=0&ie=utf-8&tn=monline_5_d g&rsv_enter=0&
Rsv_sug3=89&rsv_sug4=8344&rsv_sug1=30&rsv_sug2=0&inputT=2237639HTTP/1.1;
The present embodiment the method process referring to shown in Fig. 2, specifically includes the following steps:
Step 1: extracting the critical data in network message;
In this step, data after extracting keyword wd=, then critical data are as follows:
%E8%BF%99%E5%B0%B1%E6%98%AF%E6%88%91%20%C2%87.
Step 2: the critical data extracted is decoded;
%E8%BF%99%E5%B0%B1%E6%98%AF%E6%88%91%20%C2%87 is decoded
Hexadecimal data are as follows: 98 AF E6 of E8 BF 99E5 B0 B1 E6,88 91 20 C2 87;
Step 3: decoded data are subjected to data encoding type identification;
In the present embodiment, step 3 includes following sub-step:
Step 3.1: code character statistics is carried out to the decoded data;
In this step, according to preset coding section, priority relationship and weight, from high priority to low priority pair
The data carry out traversal statistics, specific as follows:
Step 3.1.1: judge whether first character (example: E8) meets ASCII coding section;It is unsatisfactory for, then carries out down
One step;
In this step, if first character E8 meets ASCII coding section, meet the character in ASCII coding section
Number+1;Offset 1, return step 3.1.1 continues traversal statistics follow-up data, until the maximum length that can be deviated, i.e., the described solution
Data traversal after code finishes;
Step 3.1.2: judge whether the first two character (example: E8 BF) meets the overlapping interval (packet of UTF8 and GB18030
Include can display interval and can not display interval), be unsatisfactory for, then carry out in next step;
In this step, if the first two character meets the overlapping interval of UTF8 and GB18030, meet UTF8 with
The number of characters+2 of the overlapping interval of GB18030;Offset 2, return step 3.1.1 continues traversal statistics follow-up data, until can
The maximum length of offset;
Step 3.1.3: judging whether the first two character (example: E8 BF) meets cavity coding section, is unsatisfactory for, then carries out
In next step;
In this step, if the first two character meets cavity coding section, meet the number of characters+2 in cavity coding section;
Offset 2, return step 3.1.1 continues traversal statistics follow-up data, until the maximum length that can be deviated;
Step 3.1.4: judge whether first four character (example: 99 E5 of E8 BF) meets the four byte code area of GB18030
Between, it is unsatisfactory for, then carries out in next step;If it is satisfied, then determining that GB18030 type of coding is final type of coding;
Step 3.1.5: judge the first six character (example: 99 E5 B0 B1 of E8 BF) whether and meanwhile meet GB18030 coding
Section is encoded with non-common 6 combination of bytes of UTF8, is unsatisfactory for, then carries out in next step;
In this step, if the first six character meets GB18030 coding and the non-common 6 combination of bytes code areas UTF8 simultaneously
Between, then meet the number of characters+6 of GB18030 coding and the non-common 6 combination of bytes coding section UTF8 simultaneously;Offset 6 returns
Step 3.1.1 continues traversal statistics follow-up data, until the maximum length that can be deviated;
Step 3.1.6: judge the first six character (example: 99 E5 B0 B1 of E8 BF) whether and meanwhile meet GBB18030 volume
Code and non-6 byte codes of commonly using of UTF8 Chinese character code and UTF8 combine section, are unsatisfactory for, then carry out next step;
In this step, if the first six character meets GBB18030 coding and UTF8 Chinese character code simultaneously and UTF8 is non-common
6 byte codes combine section, then meet GBB18030 coding and UTF8 Chinese character code and the non-common 6 byte code groups of UTF8 simultaneously
Close the number of characters+6 in section;Offset 6, return step 3.1.1 continues traversal statistics follow-up data, until the maximum that can be deviated
Length;
Step 3.1.7: judge the first six character (example: 99 E5 B0 B1 of E8 BF) whether and meanwhile meet GB18030 coding
Code displaying section and UTF8 between the common code area UTF8 encode 6 combination of bytes sections, are unsatisfactory for, then carry out down
One step;
In this step, if the first six character meets simultaneously between GB18030 coding and the common code area UTF8
Code displaying section and UTF8 encode 6 combination of bytes sections, then meet simultaneously between GB18030 coding and the non-common code area UTF8
Can not code displaying section and UTF8 encode the number of characters+6 in 6 combination of bytes sections;Offset 6, return step 3.1.1, after
Continuous traversal statistics follow-up data, until the maximum length that can be deviated;
Step 3.1.8: judging whether first three character (example: E8 BF 99) meets UTF8 coding section, meets, then meets
The number of characters+3 in UTF8 coding section;Offset 3, return step 303A1 continues traversal statistics follow-up data, until can deviate
Maximum length.
In this step, if first three character is unsatisfactory for UTF8 coding section, current character is unsatisfactory for any code area
Between, determine the corresponding type of coding of the decoded data for GB18030 coding.
Circulation executes each step in step 3.1, until all characters traversal in decoded data finishes, this implementation
The statistical result of decoded data described in example are as follows: the character number for meeting ASCII coding section is 1, meets UTF8 volume
Code section character number is 12, and meeting UTF8 and GB18030 coding can show that the character number in superimposed coding section is 2,
The number of characters for meeting other coding sections is 0.
Step 3.2: type of coding decision is carried out according to statistical result;
Specifically, decoded total length of data M=15, M described in the embodiment of the present invention subtract null character and 0 character
Length after N=15;
The embodiment of the present invention carry out type of coding decision process the following steps are included:
Step 3.2.1: judging whether N is equal to the character length 1 in the section ASCII, and judging result is not equal to progress is next
Step;
Step 3.2.2: judge N whether be equal to UTF8 coding section character length 12 and ASCII encode section character it is long
The sum of degree 1 13, judging result are not equal to progress is in next step;
Step 3.2.3: judge whether N is equal to the character length 0 in GB18030 coding section and ASCII encodes section character
The sum of length 11, judging result are not equal to progress is in next step;
Step 3.2.4: judge whether N is equal to the character length 12 in UTF8 coding section, ASCII coding section character length
1 and UTF8 coding and GB18030 coding the sum of overlapping interval character length 2 15, judging result be equal to, it is believed that after the decoding
The corresponding type of coding of data be UTF8, provide need decoding data type of coding be UTF8.
Step 3.3: the type of coding for exporting critical data is UTF8;
Step 4: according to the data encoding type, being carried out according to the coded format of browser or other coded format demands
Transcoding.
In this step, if the type of coding recognized is UTF8, and the coded format of browser is GB18030, then needs
The data of UTF8 format are converted to the data of GB18030 format, can be shown, it otherwise will messy code.
The network message that second specific embodiment of the invention is analyzed is as follows, in the present embodiment, is with HTTP message
Example, but it is not limited to HTTP message;The present embodiment only belongs to a part of inventive concept, not full content.
In the present embodiment, network message is as follows:
POST/aj/mblog/add? domain=2869929424&ajwvr=6&__rnd=
1416799662398HTTP/1.1
Host:weibo.com
Connection:keep-alive
... ... .. is omited
Location=v6_content_home&appkey=&style_type=1&pic_id=&te xt=%
EA%89%81%ED%84%87&pdetail=&rank=0&rankid=&module=stissue &pub_type=
Dialog&_t=0
The present embodiment the method process referring to shown in Fig. 2, specifically includes the following steps:
Step 1: the critical data after extracting the keyword " text=" in the message;
In this step, the critical data are as follows: %ea%89%81%ed%84%87;
Step 2: the keyword that will be extracted: %EA%89%81%ED%84%87 is decoded;
In this step, decoded hexadecimal number are as follows: 89 81 ED 8487 of EA;
Step 3: decoded data are subjected to data encoding type identification;
In the present embodiment, step 3 includes following sub-step:
Step 3.1: code character statistics is carried out to the decoded data;
In this step, according to preset coding section, priority relationship and weight, from high priority to low priority pair
The data carry out traversal statistics, specific as follows:
Step 3.1.1: judging whether first character (example: EA) meets ASCII coding section, is unsatisfactory for, then carries out down
One step;
In this step, if first character E8 meets ASCII coding section, meet the character in ASCII coding section
Number+1;Offset 1, return step 3.1.1 continues traversal statistics follow-up data, until the maximum length that can be deviated;
Step 3.1.2: judge whether the first two character (example: EA 89) meets the overlapping interval (packet of UTF8 and GB18030
Include can display interval and can not display interval), be unsatisfactory for, then carry out in next step;
In this step, if the first two character meets the overlapping interval of UTF8 and GB18030, meet UTF8 with
The number of characters+2 of the overlapping interval of GB18030;Offset 2, return step 3.1.1 continues traversal statistics follow-up data, until can
The maximum length of offset;
Step 3.1.3: judging whether the first two character (example: EA 89) meets cavity coding section, is unsatisfactory for, then carries out
In next step;
In this step, if the first two character meets cavity coding section, meet the number of characters+2 in cavity coding section;
Offset 2, return step 3.1.1 continues traversal statistics follow-up data, until the maximum length that can be deviated;
Step 3.1.4: judge whether first four character (example: 89 81 ED of EA) meets the four byte code area of GB18030
Between, it is unsatisfactory for, then carries out in next step, if it is satisfied, then determining that GB18030 is encoded to final type of coding;
Step 3.1.5: judge the first six character (example: 89 81 ED 84 87 of EA) whether and meanwhile meet GB18030 coding
Section is encoded with non-common 6 combination of bytes of UTF8, is met, then meets GB18030 coding and non-common 6 combination of bytes of UTF8 simultaneously
The number of characters+6 in section is encoded, deviates 6, return step 3.1.1, continues traversal statistics follow-up data;
In this step, if the first six character does not meet GB18030 coding and the non-common 6 combination of bytes coding of UTF8 simultaneously
Section, then current character is unsatisfactory for any coding section, determines that the corresponding type of coding of the decoded data is GB18030
Coding.
Circulation executes each step in step 3.1, until all characters traversal in decoded data finishes, this implementation
The statistical result of decoded data described in example are as follows: meeting the character number that non-common 6 combination of bytes of UTF8 encode section is
6, the number of characters for meeting other coding sections is 0.
Step 3.2: type of coding decision is carried out according to statistical result;
Specifically, decoded total length of data M=6 described in the embodiment of the present invention, subtracts the length of null character and 0 character
It is N=6 after degree;
The embodiment of the present invention carry out type of coding decision process the following steps are included:
Step 3.2.1: judging whether N is equal to the character length 0 in the section ASCII, and judging result is not equal to progress is next
Step;
Step 3.2.2: judge whether N is equal to the character length 0 in UTF8 coding section and ASCII encodes section character length
The sum of 00, judging result is not equal to progress is in next step;
Step 3.2.3: judge whether N is equal to the character length 0 in GB18030 coding section and ASCII encodes section character
The sum of length 00, judging result are not equal to progress is in next step;
Step 3.2.4: judge whether N is equal to the character length 0 in UTF8 coding section, ASCII coding section character length 0
And the sum of UTF8 coding and GB18030 coding overlapping interval character length 00, judging result are not equal to progress is in next step;
Step 3.2.5: judge whether N is equal to while meeting GB18030 coding and the non-common 6 byte code combination regions UTF8
Between character length be 6, judging result be equal to, it is believed that the corresponding type of coding of the decoded data is GB18030, is mentioned
The type of coding of supply and demand decoding data is GB18030.
Step 3.3: the type of coding for exporting critical data is GB18030;
Step 4: according to the type of coding, being turned according to the coded format of browser or other coded format demands
Code.
The network message that third specific embodiment of the present invention is analyzed is as follows, in the present embodiment, is with HTTP message
Example, but it is not limited to HTTP message;The present embodiment only belongs to a part of inventive concept, not full content.
In the present embodiment, network message is as follows:
POST/f/commit/post/add HTTP/1.1
Host:tieba.baidu.com
Connection:keep-alive
………
Content=%EA%89%81%ED%84%87%ED%84%87%ED%84%87%ED%84%
87&files=%5B%5D&mouse_pwd_isclick=0&__type__=reply
The present embodiment the method process referring to shown in Fig. 2, specifically includes the following steps:
Step 1: extracting the critical data in network message;
Critical data in this step, after extracting the keyword content=in the message;Then critical data are as follows: %
EA%89%81%ED%84%87%ED%84%87%ED%84%87%ED%84%87;
Step 2: the critical data extracted is decoded;
Decoded hexadecimal number are as follows: 84 87 ED of EA 89 81 ED, 84 87 ED, 84 87 ED 84 87;
Step 3: decoded data are subjected to data encoding type identification;
In the present embodiment, step 3 includes following sub-step:
Step 3.1: code character statistics is carried out to the decoded data;
In this step, according to preset coding section, priority relationship and weight, from high priority to low priority
Traversal statistics is carried out to the data, specific as follows:
Step 3.1.1: judging whether first character (example: EA, ED) meets ASCII coding section, is unsatisfactory for, then carries out
In next step;
In this step, if first character E8 meets ASCII coding section, meet the character in ASCII coding section
Number+1;Offset 1, return step 3.1.1 continues traversal statistics follow-up data, until the maximum length that can be deviated;
Step 3.1.2: judge whether the first two character (example: EA 89, ED 84) meets the overlay region of UTF8 and GB18030
Between (including can display interval and can not display interval), be unsatisfactory for, then carry out in next step;
In this step, if the first two character meets the overlapping interval of UTF8 and GB18030, meet UTF8 with
The number of characters+2 of the overlapping interval of GB18030;Offset 2, return step 3.1.1 continues traversal statistics follow-up data, until can
The maximum length of offset;
Step 3.1.3: judging whether the first two character (example: EA 89, ED 84) meets cavity coding section, is unsatisfactory for,
It then carries out in next step;
In this step, if the first two character meets cavity coding section, meet the number of characters+2 in cavity coding section;
Offset 2, return step 3.1.1 continues traversal statistics follow-up data, until the maximum length that can be deviated;
Step 3.1.4: judge whether first four character (example: 84 87 ED of EA 89 81 ED, ED) meets GB18030's
Four byte code section, is unsatisfactory for, then carries out in next step, if it is satisfied, then determining that GB18030 is encoded to final type of coding;
Step 3.1.5: judge that (example: 84 87 ED 84 87 of EA 89 81 ED 84 87, ED is all satisfied the first six character
This part) whether meet GB18030 coding and the non-common 6 combination of bytes coding section UTF8 simultaneously, meet, then meets simultaneously
The number of characters+6 of GB18030 coding and the non-common 6 combination of bytes coding section UTF8, deviates 6, return step 3.1.1 continues
Traversal statistics follow-up data.If conditions are not met, then carrying out in next step;
Step 3.1.6: judge the first six character (example: last three byte ED 84 87) whether and meanwhile meet GB18030
Coding and non-6 byte codes of commonly using of UTF8 Chinese character code and UTF8 combine section, are unsatisfactory for, then carry out next step;
In this step, if the first six character meets GBB18030 coding and UTF8 Chinese character code and UTF8 very simultaneously
Section is combined with 6 byte codes, then meets GBB18030 coding and UTF8 Chinese character code and non-common 6 byte codes of UTF8 simultaneously
Combine the number of characters+6 in section;Offset 6, return step 3.1.1 continues traversal statistics follow-up data, until can deviate most
Long length;
Step 3.1.7: judge the first six character (example: last three byte ED 84 87) whether and meanwhile meet GB18030
Code displaying section and UTF8 between encoding the common code area UTF8 encode 6 combination of bytes sections, are unsatisfactory for, then into
Row is in next step;
In this step, if the first six character meets simultaneously between GB18030 coding and the common code area UTF8
Code displaying section and UTF8 encode 6 combination of bytes sections, then meet simultaneously between GB18030 coding and the non-common code area UTF8
Can not code displaying section and UTF8 encode the number of characters+6 in 6 combination of bytes sections;Offset 6, return step 3.1.1, after
Continuous traversal statistics follow-up data, until the maximum length that can be deviated;
Step 3.1.8: judge whether first three character (example: last three byte ED 84 87) meets the code area UTF8
Between, meet, then meets the number of characters+3 in UTF8 coding section;Offset 3, return step 3.1.1 continue traversal and count subsequent number
According to.
Step 3.1.9: judging whether the first two character meets two byte code sections of GB18030 coding, meets, then full
The number of characters+2 in two byte code sections of sufficient GB18030 coding;Offset 2, return step 3.1.1 continues to traverse, until complete
At;
Step 3.1.10: whether interpretation has the character for being unsatisfactory for any coding section, if there is, then it is assumed that the volume of the data
Code type is GB18030, and providing GB18030 is final type of coding.
Circulation executes each step in step 3.1, until all characters traversal in decoded data finishes, this implementation
The statistical result of decoded data described in example are as follows: while meeting GB18030 coding and the non-common 6 combination of bytes coding of UTF8
The character number in section is 12, and the character number for meeting UTF8 coding section is 3, and the number of characters for meeting other coding sections is
0。
Step 3.2: type of coding decision is carried out according to statistical result;
Specifically, decoded total length of data M=15 described in the embodiment of the present invention, M subtract null character and 0 character
N=15 after length;
The embodiment of the present invention carry out type of coding decision process the following steps are included:
Step 3.2.1: judging whether N is equal to the character length 0 in the section ASCII, and judging result is not equal to progress is next
Step;
Step 3.2.2: judge whether N is equal to the character length 3 in UTF8 coding section and ASCII encodes section character length
The sum of 00, judging result is not equal to progress is in next step;
Step 3.2.3: judge whether N is equal to the character length 0 in GB18030 coding section and ASCII encodes section character
The sum of length 00, judging result are not equal to progress is in next step;
Step 3.2.4: judge whether N is equal to the character length 3 in UTF8 coding section, ASCII coding section character length 0
And the sum of UTF8 coding and GB18030 coding overlapping interval character length 03, judging result are not equal to progress is in next step;
Step 3.2.5: judge whether N is equal to while meeting GB18030 coding and the non-common 6 byte code combined characters of UTF8
According with length is 12, and judging result is not equal to progress is in next step;
Step 3.2.6: judging while meeting GB18030 coding and UTF8 coding section character length and the non-common volume of UTF8
Whether 6 byte code pattern length 12 of code is greater than 6, and if it is greater than 6, then decision goes out the corresponding coding class of the decoded data
Type is UTF8, and provides the type of coding.
In this step, due to UTF8 coding section weight it is higher, when simultaneously meet GB18030 coding and UTF8
When coding section character length and 6 byte code pattern lengths 12 of the non-common coding of UTF8 are greater than 6, determine described decoded
The corresponding type of coding of data is UTF8.
Step 3.3: the type of coding for exporting critical data is UTF8;
Step 4: according to the data encoding type, being carried out according to the coded format of browser or other coded format demands
Transcoding.
The advantage of the method for the identification of url data type of coding described in the embodiment of the present invention and transcoding is: not needing frequently
Encoding and decoding are carried out to data message, do not need to carry out data comparison to obtain the type of codings of data, thus the present invention is opposite
Efficiency is higher, better performances;Setting reference encoder array and alternative coding array are not needed, user is not needed and manually selects automatically
Literal code tool is detected, maintenance cost is reduced, the user experience is improved, and will not be because of reference encoder array and alternative coding
Array mistake causes data messy code, and accuracy rate is high, greatly reduces messy code rate;It does not need that preset type of coding is arranged, reduces
Maintenance cost, solves the problems, such as to lead to data messy code because of pre-arranged code mistake;By determining the overlapping interval of coding, draw
Showing code word section (can show code word) and can not show code word section (can not show code word) in overlapping interval is separated, is passed through
The syntagmatic that code word section can be shown, can not show code word section Yu other nonoverlapping intervals determines the type of coding of data, with solution
Certainly in overlapping interval the problem of Chinese incorrect codes;Based on user behavior, the uncommon degree of each coding codeword is divided, it is normal to sum up user
Code word section solves the Confused-code for meeting multiple type of coding sections because of multibyte code word combination;According to difference
The characteristics of type of coding, can carry out different degrees of offset when carrying out type of coding detection, to improve the effect of program operation
Rate;For the difficult problem of short text identification, the present invention can preferably solve the identification of short text type of coding.It can be effectively and quasi-
True identification data encoding type reduces messy code rate, reduces maintenance cost, improves user experience.
The embodiment of the invention also provides a kind of identification of type of coding and transcoding device, Fig. 3 is coding of the embodiment of the present invention
Type identification and transcoding device structural schematic diagram, as shown in figure 3, described device includes: critical data extraction unit 31, decoding list
First 32, type of coding recognition unit 33, transcoded data unit 34, wherein
The critical data extraction unit 31, the critical data in network message for extracting user's operation generation, and
Decoding unit is sent by the critical data of extraction;
Specifically, the critical data extraction unit 31 extracts the critical packet in the network message that user's operation generates
Include: the critical data extraction unit 31 extracts the critical data in the network message according to keyword or regular expression;
The decoding unit 32, for being decoded to the critical data, and decoded data are sent to coding class
Type recognition unit;
Specifically, it includes: 32 basis of decoding unit that the decoding unit 32, which is decoded the critical data,
URLENCODE compiles solution rule, and critical data is carried out URLDECODE decoding.
Described device further includes coding section division unit 35, for different type of codings to be divided into multiple code areas
Between, and determine the priority relationship in each coding section.
The type of coding recognition unit 33, for determining the corresponding coding class of the decoded data of the critical data
Type, and transcoded data unit is sent by determining type of coding;
The type of coding recognition unit includes configuration subelement 331, statistics subelement 331, decision subelement 333, removes
Peg unit 334, wherein
The configuration subelement 331, for loading the configuration information in each coding section;
The configuration information in each coding section includes but is not limited to the dividing condition in each coding section and corresponding preferential
Grade relationship;
The statistics subelement 332 calculates the word for meeting each coding section for looping through the decoded data
Accord with number;
The statistics subelement 332 is specifically used for: according to first pre-set priority in each coding section, successively described in judgement
Whether the character in decoded data meets each coding section;It counts and meets each coding section in the decoded data
Number of characters.
Specifically, the data that the statistics subelement 332 can limit traversal by the deflected length maximum value configured are long
Degree is total, and according to the priority relationship in each type of coding section, successively statistics meets and be unsatisfactory for the character number in each coding section,
If it is satisfied, then counting, continue to traverse subsequent data after offset, if conditions are not met, then continuing to match subsequent coding section;
The decision subelement 333, for according to meet in the decoded data it is each coding section character number,
And the priority relationship in each coding section carries out type of coding judgement, determines coding class corresponding to the decoded data
Type;
The decision subelement 333 is specifically used for: according to second pre-set priority in each coding section, successively judgement decoding
The number of characters and the decoded data for meeting each coding section in data afterwards deduct the total length after null character and 0 character
Between relationship, empty word is deducted according to the number of characters and the decoded data that meet each coding section in decoded data
The relationship between total length after symbol and 0 character determines type of coding corresponding to the critical data.
Specifically, the decision subelement 333 according to meet in decoded data it is each coding section character number with
And the sequence of the priority relationship in each coding section from high to low, successively judge the corresponding coding class of the decoded data
Type exports if may determine that the corresponding type of coding of the decoded data as a result, otherwise, continuing next time
Type of coding judgement, until the corresponding type of coding of the decoded data can be determined, or in the volume that can not be determined
The data encoding type of default is exported in the case where code type.
The revocation subelement 334, for discharging the configuration information in each coding section.
In the embodiment of the present invention, the statistics subelement 332 loops through the decoded data, calculates and meets respectively
During the character number for encoding section, traversal and statistics meet the character number in each coding section and are unsatisfactory for each code area
Between character number, if there is the character for being unsatisfactory for each coding section in ergodic process, and may determine that the volume of data
Code type, then traversal terminates, and provides type of coding;Otherwise the statistics subelement 332 continues to traverse follow-up data, until time
It goes through to the maximum value of offset, after traversal, the decision subelement 333 is sentenced further according to the priority relationship in each coding section
Each character number for encoding section that breaks judges the type of coding of the decoded data, and provides type of coding.
The transcoded data unit 34, for according to the type of coding, to the decoded data of the critical data into
Row transcoding.
In the embodiment of the present invention, the coding section division unit 35 be also used to different type of codings is divided into it is more
A coding section, and when the priority relationship in determining each coding section, the corresponding power in section is respectively encoded determines according to actual conditions
Weight;The type of coding recognition unit 33 is each according to meeting in decoded data during carrying out type of coding judgement
The priority relationship and weight of the character number and each coding section that encode section carry out type of coding judgement;For example, working as
When the character of preceding judgement meets the overlapping interval in two coding sections, the coding section that the character meets is determined according to weight,
Such as: the coding section that the biggish coding section of weight is met as current character.
The realization function that unit is managed everywhere in the identification of type of coding shown in Fig. 3 and transcoding, can refer to afore-mentioned code
Type identification and the associated description of code-transferring method and understand.It will be appreciated by those skilled in the art that type of coding shown in Fig. 3 is known
The function of each processing unit can be realized and running on the program on processor in other and transcoding device, can also be by specific
Logic circuit and realize, such as: can by central processing unit (CPU), microprocessor (MPU), digital signal processor (DSP) or
Field programmable gate array (FPGA) is realized;The storage unit can also be realized by various memories or storage medium.
In several embodiments provided by the present invention, it should be understood that disclosed method, apparatus and system, it can be with
It realizes in other way.The apparatus embodiments described above are merely exemplary, for example, the division of the unit,
Only a kind of logical function partition, there may be another division manner in actual implementation, such as: multiple units or components can be tied
It closes, or is desirably integrated into another system, or some features can be ignored or not executed.In addition, shown or discussed each group
It can be through some interfaces at the mutual communication connection in part, the indirect coupling or communication connection of equipment or unit can
To be electrical, mechanical or other forms.
Above-mentioned unit as illustrated by the separation member, which can be or may not be, to be physically separated, aobvious as unit
The component shown can be or may not be physical unit, it can and it is in one place, it may be distributed over multiple network lists
In member;Some or all of units can be selected to achieve the purpose of the solution of this embodiment according to the actual needs.
In addition, each functional unit in various embodiments of the present invention can be fully integrated in one processing unit, it can also
To be each unit individually as a unit, can also be integrated in one unit with two or more units;It is above-mentioned
Integrated unit both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.
Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above method embodiment can pass through
The relevant hardware of program instruction is completed, and program above-mentioned can store in computer-readable storage medium, which exists
When execution, step including the steps of the foregoing method embodiments is executed;And storage medium above-mentioned includes: movable storage device, read-only deposits
The various media that can store program code such as reservoir (ROM, Read-Only Memory), magnetic or disk.
Alternatively, if the above-mentioned integrated unit of the embodiment of the present invention is realized in the form of SFU software functional unit and as independence
Product when selling or using, also can store in a computer readable storage medium.Based on this understanding, this hair
Substantially the part that contributes to existing technology can body in the form of software products in other words for the technical solution of bright embodiment
Reveal and, which is stored in a storage medium, including some instructions are with so that a computer is set
Standby (can be personal computer, server or network equipment etc.) executes the whole of each embodiment the method for the present invention
Or part.And storage medium above-mentioned includes: that movable storage device, ROM, magnetic or disk etc. are various can store program generation
The medium of code.
The present invention is the data encoding type identification recorded in example and code-transferring method, device are only with above-described embodiment
Example, but it is not limited only to this, those skilled in the art should understand that: it still can be to documented by foregoing embodiments
Technical solution is modified, or equivalent substitution of some or all of the technical features;And these are modified or replace
It changes, the range for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.
Claims (6)
1. a kind of data encoding type identification and code-transferring method, which is characterized in that the described method includes:
The critical data in the network message that user's operation generates is extracted, the critical data is decoded;
Determine the corresponding type of coding of the decoded data of the critical data;
According to the type of coding, transcoding is carried out to the decoded data of the critical data;
Wherein, the corresponding type of coding of the decoded data of the determination critical data includes: each coding section of load
Configuration information;The decoded data are looped through, the character number for meeting each coding section is calculated;After the decoding
Data in meet the character number in each coding section and the priority relationship in each coding section carries out type of coding judgement,
Determine type of coding corresponding to the decoded data;Discharge the configuration information in each coding section;
Described to loop through the decoded data, the character number that calculating meets each coding section includes:
According to first pre-set priority in each coding section, successively judge whether the character in the decoded data meets respectively
Encode section;Count the number of characters for meeting each coding section in the decoded data;
It is described according to the character number for meeting each coding section in decoded data and the priority relationship in each coding section
Type of coding judgement is carried out, determines that type of coding corresponding to the decoded data includes:
According to second pre-set priority in each coding section, the character for meeting each coding section in decoded data is successively judged
Relationship between several total lengths with after the decoded data deduction null character and 0 character, according in decoded data
Meet the relationship between the number of characters in each coding section and the total length after the decoded data deduction null character and 0 character
Determine type of coding corresponding to the critical data.
2. method according to claim 1, which is characterized in that the key extracted in the network message that user's operation generates
Data include: to extract the critical data in the network message according to keyword or regular expression.
3. method according to claim 1, which is characterized in that the method also includes:
Different type of codings is divided into multiple coding sections, and determines the priority relationship in each coding section.
4. a kind of data encoding type identification and transcoding device, which is characterized in that described device includes: that critical data extracts list
Member, decoding unit, type of coding recognition unit, transcoded data unit, wherein
The critical data extraction unit, the critical data in network message for extracting user's operation generation, and will extract
Critical data be sent to decoding unit;
The decoding unit, for being decoded to the critical data, and decoded data are sent to type of coding identification
Unit;
The type of coding recognition unit, for determining the corresponding type of coding of the decoded data of the critical data, and will
Determining type of coding is sent to transcoded data unit;
The transcoded data unit, for carrying out transcoding to the decoded data of the critical data according to the type of coding;
Wherein, the type of coding recognition unit includes configuration subelement, statistics subelement, decision subelement, cancels subelement,
Wherein, the configuration subelement, for loading the configuration information in each coding section;The statistics subelement, for recycling
The decoded data are traversed, the character number for meeting each coding section is calculated;The decision subelement, for according to
The priority relationship of the character number and each coding section that meet each coding section in decoded data carries out type of coding
Judgement, determines type of coding corresponding to the decoded data;The revocation subelement, for discharging each code area
Between configuration information;
The statistics subelement is specifically used for: according to first pre-set priority in each coding section, after successively judging the decoding
Data in character whether meet each coding section;Count the character for meeting each coding section in the decoded data
Number;
The decision subelement is specifically used for: according to second pre-set priority in each coding section, successively judging decoded number
Between total length after deducting null character and 0 character according to the middle number of characters for meeting each coding section and the decoded data
Relationship deducts null character and 0 according to the number of characters and the decoded data that meet each coding section in decoded data
The relationship between total length after character determines type of coding corresponding to the critical data.
5. device according to claim 4, which is characterized in that the critical data extraction unit is specifically used for: according to key
Word or regular expression extract the critical data in the network message.
6. device according to claim 4, which is characterized in that described device further includes coding section division unit, and being used for will
Different type of codings is divided into multiple coding sections, and determines the priority relationship in each coding section.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510249023.0A CN104994128B (en) | 2015-05-15 | 2015-05-15 | A kind of identification of data encoding type and code-transferring method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510249023.0A CN104994128B (en) | 2015-05-15 | 2015-05-15 | A kind of identification of data encoding type and code-transferring method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104994128A CN104994128A (en) | 2015-10-21 |
CN104994128B true CN104994128B (en) | 2019-04-26 |
Family
ID=54305879
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510249023.0A Active CN104994128B (en) | 2015-05-15 | 2015-05-15 | A kind of identification of data encoding type and code-transferring method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104994128B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107766305A (en) * | 2017-10-23 | 2018-03-06 | 广东欧珀移动通信有限公司 | Decoding algorithm determines method, apparatus, terminal and storage medium |
CN107729302B (en) * | 2017-10-23 | 2021-10-15 | Oppo广东移动通信有限公司 | Decoding algorithm determination method, device, terminal and storage medium |
CN107770844B (en) * | 2017-10-23 | 2020-12-29 | Oppo广东移动通信有限公司 | Decoding algorithm determination method, device, terminal and storage medium |
CN107797976A (en) * | 2017-10-23 | 2018-03-13 | 广东欧珀移动通信有限公司 | Decoding algorithm determines method, apparatus, terminal and storage medium |
CN109625079B (en) * | 2018-10-24 | 2021-09-14 | 蔚来(安徽)控股有限公司 | Control method and controller for Electric Power Steering (EPS) system of automobile |
CN109495214B (en) * | 2018-11-26 | 2020-03-24 | 电子科技大学 | Channel coding type identification method based on one-dimensional inclusion structure |
CN113595683A (en) * | 2021-07-07 | 2021-11-02 | 西安震有信通科技有限公司 | Conversion processing method, device, terminal and medium based on various encoding files |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101526963A (en) * | 2009-04-17 | 2009-09-09 | 深圳华为通信技术有限公司 | Method for identifying web page coding, device and terminal equipment |
CN103207877A (en) * | 2012-01-17 | 2013-07-17 | 阿里巴巴集团控股有限公司 | Decoding method and device |
CN103593277A (en) * | 2012-08-15 | 2014-02-19 | 深圳市世纪光速信息技术有限公司 | Log processing method and system |
CN104361021A (en) * | 2014-10-21 | 2015-02-18 | 小米科技有限责任公司 | Webpage encoding identifying method and device |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8775919B2 (en) * | 2006-04-25 | 2014-07-08 | Adobe Systems Incorporated | Independent actionscript analytics tools and techniques |
-
2015
- 2015-05-15 CN CN201510249023.0A patent/CN104994128B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101526963A (en) * | 2009-04-17 | 2009-09-09 | 深圳华为通信技术有限公司 | Method for identifying web page coding, device and terminal equipment |
CN103207877A (en) * | 2012-01-17 | 2013-07-17 | 阿里巴巴集团控股有限公司 | Decoding method and device |
CN103593277A (en) * | 2012-08-15 | 2014-02-19 | 深圳市世纪光速信息技术有限公司 | Log processing method and system |
CN104361021A (en) * | 2014-10-21 | 2015-02-18 | 小米科技有限责任公司 | Webpage encoding identifying method and device |
Also Published As
Publication number | Publication date |
---|---|
CN104994128A (en) | 2015-10-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104994128B (en) | A kind of identification of data encoding type and code-transferring method and device | |
CN108737333B (en) | Data detection method and device | |
CN109246064B (en) | Method, device and equipment for generating security access control and network access rule | |
JP6055548B2 (en) | Apparatus, method, and network server for detecting data pattern in data stream | |
CN107341399B (en) | Method and device for evaluating security of code file | |
CN105322969B (en) | The method and device of data compression and decompression | |
CN111352907A (en) | Method and device for analyzing pipeline file, computer equipment and storage medium | |
CN107545451B (en) | Advertisement pushing method and device | |
CN105224600B (en) | A kind of detection method and device of Sample Similarity | |
EP2585962A1 (en) | Password checking | |
CN108234347A (en) | A kind of method, apparatus, the network equipment and storage medium for extracting feature string | |
CN112163008A (en) | Big data analysis-based user behavior data processing method and cloud computing platform | |
CN104765882B (en) | A kind of internet site statistical method based on web page characteristics character string | |
CN111708921B (en) | Number selection method, device, equipment and storage medium | |
CN110598109A (en) | Information recommendation method, device, equipment and storage medium | |
CN111563560A (en) | Data stream classification method and device based on time sequence feature learning | |
CN109558531A (en) | News information method for pushing, device and computer equipment | |
CN113364784B (en) | Detection parameter generation method and device, electronic equipment and storage medium | |
WO2018077059A1 (en) | Barcode identification method and apparatus | |
CN110830499B (en) | Network attack application detection method and system | |
CN112631945A (en) | Test case generation method and device and storage medium | |
CN117294480A (en) | Account security detection method and device, electronic equipment and storage medium | |
CN115146174B (en) | Multi-dimensional weight model-based key clue recommendation method and system | |
CN111049813A (en) | Message assembling method, message analyzing method, message assembling device, message analyzing device and storage medium | |
CN114390015B (en) | Data pushing system, method, equipment and storage medium based on object model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |