CN104361021B - Method for identifying web page coding and device - Google Patents

Method for identifying web page coding and device Download PDF

Info

Publication number
CN104361021B
CN104361021B CN201410562477.9A CN201410562477A CN104361021B CN 104361021 B CN104361021 B CN 104361021B CN 201410562477 A CN201410562477 A CN 201410562477A CN 104361021 B CN104361021 B CN 104361021B
Authority
CN
China
Prior art keywords
coding mode
resources
mode
web page
coding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410562477.9A
Other languages
Chinese (zh)
Other versions
CN104361021A (en
Inventor
左景龙
范金松
田凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiaomi Inc
Original Assignee
Xiaomi Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to CN201410562477.9A priority Critical patent/CN104361021B/en
Application filed by Xiaomi Inc filed Critical Xiaomi Inc
Priority to MX2015003807A priority patent/MX361564B/en
Priority to JP2016554794A priority patent/JP6130976B2/en
Priority to RU2015110973A priority patent/RU2610245C2/en
Priority to KR1020157007129A priority patent/KR20160059455A/en
Priority to PCT/CN2015/071308 priority patent/WO2016061930A1/en
Priority to BR112015006725A priority patent/BR112015006725A2/en
Publication of CN104361021A publication Critical patent/CN104361021A/en
Priority to US14/684,855 priority patent/US20160112491A1/en
Priority to EP15178533.4A priority patent/EP3012750A1/en
Application granted granted Critical
Publication of CN104361021B publication Critical patent/CN104361021B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)
  • Document Processing Apparatus (AREA)
  • Digital Computer Display Output (AREA)

Abstract

The disclosure is directed to a kind of method for identifying web page coding and devices, belong to computer network field.The method includes:Web data is loaded, the web data includes at least one web page resources;Detect whether the web page resources are hypertext markup language HTML resources and state coding mode;If the web page resources are HTML resources but without proper notice coding mode, the coding mode of the HTML resources is identified;The HTML resources are decoded using with the decoding process corresponding to the coding mode that recognizes.When the disclosure solves " charset " field in web page coding in the related technology and fails to write, the problem of browser may show mess code;Even if having reached without proper notice coding mode in web page resources, the effect that also can normally decode web page resources and be shown.

Description

Method for identifying web page coding and device
Technical field
This disclosure relates to computer network field, more particularly to a kind of method for identifying web page coding and device.
Background technology
With the development of network technology, browser in user's using terminal browses a kind of work(that webpage is most-often used Energy.
Since web data may be encoded using different coding modes, browser is firstly the need of according to web data In " charset " field identify the coding mode of web data, and then use decoding process corresponding with the coding mode Web data is decoded, then web data is shown.But it is built due to website and is got over the technology of web page editing Come more universal, can be failed to write in the web data of many technical staff exploitation or mistake writes " charset " field, at this point, browser is adopted It is decoded with the decoding process of acquiescence, it is possible to can show mess code.
Invention content
In order to which " charset " field solved in web page coding in the related technology fails to write or when mistake writes, browser can be shown The problem of showing mess code, the embodiment of the present disclosure provide a kind of method for identifying web page coding and device.The technical solution is as follows:
According to a kind of method for identifying web page coding that the embodiment of the present disclosure provides, this method includes:
Web data is loaded, web data includes at least one web page resources;
Whether detection web page resources are HTML resources and state coding mode;
If web page resources are HTML resources but without proper notice coding mode, the coding mode of HTML resources is identified;
HTML resources are decoded using with the decoding process corresponding to the coding mode that recognizes.
In one embodiment, this method further includes:
If web page resources are HTML resources but have stated coding mode, detect whether declared coding mode is default One kind in coding mode;
If declared coding mode is not one kind in pre-arranged code mode, the coding mode of HTML resources is identified; Or, carrying out automatic error-correcting to declared coding mode, the coding mode after automatic error-correcting is obtained.
In one embodiment, the coding mode of HTML resources is identified, including:
Call the coding mode of scheduled character code recognizer identification HTML resources.
In one embodiment, automatic error-correcting is carried out to declared coding mode, obtains the coding staff after automatic error-correcting Formula, including:
Each in declared coding mode and pre-arranged code mode is calculated separately into spelling similarity;
When highest spelling similarity is more than predetermined threshold value, by the pre-arranged code side corresponding to highest spelling similarity Formula is determined as the coding mode after automatic error-correcting.
In one embodiment, this method further includes:
If web page resources are CSS resources, the coding mode that the HTML resources in web data use is identified as CSS moneys The coding mode in source decodes CSS resources using with the decoding process corresponding to coding mode.
According to the second aspect of the embodiment of the present disclosure, a kind of web page coding identification device is provided, which includes:
Data load-on module is configured as load web data, and web data includes at least one web page resources;
Mode detection module, is configured as whether detection web page resources are HTML resources and state coding mode;
Mode identification module is configured as when web page resources are HTML resources but without proper notice coding mode, identifies HTML The coding mode of resource;
Resource decoder module, be configured as using with the decoding process decoding HTML moneys corresponding to the coding mode that recognizes Source.
In one embodiment, device further includes:
Code detection module is configured as being HTML resources but having stated coding mode when web page resources, then detects sound Whether bright coding mode is one kind in pre-arranged code mode;
Mode identification module is configured as, when declared coding mode is not a kind of in pre-arranged code mode, knowing The coding mode of other HTML resources;Or, automatic error-correcting module, is configured as when declared coding mode not being pre-arranged code side When a kind of in formula, automatic error-correcting is carried out to declared coding mode, obtains the coding mode after automatic error-correcting.
In one embodiment, mode identification module is configured as that scheduled character code recognizer is called to identify The coding mode of HTML resources.
In one embodiment, automatic error-correcting module, including:
Computational submodule is spelt, is configured as each difference in declared coding mode and pre-arranged code mode Calculate spelling similarity;
Automatic error-correcting submodule is configured as when highest spelling similarity is more than predetermined threshold value, by highest spelling Pre-arranged code mode corresponding to similarity is determined as the coding mode after automatic error-correcting.
In one embodiment, which further includes:
The negative module of coding is configured as when web page resources being CSS resources, then uses the HTML resources in web data Coding mode be identified as the coding modes of CSS resources, decode CSS resources using with the decoding process corresponding to coding mode.
According to the third aspect of the disclosure, a kind of web page coding identification device is provided, which includes:
Processor;
Memory for the executable instruction for storing processor;
Wherein, processor is configured as:
Web data is loaded, web data includes at least one web page resources;
Whether detection web page resources are hypertext markup language HTML resources and state coding mode;
If web page resources are HTML resources but without proper notice coding mode, the coding mode of HTML resources is identified;
HTML resources are decoded using with the decoding process corresponding to the coding mode that recognizes.
The technical solution that the embodiment of the present disclosure provides can include the following benefits:
By in web page resources without proper notice coding mode, identifying the coding mode of web page resources, and use and the coding The corresponding decoding process of mode is decoded web page resources;Solves " charset " word in web page coding in the related technology The problem of section is when failing to write, and browser may show mess code;Even if having reached without proper notice coding mode in web page resources, also can Normal decoding web page resources and the effect shown.
It should be understood that above general description and following detailed description is only exemplary and explanatory, not The disclosure can be limited.
Description of the drawings
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure Example, and together with specification for explaining the principles of this disclosure.
Fig. 1 is a kind of flow chart of method for identifying web page coding shown according to an exemplary embodiment;
Fig. 2 is a kind of flow chart of the method for identifying web page coding shown according to another exemplary embodiment;
Fig. 3 is a kind of block diagram of web page coding device shown according to an exemplary embodiment;
Fig. 4 is a kind of block diagram of the web page coding identification device shown according to another exemplary embodiment;
Fig. 5 is a kind of block diagram of web page coding identification device shown according to an exemplary embodiment.
Through the above attached drawings, it has been shown that the specific embodiment of the disclosure will be hereinafter described in more detail.These attached drawings It is not intended to limit the scope of this disclosure concept by any means with verbal description, but is by referring to specific embodiments Those skilled in the art illustrate the concept of the disclosure.
Specific implementation mode
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all implementations consistent with this disclosure.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects be described in detail in claims, the disclosure.
Involved terminal can be mobile phone, tablet computer, E-book reader, MP3 player in the embodiment of the present disclosure (Moving Picture Experts Group Audio Layer III, dynamic image expert's compression standard audio level 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert's compression standard audio level 4) player, pocket computer on knee and desktop computer etc..
Fig. 1 is a kind of flow chart of method for identifying web page coding shown according to an exemplary embodiment, the present embodiment with Method for identifying web page coding application illustrates in the terminal.The method for identifying web page coding may include following several steps Suddenly:
In a step 101, web data is loaded, web data includes at least one web page resources.
Web page resources are generally divided into 2 types:HTML (HyperText Mark-up Language, hypertext markup language Speech) resource and CSS (Cascading Style Sheets, cascading style sheets) resource.
In a step 102, whether detection web page resources are HTML resources and state coding mode.
In step 103, if web page resources are HTML resources but without proper notice coding mode, the coding of HTML resources is identified Mode.
At step 104, HTML resources are decoded using with the decoding process corresponding to the coding mode that recognizes.
In conclusion method for identifying web page coding provided in this embodiment, by web page resources without proper notice coding mode When, identify the coding mode of web page resources, and be decoded to web page resources using decoding process corresponding with the coding mode; When solving " charset " field in web page coding in the related technology and failing to write, the problem of browser can show mess code;Reach Even if without proper notice coding mode in web page resources, the effect that also can normally decode web page resources and be shown.
Fig. 2 is a kind of flow chart of the method for identifying web page coding shown according to another exemplary embodiment.The present embodiment It is illustrated using in the terminal with the method for identifying web page coding.The method for identifying web page coding may include following several Step:
In step 201, web data is loaded, web data includes at least one web page resources.
Terminal loads the web data of the webpage first when needing to show a webpage.The web data of each webpage Include at least one web page resources.
Web page resources can be divided into two kinds:HTML resources and CSS resources.
In step 202, whether detection web page resources are HTML resources.
Before decoding each web page resources, terminal detects whether web page resources are HTML resources first.
If web page resources are HTML resources, 203 are entered step;
If web page resources are CSS resources, 210 are entered step.
In step 203, whether detection HTML resources state coding mode.
Common coding mode includes:UTF-8 (8-bit Unicode Transformation Format, 8 bits Unicode format transformations), Big5 (Big5), GB2312 (Chinese Character Set Code for Informati), GBK (information exchange use Hanzi coded character set), ISO-8859-1 (International Organization for Standardization, state Border standardization body), ISO-8859-2 etc..
HTML resources generally use " charset " field states coding mode used in itself.But due to webpage The level of developer is different, and " charset " field in HTML resources may be failed to write or mistake is write.
If HTML resource without proper notice coding modes, enter step 204;
If HTML resource declarations coding mode, enters step 206.
In step 204, if HTML resource without proper notice coding modes, identify the coding mode of HTML resources.
Terminal can call the coding mode of scheduled character code recognizer identification HTML resources.Scheduled character is known Other algorithm can be chardet character code recognizers.
For example, working as HTML resource without proper notice coding modes, then terminal calls chardet character code recognizers to identify Coding mode used by the HTML resources is GB2312 codings.
Chardet character code recognizers are a kind of algorithms of the coded format of character string for identification.It is usually used in pair The identification of the coded format of text character.
In order to accelerate recognition speed, terminal can extract the character string of predetermined length in HTML resources, pass through scheduled word Symbol code identification algorithm identifies the coding mode of the character string of the predetermined length.Without to all words in entire HTML resources Symbol string is all identified.
In step 205, HTML resources are decoded using with the decoding process corresponding to the coding mode that recognizes.
After the coding mode used by identifying HTML resources, terminal use with corresponding to the coding mode that recognizes Decoding process decodes HTML resources.
In step 206, if HTML resources have stated coding mode, detect whether declared coding mode is default One kind in coding mode.
When in HTML resources it is stated that when coding mode, since misspelling may occur for the coding mode of statement, Terminal needs to detect whether declared coding mode is one kind in pre-arranged code mode.
Pre-arranged code mode includes but not limited to:UTF-8 (8-bit Unicode Transformation Format, 8 Bit Unicode format transformations), Big5 (Big5), GB2312 (Chinese Character Set Code for Informati), GBK (information hand over Use Hanzi coded character set instead), ISO-8859-1 (International Organization for Standardization, International Organization for standardization), ISO-8859-2 etc..
If declared coding mode is one kind in pre-arranged code mode, 207 are entered step;
If declared coding mode is not one kind in pre-arranged code mode, 208 are entered step.
In step 207, if declared coding mode is one kind in pre-arranged code mode, declared volume is used Decoding process corresponding to code mode decodes HTML resources.
In a kind of during declared coding mode is pre-arranged code mode, show that declared coding mode is not spelled Write error, terminal is used decodes HTML resources with the decoding process corresponding to declared coding mode.
In a step 208, if declared coding mode is not one kind in pre-arranged code mode, HTML resources are identified Coding mode;Or, carrying out automatic error-correcting to declared coding mode, the coding mode after automatic error-correcting is obtained.
In a kind of during declared coding mode is pre-arranged code mode, show that declared coding mode exists and spell Write error.At this point, the present embodiment provides two different processing modes:
The first processing mode:The coding mode of terminal recognition HTML resources;
Identification method is identical as step 204, and terminal can call scheduled character code recognizer identification HTML resources Coding mode.Scheduled character recognition algorithm can be chardet character code recognizers.
Second of processing mode:The declared coding mode of terminal-pair carries out automatic error-correcting, obtains the volume after automatic error-correcting Code mode.
The process of automatic error-correcting is:Terminal counts declared coding mode and each in pre-arranged code mode respectively Spelling similarity is calculated, if pre-arranged code mode there are 6 kinds, can be calculated to 6 spelling similarities.When highest spelling similarity When more than predetermined threshold value, the pre-arranged code mode corresponding to highest spelling similarity is determined as the volume after automatic error-correcting by terminal Code mode.
For example, declared coding mode is " GB2812 ", pre-arranged code mode has 6 kinds, calculated spelling similarity Also there are 6 kinds.Wherein, it is 83% there are highest spelling similarity with pre-arranged code mode " GB2312 ", is more than predetermined threshold value 60%.So pre-arranged code mode " GB2312 " is determined as the coding mode after automatic error-correcting by terminal.
Make it should be noted is that the first processing mode and second processing mode can select one and use or combine With.As a kind of possible combined use mode:It is first handled using second processing mode, if but highest spelling is similar Degree is less than predetermined threshold value, alternatively, there are two or more pre-arranged code modes all to have highest spelling similarity When, terminal can re-recognize the coding mode of HTML resources using the first processing mode again.
In step 209, using re-recognize or automatic error-correcting after coding mode corresponding to decoding process decode HTML resources.
In step 210, if web page resources are CSS resources, by the coding staff of the HTML resources use in web data Formula is identified as the coding mode of CSS resources, and decodes CSS resources using with the decoding process corresponding to coding mode.
That is, if current web page resource is not HTML resources but CSS resources, due to the HTML in same web data Resource and the identical coding mode of CSS resource generally uses, then terminal will be compiled used by the HTML resources in the web data Code mode is identified as the coding mode of CSS resources, wherein the identification process of the coding mode of HTML resources can refer to above-mentioned step Described in rapid 202 to 207.
Then, terminal decodes CSS resources using the decoding process corresponding to the coding mode with CSS resources.
Finally, after decoding obtains each web page resources, the web page resources that terminal can be obtained according to decoding show webpage.
In conclusion method for identifying web page coding provided in this embodiment, by web page resources without proper notice coding mode When, identify the coding mode of web page resources, and be decoded to web page resources using decoding process corresponding with the coding mode; When solving " charset " field in web page coding in the related technology and failing to write, the problem of browser can show mess code;Reach Even if without proper notice coding mode in web page resources, the effect that also can normally decode web page resources and be shown.
Method for identifying web page coding provided in this embodiment is also stated by stating coding mode in web page resources Coding mode there are when misspelling, by re-recognize or coding mode that automatic error-correcting goes out corresponding to decoding process Web page resources are decoded, when solving " charset " field mistake in web page coding in the related technology and writing, browser can be shown The problem of showing mess code;It is write even if having reached the coding mode stated in web page resources and mistake having occurred, also can normally decode webpage Resource and the effect shown.
Following is embodiment of the present disclosure, can be used for executing embodiments of the present disclosure.It is real for disclosure device Undisclosed details in example is applied, embodiments of the present disclosure is please referred to.
Fig. 3 is a kind of block diagram of web page coding identification device shown according to an exemplary embodiment, which knows Other device being implemented in combination with as some or all of of terminal by software, hardware or both.The web page coding is known Other device may include:
Data load-on module 320 is configured as load web data, and web data includes at least one web page resources.
Mode detection module 340, is configured as whether detection web page resources are HTML resources and state coding mode.
Mode identification module 360 is configured as when web page resources are HTML resources but without proper notice coding mode, identification The coding mode of HTML resources.
Resource decoder module 380 is configured as decoding using with the decoding process corresponding to the coding mode that recognizes HTML resources.
In conclusion web page coding identification device provided in this embodiment, by web page resources without proper notice coding mode When, identify the coding mode of web page resources, and be decoded to web page resources using decoding process corresponding with the coding mode; When solving " charset " field in web page coding in the related technology and failing to write, the problem of browser can show mess code;Reach Even if without proper notice coding mode in web page resources, the effect that also can normally decode web page resources and be shown.
Fig. 4 is a kind of block diagram of the web page coding identification device shown according to another exemplary embodiment.The web page coding Identification device being implemented in combination with as some or all of of terminal by software, hardware or both.The web page coding Identification device may include:
Data load-on module 320 is configured as load web data, and web data includes at least one web page resources.
Mode detection module 340, is configured as whether detection web page resources are HTML resources and state coding mode.
Mode identification module 360 is configured as when web page resources are HTML resources but without proper notice coding mode, identification The coding mode of HTML resources.
Resource decoder module 380 is configured as decoding using with the decoding process corresponding to the coding mode that recognizes HTML resources.
Optionally, which further includes:
Code detection module 352 is configured as being HTML resources but having stated coding mode when web page resources, then detects Whether the coding mode of statement is one kind in pre-arranged code mode.
Mode identification module 360 is configured as when declared coding mode is not a kind of in pre-arranged code mode, Identify the coding mode of HTML resources.Or,
Automatic error-correcting module 370 is configured as when declared coding mode is not a kind of in pre-arranged code mode, Automatic error-correcting is carried out to declared coding mode, obtains the coding mode after automatic error-correcting.
Optionally, mode identification module 360 is configured as calling scheduled character code recognizer identification HTML resources Coding mode.
Optionally, automatic error-correcting module 370, including:
Computational submodule 372 is spelt, is configured as each in declared coding mode and pre-arranged code mode Calculate separately spelling similarity;
Automatic error-correcting submodule 374 is configured as when highest spelling similarity is more than predetermined threshold value, by highest spelling It writes the pre-arranged code mode corresponding to similarity and is determined as the coding mode after automatic error-correcting.
Optionally, which further includes:
Code multiplexing module 354 is configured as when web page resources being CSS resources, then by the HTML resources in web data The coding mode of use is identified as the coding mode of CSS resources, is provided using with the decoding process decoding CSS corresponding to coding mode Source.
In conclusion web page coding identification device provided in this embodiment, by web page resources without proper notice coding mode When, identify the coding mode of web page resources, and be decoded to web page resources using decoding process corresponding with the coding mode; When solving " charset " field in web page coding in the related technology and failing to write, the problem of browser can show mess code;Reach Even if without proper notice coding mode in web page resources, the effect that also can normally decode web page resources and be shown.
Web page coding identification device provided in this embodiment is also stated by stating coding mode in web page resources Coding mode there are when misspelling, by re-recognize or coding mode that automatic error-correcting goes out corresponding to decoding process Web page resources are decoded, when solving " charset " field mistake in web page coding in the related technology and writing, browser can be shown The problem of showing mess code;It is write even if having reached the coding mode stated in web page resources and mistake having occurred, also can normally decode webpage Resource and the effect shown.
About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method Embodiment in be described in detail, explanation will be not set forth in detail herein.
Fig. 5 is a kind of block diagram for web page coding identification device 500 shown according to an exemplary embodiment.For example, Device 500 can be mobile phone, computer, digital broadcast terminal, messaging devices, game console, tablet device, doctor Treat equipment, body-building equipment, personal digital assistant etc..
With reference to Fig. 5, device 500 may include following one or more components:Processing component 502, memory 504, power supply Component 506, multimedia component 508, audio component 510, the interface 512 of input/output (I/O), sensor module 514, and Communication component 516.
The integrated operation of 502 usual control device 500 of processing component, such as with display, call, data communication, phase Machine operates and record operates associated operation.Processing component 502 may include that one or more processors 520 refer to execute It enables, to perform all or part of the steps of the methods described above.In addition, processing component 502 may include one or more modules, just Interaction between processing component 502 and other assemblies.For example, processing component 502 may include multi-media module, it is more to facilitate Interaction between media component 508 and processing component 502.
Memory 504 is configured as storing various types of data to support the operation in device 500.These data are shown Example includes instruction for any application program or method that operate on device 500, contact data, and telephone book data disappears Breath, picture, video etc..Memory 504 can be by any kind of volatibility or non-volatile memory device or their group It closes and realizes, such as static RAM (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable to compile Journey read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash Device, disk or CD.
Power supply module 506 provides electric power for the various assemblies of device 500.Power supply module 506 may include power management system System, one or more power supplys and other generated with for device 500, management and the associated component of distribution electric power.
Multimedia component 508 is included in the screen of one output interface of offer between described device 500 and user.One In a little embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen Curtain may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touch sensings Device is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding action Boundary, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, more matchmakers Body component 508 includes a front camera and/or rear camera.When device 500 is in operation mode, such as screening-mode or When video mode, front camera and/or rear camera can receive external multi-medium data.Each front camera and Rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio component 510 is configured as output and/or input audio signal.For example, audio component 510 includes a Mike Wind (MIC), when device 500 is in operation mode, when such as call model, logging mode and speech recognition mode, microphone by with It is set to reception external audio signal.The received audio signal can be further stored in memory 504 or via communication set Part 516 is sent.In some embodiments, audio component 510 further includes a loud speaker, is used for exports audio signal.
I/O interfaces 512 provide interface between processing component 502 and peripheral interface module, and above-mentioned peripheral interface module can To be keyboard, click wheel, button etc..These buttons may include but be not limited to:Home button, volume button, start button and lock Determine button.
Sensor module 514 includes one or more sensors, and the state for providing various aspects for device 500 is commented Estimate.For example, sensor module 514 can detect the state that opens/closes of device 500, and the relative positioning of component, for example, it is described Component is the display and keypad of device 500, and sensor module 514 can be with 500 1 components of detection device 500 or device Position change, the existence or non-existence that user contacts with device 500,500 orientation of device or acceleration/deceleration and device 500 Temperature change.Sensor module 514 may include proximity sensor, be configured to detect without any physical contact Presence of nearby objects.Sensor module 514 can also include optical sensor, such as CMOS or ccd image sensor, at As being used in application.In some embodiments, which can also include acceleration transducer, gyro sensors Device, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 516 is configured to facilitate the communication of wired or wireless way between device 500 and other equipment.Device 500 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or combination thereof.In an exemplary implementation In example, communication component 516 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel. In one exemplary embodiment, the communication component 516 further includes near-field communication (NFC) module, to promote short range communication.Example Such as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology, Bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, device 500 can be believed by one or more application application-specific integrated circuit (ASIC), number Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.
In the exemplary embodiment, it includes the non-transitorycomputer readable storage medium instructed, example to additionally provide a kind of Such as include the memory 504 of instruction, above-metioned instruction can be executed by the processor 520 of device 500 to complete the above method.For example, The non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk With optical data storage devices etc..
A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is by the processing of device 500 When device executes so that device 500 is able to carry out Fig. 1 or method for identifying web page coding illustrated in fig. 2.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the disclosure Its embodiment.This application is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or Person's adaptive change follows the general principles of this disclosure and includes the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are by following Claim is pointed out.
It should be understood that the present disclosure is not limited to the precise structures that have been described above and shown in the drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present disclosure is only limited by the accompanying claims.

Claims (7)

1. a kind of method for identifying web page coding, which is characterized in that the method includes:
Web data is loaded, the web data includes at least one web page resources;
Detect whether the web page resources are hypertext markup language HTML resources and state coding mode;
If the web page resources are HTML resources but without proper notice coding mode, the coding mode of the HTML resources is identified;
The HTML resources are decoded using with the decoding process corresponding to the coding mode that recognizes;
If the web page resources are HTML resources but have stated coding mode, detect the declared coding mode whether be One kind in pre-arranged code mode;
If the declared coding mode is not one kind in the pre-arranged code mode, the volume of the HTML resources is identified Code mode;Or, carrying out automatic error-correcting to the declared coding mode, the coding mode after automatic error-correcting is obtained,
It is wherein described that automatic error-correcting is carried out to the declared coding mode, the coding mode after automatic error-correcting is obtained, Including:
Each in the declared coding mode and the pre-arranged code mode is calculated separately into spelling similarity;
When the highest spelling similarity is more than predetermined threshold value, by the default volume corresponding to the highest spelling similarity Code mode is determined as the coding mode after automatic error-correcting.
2. according to the method described in claim 1, it is characterized in that, the coding mode of the identification HTML resources, including:
Scheduled character code recognizer is called to identify the coding mode of the HTML resources.
3. according to the method described in claim 1, it is characterized in that, the method further includes:
If the web page resources are cascading style sheets CSS resources, the HTML resources in the web data are used Coding mode is identified as the coding mode of the CSS resources, and institute is decoded using with the decoding process corresponding to the coding mode State CSS resources.
4. a kind of web page coding identification device, which is characterized in that described device includes:
Data load-on module is configured as load web data, and the web data includes at least one web page resources;
Mode detection module is configured as detecting whether the web page resources are hypertext markup language HTML resources and state Coding mode;
Mode identification module is configured as when the web page resources are HTML resources but without proper notice coding mode, described in identification The coding mode of HTML resources;
Resource decoder module, be configured as using with described in the decoding process decoding corresponding to the coding mode that recognizes HTML resources;
Code detection module is configured as being HTML resources but having stated coding mode when the web page resources, then detects sound Whether the bright coding mode is one kind in pre-arranged code mode;
The mode identification module is configured as when the declared coding mode not being one in the pre-arranged code mode When kind, the coding mode of the HTML resources is identified;Or, automatic error-correcting module, is configured as working as the declared coding staff When formula is not a kind of in the pre-arranged code mode, automatic error-correcting is carried out to the declared coding mode, is obtained automatic The coding mode after error correction,
Wherein, the automatic error-correcting module, including:
Computational submodule is spelt, is configured as each in the declared coding mode and the pre-arranged code mode Calculate separately spelling similarity;
Automatic error-correcting submodule is configured as when the highest spelling similarity is more than predetermined threshold value, will be highest described Pre-arranged code mode corresponding to spelling similarity is determined as the coding mode after automatic error-correcting.
5. device according to claim 4, which is characterized in that
The mode identification module is configured as the coding for calling scheduled character code recognizer to identify the HTML resources Mode.
6. device according to claim 4, which is characterized in that described device further includes:
The negative module of coding is configured as when the web page resources being cascading style sheets CSS resources, then will be in the web data The coding mode that uses of the HTML resources be identified as the coding modes of the CSS resources, using with the coding mode institute Corresponding decoding process decodes the CSS resources.
7. a kind of web page coding identification device, which is characterized in that described device includes:
Processor;
Memory for the executable instruction for storing the processor;
Wherein, the processor is configured as:
Web data is loaded, the web data includes at least one web page resources;
Detect whether the web page resources are hypertext markup language HTML resources and state coding mode;
If the web page resources are HTML resources but without proper notice coding mode, the coding mode of the HTML resources is identified;
The HTML resources are decoded using with the decoding process corresponding to the coding mode that recognizes;
If the web page resources are HTML resources but have stated coding mode, detect the declared coding mode whether be One kind in pre-arranged code mode;
If the declared coding mode is not one kind in the pre-arranged code mode, the volume of the HTML resources is identified Code mode;Or, carrying out automatic error-correcting to the declared coding mode, the coding mode after automatic error-correcting is obtained,
It is wherein described that automatic error-correcting is carried out to the declared coding mode, the coding mode after automatic error-correcting is obtained, Including:
Each in the declared coding mode and the pre-arranged code mode is calculated separately into spelling similarity;
When the highest spelling similarity is more than predetermined threshold value, by the default volume corresponding to the highest spelling similarity Code mode is determined as the coding mode after automatic error-correcting.
CN201410562477.9A 2014-10-21 2014-10-21 Method for identifying web page coding and device Active CN104361021B (en)

Priority Applications (9)

Application Number Priority Date Filing Date Title
CN201410562477.9A CN104361021B (en) 2014-10-21 2014-10-21 Method for identifying web page coding and device
JP2016554794A JP6130976B2 (en) 2014-10-21 2015-01-22 Web page encoding identification method, web page encoding identification device, program, and recording medium
RU2015110973A RU2610245C2 (en) 2014-10-21 2015-01-22 Method and device for web page encode identification
KR1020157007129A KR20160059455A (en) 2014-10-21 2015-01-22 Method and device for identifying encoding of web page
MX2015003807A MX361564B (en) 2014-10-21 2015-01-22 Web page coding identification method and device.
PCT/CN2015/071308 WO2016061930A1 (en) 2014-10-21 2015-01-22 Web page coding identification method and device
BR112015006725A BR112015006725A2 (en) 2014-10-21 2015-01-22 method and device for identifying web page coding
US14/684,855 US20160112491A1 (en) 2014-10-21 2015-04-13 Method and device for identifying encoding of web page
EP15178533.4A EP3012750A1 (en) 2014-10-21 2015-07-27 Method and device for identifying encoding of web page

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410562477.9A CN104361021B (en) 2014-10-21 2014-10-21 Method for identifying web page coding and device

Publications (2)

Publication Number Publication Date
CN104361021A CN104361021A (en) 2015-02-18
CN104361021B true CN104361021B (en) 2018-07-24

Family

ID=52528283

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410562477.9A Active CN104361021B (en) 2014-10-21 2014-10-21 Method for identifying web page coding and device

Country Status (8)

Country Link
EP (1) EP3012750A1 (en)
JP (1) JP6130976B2 (en)
KR (1) KR20160059455A (en)
CN (1) CN104361021B (en)
BR (1) BR112015006725A2 (en)
MX (1) MX361564B (en)
RU (1) RU2610245C2 (en)
WO (1) WO2016061930A1 (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104994128B (en) * 2015-05-15 2019-04-26 北京网康科技有限公司 A kind of identification of data encoding type and code-transferring method and device
CN105468753A (en) * 2015-11-27 2016-04-06 北京金和网络股份有限公司 Multi-coding-format data display system and method
CN106407438A (en) * 2016-09-28 2017-02-15 珠海迈越信息技术有限公司 Data processing method and system
CN110020343B (en) * 2017-09-01 2021-03-30 北京国双科技有限公司 Method and device for determining webpage coding format
CN110674377A (en) * 2019-09-24 2020-01-10 四川长虹电器股份有限公司 Crawler-based news hotspot word acquisition method
CN114024651A (en) * 2020-07-16 2022-02-08 深信服科技股份有限公司 Method, device and equipment for identifying coding type and readable storage medium
CN114415817B (en) * 2020-10-28 2024-05-07 北京小米移动软件有限公司 Display control method, electronic device and storage medium
CN113595683A (en) * 2021-07-07 2021-11-02 西安震有信通科技有限公司 Conversion processing method, device, terminal and medium based on various encoding files

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101526963A (en) * 2009-04-17 2009-09-09 深圳华为通信技术有限公司 Method for identifying web page coding, device and terminal equipment
US7711673B1 (en) * 2005-09-28 2010-05-04 Trend Micro Incorporated Automatic charset detection using SIM algorithm with charset grouping
CN103207877A (en) * 2012-01-17 2013-07-17 阿里巴巴集团控股有限公司 Decoding method and device

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3203544B2 (en) * 1996-01-31 2001-08-27 日本電信電話株式会社 Text maximum likelihood decoding method and maximum likelihood decoding device, and data communication network device
JP2000132449A (en) * 1998-10-27 2000-05-12 Nippon Telegr & Teleph Corp <Ntt> Proxy access method, device therefor and record medium recorded with proxy access program
US6701320B1 (en) * 2002-04-24 2004-03-02 Bmc Software, Inc. System and method for determining a character encoding scheme
US7148824B1 (en) * 2005-08-05 2006-12-12 Xerox Corporation Automatic detection of character encoding format using statistical analysis of the text strings
US8271263B2 (en) * 2007-03-30 2012-09-18 Symantec Corporation Multi-language text fragment transcoding and featurization
JP5565197B2 (en) * 2010-08-18 2014-08-06 富士通株式会社 Web application linkage method, linkage apparatus, and linkage program
RU2500024C2 (en) * 2011-12-27 2013-11-27 Общество С Ограниченной Ответственностью "Центр Инноваций Натальи Касперской" Method for automated language detection and (or) text document coding
US8938683B2 (en) * 2012-09-11 2015-01-20 Ebay Inc. Visual state comparator
TWI493365B (en) * 2013-08-16 2015-07-21 Arphic Technology Co Ltd Input and instant displaying method with multiple character-set character codes, system and apparatus

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7711673B1 (en) * 2005-09-28 2010-05-04 Trend Micro Incorporated Automatic charset detection using SIM algorithm with charset grouping
CN101526963A (en) * 2009-04-17 2009-09-09 深圳华为通信技术有限公司 Method for identifying web page coding, device and terminal equipment
CN103207877A (en) * 2012-01-17 2013-07-17 阿里巴巴集团控股有限公司 Decoding method and device

Also Published As

Publication number Publication date
MX361564B (en) 2018-12-11
BR112015006725A2 (en) 2017-07-04
RU2015110973A (en) 2016-10-20
JP6130976B2 (en) 2017-05-17
MX2015003807A (en) 2016-08-02
EP3012750A1 (en) 2016-04-27
KR20160059455A (en) 2016-05-26
JP2016539450A (en) 2016-12-15
RU2610245C2 (en) 2017-02-08
CN104361021A (en) 2015-02-18
WO2016061930A1 (en) 2016-04-28

Similar Documents

Publication Publication Date Title
CN104361021B (en) Method for identifying web page coding and device
EP3567584B1 (en) Electronic apparatus and method for operating same
EP2945098B1 (en) Method and device for hiding privacy information
JP5956725B2 (en) Method, device, and computer program product for providing context-aware help content
US20170249934A1 (en) Electronic device and method for operating the same
KR102094013B1 (en) Method and apparatus for transmitting message in an electronic device
JP2015011170A (en) Voice recognition client device performing local voice recognition
KR20200015267A (en) Electronic device for determining an electronic device to perform speech recognition and method for the same
US20140198032A1 (en) Method and apparatus for displaying screen with eye tracking in portable terminal
EP3444811A1 (en) Speech recognition method and device
CN103812999B (en) Mobile terminal and message registration treating method and apparatus thereof
US10269347B2 (en) Method for detecting voice and electronic device using the same
KR20140019167A (en) Method for providing voice guidance function and an electronic device thereof
CN104035977B (en) Webpage loading method and device
KR101584887B1 (en) Method and system of supporting multitasking of speech recognition service in in communication device
KR20150019813A (en) Method for controlling an content integrity and an electronic device
US20210405767A1 (en) Input Method Candidate Content Recommendation Method and Electronic Device
CN108268507B (en) Browser-based processing method and device and electronic equipment
KR102186455B1 (en) A method of recommending adjusted function to user and a mobile device for performing the same
CN106776990B (en) Information processing method and device and electronic equipment
EP4318273A2 (en) Method and system providing contextual functionality in static web pages
CN107704911A (en) Generation, read method and the device of Quick Response Code
KR102255369B1 (en) Method for providing alternative service and electronic device thereof
CN103049196A (en) Method and system for operating electronic device through gestures
CN104077130B (en) Window name processing method, device and electronic equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant