CN110020343A - The determination method and apparatus of web page coding format - Google Patents

The determination method and apparatus of web page coding format Download PDF

Info

Publication number
CN110020343A
CN110020343A CN201710784883.3A CN201710784883A CN110020343A CN 110020343 A CN110020343 A CN 110020343A CN 201710784883 A CN201710784883 A CN 201710784883A CN 110020343 A CN110020343 A CN 110020343A
Authority
CN
China
Prior art keywords
format
target webpage
url
target
coded format
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710784883.3A
Other languages
Chinese (zh)
Other versions
CN110020343B (en
Inventor
张野
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201710784883.3A priority Critical patent/CN110020343B/en
Publication of CN110020343A publication Critical patent/CN110020343A/en
Application granted granted Critical
Publication of CN110020343B publication Critical patent/CN110020343B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

This application discloses a kind of determination method and apparatus of web page coding format.Wherein, this method comprises: obtaining uniform resource position mark URL, wherein the corresponding webpage of the URL is target webpage;The coded format of the target webpage is determined according to the URL and preset field contents;The coded format of the target webpage is determined according to the URL and character string transform mode;Judge whether the coded format for the target webpage determined according to the URL and preset field contents is identical as the coded format for the target webpage determined according to the URL and character string transform mode;The coded format of the target webpage is determined according to judging result.By the application, efficiency lower technical problem when solving the coded format for determining webpage in the related technology.

Description

The determination method and apparatus of web page coding format
Technical field
This application involves web technologies fields, in particular to a kind of determination method and apparatus of web page coding format.
Background technique
In the related technology, when the coded format to webpage judges, generally by one in mouse webpage clicking Plug-in unit is checked the code of the webpage by plug-in unit selection, then just user is needed to read over web page code, thus Determine the coded format of code in webpage.But the judgment mode of above-mentioned web page coding format, need user to check net line by line Page code, takes a long time, and efficiency is lower.
Efficiency lower problem when for the coded format for determining webpage in the related technology, not yet proposes effective solution at present Certainly scheme.
Summary of the invention
The main purpose of the application is to provide a kind of determination method of web page coding format, true in the related technology to solve Determine efficiency lower problem when the coded format of webpage.
To achieve the goals above, according to the one aspect of the application, a kind of determination side of web page coding format is provided Method.This method comprises: obtaining uniform resource position mark URL, wherein the corresponding webpage of the URL is target webpage;According to described URL and preset field contents determine the coded format of the target webpage;It is determined according to the URL and character string transform mode The coded format of the target webpage;Judge the target webpage determined according to the URL and preset field contents Whether coded format is identical as the coded format for the target webpage determined according to the URL and character string transform mode; The coded format of the target webpage is determined according to judging result.
Further, according to the URL and character string transform mode determine the target webpage coded format include: by The target webpage is converted into the page of string format;Using the first pre-arranged code format by the page of the string format Be converted to byte stream;The byte stream is converted to by target string using the second pre-arranged code format;According to the target word It whether include that the character of preset format type determines the coded format of the target webpage in symbol string.
Further, whether the character of the preset format type is Chinese character, wrap according in the target string If it includes Chinese in the target string that the character for including preset format type, which determines that the coded format of the target webpage includes:, Character determines that the coded format of the target webpage is UTF-8;If not including Chinese character in the target string, determine The coded format of the target webpage is GBK or GB2312.
Further, if determining that the coded format of the target webpage includes: that the judging result is according to judging result It is identical, by the coded format for the target webpage determined according to the URL and preset field contents or according to the URL Or coded format of the coded format of the target webpage determined of preset characters string transform mode as the target webpage; If the judging result is difference, by the target webpage determined according to the URL and preset characters string transform mode Coded format of the coded format as the target webpage.
Further, determine that the coded format of the target webpage includes: to mention according to the URL and preset field contents Take the goal-selling character string in the preset field contents;According to the goal-selling character string of extraction and the URL, Determine the coded format of the target webpage.
To achieve the goals above, according to the another aspect of the application, a kind of determining dress of web page coding format is provided It sets.The device includes: acquiring unit, for obtaining uniform resource position mark URL, wherein the corresponding webpage of the URL is target Webpage;First determination unit, for determining the coded format of the target webpage according to the URL and preset field contents; Second determination unit, for determining the coded format of the target webpage according to the URL and character string transform mode;Judgement is single Member, the coded format of the target webpage for judging to be determined according to the URL and preset field contents, and according to institute Whether the coded format for stating the target webpage that URL and character string transform mode are determined is identical;Third determination unit, is used for The coded format of the target webpage is determined according to judging result.
Further, the second determination unit includes: conversion module, for converting string format for the target webpage The page;First conversion module, for using the first pre-arranged code format by the conversion of page of the string format for byte Stream;Second conversion module, for the byte stream to be converted to target string using the second pre-arranged code format;Determine mould Block, for whether including that the character of preset format type determines the coding lattice of the target webpage according in the target string Formula.
Further, the character of the preset format type is Chinese character, and determination module includes: the first determining submodule Block, if determining that the coded format of the target webpage is UTF-8 for including Chinese character in the target string;Second Submodule is determined, if determining that the coded format of the target webpage is for not including Chinese character in the target string GBK or GB2312.
To achieve the goals above, according to the another aspect of the application, a kind of storage medium, the storage medium are provided Program including storage, wherein described program executes the determination method of web page coding format described in above-mentioned any one.
To achieve the goals above, according to the another aspect of the application, a kind of processor is provided, the processor is used for Run program, wherein described program executes the determination method of web page coding format described in above-mentioned any one when running.
By the application, the coded format of target webpage can be determined according to URL and preset field contents, it can also root Then the coded format that target webpage is determined according to URL and character string transform mode judges true according to URL and preset field contents The coded format for the target webpage made and the coded format for the target webpage determined according to URL and character string transform mode are It is no identical, and determine according to judging result the coded format of target webpage.It, can be compared with by the determination method of two kinds of coded formats For the accurate coded format for determining target webpage, it can be correspondingly improved the efficiency of the coded format of determining target webpage, And then solve the problems, such as that efficiency is lower when determining the coded format of webpage in the related technology.
Detailed description of the invention
The attached drawing constituted part of this application is used to provide further understanding of the present application, the schematic reality of the application Example and its explanation are applied for explaining the application, is not constituted an undue limitation on the present application.In the accompanying drawings:
Fig. 1 is the flow chart according to the determination method of the web page coding format of the embodiment of the present application;And
Fig. 2 is the schematic diagram according to the determining device of the web page coding format of the embodiment of the present application.
Specific embodiment
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
In order to make those skilled in the art more fully understand application scheme, below in conjunction in the embodiment of the present application Attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only The embodiment of the application a part, instead of all the embodiments.Based on the embodiment in the application, ordinary skill people Member's every other embodiment obtained without making creative work, all should belong to the model of the application protection It encloses.
Description below is made to part term or noun involved in the embodiment of the present application below:
URL (Uniform Resource Locator), uniform resource locator are to can obtain from internet The position of resource and a kind of succinct expression of access method, are the addresses of standard resource on internet.It is each on internet A file has a unique URL, and the information that it includes points out the position of file.
GB2312 coding, Chinese Character Set Code for Informati are suitable between the systems such as Chines words processing, Chinese communication Information exchange.
UTF-8 (8-bit Unicode Transformation Format) coded format, it is a kind of for Uniconde's Variable length character coding, can use 1 to 6 byte code Unicode characters.With can unify during the page shows on webpage Literary simplified traditional font and other language.
It should be noted that the description and claims of this application and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way Data be interchangeable under appropriate circumstances, so as to embodiments herein described herein.In addition, term " includes " and " tool Have " and their any deformation, it is intended that cover it is non-exclusive include, for example, containing a series of steps or units Process, method, system, product or equipment those of are not necessarily limited to be clearly listed step or unit, but may include without clear Other step or units listing to Chu or intrinsic for these process, methods, product or equipment.
Following embodiment can be applied in the coded format for determining webpage, wherein coded format is usually in creation net When page or website, the coding mode of use, in the related art, for the type of coded format, setting is relatively fixed , after determining the coded format of webpage, according to user demand, each object element in webpage can be determined in display screen Height, the width of position and display in curtain.
According to an embodiment of the present application, a kind of determination method of web page coding format is provided.
Fig. 1 is the flow chart according to the determination method of the web page coding format of the embodiment of the present application.As shown in Figure 1, the party Method the following steps are included:
Step S101 obtains uniform resource position mark URL, wherein the corresponding webpage of URL is target webpage.
In the application, the coded format of target webpage is judged, which can be the webpage that user specifies, It will include different file resources in each target webpage, and uniform resource locator can be determined as by being directed toward the path of the webpage URL can be directly linked to the target webpage by the URL.
Wherein, the coded format of different target webpages is different, and can be determined by the embodiment in the application The coded format of each target webpage out.
Step S102 determines the coded format of target webpage according to URL and preset field contents.
Through the above steps, a kind of coded format that can determine target webpage can determine target network according to URL Page, can determine the coding lattice of target webpage according to preset field contents in the file content and file in target webpage Formula.Wherein, preset field contents can be the common charset field of determining target webpage coded format to determine, for example, Coding pointed by one URL includes: < meta http-equiv=" content-type " content=" text/html: Charset=gbk "/>, then the coded format of target webpage can be determined according to the content (charset=gbk) in field For GBK coded format.
Wherein, determine that the coded format of target webpage includes: to extract preset field according to URL and preset field contents Goal-selling character string in content;According to the goal-selling character string and URL of extraction, the coding lattice of target webpage are determined Formula.Wherein the goal-selling character string can be " charset ".The volume of webpage can be extracted by the goal-selling character string Code format.
Wherein, the goal-selling character string of above-described embodiment may be the field that user voluntarily writes, for example, bm= gbk.User can determine the field mode that may determine that coded format according to the code voluntarily write.
Step S103 determines the coded format of target webpage according to URL and character string transform mode.
In the above-described embodiments, character string transform mode may include various ways, the webpage as corresponding to each URL Content is different, can be by target network corresponding to URL character or URL in the coded format for determining the target webpage The coding file content of page is converted to corresponding character string, for example, the coding file content of target webpage is converted to binary system Form.
Optionally, determine that the coded format of target webpage includes: to turn target webpage according to URL and character string transform mode Turn to the page of string format;Use the first pre-arranged code format by the conversion of page of string format for byte stream;Using Byte stream is converted to target string by the second pre-arranged code format;It whether include preset format type according in target string Character determine target webpage coded format.
Optionally, the first above-mentioned pre-arranged code format can be GB2312 coded format, which can be encoded It is divided into 94 areas, each area there are 94 positions, only one character on each position, therefore can be with the area at place and position come to target Content in webpage is encoded.It can be byte stream by the conversion of page of string format by the first pre-arranged code format, In the case where the targeted web content represented by string format is binary system, it can be encoded by GB2312 by character string lattice The conversion of page of formula is the file of bytestream format.
Wherein, the second above-mentioned pre-arranged code format can be a variety of coded formats, such as UTF-8 coded format.Pass through Second pre-arranged code format can convert target string for the file of the target webpage indicated with byte stream, in target character It may include specific character in string, for example, Chinese character.
The character of another optional embodiment, preset format type is Chinese character, is according in target string If it includes middle text in target string that the no character including preset format type, which determines that the coded format of target webpage includes:, Symbol determines that the coded format of target webpage is UTF-8;If in target string not including Chinese character, target webpage is determined Coded format is GBK or GB2312.
By above embodiment, the coded format of target webpage can be determined.The coded format can with use it is pre- If the coded format determined of field contents it is identical, can also be different.
Step S104 judges the coded format and basis of the target webpage determined according to URL and preset field contents Whether the coded format for the target webpage that URL and character string transform mode are determined is identical.
Step S105 determines the coded format of target webpage according to judging result.
It for above-mentioned steps, can establish two kinds of modes for judging target webpage coded format, by comparing, can sentence It is disconnected go out the coded format of target webpage determined according to URL and preset field contents, and according to URL and character string conversion side In the identical situation of the coded format for the target webpage that formula is determined, determine target webpage coded format be according to URL and The coded format for the target webpage that preset field contents are determined.
In above-described embodiment, if judging the coded format for the target webpage determined according to URL and preset field contents When not identical as the coded format for the target webpage determined according to URL and character string transform mode, need to redefine target The coded format of webpage.Optionally, the coding lattice for the target webpage that above-mentioned judgement is determined according to URL and preset field contents Formula in the different situation of coded format for the target webpage determined according to URL and character string transform mode, can also be true The coded format of the fixed target webpage determined according to URL and preset field contents is the coded format of the target webpage.
Optionally, if according to judging result determine target webpage coded format include: judging result be it is identical, by basis The coded format for the target webpage that URL and preset field contents are determined is true according to URL or preset characters string transform mode Coded format of the coded format for the target webpage made as target webpage;It, will be according to URL and pre- if judging result is difference If coded format of the coded format for the target webpage that character string transform mode is determined as target webpage.
, can be in judging result difference in above-described embodiment mode, it will be according to URL and preset characters string transform mode Coded format of the coded format for the target webpage determined as target webpage.
Through the foregoing embodiment, the coded format that target webpage can be determined according to URL and preset field contents, can also To determine the coded format of target webpage according to URL and character string transform mode, then judge according in URL and preset field Hold the coded format and the coding lattice for the target webpage determined according to URL and character string transform mode of the target webpage determined Whether formula is identical, and the coded format of target webpage is determined according to judging result.It, can by the determination method of two kinds of coded formats Accurately to determine the coded format of target webpage, the effect of the coded format of determining target webpage can be correspondingly improved Rate, and then solve the problems, such as that efficiency is lower when determining the coded format of webpage in the related technology.
Here is the specific embodiment according to the application.
In this embodiment, using UTF-8 as reference encoder format.Optionally, pass through the side of character string in this embodiment Formula judges the format of web page coding, and the Webpage got according to URL is converted to the form of character string, then by the character String is converted into byte stream by GB2312 coding mode;Then the byte stream of conversion is reduced to by string format with UTF-8 File.Judge just if it exists can then determine the coding of the webpage with the presence or absence of Chinese character in the file of the string format Format is UTF-8 coded format, is otherwise exactly GBK or GB2312 coding.
Optionally, the coded format that the webpage is determined by charset field determines the volume of webpage in charset field When code format is consistent with the coded format of the webpage of above-mentioned determination, the coded format of the webpage can be determined.
By testing repeatedly, the accuracy rate of this method judgement is very high, basically reaches 98%, this is that one kind reversely judges Mode judges the coded format of webpage before can crawling used in the crawler of webpage, reduces the probability for messy code occur.
In the related art, if judging web page coding format, often only judge the fraction byte stream in webpage, do not have The file of with good grounds full text judges the coded format of webpage, in this application, can be obtained by judging full text file The coded format of webpage.By above-described embodiment of the application, the accuracy rate for judging web page coding format can be improved, crawling During dynamic generation coded format, do not need the CSN file that coded format is separately configured.It is complete by the page in the application Portion's byte stream judge and the field of webpage chaset is combined to be confirmed, improves the efficiency for judging web page coding.
It should be noted that step shown in the flowchart of the accompanying drawings can be in such as a group of computer-executable instructions It is executed in computer system, although also, logical order is shown in flow charts, and it in some cases, can be with not The sequence being same as herein executes shown or described step.
The embodiment of the present application also provides a kind of determining devices of web page coding format, it should be noted that the application is real The determining device for applying the web page coding format of example can be used for executing provided by the embodiment of the present application for web page coding format Determination method.The determining device of web page coding format provided by the embodiments of the present application is introduced below.
Fig. 2 is according to the schematic diagram of the determining device of the web page coding format of the embodiment of the present application, as shown in Fig. 2, the dress Setting includes: acquiring unit 21, for obtaining uniform resource position mark URL, wherein the corresponding webpage of URL is target webpage;First Determination unit 23, for determining the coded format of target webpage according to URL and preset field contents;Second determination unit 25, For determining the coded format of target webpage according to URL and character string transform mode;Judging unit 27, for judging according to URL The coded format for the target webpage determined with preset field contents and the mesh determined according to URL and character string transform mode Whether the coded format for marking webpage is identical;Third determination unit 29, for determining the coding lattice of target webpage according to judging result Formula.
Through the foregoing embodiment, target can be determined according to URL and preset field contents by the first determination unit 23 The coded format of webpage can also determine target webpage according to URL and character string transform mode by the second determination unit 25 Then coded format judges the coding for the target webpage determined according to URL and preset field contents by judging unit 27 Whether format is identical as the coded format for the target webpage determined according to URL and character string transform mode, and true by third Order member 29 determines the coded format of target webpage according to judging result.It, can be compared with by the determination method of two kinds of coded formats For the accurate coded format for determining target webpage, it can be correspondingly improved the efficiency of the coded format of determining target webpage, And then when solving to determine the coded format of webpage in the related technology, the lower problem of efficiency.
Optionally, the second determination unit 25 includes: conversion module, for converting target webpage to the page of string format Face;First conversion module, for using the first pre-arranged code format by the conversion of page of string format for byte stream;Second turn Block is changed the mold, for byte stream to be converted to target string using the second pre-arranged code format;Determination module, for according to target It whether include that the character of preset format type determines the coded format of target webpage in character string.
Wherein, the character of preset format type is Chinese character, and determination module includes: the first determining submodule, if for Include Chinese character in target string, determines that the coded format of target webpage is UTF-8;Second determines submodule, if for Do not include Chinese character in target string, determines that the coded format of target webpage is GBK or GB GB2312.
For above-described embodiment, third determination unit 29 includes: that third determines submodule, if being phase for judging result Together, turn by the coded format for the target webpage determined according to URL and preset field contents or according to URL or preset characters string Coded format of the coded format for the target webpage that change mode is determined as target webpage;4th determines submodule, if for Judging result is difference, using the coded format for the target webpage determined according to URL and preset characters string transform mode as mesh Mark the coded format of webpage.
The determining device of the web page coding format includes processor and memory, and above-mentioned acquiring unit 21, first determines Unit 23, the second determination unit 25, judging unit 27 and third determination unit 29 etc. are stored in memory as program unit In, above procedure unit stored in memory is executed by processor to realize corresponding function.
Include kernel in processor, is gone in memory to transfer corresponding program unit by kernel.Kernel can be set one Or more, the efficiency of the coded format of determining target webpage is improved by adjusting kernel parameter.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/ Or the forms such as Nonvolatile memory, if read-only memory (ROM) or flash memory (flash RAM), memory include that at least one is deposited Store up chip.
The embodiment of the invention provides a kind of storage mediums, are stored thereon with program, real when which is executed by processor The determination method of the existing web page coding format.
The embodiment of the invention provides a kind of processor, the processor is for running program, wherein described program operation The determination method of web page coding format described in Shi Zhihang.
The embodiment of the invention provides a kind of equipment, equipment include processor, memory and storage on a memory and can The program run on a processor, processor perform the steps of acquisition uniform resource position mark URL when executing program, wherein The corresponding webpage of URL is target webpage;The coded format of target webpage is determined according to URL and preset field contents;According to URL The coded format of target webpage is determined with character string transform mode;Judge the mesh determined according to URL and preset field contents Whether the coded format for marking webpage is identical as the coded format for the target webpage determined according to URL and character string transform mode; The coded format of target webpage is determined according to judging result.
The coded format that target webpage is determined according to URL and character string transform mode includes: to convert word for target webpage Accord with the page of string format;Use the first pre-arranged code format by the conversion of page of string format for byte stream;It is pre- using second If byte stream is converted to target string by coded format;According in target string whether include preset format type character Determine the coded format of target webpage.
The character of preset format type be Chinese character, according in target string whether include preset format type word If symbol determines that the coded format of target webpage includes: to determine the coding lattice of target webpage including Chinese character in target string Formula is UTF-8;If in target string not including Chinese character, determine that the coded format of target webpage is GBK or GB2312.
If according to judging result determine target webpage coded format include: judging result be it is identical, will be according to URL and pre- If the field contents coded format of target webpage determined or the mesh determined according to URL or preset characters string transform mode Mark coded format of the coded format of webpage as target webpage;If judging result is difference, will be according to URL and preset characters string Coded format of the coded format for the target webpage that transform mode is determined as target webpage.
The coded format that target webpage is determined according to URL and preset field contents includes: to extract preset field contents In goal-selling character string;According to the goal-selling character string and URL of extraction, the coded format of target webpage is determined.This Equipment in text can be server, PC, PAD, mobile phone etc..
Present invention also provides a kind of computer program products, when executing on data processing equipment, are adapted for carrying out just The program of beginningization there are as below methods step: uniform resource position mark URL is obtained, wherein the corresponding webpage of URL is target webpage; The coded format of target webpage is determined according to URL and preset field contents;Target is determined according to URL and character string transform mode The coded format of webpage;Judge the coded format for the target webpage determined according to URL and preset field contents and according to URL Whether the coded format for the target webpage determined with character string transform mode is identical;Target webpage is determined according to judging result Coded format.
The coded format that target webpage is determined according to URL and character string transform mode includes: to convert word for target webpage Accord with the page of string format;Use the first pre-arranged code format by the conversion of page of string format for byte stream;It is pre- using second If byte stream is converted to target string by coded format;According in target string whether include preset format type character Determine the coded format of target webpage.
The character of preset format type be Chinese character, according in target string whether include preset format type word If symbol determines that the coded format of target webpage includes: to determine the coding lattice of target webpage including Chinese character in target string Formula is UTF-8;If in target string not including Chinese character, determine that the coded format of target webpage is GBK or GB2312.
If according to judging result determine target webpage coded format include: judging result be it is identical, will be according to URL and pre- If the field contents coded format of target webpage determined or the mesh determined according to URL or preset characters string transform mode Mark coded format of the coded format of webpage as target webpage;If judging result is difference, will be according to URL and preset characters string Coded format of the coded format for the target webpage that transform mode is determined as target webpage.
The coded format that target webpage is determined according to URL and preset field contents includes: to extract preset field contents In goal-selling character string;According to the goal-selling character string and URL of extraction, the coded format of target webpage is determined.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/ Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable Jie The example of matter.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including element There is also other identical elements in process, method, commodity or equipment.
It will be understood by those skilled in the art that embodiments herein can provide as method, system or computer program product. Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the application Form.It is deposited moreover, the application can be used to can be used in the computer that one or more wherein includes computer usable program code The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) Formula.
The above is only embodiments herein, are not intended to limit this application.To those skilled in the art, Various changes and changes are possible in this application.It is all within the spirit and principles of the present application made by any modification, equivalent replacement, Improve etc., it should be included within the scope of the claims of this application.

Claims (10)

1. a kind of determination method of web page coding format characterized by comprising
Obtain uniform resource position mark URL, wherein the corresponding webpage of the URL is target webpage;
The coded format of the target webpage is determined according to the URL and preset field contents;
The coded format of the target webpage is determined according to the URL and character string transform mode, wherein pass through the character string Transform mode determines the target string of the target webpage;
The coded format for judging the target webpage determined according to the URL and preset field contents, and according to Whether the coded format for the target webpage that URL and character string transform mode are determined is identical;
The coded format of the target webpage is determined according to judging result.
2. the method according to claim 1, wherein according to the URL and the determination of character string transform mode The coded format of target webpage includes:
Convert the target webpage to the page of string format;
Use the first pre-arranged code format by the conversion of page of the string format for byte stream;
The byte stream is converted to by target string using the second pre-arranged code format;
It whether include that the character of preset format type determines the coded format of the target webpage according in the target string.
3. according to the method described in claim 2, it is characterized in that, the character of the preset format type is Chinese character, root It whether include that the character of preset format type determines that the coded format of the target webpage includes: according in the target string
If in the target string including Chinese character, determine that the coded format of the target webpage is UTF-8;
If not including Chinese character in the target string, determine that the coded format of the target webpage is GBK or GB2312.
4. the method according to claim 1, wherein determining the coding lattice of the target webpage according to judging result Formula includes:
If the judging result be it is identical, by the target webpage determined according to the URL and preset field contents Coded format or the coded format for the target webpage determined according to the URL or preset characters string transform mode are as institute State the coded format of target webpage;
If the judging result is difference, the target network that will be determined according to the URL and preset characters string transform mode Coded format of the coded format of page as the target webpage.
5. the method according to claim 1, wherein according to the URL and the determination of preset field contents The coded format of target webpage includes:
Extract the goal-selling character string in the preset field contents;
According to the goal-selling character string of extraction and the URL, the coded format of the target webpage is determined.
6. a kind of determining device of web page coding format characterized by comprising
Acquiring unit, for obtaining uniform resource position mark URL, wherein the corresponding webpage of the URL is target webpage;
First determination unit, for determining the coded format of the target webpage according to the URL and preset field contents;
Second determination unit, for determining the coded format of the target webpage according to the URL and character string transform mode;
Judging unit, the coding lattice of the target webpage for judging to be determined according to the URL and preset field contents Whether formula is identical as the coded format for the target webpage determined according to the URL and character string transform mode;
Third determination unit, for determining the coded format of the target webpage according to judging result.
7. device according to claim 6, which is characterized in that the second determination unit includes:
Conversion module, for converting the target webpage to the page of string format;
First conversion module, for using the first pre-arranged code format by the conversion of page of the string format for byte stream;
Second conversion module, for the byte stream to be converted to target string using the second pre-arranged code format;
Determination module, for whether the character including preset format type to determine the target network according in the target string The coded format of page.
8. device according to claim 7, which is characterized in that the character of the preset format type is Chinese character, is sentenced Cover half block includes:
First determines submodule, if determining the coding of the target webpage for including Chinese character in the target string Format is UTF-8;
Second determines submodule, if determining the volume of the target webpage for not including Chinese character in the target string Code format is GBK or GB2312.
9. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein described program right of execution Benefit require 1 to the web page coding format described in any one of claim 5 determination method.
10. a kind of processor, which is characterized in that the processor is for running program, wherein right of execution when described program is run Benefit require 1 to the web page coding format described in any one of claim 5 determination method.
CN201710784883.3A 2017-09-01 2017-09-01 Method and device for determining webpage coding format Active CN110020343B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710784883.3A CN110020343B (en) 2017-09-01 2017-09-01 Method and device for determining webpage coding format

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710784883.3A CN110020343B (en) 2017-09-01 2017-09-01 Method and device for determining webpage coding format

Publications (2)

Publication Number Publication Date
CN110020343A true CN110020343A (en) 2019-07-16
CN110020343B CN110020343B (en) 2021-03-30

Family

ID=67186195

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710784883.3A Active CN110020343B (en) 2017-09-01 2017-09-01 Method and device for determining webpage coding format

Country Status (1)

Country Link
CN (1) CN110020343B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113595683A (en) * 2021-07-07 2021-11-02 西安震有信通科技有限公司 Conversion processing method, device, terminal and medium based on various encoding files
CN114615074A (en) * 2022-03-25 2022-06-10 山石网科通信技术股份有限公司 Network message decoding method, network attack detection method, device and storage medium
CN114827113A (en) * 2022-04-18 2022-07-29 阿里巴巴(中国)有限公司 Webpage access method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101101606A (en) * 2007-08-03 2008-01-09 中兴通讯股份有限公司 Web page coding language automatic identification method and device for embedded type browser
CN101526963A (en) * 2009-04-17 2009-09-09 深圳华为通信技术有限公司 Method for identifying web page coding, device and terminal equipment
CN102360392A (en) * 2011-10-24 2012-02-22 青岛海信移动通信技术股份有限公司 Method and device for determining webpage encoding mode
CN104361021A (en) * 2014-10-21 2015-02-18 小米科技有限责任公司 Webpage encoding identifying method and device
CN104391993A (en) * 2014-12-15 2015-03-04 浪潮(北京)电子信息产业有限公司 Method and system for recognizing webpage codes
CN106570044A (en) * 2015-10-13 2017-04-19 北京国双科技有限公司 Method and device for analyzing webpage code

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101101606A (en) * 2007-08-03 2008-01-09 中兴通讯股份有限公司 Web page coding language automatic identification method and device for embedded type browser
CN101526963A (en) * 2009-04-17 2009-09-09 深圳华为通信技术有限公司 Method for identifying web page coding, device and terminal equipment
CN102360392A (en) * 2011-10-24 2012-02-22 青岛海信移动通信技术股份有限公司 Method and device for determining webpage encoding mode
CN104361021A (en) * 2014-10-21 2015-02-18 小米科技有限责任公司 Webpage encoding identifying method and device
CN104391993A (en) * 2014-12-15 2015-03-04 浪潮(北京)电子信息产业有限公司 Method and system for recognizing webpage codes
CN106570044A (en) * 2015-10-13 2017-04-19 北京国双科技有限公司 Method and device for analyzing webpage code

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王璟琦: ""基于内容单元的网页解析与内容提取"", 《万方》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113595683A (en) * 2021-07-07 2021-11-02 西安震有信通科技有限公司 Conversion processing method, device, terminal and medium based on various encoding files
CN114615074A (en) * 2022-03-25 2022-06-10 山石网科通信技术股份有限公司 Network message decoding method, network attack detection method, device and storage medium
CN114827113A (en) * 2022-04-18 2022-07-29 阿里巴巴(中国)有限公司 Webpage access method and device
CN114827113B (en) * 2022-04-18 2024-04-16 阿里巴巴(中国)有限公司 Webpage access method and device

Also Published As

Publication number Publication date
CN110020343B (en) 2021-03-30

Similar Documents

Publication Publication Date Title
CN105574049B (en) Page processing method, device and system for mobile application
CN110020353B (en) Method and device for constructing webpage form
CN110020343A (en) The determination method and apparatus of web page coding format
CN109086126B (en) Task scheduling processing method and device, server, client and electronic equipment
CN109582948A (en) The method and device that evaluated views extract
CN110264361A (en) A kind of data analysis method and device of block chain
CN110321675A (en) Generation, source tracing method and device based on webpage watermark
CN106909567B (en) Data processing method and device
CN109683773A (en) Corpus labeling method and device
CN110110198B (en) Webpage information extraction method and device
CN105989126B (en) A kind of Webpage display process and device
CN109062906A (en) The interpretation method and device of program language resource
CN104978325B (en) A kind of web page processing method, device and user terminal
CN105405002A (en) Formula data configuration method and system based on SAP platform
CN108874379B (en) Page processing method and device
CN109558548A (en) A kind of method and Related product for eliminating CSS style redundancy
CN109582188A (en) A kind of method, apparatus and relevant device for realizing the positioning of pop-up interior element
CN104346174A (en) Method for describing and reproducing on-line vector diagram modeling process
CN112560403A (en) Text processing method and device and electronic equipment
CN110232155A (en) The information recommendation method and electronic equipment of browser interface
CN111209009A (en) Content distribution method and device, storage medium and electronic equipment
CN107368557B (en) Page editing method and device
CN115297042A (en) Method for detecting consistency of web pages under different networks and related equipment
CN108228145A (en) Data processing method, system and the mobile equipment of mixed type application program
CN109542401A (en) A kind of Web development approach, device, storage medium and processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: Beijing Guoshuang Technology Co.,Ltd.

Address before: 100086 Beijing city Haidian District Shuangyushu Area No. 76 Zhichun Road cuigongfandian 8 layer A

Applicant before: Beijing Guoshuang Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant