CN106815245B - Method and device for analyzing source information of search engine - Google Patents

Method and device for analyzing source information of search engine Download PDF

Info

Publication number
CN106815245B
CN106815245B CN201510860638.7A CN201510860638A CN106815245B CN 106815245 B CN106815245 B CN 106815245B CN 201510860638 A CN201510860638 A CN 201510860638A CN 106815245 B CN106815245 B CN 106815245B
Authority
CN
China
Prior art keywords
format
target
address
search engine
webpage address
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510860638.7A
Other languages
Chinese (zh)
Other versions
CN106815245A (en
Inventor
储雨知
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201510860638.7A priority Critical patent/CN106815245B/en
Publication of CN106815245A publication Critical patent/CN106815245A/en
Application granted granted Critical
Publication of CN106815245B publication Critical patent/CN106815245B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Abstract

The application discloses a method and a device for analyzing source information of a search engine. The method comprises the following steps: acquiring a target webpage address of source information of a search engine to be analyzed; judging whether the format of the target webpage address conforms to the format of a search engine page or not; if the format of the target webpage address does not conform to the format of the search engine page, judging whether the format of the target webpage address conforms to a preset format or not, wherein the preset format is a format which is configured in advance according to the jump page address; and if the format of the target webpage address accords with the preset format, analyzing the source information of the search engine corresponding to the target webpage address. By the method and the device, the problem that the accuracy of analyzing the source information of the search engine in the related technology is low is solved.

Description

Method and device for analyzing source information of search engine
Technical Field
The application relates to the field of website analysis, in particular to a method and a device for analyzing source information of a search engine.
Background
Currently, more and more websites are keen on the study of behavior of visitors to websites, i.e., website analysis. Among them, parsing search engine source information is a very important step in website analysis. In general, in the related art, the resolution is performed according to a source web page address (source URL), and if the source URL conforms to a preset format of a search engine page (for example, baidu.com/s. However, some search engines perform special processing on a search engine website in order to protect the privacy of keywords of a user, so that when the user clicks the content of the search engine, the user does not directly jump to a target website, but passes through a plurality of jump pages halfway and finally reaches the target website, so that a source URL obtained by a javascript code on the target website is a jump page URL rather than a search engine page URL, and finally, the source information of the search engine is difficult to analyze, and even can be directly judged as a non-search engine source.
Aiming at the problem of lower accuracy of analyzing the source information of the search engine in the related technology, no effective solution is provided at present.
Disclosure of Invention
The present application mainly aims to provide a method and an apparatus for parsing search engine source information, so as to solve the problem of low accuracy in parsing search engine source information in the related art.
In order to achieve the above object, according to one aspect of the present application, a method for parsing search engine source information is provided. The method comprises the following steps: acquiring a target webpage address of source information of a search engine to be analyzed; judging whether the format of the target webpage address conforms to the format of a search engine page or not; if the format of the target webpage address does not conform to the format of the search engine page, judging whether the format of the target webpage address conforms to a preset format or not, wherein the preset format is a format which is configured in advance according to the jump page address; and if the format of the target webpage address accords with the preset format, analyzing the source information of the search engine corresponding to the target webpage address.
Further, analyzing the search engine source information corresponding to the target web page address includes: determining position information corresponding to search engine source information in a preset format; determining a target position corresponding to the position information on the target webpage address; extracting content information on a target position in a target webpage address; and taking the content information at the target position in the target webpage address as the source information of the search engine corresponding to the target webpage address.
Further, after determining whether the format of the target webpage address conforms to the preset format, the method further includes: if the format of the target webpage address does not conform to the preset format, determining that the target webpage address is a webpage address which does not belong to a search engine source; sending the target webpage address to a target address; and analyzing the target webpage address on the target address.
Further, before determining whether the format of the target webpage address conforms to the preset format, the method further includes: acquiring jump page addresses of a target number from historical data; counting the target format of the jump page address according to the jump page addresses with the target number; and taking the target format of the jump page address as a preset format, and storing the preset format into a preset data list.
Further, if the format of the target webpage address does not conform to the format of the search engine page, determining whether the format of the target webpage address conforms to a preset format includes: if the format of the target webpage address does not conform to the format of the search engine page, determining the target webpage address as a jump page address; and matching the format of the target webpage address with the preset format in the preset data list one by one to judge whether the format of the target webpage address conforms to the preset format.
In order to achieve the above object, according to another aspect of the present application, there is provided a search engine source information parsing apparatus. The device includes: the first acquisition unit is used for acquiring a target webpage address of the source information of the search engine to be analyzed; the first judging unit is used for judging whether the format of the target webpage address conforms to the format of a search engine page or not; the second judgment unit is used for judging whether the format of the target webpage address conforms to a preset format or not under the condition that the format of the target webpage address does not conform to the format of the search engine page, wherein the preset format is a format which is configured in advance according to the jump page address; and the first analysis unit is used for analyzing the search engine source information corresponding to the target webpage address under the condition that the format of the target webpage address accords with the preset format.
Further, the first parsing unit includes: the first determining module is used for determining position information corresponding to the search engine source information in a preset format; the second determining module is used for determining a target position corresponding to the position information on the target webpage address; the extraction module is used for extracting content information on a target position in a target webpage address; and the third determining module is used for taking the content information on the target position in the target webpage address as the source information of the search engine corresponding to the target webpage address.
Further, the apparatus further comprises: the determining unit is used for determining the target webpage address as a webpage address which does not belong to a search engine source under the condition that the format of the target webpage address does not conform to the preset format; the sending unit is used for sending the target webpage address to the target address; and the second analysis unit is used for analyzing the target webpage address on the target address.
Further, the apparatus further comprises: the second acquisition unit is used for acquiring jump page addresses of a target number from historical data; the statistical unit is used for counting the target format of the jump page address according to the jump page addresses with the target number; and the storage unit is used for taking the target format of the jump page address as a preset format and storing the preset format to a preset data list.
Further, the second determination unit includes: the fourth determining module is used for determining the target webpage address as a jump page address under the condition that the format of the target webpage address does not conform to the format of the search engine page; and the judging module is used for matching the format of the target webpage address with the preset format in the preset data list one by one so as to judge whether the format of the target webpage address conforms to the preset format.
Through the application, the following steps are adopted: acquiring a target webpage address of source information of a search engine to be analyzed; judging whether the format of the target webpage address conforms to the format of a search engine page or not; if the format of the target webpage address does not conform to the format of the search engine page, judging whether the format of the target webpage address conforms to a preset format or not, wherein the preset format is a format which is configured in advance according to the jump page address; and if the format of the target webpage address accords with the preset format, analyzing the search engine source information corresponding to the target webpage address, solving the problem of lower accuracy of analyzing the search engine source information in the related technology, and further achieving the effect of improving the accuracy of analyzing the search engine source information.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:
FIG. 1 is a flow chart of a method for parsing search engine source information according to an embodiment of the present application; and
fig. 2 is a schematic diagram of a parsing apparatus for source information of a search engine according to an embodiment of the present application.
Detailed Description
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For convenience of description, terms referred to in the embodiments of the present application are explained below:
a Uniform Resource Locator (URL) is also called a web page address, and is an address of a standard Resource on the internet. In general, a URL is a character string used to describe information resources on the Internet, and is mainly used in various WWW client programs and server programs, especially in famous Mosaic. The URL can be used for describing various information resources in a uniform format, including files, addresses and directories of servers and the like. The format of the URL consists of the following three parts: the first part is a protocol (or called a service mode); the second part is the host IP address (sometimes including the port number) where the resource is stored; the third part is the specific address of the host resource, such as directory and file name. Between the first part and the second part, ": the// "symbol separates the second and third portions. The first and second portions are indispensable, and the third portion may sometimes be omitted.
According to the embodiment of the application, a method for analyzing the source information of the search engine is provided.
Fig. 1 is a flowchart of a method for parsing search engine source information according to an embodiment of the present application. As shown in fig. 1, the method comprises the steps of:
step S101, a target webpage address of the source information of the search engine to be analyzed is obtained.
The web page address of the source information of the search engine to be analyzed, which is obtained by the javascript code on the target website, is used as the target web page address, and the target web page address of the source information of the search engine to be analyzed is simply called as the source URL in the description of the application. The target web page address is the web page address directly jumping to the target web site, and the source information of the search engine may include the name of the source search engine, whether to pay or not, and the like.
Step S102, judging whether the format of the target webpage address accords with the format of the search engine page.
The specific format of the search engine page is not limited in the application, the format of the respective search engine page can be determined according to different search engines, the format of the unified search engine page can be determined according to a plurality of search engines, and whether the format of the source URL accords with the format of the search engine page or not is judged.
Step S103, if the format of the target webpage address does not conform to the format of the search engine page, judging whether the format of the target webpage address conforms to a preset format, wherein the preset format is a format which is configured in advance according to the jump page address.
In the application, whether the format of the source URL meets the preset format or not is judged according to the regular expression of the jump page address, which is the preset format of the jump page address, namely the format of the source URL is matched with the regular expression of the jump page address.
Optionally, in the method for parsing search engine source information provided in this embodiment of the present application, before determining whether a format of a target web address matches a preset format, the method further includes: acquiring jump page addresses of a target number from historical data; counting the target format of the jump page address according to the jump page addresses with the target number; and taking the target format of the jump page address as a preset format, and storing the preset format into a preset data list.
Optionally, in the method for parsing search engine source information provided in the embodiment of the present application, if the format of the target webpage address does not conform to the format of the search engine page, determining whether the format of the target webpage address conforms to a preset format includes: if the format of the target webpage address does not conform to the format of the search engine page, determining the target webpage address as a jump page address; and matching the format of the target webpage address with the preset format in the preset data list one by one to judge whether the format of the target webpage address conforms to the preset format.
For example, for a hundred degree natural search entry, the preset format is configured as follows: ("^ www.baidu.com/link. And storing the preset format into a preset data list. And when judging whether the format of the source URL accords with the preset format, matching the format of the source URL with the preset format stored in the preset data list one by one. When the format of the source URL is matched with the preset format, the format of the source URL can be determined to be in accordance with the preset format, and the source URL does not need to be continuously matched with the residual preset formats in the preset database.
And step S104, if the format of the target webpage address accords with the preset format, analyzing the source information of the search engine corresponding to the target webpage address.
And if the format of the target webpage address is successfully matched with the regular expression of the jump page address, analyzing the source URL by adopting the regular expression of the jump page address to obtain the source information of the search engine corresponding to the source URL.
Optionally, in the method for parsing search engine source information provided in the embodiment of the present application, parsing search engine source information corresponding to a target web page address includes: determining position information corresponding to search engine source information in a preset format; determining a target position corresponding to the position information on the target webpage address; extracting content information on a target position in a target webpage address; and taking the content information at the target position in the target webpage address as the source information of the search engine corresponding to the target webpage address.
For example, the preset format is: ("^ www.baidu.com/link. If the matching between the preset format and the source URL is successful, key information such as a source engine name, whether to pay or not (i.e., search engine source information corresponding to the source URL) can be obtained from the target location information corresponding to the source URL.
Optionally, in the method for parsing search engine source information provided in the embodiment of the present application, after determining whether a format of a target web address conforms to a preset format, the method further includes: if the format of the target webpage address does not conform to the preset format, determining that the target webpage address is a webpage address which does not belong to a search engine source; sending the target webpage address to a target address; and analyzing the target webpage address on the target address.
If the format of the source URL does not conform to the preset format, determining that the source URL is a webpage address not belonging to the source of the search engine; and transferring the source URL to other source analysis modules for analysis.
In summary, through the above steps, the problem that when the target webpage address does not conform to the format of the search engine page, the accuracy of analyzing the source information of the search engine is low due to the fact that the target webpage address is directly judged to be a webpage address of a non-search source is solved.
According to the method for analyzing the source information of the search engine, the target webpage address of the source information of the search engine to be analyzed is obtained; judging whether the format of the target webpage address conforms to the format of a search engine page or not; if the format of the target webpage address does not conform to the format of the search engine page, judging whether the format of the target webpage address conforms to a preset format or not, wherein the preset format is a format which is configured in advance according to the jump page address; and if the format of the target webpage address accords with the preset format, analyzing the search engine source information corresponding to the target webpage address, solving the problem of lower accuracy of analyzing the search engine source information in the related technology, and further achieving the effect of improving the accuracy of analyzing the search engine source information.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
The embodiment of the present application further provides an analysis device for search engine source information, and it should be noted that the analysis device for search engine source information in the embodiment of the present application may be used to execute the analysis method for search engine source information provided in the embodiment of the present application. The following describes an apparatus for parsing search engine source information according to an embodiment of the present application.
Fig. 2 is a schematic diagram of a parsing apparatus for source information of a search engine according to an embodiment of the present application. As shown in fig. 2, the apparatus includes: a first acquiring unit 10, a first judging unit 20, a second judging unit 30 and a first analyzing unit 40.
The first obtaining unit 10 is configured to obtain a target web page address of source information of a search engine to be analyzed.
The first judging unit 20 is used for judging whether the format of the target webpage address conforms to the format of the search engine page.
The second judging unit 30 is configured to judge whether the format of the target webpage address conforms to a preset format under the condition that the format of the target webpage address does not conform to the format of the search engine page, where the preset format is a format pre-configured according to the jump page address.
The first parsing unit 40 is configured to parse the search engine source information corresponding to the target webpage address when the format of the target webpage address conforms to a preset format.
According to the analysis device for the source information of the search engine provided by the embodiment of the application, the target webpage address of the source information of the search engine to be analyzed is acquired through the first acquisition unit 10; the first judgment unit 20 judges whether the format of the target web page address conforms to the format of the search engine page; the second judging unit 30 judges whether the format of the target webpage address conforms to a preset format under the condition that the format of the target webpage address does not conform to the format of the search engine page, wherein the preset format is a format pre-configured according to the jump page address; and the first parsing unit 40 parses the search engine source information corresponding to the target webpage address under the condition that the format of the target webpage address conforms to the preset format, thereby solving the problem of low accuracy of parsing the search engine source information in the related art and further achieving the effect of improving the accuracy of parsing the search engine source information.
Optionally, in the parsing apparatus for source information of a search engine provided in the embodiment of the present application, the first parsing unit 40 includes: the first determining module is used for determining position information corresponding to the search engine source information in a preset format; the second determining module is used for determining a target position corresponding to the position information on the target webpage address; the extraction module is used for extracting content information on a target position in a target webpage address; and the third determining module is used for taking the content information on the target position in the target webpage address as the source information of the search engine corresponding to the target webpage address.
Optionally, in the apparatus for parsing source information of a search engine provided in the embodiment of the present application, the apparatus further includes: the determining unit is used for determining the target webpage address as a webpage address which does not belong to a search engine source under the condition that the format of the target webpage address does not conform to the preset format; the sending unit is used for sending the target webpage address to the target address; and the second analysis unit is used for analyzing the target webpage address on the target address.
Optionally, in the apparatus for parsing source information of a search engine provided in the embodiment of the present application, the apparatus further includes: the second acquisition unit is used for acquiring jump page addresses of a target number from historical data; the statistical unit is used for counting the target format of the jump page address according to the jump page addresses with the target number; and the storage unit is used for taking the target format of the jump page address as a preset format and storing the preset format to a preset data list.
Optionally, in the apparatus for parsing source information of a search engine provided in the embodiment of the present application, the second determining unit 30 includes: the fourth determining module is used for determining the target webpage address as a jump page address under the condition that the format of the target webpage address does not conform to the format of the search engine page; and the judging module is used for matching the format of the target webpage address with the preset format in the preset data list one by one so as to judge whether the format of the target webpage address conforms to the preset format.
The analysis device for the search engine source information comprises a processor and a memory, wherein the first acquisition unit, the first judgment unit, the second judgment unit, the first analysis unit and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions. The preset data list and the preset format can be stored in the memory.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to one or more than one, and the search engine source information is analyzed by adjusting the kernel parameters.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
The present application further provides an embodiment of a computer program product, which, when being executed on a data processing device, is adapted to carry out program code for initializing the following method steps: acquiring a target webpage address of source information of a search engine to be analyzed; judging whether the format of the target webpage address conforms to the format of a search engine page or not; if the format of the target webpage address does not conform to the format of the search engine page, judging whether the format of the target webpage address conforms to a preset format or not, wherein the preset format is a format which is configured in advance according to the jump page address; and if the format of the target webpage address accords with the preset format, analyzing the source information of the search engine corresponding to the target webpage address.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
It will be apparent to those skilled in the art that the modules or steps of the present application described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present application is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (8)

1. A method for analyzing source information of a search engine is characterized by comprising the following steps:
acquiring a target webpage address of source information of a search engine to be analyzed;
judging whether the format of the target webpage address conforms to the format of a search engine page or not;
if the format of the target webpage address does not conform to the format of the search engine page, judging whether the format of the target webpage address conforms to a preset format or not, wherein the preset format is a format which is configured in advance according to the jump page address; and
wherein, the preset format comprises the following contents: regular expressions of the jump page address, search engine names and information about whether to pay or not;
if the format of the target webpage address conforms to the preset format, analyzing the source information of the search engine corresponding to the target webpage address;
before judging whether the format of the target webpage address accords with a preset format, the method further comprises the following steps: acquiring jump page addresses of a target number from historical data; counting the target format of the jump page address according to the jump page addresses with the target number; and taking the target format of the jump page address as the preset format, and storing the preset format to a preset data list.
2. The method of claim 1, wherein parsing search engine source information corresponding to the target web page address comprises:
determining position information corresponding to search engine source information in the preset format;
determining a target position corresponding to the position information on the target webpage address;
extracting content information on a target position in the target webpage address; and
and taking the content information on the target position in the target webpage address as the source information of the search engine corresponding to the target webpage address.
3. The method of claim 1, wherein after determining whether the format of the target webpage address conforms to a preset format, the method further comprises:
if the format of the target webpage address does not conform to the preset format, determining that the target webpage address is a webpage address which does not belong to a search engine source;
sending the target webpage address to a target address; and
and analyzing the target webpage address on the target address.
4. The method of claim 1, wherein if the format of the target web address does not conform to the format of the search engine page, determining whether the format of the target web address conforms to a predetermined format comprises:
if the format of the target webpage address does not conform to the format of the search engine page, determining the target webpage address as a jump page address; and
and matching the format of the target webpage address with the preset format in the preset data list one by one to judge whether the format of the target webpage address conforms to the preset format.
5. A device for parsing source information of a search engine is characterized by comprising:
the first acquisition unit is used for acquiring a target webpage address of the source information of the search engine to be analyzed;
the first judgment unit is used for judging whether the format of the target webpage address conforms to the format of a search engine page or not;
the second judgment unit is used for judging whether the format of the target webpage address conforms to a preset format or not under the condition that the format of the target webpage address does not conform to the format of the search engine page, wherein the preset format is a format which is configured in advance according to the jump page address; and
wherein, the preset format comprises the following contents: regular expressions of the jump page address, search engine names and information about whether to pay or not;
the first analysis unit is used for analyzing the search engine source information corresponding to the target webpage address under the condition that the format of the target webpage address conforms to the preset format;
wherein the apparatus further comprises: the second acquisition unit is used for acquiring jump page addresses of a target number from historical data; the counting unit is used for counting the target format of the jump page address according to the jump page addresses with the target number; and the storage unit is used for taking the target format of the jump page address as the preset format and storing the preset format to a preset data list.
6. The apparatus of claim 5, wherein the first parsing unit comprises:
the first determining module is used for determining the position information corresponding to the search engine source information in the preset format;
the second determining module is used for determining a target position corresponding to the position information on the target webpage address;
the extraction module is used for extracting content information on a target position in the target webpage address; and
and the third determining module is used for taking the content information on the target position in the target webpage address as the source information of the search engine corresponding to the target webpage address.
7. The apparatus of claim 5, further comprising:
the determining unit is used for determining the target webpage address as a webpage address which does not belong to a search engine source under the condition that the format of the target webpage address does not conform to the preset format;
the sending unit is used for sending the target webpage address to a target address; and
and the second analysis unit is used for analyzing the target webpage address on the target address.
8. The apparatus according to claim 5, wherein the second determining unit comprises:
a fourth determining module, configured to determine that the target webpage address is a jump page address when the format of the target webpage address does not conform to the format of the search engine page; and
and the judging module is used for matching the format of the target webpage address with the preset format in the preset data list one by one so as to judge whether the format of the target webpage address conforms to the preset format.
CN201510860638.7A 2015-11-30 2015-11-30 Method and device for analyzing source information of search engine Active CN106815245B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510860638.7A CN106815245B (en) 2015-11-30 2015-11-30 Method and device for analyzing source information of search engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510860638.7A CN106815245B (en) 2015-11-30 2015-11-30 Method and device for analyzing source information of search engine

Publications (2)

Publication Number Publication Date
CN106815245A CN106815245A (en) 2017-06-09
CN106815245B true CN106815245B (en) 2020-05-22

Family

ID=59107192

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510860638.7A Active CN106815245B (en) 2015-11-30 2015-11-30 Method and device for analyzing source information of search engine

Country Status (1)

Country Link
CN (1) CN106815245B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101159592A (en) * 2007-08-10 2008-04-09 北大方正集团有限公司 Statistical method and device of internet data information clicking rates
CN101237341A (en) * 2007-02-01 2008-08-06 阿里巴巴公司 User information processing method and system
CN104462182A (en) * 2014-10-10 2015-03-25 北京国双科技有限公司 Webpage skipping processing method and device
CN104679747A (en) * 2013-11-26 2015-06-03 腾讯科技(深圳)有限公司 Detection device and method for website redirection
CN104915347A (en) * 2014-03-11 2015-09-16 腾讯科技(北京)有限公司 Processing method, apparatus and system for web address

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101237341A (en) * 2007-02-01 2008-08-06 阿里巴巴公司 User information processing method and system
CN101159592A (en) * 2007-08-10 2008-04-09 北大方正集团有限公司 Statistical method and device of internet data information clicking rates
CN104679747A (en) * 2013-11-26 2015-06-03 腾讯科技(深圳)有限公司 Detection device and method for website redirection
CN104915347A (en) * 2014-03-11 2015-09-16 腾讯科技(北京)有限公司 Processing method, apparatus and system for web address
CN104462182A (en) * 2014-10-10 2015-03-25 北京国双科技有限公司 Webpage skipping processing method and device

Also Published As

Publication number Publication date
CN106815245A (en) 2017-06-09

Similar Documents

Publication Publication Date Title
CN103888490B (en) A kind of man-machine knowledge method for distinguishing of full automatic WEB client side
US8788925B1 (en) Authorized syndicated descriptions of linked web content displayed with links in user-generated content
CN114417197A (en) Access record processing method and device and storage medium
US20150302466A1 (en) Data determination method and device for a thermodynamic chart
CN103577526B (en) It is a kind of to verify method, system and browser that whether the page is changed
CN106534268B (en) Data sharing method and device
CN111008405A (en) Website fingerprint identification method based on file Hash
WO2017136755A1 (en) Analyzing analytic element network traffic
CN109815112B (en) Data debugging method and device based on functional test and terminal equipment
US20210383059A1 (en) Attribution Of Link Selection By A User
CN103973506A (en) Domain name verifying method, device and system
US20170230258A1 (en) Managing network communication protocols
CN106815248B (en) Website analysis method and device
CN110955855B (en) Information interception method, device and terminal
CN106897297B (en) Method and device for determining access path between website columns
US9749352B2 (en) Apparatus and method for collecting harmful website information
CN103618742A (en) Method and system for acquiring sub domain names and webmaster permission verification method
CN108108381B (en) Page monitoring method and device
CN106815247B (en) Uniform resource locator obtaining method and device
CN106815245B (en) Method and device for analyzing source information of search engine
CN111209325A (en) Service system interface identification method, device and storage medium
US9756064B2 (en) Apparatus and method for collecting harmful website information
CN108073589B (en) Method and device for acquiring webpage elements
CN107784054B (en) Page publishing method and device
CN110825976B (en) Website page detection method and device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: Beijing Guoshuang Technology Co.,Ltd.

Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing

Applicant before: Beijing Guoshuang Technology Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant