CN110851747B - Information matching method and device - Google Patents

Information matching method and device Download PDF

Info

Publication number
CN110851747B
CN110851747B CN201810861161.8A CN201810861161A CN110851747B CN 110851747 B CN110851747 B CN 110851747B CN 201810861161 A CN201810861161 A CN 201810861161A CN 110851747 B CN110851747 B CN 110851747B
Authority
CN
China
Prior art keywords
url information
information
url
matching
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810861161.8A
Other languages
Chinese (zh)
Other versions
CN110851747A (en
Inventor
梁洪波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201810861161.8A priority Critical patent/CN110851747B/en
Publication of CN110851747A publication Critical patent/CN110851747A/en
Application granted granted Critical
Publication of CN110851747B publication Critical patent/CN110851747B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a method and a device for matching information, which are characterized in that a phrase input by a user, first URL information and second URL information related to the phrase are obtained; unifying the symbolic formats of the first URL information and the second URL information; removing the protocol, the user name and the password contained in the first URL information and the second URL information; aligning connection characters between port information and path information in the residual information of the first URL information and the second URL information, and dividing the residual information into two parts by taking the connection characters as boundaries; and if the two parts of the first URL information and the second URL information both meet the preset condition, determining that the first URL information is matched with the second URL information. By the method, redundant information in the URL information is removed, and the obtained URL information is adjusted and then matched. Not only is the matching of the URL optimized, but also the matching accuracy is improved.

Description

Information matching method and device
Technical Field
The present invention relates to the field of network technologies, and in particular, to an information matching method and apparatus.
Background
With the continuous development of society, the internet becomes an indispensable part of people's lives, and users have increasingly high requirements on the accuracy of information obtained by using search engines on the internet.
When performing keyword ranking analysis of Search Engine Optimization (SEO), obtaining link information of a phrase in a designated Search Engine through a crawler program according to the phrase and a Uniform Resource Locator (URL) input by a user, and then matching the link information with the URL input by the user.
In the prior art, the general matching process is as follows: firstly preprocessing the URL input by the user and the crawled URL, secondly comparing the URL input by the user and the crawled URL, and finally finishing matching if the URL input by the user and the crawled URL are equal. The search engine acquires information by judging the legality of the URL through preprocessing, and directly matching the URL according to the input URL of a user after preprocessing. The URL preprocessing mainly judges the legality of the URL, and a regular expression is generally adopted to match all parts of the URL to judge whether the URL is legal or not. Since the URL must be preceded by a specific protocol in the prior art, and the address cannot contain double bytes or non-link special characters, it may cause a situation that the URL is not clear or understood incorrectly by using the regular expression. The search results of the search engine are used in a mode of reflecting the webpage ranking according to the relevance of characters, words and phrases, namely, when keyword ranking is carried out, various matching rules occur, so that the matching is not flexible, and the URL input by a user is directly matched in a crawling result set, so that the business requirements are difficult to meet.
Disclosure of Invention
In view of this, embodiments of the present invention provide an information matching method and apparatus to achieve the purpose of optimizing URL matching and improving URL matching accuracy, and the embodiments of the present invention provide the following technical solutions:
a method of information matching, the method comprising:
acquiring a phrase input by a user, first Uniform Resource Locator (URL) information and second URL information which is acquired in a search engine through a crawler technology and is related to the phrase;
unifying symbol formats of the first URL information and the second URL information;
removing protocols, user names and passwords contained in the first URL information and the second URL information;
aligning connection characters between port information and path information in the residual information of the first URL information and the second URL information, and dividing the residual information into two parts by taking the connection characters as boundaries, wherein the left side of each connection character is a first part, and the right side of each connection character is a second part;
and matching the first part of the first URL information with the first part of the second URL information, and matching the second part of the first URL information with the second part of the second URL information, wherein the matching conditions are met, and the first URL information is determined to be matched with the second URL information.
Preferably, the unifying the symbolic formats of the first URL information and the second URL information includes:
and uniformly adjusting the symbol formats in the first URL information and the second URL information into a lower case format or an upper case format.
Preferably, the matching the first part of the first URL information and the first part of the second URL information, and the matching the second part of the first URL information and the second part of the second URL information, both satisfying a preset matching condition, and determining that the first URL information and the second URL information match includes:
matching a first portion of the first URL information and a first portion of the second URL information, and matching a second portion of the first URL information and a second portion of the second URL information;
and if the first part of the second URL information begins with the first part of the first URL information and the second part of the second URL information ends with the second part of the first URL information, determining that the first URL information and the second URL information are matched.
Preferably, if the first part of the second URL information does not begin with the first part of the first URL information or the second part of the second URL information does not end with the second part of the first URL information, it is determined that the first URL information and the second URL information do not match.
Preferably, the information matching method further includes:
in the process of removing the protocol, the user name and the password included in the first URL information and the second URL information, if it is detected that 80 ports are included in the first URL information and the second URL information, 80 ports in the first URL information and the second URL information are removed.
An information matching apparatus, the apparatus comprising:
the acquisition unit is used for acquiring a phrase input by a user, first Uniform Resource Locator (URL) information and second URL information which is acquired in a search engine through a crawler technology and is related to the phrase;
a uniform format unit for unifying symbol formats of the first URL information and the second URL information;
a removing unit, configured to remove a protocol, a user name, and a password included in the first URL information and the second URL information;
the adjusting unit is used for aligning connection characters between port information and path information in the residual information of the first URL information and the second URL information, dividing the residual information into two parts by taking the connection characters as boundaries, wherein the left side of each connection character is a first part, and the right side of each connection character is a second part;
and the matching unit is used for matching the first part of the first URL information with the first part of the second URL information and matching the second part of the first URL information with the second part of the second URL information, and the matching unit both meet preset matching conditions and determine that the first URL information is matched with the second URL information.
Preferably, the uniform format unit is configured to uniformly adjust symbol formats in the first URL information and the second URL information into a lower case format or an upper case format.
Preferably, the matching unit is configured to match a first part of the first URL information and a first part of the second URL information, and match a second part of the first URL information and a second part of the second URL information; and if the first part of the second URL information begins with the first part of the first URL information and the second part of the second URL information ends with the second part of the first URL information, determining that the first URL information is matched with the second URL information.
Preferably, the matching unit is further configured to determine that the first URL information does not match the second URL information if the first portion of the second URL information does not begin with the first portion of the first URL information or the second portion of the second URL information does not end with the second portion of the first URL information.
Preferably, the removing unit is further configured to, in a process of removing a protocol, a user name, and a password included in the first URL information and the second URL information, remove 80 ports in the first URL information and the second URL information if it is detected that 80 ports are included in the first URL information and the second URL information.
A storage medium on which a program is stored, the program implementing the above-described information matching method when executed by a processor.
A processor for running a program, wherein the program runs to execute the information matching method.
The embodiment of the invention obtains a phrase and first URL information input by a user, and obtains second URL information related to the phrase in a search engine through a crawler technology; unifying the symbolic formats of the first URL information and the second URL information; removing a protocol, a user name and a password contained in the first URL information and the second URL information, and 80 ports contained in the second URL information and the second URL information detected in the process; aligning connection characters between port information and path information in the residual information of the first URL information and the second URL information, and dividing the residual information into two parts by taking the connection characters as boundaries, wherein the left side of each connection character is a first part, and the right side of each connection character is a second part; and matching the first part of the first URL information with the first part of the second URL information, and matching the second part of the first URL information with the second part of the second URL information, wherein the first URL information and the second URL information are determined to be matched when preset conditions are met. By the information matching method, redundant information in the first URL information and the second URL information is removed, and the obtained first URL information and the obtained second URL information are adjusted and then matched. Not only is the matching of the URL optimized, but also the matching accuracy is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic flow chart of an information matching method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an information matching apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
As can be seen from the background art, the process of obtaining link information of a phrase in a designated search engine through a crawler technology according to the phrase and a uniform resource locator input by a user and then matching the link information with the uniform resource locator input by the user requires search engine optimization. The search engine optimization is a way of improving the ranking of the current website in the related search engine by using the search rules of the search engine. When matching, the URL input by the user and the crawled URL are needed to be preprocessed, and the URL preprocessing mainly adopts a regular expression to match all parts of the URL, namely, the legality of the URL is judged. The regular expression is used for retrieving and replacing the text which conforms to a certain pattern. When URL preprocessing is carried out in the prior art, the situation that the regular expression is unclear or wrong in understanding the URL can be caused, and various matching rules can be caused when keyword ranking is carried out, so that the matching is not flexible, and the service requirement is difficult to meet. Therefore, the invention discloses an information matching method and device, and aims to achieve the purposes of URL optimization and accurate matching.
Fig. 1 is a schematic flow chart of an information matching method according to an embodiment of the present invention. The information matching method includes the following steps.
Step S101: the method comprises the steps of obtaining a phrase and first URL information input by a user, and obtaining second URL information related to the phrase in a search engine through a crawler technology.
Step S102, unifying symbol formats in the first URL information and the second URL information.
It should be noted that unifying the symbolic formats of the first URL information and the second URL information includes: and uniformly adjusting the symbol formats in the first URL information and the second URL information into a lower case format or an upper case format.
Step S103, removing the protocol, the user name and the password contained in the first URL information and the second URL information.
In a specific implementation, a complete piece of URL information includes: protocol, username, password, domain name, port, path, query terms, and other fragments, etc. For example, a piece of URL information given as follows:
http://lianghonbo:123456@www.gridsum.com:8080/news/sznews/news.html?date=xx#top=10。
wherein http is a protocol; lianghonbo:12345 is the username and password; www.grisum.com is a domain name; 8080 is a port; html is a path; is there a date xx is a query condition; and # top 10 is a fragment.
In the process of executing step S103, the protocol, the user name, and the password included in the URL information are removed to obtain: www.gridsum.com 8080/news/sznews/news. htmldate. xx # top. 10.
Optionally, in the process of executing step S103, if it is detected that the first URL information and the second URL information include 80 ports, 80 ports in the first URL information and the second URL information are removed.
Step S104, aligning connection characters between the port information and the path information in the residual information of the first URL information and the second URL information, dividing the residual information into two parts by taking the connection characters as boundaries, wherein the left side of the connection characters is a first part, and the right side of the connection characters is a second part.
The above-described specific process of executing step S104 is exemplified.
If the first URL information of the protocol, the user name and the password is removed is as follows:
gridsum.com:8080/news/sznews/。
the second URL information for removing the protocol, the user name, and the password is:
www.gridsum.com:8080/news/sznews/news.htmldate=xx#top=10。
step S104 is performed to align the connection characters "/" between the ports and the paths in the first URL information and the second URL information, and to divide the remaining information of the first URL information and the second URL information into two parts with "/" as a boundary. The left side of the "/" is the first part and the right side of the "/" is the second part. The first part of the first URL information and the first part of the second URL information may be further aligned to the right, and the second part of the first URL information and the second part of the second URL information may be further aligned to the left, and then:
www.gridsum.com:8080/news/sznews/news.htmldate=xx#top=10
gridsum.com:8080/news/sznews/
step S105, matching the first part of the first URL information and the first part of the second URL information.
Step S106, matching the second part of the first URL information with the second part of the second URL information.
Step S105 and step S106 may be performed simultaneously or may not be performed simultaneously.
Step S107, if the results of the steps S105 and S106 both satisfy a preset matching condition, determining that the first URL information and the second URL information are matched.
In step S107, the preset matching condition is: the first portion of the second URL information begins with the first portion of the first URL information and the second portion of the second URL information ends with the second portion of the first URL information.
Therefore, in the process of executing step S105 to step S107, if the first part of the second URL information starts with the first part of the first URL information and the second part of the second URL information ends with the second part of the first URL information, it is determined that the first URL information and the second URL information match.
Determining that the first URL information and the second URL information do not match if the first portion of the second URL information does not begin with the first portion of the first URL information or the second portion of the second URL information does not end with the second portion of the first URL information.
For example: the first part of the second URL information is: www.gridsum.com:8080, the second part of the second URL information is: htmldate ═ xx # top ═ 10. The first part of the first URL information is: com:8080, the second part of the first URL information is: news/sznews/. In the matching process, the following results are obtained:
www.gridsum.com:8080/news/sznews/news.htmldate=xx#top=10
gridsum.com:8080/news/sznews/
as can be seen from the above example, the matching results in that the first portion of the second URL information begins with the first portion of the first URL information and the second portion of the second URL information ends with the second portion of the first URL information, confirming that the first URL information matches the second URL information.
If the first part of the second URL information is: www.gridsum.com:8080, the second part of the second URL information is: htmldate ═ xx # top ═ 10. The first part of the first URL information is: news. grid. com:8080, the second part of the first URL information is: news/sznews/. Obtained during the matching process
www.gridusm.com:8080/news/sznews/news.htmldate=xx#top=10
news.gridsum.com:8080/news/sznews/
As can be seen from the above example, the matching results in that the first portion of the second URL information does not begin with the first portion of the first URL information or the second portion of the second URL information does not end with the second portion of the first URL information, and it is determined that the first URL information and the second URL information do not match.
The embodiment of the invention obtains a phrase and first URL information input by a user, and obtains second URL information related to the phrase in a search engine through a crawler technology; unifying the symbolic formats of the first URL information and the second URL information; removing the protocol, the user name and the password contained in the first URL information and the second URL information, and 80 ports contained in the first URL information and the second URL information detected in the process; aligning connection characters between port information and path information in the residual information of the first URL information and the second URL information, and dividing the residual information into two parts by taking the connection characters as boundaries, wherein the left side of each connection character is a first part, and the right side of each connection character is a second part; and matching the first part of the first URL information with the first part of the second URL information, and matching the second part of the first URL information with the second part of the second URL information, wherein the first URL information and the second URL information are determined to be matched when preset conditions are met. By the information matching method, redundant information in the first URL information and the second URL information is removed, and the obtained first URL information and the obtained second URL information are adjusted and then matched. Not only is the matching of the URL optimized, but also the matching accuracy is improved.
Based on the information matching method disclosed in the embodiment of the present invention, the embodiment of the present invention also correspondingly discloses an information matching device, as shown in fig. 2, the information matching device 200 mainly includes:
the obtaining unit 201 is configured to obtain a phrase and first URL information input by a user, and second URL information related to the phrase, which is obtained in a search engine through a crawler technology.
A unifying unit 202, configured to unify the symbol formats of the first URL information and the second URL information.
A removing unit 203, configured to remove the protocol, the user name, and the password included in the first URL information and the second URL information.
An adjusting unit 204, configured to align a connection character between port information and path information in remaining information of the first URL information and the second URL information, divide the remaining information into two parts by using the connection character as a boundary, where a left side of the connection character is a first part, and a right side of the connection character is a second part;
the matching unit 205 is configured to match the first part of the first URL information and the first part of the second URL information, and match the second part of the first URL information and the second part of the second URL information, which both satisfy a preset matching condition, and determine that the first URL information and the second URL information match.
Further, the symbol format in the unified element 202 is: and uniformly adjusting the first URL information and the second URL information into a lower case format or an upper case format.
Further, in the process of performing the removing unit 203, if it is detected that the first URL information and the second URL information include 80 ports, the 80 ports included in the first URL information and the second URL information are removed.
Further, the preset matching conditions in the matching unit 205 are: the first portion of the second URL information begins with the first portion of the first URL information and the second portion of the second URL information ends with the second portion of the first URL information. Determining that the first URL information matches the second URL information if the first portion of the second URL information begins with the first portion of the first URL information and the second portion of the second URL information ends with the second portion of the first URL information; and if the first part of the second URL information does not begin with the first part of the second URL information and the second part of the second URL information does not end with the second part of the first URL information, determining that the first URL information and the second URL information do not match.
The specific principle and the implementation process of each module and unit in the information matching device disclosed in the embodiment of the present invention are the same as those of the information matching method disclosed in the embodiment of the present invention, and reference may be made to corresponding parts in the information matching method disclosed in the embodiment of the present invention, which are not described herein again.
The information matching device comprises a processor and a memory, wherein the acquisition unit, the unification unit, the removal unit, the adjustment unit, the matching unit and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The embodiment of the invention provides an information matching device, which comprises an acquisition unit, a matching unit and a display unit, wherein the acquisition unit is used for acquiring a phrase input by a user, first URL information and second URL information related to the phrase; secondly, unifying the symbolic formats of the first URL information and the second URL information through a unification unit; removing the protocol, the user name and the password of the first URL information and the second URL information and the 80 ports detected in the process again through the removing unit; then, adjusting the alignment of the first part of the first URL information and the first part of the second URL information and the alignment of the second part of the first URL information and the second part of the second URL information through an adjusting unit; and finally, matching the first part of the first URL information and the first part of the second URL information and the second part of the first URL information and the second part of the second URL information through a matching unit. Redundant information in the first URL information and the second URL information is removed through the removing unit, and the obtained first URL information and the obtained second URL information are adjusted through the adjusting unit and then matched through the matching unit. Not only is the matching of the URL optimized, but also the matching accuracy is improved.
An embodiment of the present invention provides a storage medium on which a program is stored, the program implementing the information matching method when executed by a processor.
The embodiment of the invention provides a processor, which is used for running a program, wherein the information matching method is executed when the program runs.
The embodiment of the invention provides equipment, which comprises a processor, a memory and a program which is stored on the memory and can run on the processor, wherein the processor executes the program and realizes the following steps:
acquiring a phrase input by a user, first Uniform Resource Locator (URL) information and second URL information which is acquired in a search engine through a crawler technology and is related to the phrase; unifying symbol formats of the first URL information and the second URL information; removing protocols, user names and passwords contained in the first URL information and the second URL information; aligning connection characters between port information and path information in the residual information of the first URL information and the second URL information, and dividing the residual information into two parts by taking the connection characters as boundaries, wherein the left side of each connection character is a first part, and the right side of each connection character is a second part; and matching the first part of the first URL information with the first part of the second URL information, and matching the second part of the first URL information with the second part of the second URL information, wherein the matching conditions are met, and the first URL information is determined to be matched with the second URL information.
Preferably, the unifying the symbolic formats of the first URL information and the second URL information includes: and uniformly adjusting the symbol formats in the first URL information and the second URL information into a lower case format or an upper case format.
Preferably, the matching the first part of the first URL information and the first part of the second URL information, and the matching the second part of the first URL information and the second part of the second URL information, both satisfying a preset matching condition, and determining that the first URL information and the second URL information match includes: matching a first portion of the first URL information and a first portion of the second URL information, and matching a second portion of the first URL information and a second portion of the second URL information; and if the first part of the second URL information begins with the first part of the first URL information and the second part of the second URL information ends with the second part of the first URL information, determining that the first URL information and the second URL information are matched.
Preferably, if the first part of the second URL information does not begin with the first part of the first URL information or the second part of the second URL information does not end with the second part of the first URL information, it is determined that the first URL information and the second URL information do not match.
Preferably, the method further comprises the following steps: in the process of removing the protocol, the user name and the password included in the first URL information and the second URL information, if it is detected that 80 ports are included in the first URL information and the second URL information, 80 ports in the first URL information and the second URL information are removed.
The device herein may be a server, a PC, a PAD, a mobile phone, etc.
All the embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, the system or system embodiments, which are substantially similar to the method embodiments, are described in a relatively simple manner, and reference may be made to some descriptions of the method embodiments for relevant points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (8)

1. An information matching method, characterized in that the method comprises:
acquiring a phrase input by a user, first Uniform Resource Locator (URL) information and second URL information which is acquired in a search engine through a crawler technology and is related to the phrase;
unifying symbol formats of the first URL information and the second URL information;
removing protocols, user names and passwords contained in the first URL information and the second URL information;
aligning connection characters between port information and path information in the residual information of the first URL information and the second URL information, and dividing the residual information into two parts by taking the connection characters as boundaries, wherein the left side of each connection character is a first part, and the right side of each connection character is a second part;
and matching a first portion of the first URL information with a first portion of the second URL information, and matching a second portion of the first URL information with a second portion of the second URL information, wherein if the first portion of the second URL information begins with the first portion of the first URL information and the second portion of the second URL information ends with the second portion of the first URL information, it is determined that the first URL information and the second URL information match.
2. The method of claim 1, wherein unifying the symbolic format of the first URL information and the second URL information comprises:
and uniformly adjusting the symbol formats in the first URL information and the second URL information into a lower case format or an upper case format.
3. The method of claim 1,
determining that the first URL information and the second URL information do not match if the first portion of the second URL information does not begin with the first portion of the first URL information or the second portion of the second URL information does not end with the second portion of the first URL information.
4. The method according to any one of claims 1-3, further comprising:
in the process of removing the protocol, the user name and the password included in the first URL information and the second URL information, if it is detected that 80 ports are included in the first URL information and the second URL information, 80 ports in the first URL information and the second URL information are removed.
5. An information matching apparatus, characterized in that the apparatus comprises:
the acquisition unit is used for acquiring a phrase input by a user, first Uniform Resource Locator (URL) information and second URL information which is acquired in a search engine through a crawler technology and is related to the phrase;
a uniform format unit for unifying symbol formats of the first URL information and the second URL information;
a removing unit, configured to remove a protocol, a user name, and a password included in the first URL information and the second URL information;
the adjusting unit is used for aligning connection characters between port information and path information in the residual information of the first URL information and the second URL information, dividing the residual information into two parts by taking the connection characters as boundaries, wherein the left side of each connection character is a first part, and the right side of each connection character is a second part;
a matching unit, configured to match a first portion of the first URL information with a first portion of the second URL information, and match a second portion of the first URL information with a second portion of the second URL information, and determine that the first URL information matches the second URL information if the first portion of the second URL information begins with the first portion of the first URL information and the second portion of the second URL information ends with the second portion of the first URL information.
6. The apparatus of claim 5,
the uniform format unit is used for uniformly adjusting the symbol formats in the first URL information and the second URL information into a lower case format or an upper case format.
7. A storage medium, characterized in that a program is stored thereon, which when executed by a processor implements the information matching method according to any one of claims 1 to 4.
8. A processor, characterized in that the processor is configured to run a program, wherein the program when running performs the information matching method according to any one of claims 1-4.
CN201810861161.8A 2018-08-01 2018-08-01 Information matching method and device Active CN110851747B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810861161.8A CN110851747B (en) 2018-08-01 2018-08-01 Information matching method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810861161.8A CN110851747B (en) 2018-08-01 2018-08-01 Information matching method and device

Publications (2)

Publication Number Publication Date
CN110851747A CN110851747A (en) 2020-02-28
CN110851747B true CN110851747B (en) 2022-08-02

Family

ID=69594491

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810861161.8A Active CN110851747B (en) 2018-08-01 2018-08-01 Information matching method and device

Country Status (1)

Country Link
CN (1) CN110851747B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102123050A (en) * 2011-03-09 2011-07-13 成都勤智数码科技有限公司 Network terminal management method
CN102594934A (en) * 2011-12-30 2012-07-18 奇智软件(北京)有限公司 Method and device for identifying hijacked website
CN102843271A (en) * 2011-11-14 2012-12-26 哈尔滨安天科技股份有限公司 Formalization detection method and system for malicious URL (uniform resource locator)
CN103577447A (en) * 2012-07-30 2014-02-12 百度在线网络技术(北京)有限公司 Method and equipment used for determining page type information of target pages
CN104732384A (en) * 2013-12-24 2015-06-24 中兴通讯股份有限公司 Processing method and system for application software online payment
CN104778164A (en) * 2014-01-09 2015-07-15 中国银联股份有限公司 Method and device for detecting repeated URL (Uniform Resource Locator)
CN105024989A (en) * 2014-11-26 2015-11-04 哈尔滨安天科技股份有限公司 Malicious URL heuristic detection method and system based on abnormal port
CN105095236A (en) * 2014-04-30 2015-11-25 优视科技有限公司 Advertisement filtering method and device
CN105138912A (en) * 2015-09-25 2015-12-09 北京奇虎科技有限公司 Method and device for generating phishing website detection rules automatically
CN105187439A (en) * 2015-09-25 2015-12-23 北京奇虎科技有限公司 Phishing website detection method and device
US20160337486A1 (en) * 2014-12-05 2016-11-17 Lg Electronics Inc. Apparatus for transmitting broadcast signal, apparatus for receiving broadcast signal, method for transmitting broadcast signal and method for receiving broadcast signal
CN107609032A (en) * 2017-08-09 2018-01-19 联动优势科技有限公司 A kind of matching process and electronic equipment

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102123050A (en) * 2011-03-09 2011-07-13 成都勤智数码科技有限公司 Network terminal management method
CN102843271A (en) * 2011-11-14 2012-12-26 哈尔滨安天科技股份有限公司 Formalization detection method and system for malicious URL (uniform resource locator)
CN102594934A (en) * 2011-12-30 2012-07-18 奇智软件(北京)有限公司 Method and device for identifying hijacked website
CN103577447A (en) * 2012-07-30 2014-02-12 百度在线网络技术(北京)有限公司 Method and equipment used for determining page type information of target pages
CN104732384A (en) * 2013-12-24 2015-06-24 中兴通讯股份有限公司 Processing method and system for application software online payment
CN104778164A (en) * 2014-01-09 2015-07-15 中国银联股份有限公司 Method and device for detecting repeated URL (Uniform Resource Locator)
CN105095236A (en) * 2014-04-30 2015-11-25 优视科技有限公司 Advertisement filtering method and device
CN105024989A (en) * 2014-11-26 2015-11-04 哈尔滨安天科技股份有限公司 Malicious URL heuristic detection method and system based on abnormal port
US20160337486A1 (en) * 2014-12-05 2016-11-17 Lg Electronics Inc. Apparatus for transmitting broadcast signal, apparatus for receiving broadcast signal, method for transmitting broadcast signal and method for receiving broadcast signal
CN105138912A (en) * 2015-09-25 2015-12-09 北京奇虎科技有限公司 Method and device for generating phishing website detection rules automatically
CN105187439A (en) * 2015-09-25 2015-12-23 北京奇虎科技有限公司 Phishing website detection method and device
CN107609032A (en) * 2017-08-09 2018-01-19 联动优势科技有限公司 A kind of matching process and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种面向大规模URL过滤的多模式串匹配算法;刘燕兵 等;《计算机学报》;20140515;第37卷(第5期);1159-1169 *

Also Published As

Publication number Publication date
CN110851747A (en) 2020-02-28

Similar Documents

Publication Publication Date Title
US8185621B2 (en) Systems and methods for monitoring webpages
CN100367276C (en) Method and appts for searching within a computer network
US20030014450A1 (en) Auto-correcting URL-parser
CN102541853B (en) Method and device which are capable of obtaining application information by utilizing browser address bar
CN103873918B (en) Image processing method, device and terminal
US20140006487A1 (en) Methods for making ajax web applications bookmarkable and crawable and devices thereof
WO2004084097A1 (en) Method and apparatus for detecting invalid clicks on the internet search engine
EP1982275A1 (en) Search platform
US20110196854A1 (en) Providing a www access to a web page
US11431749B2 (en) Method and computing device for generating indication of malicious web resources
CN107666404B (en) Broadband network user identification method and device
CN103812906B (en) Website recommendation method and device and communication system
CN104123125A (en) Webpage resource acquisition method and device
CN103793495B (en) Application message search method and system and application message acquisition methods and system
CN104065736B (en) A kind of URL reorientation methods, apparatus and system
CN103412944A (en) Internet addressing method and device
US20120054598A1 (en) Method and system for viewing web page and computer Program product thereof
WO2017063596A1 (en) Method, apparatus and device for processing sitemap
CN105871961B (en) A kind of method and device of gray scale publication routing
EP2711852A1 (en) Methods and systems for providing content provider-specified URL keyword navigation
CN103618742A (en) Method and system for acquiring sub domain names and webmaster permission verification method
CN110851747B (en) Information matching method and device
US20130212101A1 (en) Portlet processing apparatus, portal server, portal system, portlet processing method and recording medium
CN101739401A (en) Network search method and equipment
CN103064873B (en) A kind of web page quality data capture method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant