CN109284436A - Paths planning method and network piracy when searching for unknown message network find system - Google Patents
Paths planning method and network piracy when searching for unknown message network find system Download PDFInfo
- Publication number
- CN109284436A CN109284436A CN201811285660.3A CN201811285660A CN109284436A CN 109284436 A CN109284436 A CN 109284436A CN 201811285660 A CN201811285660 A CN 201811285660A CN 109284436 A CN109284436 A CN 109284436A
- Authority
- CN
- China
- Prior art keywords
- node
- value
- network
- search
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 230000003247 decreasing effect Effects 0.000 claims abstract description 8
- 238000007726 management method Methods 0.000 claims description 9
- 230000033001 locomotion Effects 0.000 claims description 4
- 238000004458 analytical method Methods 0.000 description 7
- 238000001514 detection method Methods 0.000 description 4
- 230000009193 crawling Effects 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 238000011109 contamination Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000009792 diffusion process Methods 0.000 description 2
- 239000006185 dispersion Substances 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- NHDHVHZZCFYRSB-UHFFFAOYSA-N pyriproxyfen Chemical compound C=1C=CC=NC=1OC(C)COC(C=C1)=CC=C1OC1=CC=CC=C1 NHDHVHZZCFYRSB-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/10—Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Technology Law (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Computer And Data Communications (AREA)
Abstract
The present invention provides paths planning methods when a kind of search unknown message network, applied to it is initial when all nodes attribute be unknown information network, if including the following steps: S1, finding that a certain node has particular community, its relating value is then determined as positive value, and the relating value of node around it is also assigned to positive value, and the size of the positive value is successively decreased with a distance from origin node;S2, the big node of the positive value is preferentially accessed, if access node has the particular community, repeatedly step S1.Method of the invention is applied to following occasion: the node containing particular community is searched in unknown information network with the system with intelligence;The purpose of the present invention is making rational planning for searching route to improve search efficiency, while realizing the search discovery of network piracy.
Description
Technical field
The present invention relates to information technology fields, and in particular to it is a kind of search unknown message network when paths planning method and
System is found using the network piracy of this method.
Background technique
Information network is usually by connecting and composing between node and node.Each node includes following information: one, content is believed
Breath, two, link information.It can be text, image, sound, video etc. on content information format, depending on meaning is by specific application.
Link information is directed toward other nodes, and system can find other nodes using this information.Link information sometimes referred to as link,
Location etc..
In general, " attribute of node " refers to certain characteristic of the content information of node, for example, text be advertisement or
Non- advertisement;Sound is voice, music or the bustling noise of a market;Whether video includes illegal contents, etc..Judge whether node has certain attribute
(manpower or machine) resource it is generally necessary to pay.
In general, information network is unknown for search system.System gradually, understand information network step by step, this mistake
Cheng Zhong, system can have following several states to the understanding of a node:
One, it hides, system does not know the presence of this node completely;
Two, it finding and does not access, system is aware of the presence of this node by neighborhood of nodes, but its data also has not been obtained,
It is naturally also far from being and any analysis is made to its information;
Three, understand connection and do not know about content, system is aware of the link information of this node, but has no knowledge about its content
(whether meeting certain attribute);
Four, understand content and do not know about connection, system is aware of the content of this node, but has no knowledge about its connection letter
Breath;
Five, understand completely.
Search system finds network by gradually accessing node, and the information for hiding node in the process is gradually revealed
Out.Internal system saves the information of a collection of node, these node informations can be in above-mentioned two to five some state.System
Where is the fixed trend in next step of knack, i.e., one is selected from numerous nodes of state two, three, four, after obtaining its information or carrying out
Continuous analysis so recycles.The target of system is the node for correctly finding particular community as quickly as possible in unknown network,
The fine or not efficiency by decision systems of above-mentioned decision.
In the prior art, the relevant technologies of above-mentioned network discovery have following several:
One, depth-first search and breadth first search;Two, based on the method for content clustering;Three, with PageRanking
For the link analysis method of representative.
Basic searching route has breadth First and two kinds of depth-first.Network structure is a non-directed graph in graph theory.
Any vertex v 0 of the breadth-first strategy inside Connected undigraph, successively search accesses v0 point again after accessing v0 point
Abutment points w1, w2, the w3 of other also not visited mistakes ..., successively search access each of w1 is accessed not yet again later
Abutment points, each abutment points being accessed not yet of w2, and so on, i.e., from closely to remote since v0 point, by level by
A access communicates and path length is successively incremental vertex since 1 ing with v0 point path, final up to all vertex are all in figure
It is accessed once.Depth-first Dissatisfied Suo Ze accesses any vertex v in figure first, then access is adjacent with v since vertex v point
But the another summit w1 being accessed not yet, any vertex w2 adjacent with w1 and accessed not yet is then accessed,
W3 ..., repeats the above process when cannot continue to access down, successively gradually return back to the vertex that recent visit is crossed, at this moment
It is accessed adjacent vertex if had or not, just executes the process of above-mentioned search again since the vertex, until institute in figure
Until thering is vertex to be all accessed to.Both methods is all in accordance with predefined sequential search network, to discovery particular community
This target of node does not have any optimization.
Method based on content clustering needs to define a kind of calculation method of distance between content information.Apart from close node
It is considered as around same " theme ", the connection issued from these nodes has higher priority, is accessed, is used for earlier
Fish-Search the and Shark-Search method of web crawlers is all such.Debrra et al. is put forward for the first time Fish-Search
Method, one lists of links sorted by priority of system maintenance, and next search target is selected according to it.In information search
In the process, link belonging to the higher node of the degree of correlation is endowed higher priority.Hersovic et al. is based on Fish-Search
Method proposes Shark-Search method, and the similarity of node, the method are creatively calculated using vector space model
The distance between vector is compared to judge similitude, really a kind of text cluster.
Link analysis method was proposed with Google founder Larry Page and Sergey Brind in 1997
PageRanking is representative.It is initially use in Google search engine, and effect is to calculate webpage according to discrepancy chain relation
Importance, webpage is ranked up accordingly.Link analysis method is introduced in web search, is constructed with the importance of webpage
Access privileges, important webpage first access.
The node containing particular community is searched in unknown information network, above-mentioned three kinds of methods have its weakness.Range is excellent
First and depth-first search is basic search mode, does not have any optimization to discovery target.Method requirement based on content clustering
Inter-node has mensurable similitude, such as " node about Chinese medicine ", and for the attribute of not mensurable similitude,
For example " node containing pirate text " is then helpless, because the content that " piracy " this attribute is related to is dispersion, between each other
It is not required for similar.Link analysis method initially calculates Web page importance in Google engine, condition be node information all
It has been obtained that, i.e., be all the node of aforesaid state five, system can go to calculate the sequence between them to the greatest extent.It is searched in unknown message network
In Suo Yingyong, system gradually, disclose node information step by step, a large amount of nodes in the process be hide or information it is incomplete, use chain
Connect the importance that analysis is difficult to accurate reconstruction node.
Summary of the invention
Present invention solves the technical problem that being to provide a kind of method applied to following occasion: being using what there is intelligence
System searches for the node containing particular community in unknown information network.The purpose of the present invention is make rational planning for searching route to improve
Search efficiency, while realizing the search discovery of network piracy.
To achieve the goals above, present invention employs following technical solutions:
A kind of paths planning method when search unknown message network, applied to it is initial when all nodes attribute be unknown
Information network, include the following steps:
If S1, finding that a certain node has particular community, its relating value is determined as positive value, and by node around it
Relating value is also assigned to positive value, and the size of the positive value is successively decreased with a distance from origin node;
S2, the big node of the positive value is preferentially accessed, if access node has the particular community, repeatedly step
S1。
Further, S1 is specifically included:
S11, a P value is associated with to each node in the information network, P is positive value and is initially 0, P (V) expression knot
The P value of point V;
S12, setting constant M and L, wherein M is the positive number greater than zero, indicates to find P when the particular community on node
Increment, L are capability of influence coefficient, 0 < L < M, as P (V) < L, no longer to the relating value assignment of its surroundings nodes;
S13, when determine certain node V have the particular community when, increase its P value: P (V) '=P (V)+M, and accordingly increase
Add the P value of node around it, and the P value increase of surroundings nodes is successively decreased with a distance from node V;
And S2 is specifically included:
S14, it adds up to the P value of each node, arranges the access order for determining node from high to low with P value.
Further, S13 is specifically included: using node V as root, with the n-layer node around breadth-first fashion traversal V, being increased
Add the P value of each node being accessed;
Specifically, enabling Vij is i-th layer of j-th of node, Δ Pij is the increment of P (Vij), and each layer of Δ Pij value is in the past
One layer is decayed by factor alpha, then Δ P1j=α M, Δ P2j=α2M, Δ P3j=α3M ...;P (Vij) '=P (Vij)+Δ pij,
In 0 < α < 1.
Preferably, the value range that the value range of M is 50~500, L is 0~0.1M.
Further, the particular community includes that node is related to the hot spot of public opinions of pirate content or illegal contents or diverging.
The present invention also provides a kind of network piracies to find system, including database server interconnected, business clothes
Be engaged in device and evidence obtaining server, the database server for record original work works relevant information, web crawlers job information and
System operation information, the service server are used to carry out data by web crawlers to crawl, execute search strategy and detection
Whether encroach right, the evidence obtaining server is for executing evidence obtaining movement;
Wherein, the web crawlers includes basic crawler unit, Features Management unit and strategy execution unit, the basis
Crawler unit is crawled for carrying out data;The content and the original that the Features Management unit is used to crawl basic crawler unit
Works product carry out characteristic matching, judge node with the presence or absence of pirate content;The strategy execution unit is used to be based on the feature
Matching and judging result execute search strategy using paths planning method as described above.
Further, the basic crawler progress data crawl including downloading web page contents and are filtered into text, and under
Image in support grid page;It includes doing filtered text and text original work works that the Features Management unit, which carries out characteristic matching,
Matching matches the image of downloading with image original work works.
Further, the web crawlers job information of the database server record and system operation information include:
URL, linking relationship and infringement discovery result.
Further, system includes that the master-slave mode formed by a database server and several service servers calculates
Machine cluster, the evidence obtaining server and service server are deployed on same hardware or distributed deployment is in the difference of internet
Position, the service server and evidence obtaining server are connected into internet by the outlet of local area network.
Beneficial effects of the present invention: " node for having found certain attribute " this event includes the information about network,
The present invention takes full advantage of this information, serves subsequent search.Method of the invention be suitable for node attribute it is non-cluster but
Have the case where certain association, typically such as pirate content, certain illegal contents, diverging hot spot of public opinions.In adjusting method
Parameter such as M, L, α value can make the present invention adapt to a variety of different occasions.Method of the invention can look in search all at sea
The efficiency of unknown message web search is improved in the big path of the probability of success out.
Network piracy of the invention finds system, and pirate content can be effectively found on network and is collected evidence and is recorded,
The huge and tortious dispersion for solving network makes obligee be difficult to find abuse and law to electronic evidence
Regulation relatively lags behind, and is difficult to the problem collected evidence finding infringement.
Detailed description of the invention
Fig. 1 is influence schematic diagram of the destination node to surroundings nodes in paths planning method of the invention.
Fig. 2 is the composition schematic diagram that network piracy of the invention finds system embodiment.
Fig. 3 is the function gradation structure that network piracy of the invention finds service server in system embodiment.
Specific embodiment
For a further understanding of the present invention, the preferred embodiment of the invention is described below with reference to embodiment, still
It should be appreciated that these descriptions are only further explanation the features and advantages of the present invention, rather than to the claims in the present invention
Limitation.
Embodiment 1
Paths planning method when a kind of search unknown message network is present embodiments provided, all knots when being applied to initial
The attribute of point is unknown information network, is included the following steps:
If S1, finding that a certain node has particular community, its relating value is determined as positive value, and by node around it
Relating value is also assigned to positive value, and the size of the positive value is successively decreased with a distance from origin node;
S2, the big node of the positive value is preferentially accessed, if access node has the particular community, repeatedly step
S1。
Planning path is during the above method is used to search for node containing particular community in unknown information network to mention
The effect of height search.Its basic principle is probability: the attribute of all nodes is unknown when initial, when the attribute of a certain node
It is determined as timing, the probability that the attribute of surrounding node is also positive increases.It is considered as, there is particular community (such as content is piracy)
Node have certain influence power to node around, this influence power is successively decreased with distance.Successively decrease by this influence power and with distance
Situation digitlization, impacted big node first accesses, and just forms a kind of search strategy of optimization.
Embodiment as a further preference, S1 are specifically included:
S11, a P value is associated with to each node in the information network, P is positive value and is initially 0, P (V) expression knot
The P value of point V;
S12, setting constant M and L, wherein M is the positive number greater than zero, indicates to find P when the particular community on node
Increment, L are capability of influence coefficient, 0 < L < M, as P (V) < L, no longer to the relating value assignment of its surroundings nodes;
S13, when determine certain node V have the particular community when, increase its P value: P (V) '=P (V)+M, and accordingly increase
Add the P value of node around it, and the P value increase of surroundings nodes is successively decreased with a distance from node V;
And S2 is specifically included:
S14, it adds up to the P value of each node, arranges the access order for determining node from high to low with P value.
In the present invention, set that the node that certain attribute is positive is powerful to node around, and influence degree declines with range attenuation
It reduces to centainly low and just loses capability of influence.As shown in Figure 1, value is influence of the point of P1 to the point on link path according to factor alpha
Exponential function variation, and intermediate point is influenced by two points that value is P1 and P2 simultaneously, and influence value is accumulated.It is empty
The point of line connection is hiding node, if P1 > L (herein yes) when being found, its influence value will be α P1.
Based on this, embodiment, S13 are specifically included as a further preference: using node V as root, with breadth-first side
Formula traverses n-layer (such as n=5) node around V, increases the P value for the node being each accessed;
Specifically, enabling Vij is i-th layer of j-th of node, Δ Pij is the increment of P (Vij), and each layer of Δ Pij value is in the past
One layer is decayed by factor alpha, then Δ P1j=α M, Δ P2j=α2M, Δ P3j=α3M ...;Preferably, the value range of M be 50~
500, L value range is 0~0.1M.For example take α=0.6, M=100, then (Δ P1j, Δ P2j, Δ P3j, Δ P4j, Δ
P5j)=(60,36,21.6,12.96,7.776);
P (Vij) '=P (Vij)+Δ pij.
Preferably, above-mentioned particular community can be that node be related to pirate content or illegal contents or diverging
Hot spot of public opinions.Adjusting M, L, α value can make the present invention adapt to a variety of different occasions.
Embodiment 2
A kind of network piracy discovery system is present embodiments provided, as shown in Figure 2 comprising database clothes interconnected
Business device, service server and evidence obtaining server.Database server is for recording original work works relevant information, web crawlers work
Information and system operation information, service server are used to carry out data by web crawlers to crawl, execute search strategy and inspection
It surveys and whether encroaches right, evidence obtaining server is for executing evidence obtaining movement;
Be illustrated in figure 3 the function gradation structure of service server, the web crawlers used include basic crawler unit,
Features Management unit and strategy execution unit, basic crawler unit are crawled for carrying out data, can be chosen in the present embodiment
WebMagic;The content and the original work works that Features Management unit is used to crawl basic crawler unit carry out characteristic matching,
Judge node with the presence or absence of pirate content;Strategy execution unit is used to be based on characteristic matching and judging result, using in embodiment 1
Paths planning method execute search strategy.
Embodiment as a further preference, basic crawler carry out data and crawl including downloading web page contents and be filtered into
Image in text, and downloading webpage;It includes by filtered text and text that the Features Management unit, which carries out characteristic matching,
Basis works product do matching or match the image of downloading with image original work works, to realize content of text and picture material
Pirate discovery.
Optionally, the web crawlers job information of database server record and system operation information include: URL, link
Relationship and infringement discovery are as a result, can be used for crawling the calculating of strategy.
Embodiment as a further preference, the system in the present embodiment include by a database server and several
Platform service server formed master-slave mode computer cluster, evidence obtaining server and service server be deployed on same hardware or
Distributed deployment is connected into Yin Te by the outlet of local area network in the different location of internet, service server and evidence obtaining server
Net.
In the present invention, pirate website is regarded as a kind of " pollution sources ", thus the path planning side in its embodiment 1 for using
Method is alternatively referred to as contamination method.In practical execution, the paths planning method in embodiment 1 is further described below:
One, damped manner has selected exponential function in embodiment 1, convenient for calculating, is also conceivable to it in other embodiments
His function.
Two, a node can repeatedly be polluted, and the pollution of node can be accumulated, thus the P value of node can be more than 100
(if setting M=100).
Three, on program is realized, due to crawling and encroaching right, detection is all batch processing, therefore the node encountered in diffusion path has
Several situations:
A) not yet by crawling, i.e., the node (url) of web data is not yet obtained;
B) node (url) of webpage has been obtained, it is divided into two classes again:
I. do not make infringement detection, do not know whether it contains piracy;
Ii. detection was done, it is known that whether it contains piracy, and testing result can be divided into:
1. containing piracy 2. without piracy
The node of situation b) can not become candidate url.B) indicate whether known current node is to steal the case where-ii
Version, (probability, possibility) seems unimportant whether " pollution ".A kind of way of embodiment is not distinguish these situations, one
Rule indistinguishably executes aforementioned strategy, i.e., not because the mode of operation of node (url) blocks the diffusion of pollution.In the time for choosing crawler
Selecting naturally can filter out inappropriate node when url.
Embodiment 3
Actual motion test is carried out using network piracy discovery system of the invention, measured result is as shown in the table:
Seed sum | 3358304 |
The seed number of pollution | 20802 |
The pirate point found in the seed of pollution | 7 |
The pirate point found in untainted kind | 164 |
Pollute seed piracy probability | 3.37*10-4 |
Uncontaminated seed piracy probability | 4.91*10-5 |
During whole system operation, url seed team shows the element of normal state (not comtaminated), and discovery is pirate therebetween
Point 164;There is the element of contaminated (P ≠ 0), the pirate point 7 of discovery therebetween.Find that pirate ratio is usual in points of contamination
6.85 times of state illustrate that method used by system is effective.
The above description of the embodiment is only used to help understand the method for the present invention and its core ideas.It should be pointed out that pair
For those skilled in the art, without departing from the principle of the present invention, the present invention can also be carried out
Some improvements and modifications, these improvements and modifications also fall within the scope of protection of the claims of the present invention.
Claims (9)
1. paths planning method when a kind of search unknown message network, applied to it is initial when all nodes attribute be unknown
Information network, which comprises the steps of:
If S1, finding that a certain node has particular community, its relating value is determined as positive value, and by the association of node around it
Value is also assigned to positive value, and the size of the positive value is successively decreased with a distance from origin node;
S2, the big node of the positive value is preferentially accessed, if access node has the particular community, repeatedly step S1.
2. paths planning method when search unknown message network as described in claim 1, which is characterized in that S1 is specifically wrapped
It includes:
S11, a P value is associated with to each node in the information network, P is positive value and is initially 0, P (V) expression node V
P value;
S12, setting constant M and L, wherein M is the positive number greater than zero, indicates the increment that P when the particular community is found on node,
L is capability of influence coefficient, 0 < L < M, as P (V) < L, no longer to the relating value assignment of its surroundings nodes;
S13, when determine certain node V have the particular community when, increase its P value: P (V) '=P (V)+M, and increase accordingly it
The P value of surrounding node, and the P value increase of surroundings nodes is successively decreased with a distance from node V;
And S2 is specifically included:
S14, it adds up to the P value of each node, arranges the access order for determining node from high to low with P value.
3. paths planning method when search unknown message network as claimed in claim 2, which is characterized in that S13 is specifically wrapped
It includes: using node V as root, with the n-layer node around breadth-first fashion traversal V, increasing the P value for the node being each accessed;
Specifically, enabling Vij is i-th layer of j-th of node, Δ Pij is the increment of P (Vij), and each layer of Δ Pij value is from preceding layer
Decay by factor alpha, then Δ P1j=α M, Δ P2j=α2M, Δ P3j=α3M ...;P (Vij) '=P (Vij)+Δ pij, wherein 0 <
α < 1.
4. paths planning method when search unknown message network as claimed in claim 3, which is characterized in that the value model of M
Enclosing for the value range of 50~500, L is 0~0.1M.
5. paths planning method when search unknown message network according to any one of claims 1-4, which is characterized in that institute
Stating particular community includes that node is related to the hot spot of public opinions of pirate content or illegal contents or diverging.
6. a kind of network piracy finds system, which is characterized in that including database server interconnected, service server and
Evidence obtaining server, the database server is for recording original work works relevant information, web crawlers job information and system fortune
Row information, the service server are used to carry out data by web crawlers and crawl, execute search strategy and detect whether to invade
Power, the evidence obtaining server is for executing evidence obtaining movement;
Wherein, the web crawlers includes basic crawler unit, Features Management unit and strategy execution unit, the basis crawler
Unit is crawled for carrying out data;The content and the original work that the Features Management unit is used to crawl basic crawler unit are made
Product carry out characteristic matching, judge node with the presence or absence of pirate content;The strategy execution unit is used to be based on the characteristic matching
And judging result, search strategy is executed using paths planning method as described in any one in claim 1-5.
7. network piracy as claimed in claim 6 finds system, which is characterized in that the basis crawler carries out data and crawls packet
It includes downloading web page contents and is filtered into text, and the image in downloading webpage;The Features Management unit carries out characteristic matching
Including filtered text is matched with text original work works or matches the image of downloading with image original work works.
8. network piracy as claimed in claim 6 finds system, which is characterized in that the network of the database server record
Crawler job information and system operation information include: URL, linking relationship and infringement discovery result.
9. network piracy as claimed in claim 7 or 8 finds system, which is characterized in that system includes being taken by a database
The master-slave mode computer cluster that business device and several service servers are formed, the evidence obtaining server are deployed in service server
On same hardware or distributed deployment is in the different location of internet, and the service server passes through local with evidence obtaining server
The outlet of net is connected into internet.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811285660.3A CN109284436B (en) | 2018-10-31 | 2018-10-31 | Path planning method and network piracy discovery system during searching unknown information network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811285660.3A CN109284436B (en) | 2018-10-31 | 2018-10-31 | Path planning method and network piracy discovery system during searching unknown information network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109284436A true CN109284436A (en) | 2019-01-29 |
CN109284436B CN109284436B (en) | 2020-06-23 |
Family
ID=65174744
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811285660.3A Active CN109284436B (en) | 2018-10-31 | 2018-10-31 | Path planning method and network piracy discovery system during searching unknown information network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109284436B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110188117A (en) * | 2019-04-15 | 2019-08-30 | 平安科技(深圳)有限公司 | Right-safeguarding strategy matching method, apparatus, computer equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101630327A (en) * | 2009-08-14 | 2010-01-20 | 昆明理工大学 | Design method of theme network crawler system |
CN102054003A (en) * | 2009-11-04 | 2011-05-11 | 北京搜狗科技发展有限公司 | Methods and systems for recommending network information and creating network resource index |
WO2017152550A1 (en) * | 2016-03-09 | 2017-09-14 | 乐视控股(北京)有限公司 | Webpage capture method and device |
CN107463688A (en) * | 2017-08-10 | 2017-12-12 | 四川长虹电器股份有限公司 | Mixed search algorithm based on web crawlers technology |
-
2018
- 2018-10-31 CN CN201811285660.3A patent/CN109284436B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101630327A (en) * | 2009-08-14 | 2010-01-20 | 昆明理工大学 | Design method of theme network crawler system |
CN102054003A (en) * | 2009-11-04 | 2011-05-11 | 北京搜狗科技发展有限公司 | Methods and systems for recommending network information and creating network resource index |
WO2017152550A1 (en) * | 2016-03-09 | 2017-09-14 | 乐视控股(北京)有限公司 | Webpage capture method and device |
CN107463688A (en) * | 2017-08-10 | 2017-12-12 | 四川长虹电器股份有限公司 | Mixed search algorithm based on web crawlers technology |
Non-Patent Citations (1)
Title |
---|
侯现成: "基于侵权社区挖掘的P2P网络版权内容传播研究", 《中国优秀硕士论文全文数据库》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110188117A (en) * | 2019-04-15 | 2019-08-30 | 平安科技(深圳)有限公司 | Right-safeguarding strategy matching method, apparatus, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109284436B (en) | 2020-06-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Sharma et al. | A brief review on search engine optimization | |
Dourisboure et al. | Extraction and classification of dense communities in the web | |
US7882099B2 (en) | System and method for focused re-crawling of web sites | |
Wu et al. | Identifying link farm spam pages | |
US8219549B2 (en) | Forum mining for suspicious link spam sites detection | |
Henzinger | Algorithmic challenges in web search engines | |
CN103605715B (en) | Data Integration treating method and apparatus for multiple data sources | |
CN101490685A (en) | A method for increasing the security level of a user machine browsing web pages | |
CN107437026B (en) | Malicious webpage advertisement detection method based on advertisement network topology | |
CN103853744B (en) | Deceptive junk comment detection method oriented to user generated contents | |
CN104202291A (en) | Anti-phishing method based on multi-factor comprehensive assessment method | |
CN109086356A (en) | The incorrect link relationship diagnosis of extensive knowledge mapping and modification method | |
CN109033203A (en) | A kind of feature extraction method for parallel processing towards big data | |
CN109104421A (en) | A kind of web site contents altering detecting method, device, equipment and readable storage medium storing program for executing | |
CN109284436A (en) | Paths planning method and network piracy when searching for unknown message network find system | |
CN110489636A (en) | A kind of web advertisement screen method based on code analysis and image procossing | |
CN103605670B (en) | A kind of method and apparatus for determining the crawl frequency of network resource point | |
CN103605735B (en) | website data analysis method and device | |
CN1495647A (en) | Information storage and research | |
KR101508190B1 (en) | Apparatus for colleting of harmful sites and method thereof | |
Aoki et al. | Graph visualization of dark web hyperlinks and their feature analysis | |
CN110472125B (en) | Multistage page cascading crawling method and equipment based on web crawler | |
Sarkar et al. | On rich clubs of path-based centralities in networks | |
CN107239704A (en) | Malicious web pages find method and device | |
KR101524618B1 (en) | Apparatus for colleting of harmful sites and method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |