CN110442775A - Acquisition methods, device and the electronic equipment of multiple level marketing Website publicity address - Google Patents

Acquisition methods, device and the electronic equipment of multiple level marketing Website publicity address Download PDF

Info

Publication number
CN110442775A
CN110442775A CN201910743972.2A CN201910743972A CN110442775A CN 110442775 A CN110442775 A CN 110442775A CN 201910743972 A CN201910743972 A CN 201910743972A CN 110442775 A CN110442775 A CN 110442775A
Authority
CN
China
Prior art keywords
network address
target network
suspected target
address
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910743972.2A
Other languages
Chinese (zh)
Inventor
胡招武
范渊
杨勃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dbappsecurity Technology Co Ltd
Original Assignee
Hangzhou Dbappsecurity Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dbappsecurity Technology Co Ltd filed Critical Hangzhou Dbappsecurity Technology Co Ltd
Priority to CN201910743972.2A priority Critical patent/CN110442775A/en
Publication of CN110442775A publication Critical patent/CN110442775A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]

Abstract

The present invention provides a kind of acquisition methods of multiple level marketing Website publicity address, device and electronic equipments, are related to network detection field, first by being based on multiple preset search engines, retrieve to target search phrase, obtain multiple search results;It is then based on multiple search results and determines multiple suspected target network address, and obtain the corresponding content of pages of multiple suspected target network address;Then the degree of correlation of each suspected target network address and target search phrase in multiple suspected target network address is determined respectively based on the corresponding content of pages of multiple suspected target network address;The suspected target network address that the degree of correlation reaches preset threshold is finally determined as multiple level marketing Website publicity address.It finds that the movable method of multiple level marketing on line, technical solution provided by the invention can alleviate the lower problem of efficiency existing in the prior art compared to the existing mode dependent on report, is conducive to the acquisition efficiency for improving multiple level marketing website.

Description

Acquisition methods, device and the electronic equipment of multiple level marketing Website publicity address
Technical field
The present invention relates to network field, in particular to a kind of acquisition methods of multiple level marketing Website publicity address, device and Electronic equipment.
Background technique
Multiple level marketing refers to that organizer develops personnel, by development personnel or requires to be developed personnel to pay certain expense and be Condition, which obtains, is added the illegal activities that the modes such as qualification attain wealth.
Currently, the manner of multiple level marketing is mainly Below-the-line, the discovery for the pyramid schemes under line be usually by with It is reported at family.
However, the publicity of pyramid schemes is also from traditional Below-the-line on line along with the development of internet and universal The direction of Website publicity is developed, and website has become the important channel of multiple level marketing publicity on the line with high sharing.And rely on tradition The mode of report find pyramid schemes on line, efficiency is relatively low.
Summary of the invention
The purpose of the present invention includes, for example, providing a kind of acquisition methods of multiple level marketing Website publicity address, device and electronics Equipment can alleviate the lower problem of acquisition efficiency existing in the prior art.
The embodiment of the present invention this can be implemented so that
In a first aspect, the embodiment of the present invention provides a kind of acquisition methods of multiple level marketing Website publicity address, comprising the following steps:
Based on multiple preset search engines, target search phrase is retrieved, obtains multiple search results;
Multiple suspected target network address are determined based on multiple search results, and it is corresponding to obtain the multiple suspected target network address Content of pages;
It is determined in the multiple suspected target network address respectively based on the corresponding content of pages of the multiple suspected target network address The degree of correlation of each suspected target network address and the target search phrase;
The suspected target network address that the degree of correlation reaches preset threshold is determined as multiple level marketing Website publicity address.
In alternative embodiments, the target search phrase include according to default queueing discipline multiple level marketing entry name, Search for separator and website keyword.
It is in alternative embodiments, described to determine multiple suspected target network address based on multiple search results, comprising:
Primary filtration is carried out to the multiple search result and obtains multiple results to be analyzed;
Postsearch screening is carried out to the multiple result to be analyzed and obtains multiple suspected target network address.
It is in alternative embodiments, described to obtain the corresponding content of pages of the multiple suspected target network address, comprising:
It sends and requests to the suspected target network address;
Receive the response data that the suspected target network address returns;
The response data is parsed to obtain content of pages.
In alternative embodiments, described to be determined respectively based on the corresponding content of pages of the multiple suspected target network address The degree of correlation of each suspected target network address and the target search phrase in the multiple suspected target network address, comprising:
Described in being determined respectively based on tf-idf computation model and the corresponding content of pages of the multiple suspected target network address The degree of correlation of each suspected target network address and the target search phrase in multiple suspected target network address;Wherein, the tf-idf Computation model is constructed based on tf-idf algorithm.
In alternative embodiments, the calculation formula of the tf-idf algorithm includes:
wi,j=tfi,j×idfi
In above formula, wi,jIndicate the degree of correlation of entry i and document j;tfi,jIndicate word frequency of the entry i in document j;ni,jTable Show the number that entry i occurs in certain class document j;∑knk,jIndicate the entry sum in the document;idfiIndicate that entry i's is inverse To document-frequency;| D | indicate total number of documents relevant to entry i, | { d ∈ D:t ∈ d } | indicate the number of files comprising entry i.
In alternative embodiments, the method also includes:
The suspected target network address for reaching preset threshold is ranked up and is exported according to the sequence of the degree of correlation from big to small.
Second aspect, the embodiment of the present invention provide a kind of acquisition device of multiple level marketing Website publicity address, comprising the following steps:
Retrieval module retrieves target search phrase, obtains multiple inspections for being based on multiple preset search engines Hitch fruit;
Analysis module for determining multiple suspected target network address based on multiple search results, and obtains the multiple doubtful The corresponding content of pages of target network address;
Computing module, for determining the multiple doubt respectively based on the corresponding content of pages of the multiple suspected target network address Like the degree of correlation of suspected target network address and the target search phrase each in target network address;
Determining module, the suspected target network address for the degree of correlation to be reached to preset threshold is with being determined as multiple level marketing Website publicity Location.
The third aspect, the embodiment of the present invention provide a kind of electronic equipment, including memory, processor and are stored in described deposit On reservoir and the computer program that can run on the processor, the processor are realized when executing the computer program The step of stating aforementioned embodiments described in any item methods.
Fourth aspect, the embodiment of the present invention provide a kind of computer readable storage medium, the computer-readable storage medium Computer program is stored in matter, the computer program executes any one of above-mentioned aforementioned embodiments institute when being run by processor The step of method stated.
The beneficial effect of the embodiment of the present invention includes, for example:
Acquisition methods, device, electronic equipment and the computer of multiple level marketing Website publicity provided in an embodiment of the present invention address can Storage medium is read to retrieve target search phrase by being primarily based on multiple preset search engines, obtain multiple retrievals As a result;It is then based on multiple search results and determines multiple suspected target network address, and it is corresponding to obtain the multiple suspected target network address Content of pages;Then the multiple suspected target is determined based on the corresponding content of pages of the multiple suspected target network address respectively The degree of correlation of each suspected target network address and the target search phrase in network address;The degree of correlation is finally reached into doubting for preset threshold It is determined as multiple level marketing Website publicity address like target network address.Therefore, it is found on line compared to the existing mode dependent on report It is lower can to alleviate efficiency existing in the prior art for the method for pyramid schemes, technical solution provided in an embodiment of the present invention Problem is conducive to the acquisition efficiency for improving multiple level marketing website.
Other features and advantages of the present invention will illustrate in the following description, also, partly become from specification It obtains it is clear that understand through the implementation of the invention.
To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate Appended attached drawing, is described in detail below.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached Figure is briefly described, it should be understood that the following drawings illustrates only certain embodiments of the present invention, therefore is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.
Fig. 1 shows a kind of flow chart of the acquisition methods of multiple level marketing Website publicity address provided in an embodiment of the present invention;
Fig. 2 shows the flow charts of the acquisition methods of another multiple level marketing Website publicity address provided in an embodiment of the present invention;
Fig. 3 shows a kind of specific example of the acquisition methods of multiple level marketing Website publicity address provided in an embodiment of the present invention Flow chart;
Fig. 4 shows a kind of schematic diagram of the acquisition device of multiple level marketing Website publicity address provided in an embodiment of the present invention;
Fig. 5 shows the schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.The present invention being usually described and illustrated herein in the accompanying drawings is implemented The component of example can be arranged and be designed with a variety of different configurations.
Therefore, the detailed description of the embodiment of the present invention provided in the accompanying drawings is not intended to limit below claimed The scope of the present invention, but be merely representative of selected embodiment of the invention.Based on the embodiments of the present invention, this field is common Technical staff's every other embodiment obtained without creative efforts belongs to the model that the present invention protects It encloses.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing.
In the description of the present invention, it should be noted that retouched if term " first ", " second " etc. occur and being only used for distinguishing It states, is not understood to indicate or imply relative importance.
It should be noted that in the absence of conflict, the feature in the embodiment of the present invention can be combined with each other.
Currently, the manner of multiple level marketing is mainly Below-the-line, the discovery for the pyramid schemes under line be usually by with It is reported at family;Along with the development of internet and universal, the publicity of pyramid schemes is also surfed the Internet from traditional Below-the-line to line Stand publicity direction develop, with high sharing line on website have become multiple level marketing publicity important channel.However, by tradition The mode of report find pyramid schemes on line, efficiency is relatively low;How line on multiple level marketing movable publicity address is efficiently found The problem of as current urgent need to resolve.
Based on this, acquisition methods, device and the electronics of a kind of multiple level marketing Website publicity address provided in an embodiment of the present invention are set It is standby, the prior art can be alleviated and obtain the lower problem of efficiency, improve the acquisition efficiency of multiple level marketing promotional activities address on line.
For convenient for understanding the present embodiment, first to a kind of multiple level marketing Website publicity disclosed in the embodiment of the present invention The acquisition methods of location describe in detail.
Embodiment one:
The acquisition methods for present embodiments providing a kind of multiple level marketing Website publicity address, the acquisition applied to multiple level marketing website are led Domain is executed by the electronic equipment in the field, and electronic equipment for example can be the main controller of terminal or terminal Deng.
Referring to FIG. 1, the acquisition methods of the multiple level marketing Website publicity address, comprising the following steps:
Step S102 is based on multiple preset search engines, retrieves to target search phrase, obtains multiple retrieval knots Fruit;
Step S104 determines multiple suspected target network address based on multiple search results, and obtains multiple suspected target network address Corresponding content of pages;
Step S106 is determined in multiple suspected target network address respectively based on the corresponding content of pages of multiple suspected target network address The degree of correlation of each suspected target network address and target search phrase;
The suspected target network address that the degree of correlation reaches preset threshold is determined as multiple level marketing Website publicity address by step S108.
In above-mentioned steps S102, preset search engine for example can be Baidu search engine, search dog search engine, paddy Song search engine, the search engine that must be answered search engine and 360 search engines, and here preset at are multiple.
In a kind of optional embodiment, target search phrase includes the multiple level marketing project inputted according to default queueing discipline Name, search separator and website keyword.
Above-mentioned default queueing discipline can be sequence arrangement mode and be also possible to reverse mode, in the present embodiment, Default queueing discipline is sequence arrangement mode, i.e., target search phrase includes the multiple level marketing entry name for being arranged successively input, search point Every symbol and website keyword.The wherein title of the predetermined multiple level marketing project for needing to inquire of the entitled user of multiple level marketing project;Net Keyword of standing is the predetermined website logo vocabulary of user, such as can be and " log in, login, official website, backstage, homepage, note The website logos vocabulary such as volume, login ", search result is not comprehensive in order to prevent, and website keyword here is multiple;Search separates Symbol is retrieved in retrieval using multiple level marketing entry name and website keyword as a vocabulary for preventing, cause search result it is less, More unilateral problem, such as search separator can use one or more spaces.To sum up, search phrase can be for example expressed as " multiple level marketing entry name+space+website keyword ".
Certainly, in other embodiments, target search phrase can also only include multiple level marketing entry name.
On the one hand the comprehensive of recall precision and search result can be improved by using multiple search engines for the present embodiment Property, be on the other hand conducive to the loophole or limitation etc. that reduce single search engine due to huge demonstration effect or itself algorithm Influence to search result is conducive to the accuracy for improving search result, to improve search efficiency.
In a kind of optional embodiment, for step S104, multiple doubtful mesh are being determined based on multiple search results When marking network address, it can be realized by following steps:
1, primary filtration is carried out to the multiple search result and obtains multiple results to be analyzed;
In view of to integrate data volume quite huge for the search result of multiple preset search engines, and each search engine The search result lower problem of the degree of correlation more rearward therefore primary filtration is carried out to search result first, obtains knot to be analyzed Fruit;In the present embodiment, the search result of default entry number or number of pages is usually chosen as result to be analyzed.Wherein entry number and page Number can be set according to actual needs.
2, postsearch screening is carried out to the multiple result to be analyzed and obtains multiple suspected target network address.
In view of in result to be analyzed containing obviously be not belonging to multiple level marketing address the problem of, therefore, here to result to be analyzed into Row postsearch screening, to exclude common or well-known website (referred to as white list website, in typing white list, in order to sieve Choosing), to obtain suspected target network address.
Step S104 improves the accuracy of search result, to be conducive to improve by primary filtration and postsearch screening Obtain efficiency.
It certainly, in other embodiments, can also be with other screening modes (such as to include default website character string Mode, such as default website character string advertisement) screen search result.
In a kind of optional embodiment, for step S104, obtaining, the multiple suspected target network address is corresponding When content of pages, it can be executed by following steps:
1) Xiang Suoshu suspected target network address sends request;
2) response data that the suspected target network address returns is received;
3) response data is parsed to obtain content of pages.
Here content of pages includes mark data and business datum, mark data include but be not limited to response body, title, ICP number of putting on record etc., business datum include but are not limited to the data such as website snapshot, website keyword.
In a kind of optional embodiment, step S106 is based on the corresponding content of pages of the multiple suspected target network address The degree of correlation of each suspected target network address and the target search phrase in the multiple suspected target network address is determined respectively, is wrapped It includes:
Described in being determined respectively based on tf-idf computation model and the corresponding content of pages of the multiple suspected target network address The degree of correlation of each suspected target network address and the target search phrase in multiple suspected target network address;Wherein, the tf-idf Computation model is constructed based on tf-idf algorithm.
Wherein, the calculation formula of above-mentioned tf-idf algorithm includes:
wi,j=tfi,j×idfi
In above formula, wi,jIndicate the degree of correlation of entry i and document j;tfi,jIndicate word frequency of the entry i in document j;ni,jTable Show the number that entry i occurs in certain class document j;∑knk,jIndicate the entry sum in the document;idfiIndicate that entry i's is inverse To document-frequency;| D | indicate total number of documents relevant to entry i, | { d ∈ D:t ∈ d } | indicate the number of files comprising entry i.
When constructing tf-idf computation model, multiple level marketing entry name is regarded as entry i, it is relevant to its (i.e. multiple level marketing entry name) Station address is corresponding document j, and the content of pages (business datum, mark data etc.) of station address can be by segmenting work Tool is expressed as each entry data of document j.
The tf-idf value between multiple level marketing entry name and each suspected target address, i.e. phase can be calculated by above-mentioned formula Guan Du.
In step S108, it is determined as multiple level marketing Website publicity address in the suspected target network address that the degree of correlation is reached to preset threshold When, it mainly comprises the steps that
1, successively judge whether each suspected target address and the degree of correlation of target search phrase are greater than preset threshold;
If 2, the degree of correlation of suspected target address and target search phrase is greater than preset threshold, by the suspected target Location is determined as multiple level marketing Website publicity address.
Multiple level marketing station address is determined according to the degree of correlation by step S108, and recognition efficiency is very fast, and result precision It is higher.
The acquisition methods of multiple level marketing Website publicity provided in an embodiment of the present invention address, by being drawn based on multiple preset search It holds up, target search phrase is retrieved, obtain multiple search results;Multiple suspected target nets are determined based on multiple search results Location, and obtain the corresponding content of pages of the multiple suspected target network address;Based on the corresponding page of the multiple suspected target network address Face content determines that each suspected target network address is related to the target search phrase in the multiple suspected target network address respectively Degree;The suspected target network address that the degree of correlation reaches preset threshold is determined as multiple level marketing Website publicity address.Compared to existing dependence The movable method of multiple level marketing on line is found in the mode of report, and it is lower that this method can alleviate efficiency existing in the prior art Problem is conducive to the acquisition efficiency for improving multiple level marketing website.
Further, on the basis of aforementioned schemes, as shown in Fig. 2, the difference with preceding method is that this method is also wrapped It includes:
Step S202 is ranked up the suspected target network address for reaching preset threshold according to the sequence of the degree of correlation from big to small And it exports.
Specifically, sorting from large to small the corresponding net of output according to the degree of correlation for the suspected target network address determined Station address, and can export in a manner of voice, in the way of text, in the way of icon etc. to display screen, facilitate user to result Check.
The result of output can certainly be stored to local data base or remote server etc., in order to which user is subsequent right As a result it is verified or is counted.
S202 can carry out sort result and displaying to user through the above steps, more intuitively, so that user is to result Have and intuitively checks, meanwhile, subsequent examination veritification is carried out convenient for user, is conducive to the Experience Degree for improving user;In addition, also advantageous Supplement in multiple level marketing Website publicity address on line is perfect.
Embodiment two:
As shown in figure 3, the embodiment of the present application also provides a kind of specifically showing for acquisition methods for tying multiple level marketing Website publicity address Example, comprising the following steps:
Step S302, the content of pages of search acquisition suspected target address;
Acquire the possible station address of multiple level marketing project (suspected target address) first;
Specifically, being retrieved based on multiple search engines to search phrase, search phrase is " entry name+space+website Keyword ", wherein entry name is the title (multiple level marketing entry name) of certain determining multiple level marketing project, and space is search separator, net Keyword of standing is predetermined vocabulary, such as " logging in, login, official website, backstage, homepage, registration " etc., it should be noted that Website keyword can be supplemented by actual conditions and be adjusted.
Here search engine is respectively adopted the search of these three search engines and is obtained number using Baidu, search dog and 360 3 kinds According to and integrate to obtain a large amount of search result;For each multiple level marketing entry name, with above-mentioned six kinds of website keywords and three kinds For search engine, 6 × 3 totally 18 kinds of search strategies can be constructed, and obtain the search result of this 18 kinds of search strategies.For every The search result that kind of search strategy obtains carries out primary filtration, takes first page 10 (can be adjusted according to final effect) totally 18 × 10 Item search result is as to be analyzed as a result, then obtaining home address as the relevant website of project from result to be analyzed Location, because each result to be analyzed can jump to a station address, wherein home address is the entry address of the website.For The home address data got, are screened, and filter out common or well-known website, such as tieba.baidu.com, Zhidao.baidu.com, chinapic.people.com.cn, i.ifeng.com etc., to obtain entry name association Multiple suspected target addresses (i.e. the home address of suspected target website).
Each suspected target address (home address of suspected target website) of above-mentioned acquisition is finally requested respectively, and acquisition is asked Data asked, and parse to obtain it by XPath tool and respond the mark datas such as body, title, the ICP number of putting on record, in conjunction with passing through Some business datums that XPath tool obtains, such as source, keyword, snapshot business datum, collectively constitute the detailed number of website According to (i.e. content of pages).
Step S304, the degree of correlation tf-idf of analytical calculation suspected target address and search phrase.
Wherein tf-idf algorithm principle are as follows: if the frequency tf high that certain entry occurs in a document, and at other Seldom occur in document, then it represents that the entry has good class discrimination ability, the i.e. degree of correlation of entry and the document, formula Are as follows:
wi,j=tfi,j×idfi
Wherein, tf is word frequency, indicates the frequency that entry occurs in a document, formula are as follows:
Wherein, molecule indicates that the number that entry i occurs in certain class document j, denominator indicate all entries of such document Sum;
And idf is reverse document-frequency, by total number of files divided by the number of files comprising the entry, then obtained quotient is taken pair Number obtains, formula are as follows:
Wherein, "+1 " here with prevent denominator be 0, | D | indicate total number of documents, | { d ∈ D:t ∈ d } | indicate include word The number of files of i.
In this application, entry name can regard entry as, and relative station address is corresponding document, website it is detailed Each entry data can be obtained according to (content of pages) by participle tool by counting accurately.
The tf-idf value between entry name and each suspected target address can be calculated by above-mentioned formula, i.e., it is related Degree.
Step S306, statistics exports result according to the degree of correlation.
After obtaining the degree of correlation of entry name and each suspected target address by above-mentioned algorithm, sieved further according to the threshold value of setting Choosing fall a collection of lesser station address of the degree of correlation, by the remaining suspected target address greater than threshold value further according to the degree of correlation from greatly to Small sequence output.
In conjunction with existing multiple level marketing project, data provided by the invention can be such that it is directed to for relevant departments as reference Property investigate, and can hit the publicity of pyramid schemes to a certain extent.
The acquisition methods of multiple level marketing Website publicity provided in an embodiment of the present invention address be it is a kind of based on calculate multiple level marketing project and The correlation of website, thus the method for obtaining multiple level marketing project publicity website, it is therefore an objective to which solution can not obtain multiple level marketing in the prior art On project line the problem of Website publicity address.
This method is according to multiple level marketing entry name, and by multiple search engines, using crawler technology, (crawler technology is a kind of efficient Obtain the search engine technique of internet data) relative website is crawled from internet, and crawl the detailed number of website According to, then by the degree of correlation of tf-idf algorithm calculating multiple level marketing project and each related web site, the higher website of the degree of correlation is analyzed, The doubtful publicity address of multiple level marketing project can be obtained, which obtains efficiency multiple level marketing address that is relatively high, and identifying Accuracy it is also higher, further by providing valuable information to relevant departments or unit, so as to a certain degree Upper effective strike and the publicity for preventing pyramid schemes.
Embodiment three:
Based on the same inventive concept, the acquisition methods pair with multiple level marketing Website publicity address are additionally provided in the embodiment of the present application The acquisition device for the multiple level marketing Website publicity address answered, the principle and the application solved the problems, such as due to the device in the embodiment of the present application The acquisition methods of embodiment above-mentioned multiple level marketing Website publicity address are similar, therefore the implementation of device may refer to the implementation of method, weight Multiple place repeats no more.
Fig. 4 is the schematic diagram of the acquisition device of multiple level marketing Website publicity provided by the embodiments of the present application address.
Referring to Fig. 4, the acquisition device of the multiple level marketing Website publicity address includes: retrieval module 401, analysis module 402, calculates Module 403 and determining module 404;
Wherein, target search phrase is retrieved, is obtained for being based on multiple preset search engines in retrieval module 401 To multiple search results;
Analysis module 402 for determining multiple suspected target network address based on multiple search results, and obtains the multiple doubt Like the corresponding content of pages of target network address;
Computing module 403 is described more for being determined respectively based on the corresponding content of pages of the multiple suspected target network address The degree of correlation of each suspected target network address and the target search phrase in a suspected target network address;
Determining module 404, the suspected target network address for the degree of correlation to be reached to preset threshold are determined as multiple level marketing Website publicity Address.
In a kind of optional embodiment, the target search phrase include according to default queueing discipline multiple level marketing entry name, Search for separator and website keyword.
In a kind of optional embodiment, analysis module 402 is determining multiple suspected target network address based on multiple search results When, it is specifically used for: primary filtration is carried out to the multiple search result and obtains multiple results to be analyzed;To the multiple to be analyzed As a result it carries out postsearch screening and obtains multiple suspected target network address.
In a kind of optional embodiment, analysis module 402 is being obtained in the corresponding page of the multiple suspected target network address Rong Shi is specifically used for: Xiang Suoshu suspected target network address sends request;Receive the response data that the suspected target network address returns; The response data is parsed to obtain content of pages.
In a kind of optional embodiment, computing module 403, based in the corresponding page of the multiple suspected target network address Hold the degree of correlation for determining each suspected target network address and the target search phrase in the multiple suspected target network address respectively: tool Body is used for: being determined respectively based on tf-idf computation model and the corresponding content of pages of the multiple suspected target network address described more The degree of correlation of each suspected target network address and the target search phrase in a suspected target network address;Wherein, the tf-idf meter Calculating model is constructed based on tf-idf algorithm.
In a kind of optional embodiment, the calculation formula of the tf-idf algorithm includes:
wi,j=tfi,j×idfi
In above formula, wi,jIndicate the degree of correlation of entry i and document j;tfi,jIndicate word frequency of the entry i in document j;ni,jTable Show the number that entry i occurs in certain class document j;∑knk,jIndicate the entry sum in the document;idfiIndicate that entry i's is inverse To document-frequency;| D | indicate total number of documents relevant to entry i, | { d ∈ D:t ∈ d } | indicate the number of files comprising entry i.
In a kind of optional embodiment, which can also include:
Output module 405, for reaching sequence of the suspected target network address of preset threshold according to the degree of correlation from big to small It is ranked up and exports.
The acquisition device of multiple level marketing Website publicity provided by the embodiments of the present application address, with multiple level marketing net provided by the above embodiment Stand publicity address acquisition detection method technical characteristic having the same reach phase so also can solve identical technical problem Same technical effect.
Referring to Fig. 5, the embodiment of the present invention also provides a kind of electronic equipment 100, comprising:
Processor 41, memory 42 and bus 43;Memory 42 is executed instruction for storing, including memory 421 and outside Memory 422;Here memory 421 is also referred to as built-in storage, for temporarily storing the operational data in processor 41, and with it is hard The data that the external memories such as disk 422 exchange, processor 41 carry out data exchange by memory 421 and external memory 422, when When the computer equipment 400 is run, communicated between the processor 41 and the memory 42 by bus 43, so that described Processor 41 is executed in User space to give an order:
Based on multiple preset search engines, target search phrase is retrieved, obtains multiple search results;
Multiple suspected target network address are determined based on multiple search results, and it is corresponding to obtain the multiple suspected target network address Content of pages;
It is determined in the multiple suspected target network address respectively based on the corresponding content of pages of the multiple suspected target network address The degree of correlation of each suspected target network address and the target search phrase;
The suspected target network address that the degree of correlation reaches preset threshold is determined as multiple level marketing Website publicity address.
Optionally, in the instruction that processor 41 executes, the target search phrase includes the biography according to default queueing discipline Sell entry name, search separator and website keyword.
It is optionally, described to determine multiple suspected target network address based on multiple search results in the instruction that processor 41 executes, And obtain the corresponding content of pages of the multiple suspected target network address, comprising:
Primary filtration is carried out to the multiple search result and obtains multiple results to be analyzed;
Postsearch screening is carried out to the multiple result to be analyzed and obtains multiple suspected target network address.
Optionally, described to obtain in the corresponding page of the multiple suspected target network address in the instruction that processor 41 executes Hold, further includes:
It sends and requests to the suspected target network address;
Receive the response data that the suspected target network address returns;
The response data is parsed to obtain content of pages.Wherein, the content of pages includes mark data and industry Business data.
Optionally, described based in the corresponding page of the multiple suspected target network address in the instruction that processor 41 executes Hold the degree of correlation for determining each suspected target network address and the target search phrase in the multiple suspected target network address respectively;
Described in being determined respectively based on tf-idf computation model and the corresponding content of pages of the multiple suspected target network address The degree of correlation of each suspected target network address and the target search phrase in multiple suspected target network address;Wherein, the tf-idf Computation model is constructed based on tf-idf algorithm.
Optionally, in the instruction that processor 41 executes, the calculation formula of the tf-idf algorithm includes:
wi,j=tfi,j×idfi
In above formula, wi,jIndicate the degree of correlation of entry i and document j;tfi,jIndicate word frequency of the entry i in document j;ni,jTable Show the number that entry i occurs in certain class document j;∑knk,jIndicate the entry sum in the document;idfiIndicate that entry i's is inverse To document-frequency;| D | indicate total number of documents relevant to entry i, | { d ∈ D:t ∈ d } | indicate the number of files comprising entry i.
Optionally, in the instruction that processor 41 executes, further includes:
The suspected target network address for reaching preset threshold is ranked up and is exported according to the sequence of the degree of correlation from big to small.
The embodiment of the present invention also provides a kind of computer readable storage medium, and meter is stored on computer readable storage medium Calculation machine program, executes the acquisition side of multiple level marketing Website publicity provided by the above embodiment address when computer program is run by processor The step of method.
In several embodiments provided herein, it should be understood that disclosed device and method can also pass through Other modes are realized.The apparatus embodiments described above are merely exemplary, for example, flow chart and structure in attached drawing Figure shows the system frame in the cards of the device of multiple embodiments according to the present invention, method and computer program product Structure, function and operation.In this regard, each box in flowchart or block diagram can represent a module, section or code A part, a part of the module, section or code includes one or more for implementing the specified logical function Executable instruction.It should also be noted that function marked in the box can also be to be different from the implementation as replacement The sequence marked in attached drawing occurs.For example, two continuous boxes can actually be basically executed in parallel, they are sometimes It can execute in the opposite order, this depends on the function involved.It is also noted that in structure chart and/or flow chart The combination of each box and the box in structure chart and/or flow chart, can function or movement as defined in executing it is dedicated Hardware based system realize, or can realize using a combination of dedicated hardware and computer instructions.
In addition, each functional module or unit in each embodiment of the present invention can integrate one independence of formation together Part, be also possible to modules individualism, an independent part can also be integrated to form with two or more modules.
It, can be with if the function is realized and when sold or used as an independent product in the form of software function module It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be intelligence Can mobile phone, personal computer, server or network equipment etc.) execute each embodiment the method for the present invention whole or Part steps.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), Random access memory (RAM, Random Access Memory), magnetic or disk etc. be various to can store program code Medium.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain Lid is within protection scope of the present invention.

Claims (10)

1. a kind of acquisition methods of multiple level marketing Website publicity address, which comprises the following steps:
Based on multiple preset search engines, target search phrase is retrieved, obtains multiple search results;
Multiple suspected target network address are determined based on the multiple search result, and it is corresponding to obtain the multiple suspected target network address Content of pages;
It is determined respectively based on the corresponding content of pages of the multiple suspected target network address each in the multiple suspected target network address The degree of correlation of suspected target network address and the target search phrase;
The suspected target network address that the degree of correlation reaches preset threshold is determined as multiple level marketing Website publicity address.
2. the method according to claim 1, wherein the target search phrase includes according to default queueing discipline Multiple level marketing entry name, search separator and website keyword.
3. the method according to claim 1, wherein it is described determined based on the multiple search result it is multiple doubtful Target network address, comprising:
Primary filtration is carried out to the multiple search result and obtains multiple results to be analyzed;
Postsearch screening is carried out to the multiple result to be analyzed and obtains multiple suspected target network address.
4. according to the method described in claim 3, it is characterized in that, described obtain the corresponding page of the multiple suspected target network address Face content, comprising:
It sends and requests to the suspected target network address;
Receive the response data that the suspected target network address returns;
The response data is parsed to obtain content of pages.
5. the method according to claim 1, wherein described be based on the corresponding page of the multiple suspected target network address Face content determines that each suspected target network address is related to the target search phrase in the multiple suspected target network address respectively Degree, comprising:
It is determined respectively based on tf-idf computation model and the corresponding content of pages of the multiple suspected target network address the multiple The degree of correlation of each suspected target network address and the target search phrase in suspected target network address;Wherein, the tf-idf is calculated Model is constructed based on tf-idf algorithm.
6. according to the method described in claim 5, it is characterized in that, the calculation formula of the tf-idf algorithm includes:
wI, j=tfI, j×idfi
In above formula, wI, jIndicate the degree of correlation of entry i and document j;tfI, jIndicate word frequency of the entry i in document j;nI, jIndicate word The number that i occurs in certain class document j;∑knK, jIndicate the entry sum in the document;idfiIndicate the reverse text of entry i Part frequency;| D | indicate total number of documents relevant to entry i, | { d ∈ D:t ∈ d } | indicate the number of files comprising entry i.
7. the method according to claim 1, wherein the method also includes:
The suspected target network address for reaching preset threshold is ranked up and is exported according to the sequence of the degree of correlation from big to small.
8. a kind of acquisition device of multiple level marketing Website publicity address, which comprises the following steps:
Retrieval module retrieves target search phrase, obtains multiple retrieval knots for being based on multiple preset search engines Fruit;
Analysis module for determining multiple suspected target network address based on the multiple search result, and obtains the multiple doubtful The corresponding content of pages of target network address;
Computing module, for determining the multiple doubtful mesh respectively based on the corresponding content of pages of the multiple suspected target network address Mark the degree of correlation of each suspected target network address and the target search phrase in network address;
Determining module, the suspected target network address for the degree of correlation to be reached to preset threshold are determined as multiple level marketing Website publicity address.
9. a kind of electronic equipment, including memory, processor and it is stored on the memory and can transports on the processor Capable computer program, which is characterized in that the processor realizes the claims 1 to 7 when executing the computer program The step of described in any item methods.
10. a kind of computer readable storage medium, computer program, feature are stored on the computer readable storage medium The step of being, the described in any item methods of the claims 1 to 7 executed when the computer program is run by processor.
CN201910743972.2A 2019-08-13 2019-08-13 Acquisition methods, device and the electronic equipment of multiple level marketing Website publicity address Pending CN110442775A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910743972.2A CN110442775A (en) 2019-08-13 2019-08-13 Acquisition methods, device and the electronic equipment of multiple level marketing Website publicity address

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910743972.2A CN110442775A (en) 2019-08-13 2019-08-13 Acquisition methods, device and the electronic equipment of multiple level marketing Website publicity address

Publications (1)

Publication Number Publication Date
CN110442775A true CN110442775A (en) 2019-11-12

Family

ID=68435080

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910743972.2A Pending CN110442775A (en) 2019-08-13 2019-08-13 Acquisition methods, device and the electronic equipment of multiple level marketing Website publicity address

Country Status (1)

Country Link
CN (1) CN110442775A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378027A (en) * 2021-07-13 2021-09-10 杭州安恒信息技术股份有限公司 Cable excavation method, device, equipment and computer readable storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020123A (en) * 2012-11-16 2013-04-03 中国科学技术大学 Method for searching bad video website
WO2014196362A1 (en) * 2013-06-02 2014-12-11 データ・サイエンティスト株式会社 Evaluation method, evaluation device, and program
CN105824822A (en) * 2015-01-05 2016-08-03 任子行网络技术股份有限公司 Method clustering phishing page to locate target page
US20160364485A1 (en) * 2015-06-12 2016-12-15 Smugmug, Inc. Advanced keyword search application
CN108234392A (en) * 2016-12-14 2018-06-29 北京国双科技有限公司 The monitoring method and device of a kind of website
CN108647225A (en) * 2018-03-23 2018-10-12 浙江大学 A kind of electric business grey black production public sentiment automatic mining method and system
CN109145117A (en) * 2018-09-05 2019-01-04 杭州安恒信息技术股份有限公司 Bonus system recognition methods, device and the electronic equipment of multiple level marketing project
CN109446409A (en) * 2018-09-19 2019-03-08 杭州安恒信息技术股份有限公司 A kind of recognition methods of the target object of doubtful multiple level marketing behavior

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020123A (en) * 2012-11-16 2013-04-03 中国科学技术大学 Method for searching bad video website
WO2014196362A1 (en) * 2013-06-02 2014-12-11 データ・サイエンティスト株式会社 Evaluation method, evaluation device, and program
CN105824822A (en) * 2015-01-05 2016-08-03 任子行网络技术股份有限公司 Method clustering phishing page to locate target page
US20160364485A1 (en) * 2015-06-12 2016-12-15 Smugmug, Inc. Advanced keyword search application
CN108234392A (en) * 2016-12-14 2018-06-29 北京国双科技有限公司 The monitoring method and device of a kind of website
CN108647225A (en) * 2018-03-23 2018-10-12 浙江大学 A kind of electric business grey black production public sentiment automatic mining method and system
CN109145117A (en) * 2018-09-05 2019-01-04 杭州安恒信息技术股份有限公司 Bonus system recognition methods, device and the electronic equipment of multiple level marketing project
CN109446409A (en) * 2018-09-19 2019-03-08 杭州安恒信息技术股份有限公司 A kind of recognition methods of the target object of doubtful multiple level marketing behavior

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨秀璋 等: "《Python网络数据爬取及分析从入门到精通(分析篇)》", 30 June 2018 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378027A (en) * 2021-07-13 2021-09-10 杭州安恒信息技术股份有限公司 Cable excavation method, device, equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
US7624102B2 (en) System and method for grouping by attribute
US7752314B2 (en) Automated tagging of syndication data feeds
CN103064956B (en) For searching for the method for digital content, calculating system and computer-readable medium
TWI437452B (en) Web spam page classification using query-dependent data
CN102402604B (en) Effective forward ordering of search engine
US8719308B2 (en) Method and system to process unstructured data
US8135709B2 (en) Relevance ranked faceted metadata search method
US8190556B2 (en) Intellegent data search engine
US8856129B2 (en) Flexible and scalable structured web data extraction
US20070022085A1 (en) Techniques for unsupervised web content discovery and automated query generation for crawling the hidden web
US20060253423A1 (en) Information retrieval system and method
US8682881B1 (en) System and method for extracting structured data from classified websites
US8515986B2 (en) Query pattern generation for answers coverage expansion
WO2010076785A1 (en) System and method for aggregating data from a plurality of web sites
Im et al. Linked tag: image annotation using semantic relationships between image tags
US8732165B1 (en) Automatic determination of whether a document includes an image gallery
CN103136228A (en) Image search method and image search device
CN102456016B (en) Method and device for sequencing search results
US20080281827A1 (en) Using structured database for webpage information extraction
CN102486791A (en) Method and server for intelligently classifying bookmarks
US20090319481A1 (en) Framework for aggregating information of web pages from a website
JP5989170B2 (en) Search result ranking apparatus and method using reliability of representative
CN110442775A (en) Acquisition methods, device and the electronic equipment of multiple level marketing Website publicity address
US20070271245A1 (en) System and method for searching a database
CN104572720A (en) Webpage information duplicate eliminating method and device and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191112

RJ01 Rejection of invention patent application after publication