CN110442775A - Acquisition methods, device and the electronic equipment of multiple level marketing Website publicity address - Google Patents
Acquisition methods, device and the electronic equipment of multiple level marketing Website publicity address Download PDFInfo
- Publication number
- CN110442775A CN110442775A CN201910743972.2A CN201910743972A CN110442775A CN 110442775 A CN110442775 A CN 110442775A CN 201910743972 A CN201910743972 A CN 201910743972A CN 110442775 A CN110442775 A CN 110442775A
- Authority
- CN
- China
- Prior art keywords
- network address
- target network
- suspected target
- address
- search
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
Abstract
The present invention provides a kind of acquisition methods of multiple level marketing Website publicity address, device and electronic equipments, are related to network detection field, first by being based on multiple preset search engines, retrieve to target search phrase, obtain multiple search results;It is then based on multiple search results and determines multiple suspected target network address, and obtain the corresponding content of pages of multiple suspected target network address;Then the degree of correlation of each suspected target network address and target search phrase in multiple suspected target network address is determined respectively based on the corresponding content of pages of multiple suspected target network address;The suspected target network address that the degree of correlation reaches preset threshold is finally determined as multiple level marketing Website publicity address.It finds that the movable method of multiple level marketing on line, technical solution provided by the invention can alleviate the lower problem of efficiency existing in the prior art compared to the existing mode dependent on report, is conducive to the acquisition efficiency for improving multiple level marketing website.
Description
Technical field
The present invention relates to network field, in particular to a kind of acquisition methods of multiple level marketing Website publicity address, device and
Electronic equipment.
Background technique
Multiple level marketing refers to that organizer develops personnel, by development personnel or requires to be developed personnel to pay certain expense and be
Condition, which obtains, is added the illegal activities that the modes such as qualification attain wealth.
Currently, the manner of multiple level marketing is mainly Below-the-line, the discovery for the pyramid schemes under line be usually by with
It is reported at family.
However, the publicity of pyramid schemes is also from traditional Below-the-line on line along with the development of internet and universal
The direction of Website publicity is developed, and website has become the important channel of multiple level marketing publicity on the line with high sharing.And rely on tradition
The mode of report find pyramid schemes on line, efficiency is relatively low.
Summary of the invention
The purpose of the present invention includes, for example, providing a kind of acquisition methods of multiple level marketing Website publicity address, device and electronics
Equipment can alleviate the lower problem of acquisition efficiency existing in the prior art.
The embodiment of the present invention this can be implemented so that
In a first aspect, the embodiment of the present invention provides a kind of acquisition methods of multiple level marketing Website publicity address, comprising the following steps:
Based on multiple preset search engines, target search phrase is retrieved, obtains multiple search results;
Multiple suspected target network address are determined based on multiple search results, and it is corresponding to obtain the multiple suspected target network address
Content of pages;
It is determined in the multiple suspected target network address respectively based on the corresponding content of pages of the multiple suspected target network address
The degree of correlation of each suspected target network address and the target search phrase;
The suspected target network address that the degree of correlation reaches preset threshold is determined as multiple level marketing Website publicity address.
In alternative embodiments, the target search phrase include according to default queueing discipline multiple level marketing entry name,
Search for separator and website keyword.
It is in alternative embodiments, described to determine multiple suspected target network address based on multiple search results, comprising:
Primary filtration is carried out to the multiple search result and obtains multiple results to be analyzed;
Postsearch screening is carried out to the multiple result to be analyzed and obtains multiple suspected target network address.
It is in alternative embodiments, described to obtain the corresponding content of pages of the multiple suspected target network address, comprising:
It sends and requests to the suspected target network address;
Receive the response data that the suspected target network address returns;
The response data is parsed to obtain content of pages.
In alternative embodiments, described to be determined respectively based on the corresponding content of pages of the multiple suspected target network address
The degree of correlation of each suspected target network address and the target search phrase in the multiple suspected target network address, comprising:
Described in being determined respectively based on tf-idf computation model and the corresponding content of pages of the multiple suspected target network address
The degree of correlation of each suspected target network address and the target search phrase in multiple suspected target network address;Wherein, the tf-idf
Computation model is constructed based on tf-idf algorithm.
In alternative embodiments, the calculation formula of the tf-idf algorithm includes:
wi,j=tfi,j×idfi
In above formula, wi,jIndicate the degree of correlation of entry i and document j;tfi,jIndicate word frequency of the entry i in document j;ni,jTable
Show the number that entry i occurs in certain class document j;∑knk,jIndicate the entry sum in the document;idfiIndicate that entry i's is inverse
To document-frequency;| D | indicate total number of documents relevant to entry i, | { d ∈ D:t ∈ d } | indicate the number of files comprising entry i.
In alternative embodiments, the method also includes:
The suspected target network address for reaching preset threshold is ranked up and is exported according to the sequence of the degree of correlation from big to small.
Second aspect, the embodiment of the present invention provide a kind of acquisition device of multiple level marketing Website publicity address, comprising the following steps:
Retrieval module retrieves target search phrase, obtains multiple inspections for being based on multiple preset search engines
Hitch fruit;
Analysis module for determining multiple suspected target network address based on multiple search results, and obtains the multiple doubtful
The corresponding content of pages of target network address;
Computing module, for determining the multiple doubt respectively based on the corresponding content of pages of the multiple suspected target network address
Like the degree of correlation of suspected target network address and the target search phrase each in target network address;
Determining module, the suspected target network address for the degree of correlation to be reached to preset threshold is with being determined as multiple level marketing Website publicity
Location.
The third aspect, the embodiment of the present invention provide a kind of electronic equipment, including memory, processor and are stored in described deposit
On reservoir and the computer program that can run on the processor, the processor are realized when executing the computer program
The step of stating aforementioned embodiments described in any item methods.
Fourth aspect, the embodiment of the present invention provide a kind of computer readable storage medium, the computer-readable storage medium
Computer program is stored in matter, the computer program executes any one of above-mentioned aforementioned embodiments institute when being run by processor
The step of method stated.
The beneficial effect of the embodiment of the present invention includes, for example:
Acquisition methods, device, electronic equipment and the computer of multiple level marketing Website publicity provided in an embodiment of the present invention address can
Storage medium is read to retrieve target search phrase by being primarily based on multiple preset search engines, obtain multiple retrievals
As a result;It is then based on multiple search results and determines multiple suspected target network address, and it is corresponding to obtain the multiple suspected target network address
Content of pages;Then the multiple suspected target is determined based on the corresponding content of pages of the multiple suspected target network address respectively
The degree of correlation of each suspected target network address and the target search phrase in network address;The degree of correlation is finally reached into doubting for preset threshold
It is determined as multiple level marketing Website publicity address like target network address.Therefore, it is found on line compared to the existing mode dependent on report
It is lower can to alleviate efficiency existing in the prior art for the method for pyramid schemes, technical solution provided in an embodiment of the present invention
Problem is conducive to the acquisition efficiency for improving multiple level marketing website.
Other features and advantages of the present invention will illustrate in the following description, also, partly become from specification
It obtains it is clear that understand through the implementation of the invention.
To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate
Appended attached drawing, is described in detail below.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached
Figure is briefly described, it should be understood that the following drawings illustrates only certain embodiments of the present invention, therefore is not construed as pair
The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this
A little attached drawings obtain other relevant attached drawings.
Fig. 1 shows a kind of flow chart of the acquisition methods of multiple level marketing Website publicity address provided in an embodiment of the present invention;
Fig. 2 shows the flow charts of the acquisition methods of another multiple level marketing Website publicity address provided in an embodiment of the present invention;
Fig. 3 shows a kind of specific example of the acquisition methods of multiple level marketing Website publicity address provided in an embodiment of the present invention
Flow chart;
Fig. 4 shows a kind of schematic diagram of the acquisition device of multiple level marketing Website publicity address provided in an embodiment of the present invention;
Fig. 5 shows the schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.The present invention being usually described and illustrated herein in the accompanying drawings is implemented
The component of example can be arranged and be designed with a variety of different configurations.
Therefore, the detailed description of the embodiment of the present invention provided in the accompanying drawings is not intended to limit below claimed
The scope of the present invention, but be merely representative of selected embodiment of the invention.Based on the embodiments of the present invention, this field is common
Technical staff's every other embodiment obtained without creative efforts belongs to the model that the present invention protects
It encloses.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi
It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing.
In the description of the present invention, it should be noted that retouched if term " first ", " second " etc. occur and being only used for distinguishing
It states, is not understood to indicate or imply relative importance.
It should be noted that in the absence of conflict, the feature in the embodiment of the present invention can be combined with each other.
Currently, the manner of multiple level marketing is mainly Below-the-line, the discovery for the pyramid schemes under line be usually by with
It is reported at family;Along with the development of internet and universal, the publicity of pyramid schemes is also surfed the Internet from traditional Below-the-line to line
Stand publicity direction develop, with high sharing line on website have become multiple level marketing publicity important channel.However, by tradition
The mode of report find pyramid schemes on line, efficiency is relatively low;How line on multiple level marketing movable publicity address is efficiently found
The problem of as current urgent need to resolve.
Based on this, acquisition methods, device and the electronics of a kind of multiple level marketing Website publicity address provided in an embodiment of the present invention are set
It is standby, the prior art can be alleviated and obtain the lower problem of efficiency, improve the acquisition efficiency of multiple level marketing promotional activities address on line.
For convenient for understanding the present embodiment, first to a kind of multiple level marketing Website publicity disclosed in the embodiment of the present invention
The acquisition methods of location describe in detail.
Embodiment one:
The acquisition methods for present embodiments providing a kind of multiple level marketing Website publicity address, the acquisition applied to multiple level marketing website are led
Domain is executed by the electronic equipment in the field, and electronic equipment for example can be the main controller of terminal or terminal
Deng.
Referring to FIG. 1, the acquisition methods of the multiple level marketing Website publicity address, comprising the following steps:
Step S102 is based on multiple preset search engines, retrieves to target search phrase, obtains multiple retrieval knots
Fruit;
Step S104 determines multiple suspected target network address based on multiple search results, and obtains multiple suspected target network address
Corresponding content of pages;
Step S106 is determined in multiple suspected target network address respectively based on the corresponding content of pages of multiple suspected target network address
The degree of correlation of each suspected target network address and target search phrase;
The suspected target network address that the degree of correlation reaches preset threshold is determined as multiple level marketing Website publicity address by step S108.
In above-mentioned steps S102, preset search engine for example can be Baidu search engine, search dog search engine, paddy
Song search engine, the search engine that must be answered search engine and 360 search engines, and here preset at are multiple.
In a kind of optional embodiment, target search phrase includes the multiple level marketing project inputted according to default queueing discipline
Name, search separator and website keyword.
Above-mentioned default queueing discipline can be sequence arrangement mode and be also possible to reverse mode, in the present embodiment,
Default queueing discipline is sequence arrangement mode, i.e., target search phrase includes the multiple level marketing entry name for being arranged successively input, search point
Every symbol and website keyword.The wherein title of the predetermined multiple level marketing project for needing to inquire of the entitled user of multiple level marketing project;Net
Keyword of standing is the predetermined website logo vocabulary of user, such as can be and " log in, login, official website, backstage, homepage, note
The website logos vocabulary such as volume, login ", search result is not comprehensive in order to prevent, and website keyword here is multiple;Search separates
Symbol is retrieved in retrieval using multiple level marketing entry name and website keyword as a vocabulary for preventing, cause search result it is less,
More unilateral problem, such as search separator can use one or more spaces.To sum up, search phrase can be for example expressed as
" multiple level marketing entry name+space+website keyword ".
Certainly, in other embodiments, target search phrase can also only include multiple level marketing entry name.
On the one hand the comprehensive of recall precision and search result can be improved by using multiple search engines for the present embodiment
Property, be on the other hand conducive to the loophole or limitation etc. that reduce single search engine due to huge demonstration effect or itself algorithm
Influence to search result is conducive to the accuracy for improving search result, to improve search efficiency.
In a kind of optional embodiment, for step S104, multiple doubtful mesh are being determined based on multiple search results
When marking network address, it can be realized by following steps:
1, primary filtration is carried out to the multiple search result and obtains multiple results to be analyzed;
In view of to integrate data volume quite huge for the search result of multiple preset search engines, and each search engine
The search result lower problem of the degree of correlation more rearward therefore primary filtration is carried out to search result first, obtains knot to be analyzed
Fruit;In the present embodiment, the search result of default entry number or number of pages is usually chosen as result to be analyzed.Wherein entry number and page
Number can be set according to actual needs.
2, postsearch screening is carried out to the multiple result to be analyzed and obtains multiple suspected target network address.
In view of in result to be analyzed containing obviously be not belonging to multiple level marketing address the problem of, therefore, here to result to be analyzed into
Row postsearch screening, to exclude common or well-known website (referred to as white list website, in typing white list, in order to sieve
Choosing), to obtain suspected target network address.
Step S104 improves the accuracy of search result, to be conducive to improve by primary filtration and postsearch screening
Obtain efficiency.
It certainly, in other embodiments, can also be with other screening modes (such as to include default website character string
Mode, such as default website character string advertisement) screen search result.
In a kind of optional embodiment, for step S104, obtaining, the multiple suspected target network address is corresponding
When content of pages, it can be executed by following steps:
1) Xiang Suoshu suspected target network address sends request;
2) response data that the suspected target network address returns is received;
3) response data is parsed to obtain content of pages.
Here content of pages includes mark data and business datum, mark data include but be not limited to response body, title,
ICP number of putting on record etc., business datum include but are not limited to the data such as website snapshot, website keyword.
In a kind of optional embodiment, step S106 is based on the corresponding content of pages of the multiple suspected target network address
The degree of correlation of each suspected target network address and the target search phrase in the multiple suspected target network address is determined respectively, is wrapped
It includes:
Described in being determined respectively based on tf-idf computation model and the corresponding content of pages of the multiple suspected target network address
The degree of correlation of each suspected target network address and the target search phrase in multiple suspected target network address;Wherein, the tf-idf
Computation model is constructed based on tf-idf algorithm.
Wherein, the calculation formula of above-mentioned tf-idf algorithm includes:
wi,j=tfi,j×idfi
In above formula, wi,jIndicate the degree of correlation of entry i and document j;tfi,jIndicate word frequency of the entry i in document j;ni,jTable
Show the number that entry i occurs in certain class document j;∑knk,jIndicate the entry sum in the document;idfiIndicate that entry i's is inverse
To document-frequency;| D | indicate total number of documents relevant to entry i, | { d ∈ D:t ∈ d } | indicate the number of files comprising entry i.
When constructing tf-idf computation model, multiple level marketing entry name is regarded as entry i, it is relevant to its (i.e. multiple level marketing entry name)
Station address is corresponding document j, and the content of pages (business datum, mark data etc.) of station address can be by segmenting work
Tool is expressed as each entry data of document j.
The tf-idf value between multiple level marketing entry name and each suspected target address, i.e. phase can be calculated by above-mentioned formula
Guan Du.
In step S108, it is determined as multiple level marketing Website publicity address in the suspected target network address that the degree of correlation is reached to preset threshold
When, it mainly comprises the steps that
1, successively judge whether each suspected target address and the degree of correlation of target search phrase are greater than preset threshold;
If 2, the degree of correlation of suspected target address and target search phrase is greater than preset threshold, by the suspected target
Location is determined as multiple level marketing Website publicity address.
Multiple level marketing station address is determined according to the degree of correlation by step S108, and recognition efficiency is very fast, and result precision
It is higher.
The acquisition methods of multiple level marketing Website publicity provided in an embodiment of the present invention address, by being drawn based on multiple preset search
It holds up, target search phrase is retrieved, obtain multiple search results;Multiple suspected target nets are determined based on multiple search results
Location, and obtain the corresponding content of pages of the multiple suspected target network address;Based on the corresponding page of the multiple suspected target network address
Face content determines that each suspected target network address is related to the target search phrase in the multiple suspected target network address respectively
Degree;The suspected target network address that the degree of correlation reaches preset threshold is determined as multiple level marketing Website publicity address.Compared to existing dependence
The movable method of multiple level marketing on line is found in the mode of report, and it is lower that this method can alleviate efficiency existing in the prior art
Problem is conducive to the acquisition efficiency for improving multiple level marketing website.
Further, on the basis of aforementioned schemes, as shown in Fig. 2, the difference with preceding method is that this method is also wrapped
It includes:
Step S202 is ranked up the suspected target network address for reaching preset threshold according to the sequence of the degree of correlation from big to small
And it exports.
Specifically, sorting from large to small the corresponding net of output according to the degree of correlation for the suspected target network address determined
Station address, and can export in a manner of voice, in the way of text, in the way of icon etc. to display screen, facilitate user to result
Check.
The result of output can certainly be stored to local data base or remote server etc., in order to which user is subsequent right
As a result it is verified or is counted.
S202 can carry out sort result and displaying to user through the above steps, more intuitively, so that user is to result
Have and intuitively checks, meanwhile, subsequent examination veritification is carried out convenient for user, is conducive to the Experience Degree for improving user;In addition, also advantageous
Supplement in multiple level marketing Website publicity address on line is perfect.
Embodiment two:
As shown in figure 3, the embodiment of the present application also provides a kind of specifically showing for acquisition methods for tying multiple level marketing Website publicity address
Example, comprising the following steps:
Step S302, the content of pages of search acquisition suspected target address;
Acquire the possible station address of multiple level marketing project (suspected target address) first;
Specifically, being retrieved based on multiple search engines to search phrase, search phrase is " entry name+space+website
Keyword ", wherein entry name is the title (multiple level marketing entry name) of certain determining multiple level marketing project, and space is search separator, net
Keyword of standing is predetermined vocabulary, such as " logging in, login, official website, backstage, homepage, registration " etc., it should be noted that
Website keyword can be supplemented by actual conditions and be adjusted.
Here search engine is respectively adopted the search of these three search engines and is obtained number using Baidu, search dog and 360 3 kinds
According to and integrate to obtain a large amount of search result;For each multiple level marketing entry name, with above-mentioned six kinds of website keywords and three kinds
For search engine, 6 × 3 totally 18 kinds of search strategies can be constructed, and obtain the search result of this 18 kinds of search strategies.For every
The search result that kind of search strategy obtains carries out primary filtration, takes first page 10 (can be adjusted according to final effect) totally 18 × 10
Item search result is as to be analyzed as a result, then obtaining home address as the relevant website of project from result to be analyzed
Location, because each result to be analyzed can jump to a station address, wherein home address is the entry address of the website.For
The home address data got, are screened, and filter out common or well-known website, such as tieba.baidu.com,
Zhidao.baidu.com, chinapic.people.com.cn, i.ifeng.com etc., to obtain entry name association
Multiple suspected target addresses (i.e. the home address of suspected target website).
Each suspected target address (home address of suspected target website) of above-mentioned acquisition is finally requested respectively, and acquisition is asked
Data asked, and parse to obtain it by XPath tool and respond the mark datas such as body, title, the ICP number of putting on record, in conjunction with passing through
Some business datums that XPath tool obtains, such as source, keyword, snapshot business datum, collectively constitute the detailed number of website
According to (i.e. content of pages).
Step S304, the degree of correlation tf-idf of analytical calculation suspected target address and search phrase.
Wherein tf-idf algorithm principle are as follows: if the frequency tf high that certain entry occurs in a document, and at other
Seldom occur in document, then it represents that the entry has good class discrimination ability, the i.e. degree of correlation of entry and the document, formula
Are as follows:
wi,j=tfi,j×idfi
Wherein, tf is word frequency, indicates the frequency that entry occurs in a document, formula are as follows:
Wherein, molecule indicates that the number that entry i occurs in certain class document j, denominator indicate all entries of such document
Sum;
And idf is reverse document-frequency, by total number of files divided by the number of files comprising the entry, then obtained quotient is taken pair
Number obtains, formula are as follows:
Wherein, "+1 " here with prevent denominator be 0, | D | indicate total number of documents, | { d ∈ D:t ∈ d } | indicate include word
The number of files of i.
In this application, entry name can regard entry as, and relative station address is corresponding document, website it is detailed
Each entry data can be obtained according to (content of pages) by participle tool by counting accurately.
The tf-idf value between entry name and each suspected target address can be calculated by above-mentioned formula, i.e., it is related
Degree.
Step S306, statistics exports result according to the degree of correlation.
After obtaining the degree of correlation of entry name and each suspected target address by above-mentioned algorithm, sieved further according to the threshold value of setting
Choosing fall a collection of lesser station address of the degree of correlation, by the remaining suspected target address greater than threshold value further according to the degree of correlation from greatly to
Small sequence output.
In conjunction with existing multiple level marketing project, data provided by the invention can be such that it is directed to for relevant departments as reference
Property investigate, and can hit the publicity of pyramid schemes to a certain extent.
The acquisition methods of multiple level marketing Website publicity provided in an embodiment of the present invention address be it is a kind of based on calculate multiple level marketing project and
The correlation of website, thus the method for obtaining multiple level marketing project publicity website, it is therefore an objective to which solution can not obtain multiple level marketing in the prior art
On project line the problem of Website publicity address.
This method is according to multiple level marketing entry name, and by multiple search engines, using crawler technology, (crawler technology is a kind of efficient
Obtain the search engine technique of internet data) relative website is crawled from internet, and crawl the detailed number of website
According to, then by the degree of correlation of tf-idf algorithm calculating multiple level marketing project and each related web site, the higher website of the degree of correlation is analyzed,
The doubtful publicity address of multiple level marketing project can be obtained, which obtains efficiency multiple level marketing address that is relatively high, and identifying
Accuracy it is also higher, further by providing valuable information to relevant departments or unit, so as to a certain degree
Upper effective strike and the publicity for preventing pyramid schemes.
Embodiment three:
Based on the same inventive concept, the acquisition methods pair with multiple level marketing Website publicity address are additionally provided in the embodiment of the present application
The acquisition device for the multiple level marketing Website publicity address answered, the principle and the application solved the problems, such as due to the device in the embodiment of the present application
The acquisition methods of embodiment above-mentioned multiple level marketing Website publicity address are similar, therefore the implementation of device may refer to the implementation of method, weight
Multiple place repeats no more.
Fig. 4 is the schematic diagram of the acquisition device of multiple level marketing Website publicity provided by the embodiments of the present application address.
Referring to Fig. 4, the acquisition device of the multiple level marketing Website publicity address includes: retrieval module 401, analysis module 402, calculates
Module 403 and determining module 404;
Wherein, target search phrase is retrieved, is obtained for being based on multiple preset search engines in retrieval module 401
To multiple search results;
Analysis module 402 for determining multiple suspected target network address based on multiple search results, and obtains the multiple doubt
Like the corresponding content of pages of target network address;
Computing module 403 is described more for being determined respectively based on the corresponding content of pages of the multiple suspected target network address
The degree of correlation of each suspected target network address and the target search phrase in a suspected target network address;
Determining module 404, the suspected target network address for the degree of correlation to be reached to preset threshold are determined as multiple level marketing Website publicity
Address.
In a kind of optional embodiment, the target search phrase include according to default queueing discipline multiple level marketing entry name,
Search for separator and website keyword.
In a kind of optional embodiment, analysis module 402 is determining multiple suspected target network address based on multiple search results
When, it is specifically used for: primary filtration is carried out to the multiple search result and obtains multiple results to be analyzed;To the multiple to be analyzed
As a result it carries out postsearch screening and obtains multiple suspected target network address.
In a kind of optional embodiment, analysis module 402 is being obtained in the corresponding page of the multiple suspected target network address
Rong Shi is specifically used for: Xiang Suoshu suspected target network address sends request;Receive the response data that the suspected target network address returns;
The response data is parsed to obtain content of pages.
In a kind of optional embodiment, computing module 403, based in the corresponding page of the multiple suspected target network address
Hold the degree of correlation for determining each suspected target network address and the target search phrase in the multiple suspected target network address respectively: tool
Body is used for: being determined respectively based on tf-idf computation model and the corresponding content of pages of the multiple suspected target network address described more
The degree of correlation of each suspected target network address and the target search phrase in a suspected target network address;Wherein, the tf-idf meter
Calculating model is constructed based on tf-idf algorithm.
In a kind of optional embodiment, the calculation formula of the tf-idf algorithm includes:
wi,j=tfi,j×idfi
In above formula, wi,jIndicate the degree of correlation of entry i and document j;tfi,jIndicate word frequency of the entry i in document j;ni,jTable
Show the number that entry i occurs in certain class document j;∑knk,jIndicate the entry sum in the document;idfiIndicate that entry i's is inverse
To document-frequency;| D | indicate total number of documents relevant to entry i, | { d ∈ D:t ∈ d } | indicate the number of files comprising entry i.
In a kind of optional embodiment, which can also include:
Output module 405, for reaching sequence of the suspected target network address of preset threshold according to the degree of correlation from big to small
It is ranked up and exports.
The acquisition device of multiple level marketing Website publicity provided by the embodiments of the present application address, with multiple level marketing net provided by the above embodiment
Stand publicity address acquisition detection method technical characteristic having the same reach phase so also can solve identical technical problem
Same technical effect.
Referring to Fig. 5, the embodiment of the present invention also provides a kind of electronic equipment 100, comprising:
Processor 41, memory 42 and bus 43;Memory 42 is executed instruction for storing, including memory 421 and outside
Memory 422;Here memory 421 is also referred to as built-in storage, for temporarily storing the operational data in processor 41, and with it is hard
The data that the external memories such as disk 422 exchange, processor 41 carry out data exchange by memory 421 and external memory 422, when
When the computer equipment 400 is run, communicated between the processor 41 and the memory 42 by bus 43, so that described
Processor 41 is executed in User space to give an order:
Based on multiple preset search engines, target search phrase is retrieved, obtains multiple search results;
Multiple suspected target network address are determined based on multiple search results, and it is corresponding to obtain the multiple suspected target network address
Content of pages;
It is determined in the multiple suspected target network address respectively based on the corresponding content of pages of the multiple suspected target network address
The degree of correlation of each suspected target network address and the target search phrase;
The suspected target network address that the degree of correlation reaches preset threshold is determined as multiple level marketing Website publicity address.
Optionally, in the instruction that processor 41 executes, the target search phrase includes the biography according to default queueing discipline
Sell entry name, search separator and website keyword.
It is optionally, described to determine multiple suspected target network address based on multiple search results in the instruction that processor 41 executes,
And obtain the corresponding content of pages of the multiple suspected target network address, comprising:
Primary filtration is carried out to the multiple search result and obtains multiple results to be analyzed;
Postsearch screening is carried out to the multiple result to be analyzed and obtains multiple suspected target network address.
Optionally, described to obtain in the corresponding page of the multiple suspected target network address in the instruction that processor 41 executes
Hold, further includes:
It sends and requests to the suspected target network address;
Receive the response data that the suspected target network address returns;
The response data is parsed to obtain content of pages.Wherein, the content of pages includes mark data and industry
Business data.
Optionally, described based in the corresponding page of the multiple suspected target network address in the instruction that processor 41 executes
Hold the degree of correlation for determining each suspected target network address and the target search phrase in the multiple suspected target network address respectively;
Described in being determined respectively based on tf-idf computation model and the corresponding content of pages of the multiple suspected target network address
The degree of correlation of each suspected target network address and the target search phrase in multiple suspected target network address;Wherein, the tf-idf
Computation model is constructed based on tf-idf algorithm.
Optionally, in the instruction that processor 41 executes, the calculation formula of the tf-idf algorithm includes:
wi,j=tfi,j×idfi
In above formula, wi,jIndicate the degree of correlation of entry i and document j;tfi,jIndicate word frequency of the entry i in document j;ni,jTable
Show the number that entry i occurs in certain class document j;∑knk,jIndicate the entry sum in the document;idfiIndicate that entry i's is inverse
To document-frequency;| D | indicate total number of documents relevant to entry i, | { d ∈ D:t ∈ d } | indicate the number of files comprising entry i.
Optionally, in the instruction that processor 41 executes, further includes:
The suspected target network address for reaching preset threshold is ranked up and is exported according to the sequence of the degree of correlation from big to small.
The embodiment of the present invention also provides a kind of computer readable storage medium, and meter is stored on computer readable storage medium
Calculation machine program, executes the acquisition side of multiple level marketing Website publicity provided by the above embodiment address when computer program is run by processor
The step of method.
In several embodiments provided herein, it should be understood that disclosed device and method can also pass through
Other modes are realized.The apparatus embodiments described above are merely exemplary, for example, flow chart and structure in attached drawing
Figure shows the system frame in the cards of the device of multiple embodiments according to the present invention, method and computer program product
Structure, function and operation.In this regard, each box in flowchart or block diagram can represent a module, section or code
A part, a part of the module, section or code includes one or more for implementing the specified logical function
Executable instruction.It should also be noted that function marked in the box can also be to be different from the implementation as replacement
The sequence marked in attached drawing occurs.For example, two continuous boxes can actually be basically executed in parallel, they are sometimes
It can execute in the opposite order, this depends on the function involved.It is also noted that in structure chart and/or flow chart
The combination of each box and the box in structure chart and/or flow chart, can function or movement as defined in executing it is dedicated
Hardware based system realize, or can realize using a combination of dedicated hardware and computer instructions.
In addition, each functional module or unit in each embodiment of the present invention can integrate one independence of formation together
Part, be also possible to modules individualism, an independent part can also be integrated to form with two or more modules.
It, can be with if the function is realized and when sold or used as an independent product in the form of software function module
It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words
The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter
Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be intelligence
Can mobile phone, personal computer, server or network equipment etc.) execute each embodiment the method for the present invention whole or
Part steps.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory),
Random access memory (RAM, Random Access Memory), magnetic or disk etc. be various to can store program code
Medium.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any
Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain
Lid is within protection scope of the present invention.
Claims (10)
1. a kind of acquisition methods of multiple level marketing Website publicity address, which comprises the following steps:
Based on multiple preset search engines, target search phrase is retrieved, obtains multiple search results;
Multiple suspected target network address are determined based on the multiple search result, and it is corresponding to obtain the multiple suspected target network address
Content of pages;
It is determined respectively based on the corresponding content of pages of the multiple suspected target network address each in the multiple suspected target network address
The degree of correlation of suspected target network address and the target search phrase;
The suspected target network address that the degree of correlation reaches preset threshold is determined as multiple level marketing Website publicity address.
2. the method according to claim 1, wherein the target search phrase includes according to default queueing discipline
Multiple level marketing entry name, search separator and website keyword.
3. the method according to claim 1, wherein it is described determined based on the multiple search result it is multiple doubtful
Target network address, comprising:
Primary filtration is carried out to the multiple search result and obtains multiple results to be analyzed;
Postsearch screening is carried out to the multiple result to be analyzed and obtains multiple suspected target network address.
4. according to the method described in claim 3, it is characterized in that, described obtain the corresponding page of the multiple suspected target network address
Face content, comprising:
It sends and requests to the suspected target network address;
Receive the response data that the suspected target network address returns;
The response data is parsed to obtain content of pages.
5. the method according to claim 1, wherein described be based on the corresponding page of the multiple suspected target network address
Face content determines that each suspected target network address is related to the target search phrase in the multiple suspected target network address respectively
Degree, comprising:
It is determined respectively based on tf-idf computation model and the corresponding content of pages of the multiple suspected target network address the multiple
The degree of correlation of each suspected target network address and the target search phrase in suspected target network address;Wherein, the tf-idf is calculated
Model is constructed based on tf-idf algorithm.
6. according to the method described in claim 5, it is characterized in that, the calculation formula of the tf-idf algorithm includes:
wI, j=tfI, j×idfi
In above formula, wI, jIndicate the degree of correlation of entry i and document j;tfI, jIndicate word frequency of the entry i in document j;nI, jIndicate word
The number that i occurs in certain class document j;∑knK, jIndicate the entry sum in the document;idfiIndicate the reverse text of entry i
Part frequency;| D | indicate total number of documents relevant to entry i, | { d ∈ D:t ∈ d } | indicate the number of files comprising entry i.
7. the method according to claim 1, wherein the method also includes:
The suspected target network address for reaching preset threshold is ranked up and is exported according to the sequence of the degree of correlation from big to small.
8. a kind of acquisition device of multiple level marketing Website publicity address, which comprises the following steps:
Retrieval module retrieves target search phrase, obtains multiple retrieval knots for being based on multiple preset search engines
Fruit;
Analysis module for determining multiple suspected target network address based on the multiple search result, and obtains the multiple doubtful
The corresponding content of pages of target network address;
Computing module, for determining the multiple doubtful mesh respectively based on the corresponding content of pages of the multiple suspected target network address
Mark the degree of correlation of each suspected target network address and the target search phrase in network address;
Determining module, the suspected target network address for the degree of correlation to be reached to preset threshold are determined as multiple level marketing Website publicity address.
9. a kind of electronic equipment, including memory, processor and it is stored on the memory and can transports on the processor
Capable computer program, which is characterized in that the processor realizes the claims 1 to 7 when executing the computer program
The step of described in any item methods.
10. a kind of computer readable storage medium, computer program, feature are stored on the computer readable storage medium
The step of being, the described in any item methods of the claims 1 to 7 executed when the computer program is run by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910743972.2A CN110442775A (en) | 2019-08-13 | 2019-08-13 | Acquisition methods, device and the electronic equipment of multiple level marketing Website publicity address |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910743972.2A CN110442775A (en) | 2019-08-13 | 2019-08-13 | Acquisition methods, device and the electronic equipment of multiple level marketing Website publicity address |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110442775A true CN110442775A (en) | 2019-11-12 |
Family
ID=68435080
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910743972.2A Pending CN110442775A (en) | 2019-08-13 | 2019-08-13 | Acquisition methods, device and the electronic equipment of multiple level marketing Website publicity address |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110442775A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113378027A (en) * | 2021-07-13 | 2021-09-10 | 杭州安恒信息技术股份有限公司 | Cable excavation method, device, equipment and computer readable storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103020123A (en) * | 2012-11-16 | 2013-04-03 | 中国科学技术大学 | Method for searching bad video website |
WO2014196362A1 (en) * | 2013-06-02 | 2014-12-11 | データ・サイエンティスト株式会社 | Evaluation method, evaluation device, and program |
CN105824822A (en) * | 2015-01-05 | 2016-08-03 | 任子行网络技术股份有限公司 | Method clustering phishing page to locate target page |
US20160364485A1 (en) * | 2015-06-12 | 2016-12-15 | Smugmug, Inc. | Advanced keyword search application |
CN108234392A (en) * | 2016-12-14 | 2018-06-29 | 北京国双科技有限公司 | The monitoring method and device of a kind of website |
CN108647225A (en) * | 2018-03-23 | 2018-10-12 | 浙江大学 | A kind of electric business grey black production public sentiment automatic mining method and system |
CN109145117A (en) * | 2018-09-05 | 2019-01-04 | 杭州安恒信息技术股份有限公司 | Bonus system recognition methods, device and the electronic equipment of multiple level marketing project |
CN109446409A (en) * | 2018-09-19 | 2019-03-08 | 杭州安恒信息技术股份有限公司 | A kind of recognition methods of the target object of doubtful multiple level marketing behavior |
-
2019
- 2019-08-13 CN CN201910743972.2A patent/CN110442775A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103020123A (en) * | 2012-11-16 | 2013-04-03 | 中国科学技术大学 | Method for searching bad video website |
WO2014196362A1 (en) * | 2013-06-02 | 2014-12-11 | データ・サイエンティスト株式会社 | Evaluation method, evaluation device, and program |
CN105824822A (en) * | 2015-01-05 | 2016-08-03 | 任子行网络技术股份有限公司 | Method clustering phishing page to locate target page |
US20160364485A1 (en) * | 2015-06-12 | 2016-12-15 | Smugmug, Inc. | Advanced keyword search application |
CN108234392A (en) * | 2016-12-14 | 2018-06-29 | 北京国双科技有限公司 | The monitoring method and device of a kind of website |
CN108647225A (en) * | 2018-03-23 | 2018-10-12 | 浙江大学 | A kind of electric business grey black production public sentiment automatic mining method and system |
CN109145117A (en) * | 2018-09-05 | 2019-01-04 | 杭州安恒信息技术股份有限公司 | Bonus system recognition methods, device and the electronic equipment of multiple level marketing project |
CN109446409A (en) * | 2018-09-19 | 2019-03-08 | 杭州安恒信息技术股份有限公司 | A kind of recognition methods of the target object of doubtful multiple level marketing behavior |
Non-Patent Citations (1)
Title |
---|
杨秀璋 等: "《Python网络数据爬取及分析从入门到精通(分析篇)》", 30 June 2018 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113378027A (en) * | 2021-07-13 | 2021-09-10 | 杭州安恒信息技术股份有限公司 | Cable excavation method, device, equipment and computer readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7624102B2 (en) | System and method for grouping by attribute | |
US7752314B2 (en) | Automated tagging of syndication data feeds | |
CN103064956B (en) | For searching for the method for digital content, calculating system and computer-readable medium | |
TWI437452B (en) | Web spam page classification using query-dependent data | |
CN102402604B (en) | Effective forward ordering of search engine | |
US8719308B2 (en) | Method and system to process unstructured data | |
US8135709B2 (en) | Relevance ranked faceted metadata search method | |
US8190556B2 (en) | Intellegent data search engine | |
US8856129B2 (en) | Flexible and scalable structured web data extraction | |
US20070022085A1 (en) | Techniques for unsupervised web content discovery and automated query generation for crawling the hidden web | |
US20060253423A1 (en) | Information retrieval system and method | |
US8682881B1 (en) | System and method for extracting structured data from classified websites | |
US8515986B2 (en) | Query pattern generation for answers coverage expansion | |
WO2010076785A1 (en) | System and method for aggregating data from a plurality of web sites | |
Im et al. | Linked tag: image annotation using semantic relationships between image tags | |
US8732165B1 (en) | Automatic determination of whether a document includes an image gallery | |
CN103136228A (en) | Image search method and image search device | |
CN102456016B (en) | Method and device for sequencing search results | |
US20080281827A1 (en) | Using structured database for webpage information extraction | |
CN102486791A (en) | Method and server for intelligently classifying bookmarks | |
US20090319481A1 (en) | Framework for aggregating information of web pages from a website | |
JP5989170B2 (en) | Search result ranking apparatus and method using reliability of representative | |
CN110442775A (en) | Acquisition methods, device and the electronic equipment of multiple level marketing Website publicity address | |
US20070271245A1 (en) | System and method for searching a database | |
CN104572720A (en) | Webpage information duplicate eliminating method and device and computer-readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191112 |
|
RJ01 | Rejection of invention patent application after publication |