CN112860667B - Correlation model building method, correlation model judging method, site discovery method and site discovery device - Google Patents

Correlation model building method, correlation model judging method, site discovery method and site discovery device Download PDF

Info

Publication number
CN112860667B
CN112860667B CN202110193713.4A CN202110193713A CN112860667B CN 112860667 B CN112860667 B CN 112860667B CN 202110193713 A CN202110193713 A CN 202110193713A CN 112860667 B CN112860667 B CN 112860667B
Authority
CN
China
Prior art keywords
site
industry
correlation
module
website
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110193713.4A
Other languages
Chinese (zh)
Other versions
CN112860667A (en
Inventor
曹咪
徐雷
陶冶
边林
刘伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN202110193713.4A priority Critical patent/CN112860667B/en
Publication of CN112860667A publication Critical patent/CN112860667A/en
Application granted granted Critical
Publication of CN112860667B publication Critical patent/CN112860667B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • G06F16/212Schema design and management with details for data modelling support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method for establishing a site and industry relevance model, which is used for constructing a site layout and industry relevance model and constructing the site and industry relevance model according to the site layout and industry relevance model. Further, a device for establishing a site and industry correlation model, a method and a device for judging the site and industry correlation, and a method and a device for discovering the site facing the industry are also provided. The site of the deep network can be evaluated and analyzed by industry users according to the site and industry relevance model, so that a comprehensive and proper deep network site can be obtained.

Description

Correlation model building method, correlation model judging method, site discovery method and site discovery device
Technical Field
The invention relates to the technical field of communication, in particular to a method and a device for establishing a site and industry relevance model, a method and a device for judging the site and industry relevance, and a method and a device for discovering the site facing the industry.
Background
Deep networks refer to non-surface network content that cannot be indexed by standard search engines, and have a greater data volume and higher data quality than surface networks. With the increasing maturity of the Web (World Wide Web) technology, the amount of data contained in the deep network is rapidly increased, so that the research on the deep network is also more and more important.
Because the data volume of the deep network is large and a reasonable evaluation model of the site and industry correlation degree of the deep network is lacking, the industry user is difficult to acquire comprehensive and proper deep network site information.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a method and a device for establishing a site and industry correlation model, a method and a device for judging the site and industry correlation, and a method and a device for discovering an industry-oriented site, aiming at the defects of the prior art, so as to provide a reasonable evaluation model of the site and industry correlation, and enable an industry user to acquire a comprehensive and proper deep network site according to the evaluation model.
In a first aspect, an embodiment of the present invention provides a method for establishing a site and industry relevance model, including: building a correlation model of site sections and industries:
Figure BDA0002945640210000011
wherein Re levance (Module) is the correlation degree of site layout and industry, A i As the ith decision basis related to industry, B j For a different time period before the current time, number (a i ∩B j ) To at B j Inner satisfaction A i Determining the information quantity of the basis, H j Is equal to B j Corresponding weight coefficient is more than or equal to 0 and less than or equal to H j Building a correlation model of the site and the industry according to the correlation model of the site layout and the industry, wherein the correlation model is less than or equal to 1:
Figure BDA0002945640210000021
wherein Re levance (Website) is the correlation degree of sites and industries, re levance (Module) k ) And the correlation degree between the k site layout and the industry in the sites is obtained.
Preferably, the ith decision related to the industry is based on information including industry keywords, or industry related business, wherein the number of industry keywords is one or more,industry-related business information includes business names and business products. The time granularity of the time period comprises month and year. H j Is equal to B j The corresponding weight coefficient is specifically as follows: h j And B is connected with j Is inversely related to the value of (c).
In a second aspect, an embodiment of the present invention further provides a method for determining a correlation degree between a site and an industry, including: according to the site and industry correlation model established by the site and industry correlation model establishing method in the first aspect, calculating the industry correlation of the site; and when the calculated industry correlation of the site is not 0, judging that the site is correlated with the industry.
In a third aspect, an embodiment of the present invention further provides an industry-oriented site discovery method, including: crawling site information of each site to construct a first site list, wherein the first site list comprises mapping relations between site addresses and site sections; content crawling is carried out on all the website sections corresponding to the websites of each website in the first website list, and corresponding website section content is obtained; constructing a second site list, wherein the second site list comprises site addresses, site sections, site section contents and mapping relations among the site addresses, the site sections and the site section contents; according to the method for judging the correlation degree between the website and the industry and the second website list, acquiring the industry correlation degree of each website which is related to the industry as a judgment result in the second website list; and constructing a third site list, wherein the third site list comprises site websites, related site sections, site section content, site industry relevance and mapping relations among the sites.
Preferably, after the third site list is constructed, the industry-oriented site discovery method further includes: the third site list is arranged in descending order according to the industry relevance of the sites, and a fourth site list is obtained; and acquiring site websites, related site sections, site section content and industry correlation of the sites of the first N sites in the fourth site list, and storing the sites to the blockchain.
Preferably, the industry-oriented site discovery method further comprises: storing website addresses, related website sections, website section content and industry relevance of the website corresponding to different industries into different blockchains so as to provide the website addresses, the related website section content and the website relevance to industry users.
In a fourth aspect, an embodiment of the present invention further provides a device for establishing a site and industry relevance model, including a first construction module and a second construction module. The first building module is used for building a correlation model of site sections and industries:
Figure BDA0002945640210000031
wherein Re levance (Module) is the correlation degree of site layout and industry, A i As the ith decision basis related to industry, B j For a different time period before the current time, number (a i ∩B j ) To at B j Inner satisfaction A i Determining the information quantity of the basis, H j Is equal to B j Corresponding weight coefficient is more than or equal to 0 and less than or equal to H j The second building module is connected with the first building module and is used for building a correlation model of the site and the industry according to the correlation model of the site layout and the industry, wherein the correlation model is less than or equal to 1:
Figure BDA0002945640210000032
wherein Re levance (Website) is the correlation degree of sites and industries, re levance (Module) k ) And the correlation degree between the k site layout and the industry in the sites is obtained.
In a fifth aspect, an embodiment of the present invention further provides a device for determining a correlation degree between a site and an industry, including a calculation module and a determination module. The calculation module is used for storing the correlation model of the site and the industry, which is established by the establishment device of the correlation model of the site and the industry in the fourth aspect, and calculating the industry correlation of the site according to the correlation model of the site and the industry; and the judging module is connected with the calculating module and is used for judging that the site is related to the industry when the calculated industry relevance of the site is not 0.
In a sixth aspect, an embodiment of the present invention further provides an industry-oriented site discovery apparatus, where the apparatus includes a crawling module, a third building module, an obtaining module, and a device for determining a correlation between a site and an industry described in the fifth aspect. And the crawling module is used for crawling site information of each site. The third construction module is connected with the crawling module and used for constructing a first site list according to the site information of each crawled site, wherein the first site list comprises the mapping relation between site addresses and site sections. And the crawling module is also used for crawling the content of all the website sections corresponding to the websites of each website in the first website list to obtain the content of the corresponding website sections. The third construction module is further configured to construct a second site list, where the second site list includes site addresses, site sections, site section contents, and mapping relationships between the three. The acquiring module is connected with the station and industry correlation judging device in the fifth aspect, and is configured to acquire, according to the judging result of the station and industry correlation judging device and the second station list, the industry correlation of each station whose judging result is related to the industry in the second station list. The third construction module is connected with the acquisition module and is also used for constructing a third site list, wherein the third site list comprises site websites, related site sections, site section content, site industry correlation degree and mapping relations among the sites.
Preferably, the industry-oriented site discovery apparatus further comprises a storage module. And the third construction module is also used for arranging the third site list in descending order according to the industry relevance of the sites to obtain a fourth site list. The storage module is connected with the third construction module and used for acquiring site websites of the first N sites in the fourth site list, related site sections, site section content and industry correlation of the sites and storing the sites to the blockchain.
The method and the device for establishing the site and industry relevance model, the method and the device for judging the site and industry relevance and the method and the device for discovering the site facing the industry are used for establishing the site layout and industry relevance model with comprehensive judging basis and establishing the site and industry relevance model based on the site layout and industry relevance model. And the method and the system can acquire comprehensive and proper deep network sites to provide for industry users according to the correlation degree between the constructed sites and the industry.
Drawings
Fig. 1: the method is a flow chart of a method for establishing a site and industry relevance model in the embodiment 1 of the invention;
fig. 2: the device for establishing the site and industry correlation model in the embodiment 4 of the invention is a structural diagram;
fig. 3: the invention is a structural diagram of a device for judging the correlation degree between a site and industry in embodiment 5;
fig. 4: the invention relates to a structure diagram of an industry-oriented site discovery device in embodiment 6.
Detailed Description
In order to enable the technical scheme of the invention to be better understood by the person skilled in the art, the method and the device for establishing the site and industry relevance model, the method and the device for judging the site and industry relevance and the method and the device for discovering the site facing the industry are described in further detail below with reference to the accompanying drawings and the embodiment.
Example 1:
as shown in fig. 1, this embodiment provides a method for establishing a site and industry relevance model, including:
step 101, constructing a correlation model of site sections and industries:
Figure BDA0002945640210000051
wherein Re levance (Module) is the correlation degree of site layout and industry, A i As the ith decision basis related to industry, B j For a different time period before the current time, number (a i ∩B j ) To at B j Inner satisfaction A i Determining the information quantity of the basis, H j Is equal to B j Corresponding weight coefficient is more than or equal to 0 and less than or equal to H j ≤1。
In this embodiment, the site generally includes a plurality of site sections, and the correlation between the site and the industry is finally determined by acquiring the content of each site section and determining the correlation between the content of each site section and the industry. In the correlation model of the website layout and the industry constructed in the embodiment, because a certain class of industry has a plurality of different judgment bases, A is adopted i Representing different decision bases related to the current industry, where i=1, 2,3,..m. Specifically, the ith decision related to the industry is based on industry-related business information including industry keywords, or industry-related business information, wherein the number of industry keywords is one or more, and the industry-related business information includes business names and business products. B (B) j For different time periods before the current time, for example 2 months, 3 months, 1 year before the current time, the time granularity of the time period includes months, years. Wherein j=1, 2,3,.. j Is equal to B j Corresponding weight coefficient is more than or equal to 0 and less than or equal to H j ≤1,H j And B is connected with j Is inversely related to the value of (c). For example, the current industry is the communications industry, A 1 Is an industry keyword, wherein the industry keyword comprises a SIM card, a mobile phone number, a telephone bill and the like. A is that 2 Is industry-related business information, wherein the industry-related business information comprises business names (such as China Unicom, china telecom, china Mobile), business products (such as flow packages, broadband services) and the like. B (B) j 1 month, 3 months, 6 months, 9 months, 1 year, 2 years, etc. before the current time. H j The value of (2) can be set reasonably according to the actual situation, the update speed of the considered information is high, the information with longer time is lower than the information with shorter time, and the researched value is reduced, therefore, H is set j And B is connected with j Is inversely related to the value of (c). In the present embodiment, when B 1 1 month, corresponding to H 1 The value of (1) is 1, when B 2 For 3 months, correspond to H 2 The value of (B) is 0.8, when B 3 1 year, correspond to H 3 The value of (2) is 0.2. The current site layout and industry relativity is valued as each B j Satisfy each A i And determining the weight sum of the number of the information according to the judgment.
Step 102, constructing a correlation model of the site and the industry according to the correlation model of the site layout and the industry:
Figure BDA0002945640210000061
wherein Re levance (Website) is the correlation degree of sites and industries, re levance (Module) k ) And the correlation degree between the k site layout and the industry in the sites is obtained.
In this embodiment, the built site-industry relevance model integrates the relevance data of each site block and the industry in the site, where k=1, 2, 3.
In the method for establishing the site and industry relevance model, because the site comprises a plurality of site sections, the established site and industry relevance model comprehensively evaluates the site and industry relevance according to the relevance data of all the site sections and the industry in the site, so that the site and industry relevance model is more reasonable. Further, in the correlation model of the website layout and the industry, the timeliness of the content of the website layout is divided in fine granularity, and corresponding weight coefficients are set on the basis of the contribution of the timeliness of the information to the information value, so that the correlation model of the website and the industry is more reasonable. In addition, the number of judging bases in the correlation model of the site layout and the industry is multiple, and compared with the situation that judgment is carried out only through the industry keywords in the prior art, the method is more comprehensive. Therefore, the site and industry correlation degree evaluated according to the site and industry correlation degree model in the embodiment is more reasonable, comprehensive and accurate, and the site evaluation is performed according to the site and industry correlation degree model in the embodiment, so that comprehensive and proper deep network sites can be provided for industry users, and deep researches are conducted.
Example 2:
the embodiment provides a method for judging the correlation degree between a site and industry, which comprises the following steps:
step 201, calculating the industry relevance of the site according to the site-industry relevance model established by the site-industry relevance model establishing method described in embodiment 1.
Step 202, when the calculated industry correlation of the site is not 0, determining that the site is related to the industry. And when the calculated industry correlation of the site is 0, judging that the site is irrelevant to the industry.
Example 3:
the embodiment provides an industry-oriented site discovery method, which is used for providing a relevant deep network site according to requirements of industry users, and comprises the following steps:
step 301, crawling site information of each site by adopting a crawler technology to construct a first site list, wherein the first site list comprises mapping relations between site addresses and site sections. The first site list may also include crawled site information such as site domain names, site names, etc.
And 302, performing content crawling on all the site sections corresponding to the website addresses in the first site list to acquire corresponding site section contents.
In this embodiment, a website address corresponds to a plurality of website sections, and a crawler technology is adopted for each website address to sequentially crawl the content of each website section corresponding to the website address, so as to obtain the content of each website section corresponding to each website address in the first website list.
Step 303, a second site list is constructed, wherein the second site list includes site addresses, site sections, site section contents, and mapping relations among the three.
In this embodiment, according to the site layout content corresponding to the site layout acquired in step 302, a second site list is constructed, compared with the first site list, the site layout content is added to the second site list, and the site layout content has a mapping relationship with the site layout.
Step 304, according to the method for determining the correlation between a website and an industry and the second website list described in embodiment 2, the industry correlation of each website related to the industry as the determination result in the second website list is obtained.
In this embodiment, when the industry correlation of each site related to the industry is obtained as the determination result in the second site list, the correlation between part of site sections of the possible sites and the industry is 0, and the correlation between the rest of site sections and the industry is not 0, and the site sections with the industry correlation not being 0 are defined as related site sections.
In step 305, a third site list is constructed, where the third site list includes site addresses, related site sections, site section contents, industry relevance of sites, and mapping relationships between the four.
Optionally, after building the third site list, the industry-oriented site discovery method further includes: the third site list is arranged in descending order according to the industry relevance of the sites, and a fourth site list is obtained; and acquiring site websites, related site sections, site section content and industry correlation of the sites of the first N sites in the fourth site list, and storing the sites to the blockchain.
In this embodiment, the value of N may be set according to the requirement, for example, the value of N is 40, and the website addresses, related website sections, website section contents and the industry relevance of the website of the top 40 websites in the fourth website list arranged in descending order according to the correlation between the website and the industry are stored in the blockchain for the current industry user to query.
Optionally, the industry-oriented site discovery method further includes: storing website addresses, related website sections, website section content and industry relevance of the website corresponding to different industries into different blockchains so as to provide the website addresses, the related website section content and the website relevance to industry users. For example, a user of industry A may review a site address, associated site tile, site tile content, and industry relevance for a site stored by blockchain A, and a user of industry B may review a site address, associated site tile, site tile content, and industry relevance for a site stored by blockchain B.
Example 4:
as shown in fig. 2, the embodiment provides a device 4 for establishing a site and industry relevance model, which includes a first construction module 41 and a second construction module 42.
A first building module 41, configured to build a relevance model of a site layout and industry:
Figure BDA0002945640210000091
wherein Re levance (Module) is the correlation degree of site layout and industry, A i As the ith decision basis related to industry, B j For a different time period before the current time, number (a i ∩B j ) To at B j Inner satisfaction A i Determining the information quantity of the basis, H j Is equal to B j Corresponding weight coefficient is more than or equal to 0 and less than or equal to H j ≤1。
The second building module 42 is connected to the first building module 41, and is configured to build a site-industry relevance model according to the site layout-industry relevance model:
Figure BDA0002945640210000092
wherein Re levance (Website) is the correlation degree of sites and industries, re levance (Module) k ) And the correlation degree between the k site layout and the industry in the sites is obtained.
Example 5:
as shown in fig. 3, the embodiment provides a device 5 for determining the correlation degree between a site and an industry, which includes a calculating module 51 and a determining module 52.
The calculation module 51 is connected to the site and industry correlation model building device 4 described in embodiment 4, and stores the site and industry correlation model built by the site and industry correlation model building device 4 described in embodiment 4 therein, so as to calculate the site's industry correlation according to the site and industry correlation model.
And the judging module 52 is connected with the calculating module 51 and is used for judging that the site is related to the industry when the calculated industry relevance of the site is not 0.
Example 6:
as shown in fig. 4, this embodiment provides an industry-oriented site discovery apparatus, which includes a crawling module 61, a third building module 62, an obtaining module 63, and the site and industry relevance determining apparatus 5 described in embodiment 5.
And a crawling module 61, configured to crawl site information of each site.
And a third construction module 62, connected to the crawling module 61, configured to construct a first site list according to the crawled site information of each site, where the first site list includes a mapping relationship between a site address and a site layout.
The crawling module 61 is further configured to crawl content of all site sections corresponding to the website addresses of the sites in the first site list, so as to obtain content of the corresponding site sections.
The third construction module 62 is further configured to construct a second site list, where the second site list includes site addresses, site sections, site section contents, and mapping relationships between the three.
The obtaining module 63 is connected to the device for determining the correlation between the website and the industry 5, and is configured to obtain, according to the determination result of the device for determining the correlation between the website and the industry 5 and the second website list, the industry correlation of each website in which the determination result in the second website list is related to the industry.
The third construction module 62 is connected to the obtaining module 63, and is further configured to construct a third site list, where the third site list includes site addresses, related site sections, site section contents, industry relevance of the sites, and mapping relationships between the sites.
Optionally, the industry-oriented site discovery apparatus further includes a storage module 64. The third construction module 62 is further configured to arrange the third site list in descending order according to the industry relevance of the sites, to obtain a fourth site list. The storage module 64 is connected to the third construction module 62, and is configured to obtain website addresses, related website sections, website section contents, and industry relevance of the websites of the first N sites in the fourth site list, and store the website addresses, the related website section contents, and the industry relevance of the websites in the blockchain.
It is to be understood that the above embodiments are merely illustrative of the application of the principles of the present invention, but not in limitation thereof. Various modifications and improvements may be made by those skilled in the art without departing from the spirit and substance of the invention, and are also considered to be within the scope of the invention.

Claims (10)

1. The method for establishing the site and industry correlation model is characterized by comprising the following steps of:
building a correlation model of site sections and industries:
Figure FDA0004199110420000011
wherein Relevance (Module) is the correlation degree between the website layout and the industry, A i As the ith decision basis related to industry, B j For a different time period before the current time, number (a i ∩B j ) To at B j Inner satisfaction A i Determining the information quantity of the basis, H j Is equal to B j Corresponding weight coefficient is more than or equal to 0 and less than or equal to H j ≤1,
Building a correlation model of the site and the industry according to the correlation model of the site layout and the industry:
Figure FDA0004199110420000012
wherein Relevance (Website) is the correlation between the site and the industry, and Relevance (Module k ) And the correlation degree between the k site layout and the industry in the sites is obtained.
2. The method for building a site and industry relevance model according to claim 1, wherein,
the ith decision related to the industry is based on information including industry keywords, or industry related business information, wherein the number of industry keywords is one or more, the industry related business information includes business names and business products,
the time granularity of the time period includes month and year,
H j is equal to B j The corresponding weight coefficient is specifically as follows: h j And B is connected with j Is inversely related to the value of (c).
3. A method for judging the correlation degree between a site and industry is characterized by comprising the following steps:
calculating the industry relevance of the station according to the station and industry relevance model established by the station and industry relevance model establishing method according to claim 1 or 2;
and when the calculated industry correlation of the site is not 0, judging that the site is correlated with the industry.
4. An industry-oriented site discovery method, comprising:
crawling site information of each site to construct a first site list, wherein the first site list comprises mapping relations between site addresses and site sections;
content crawling is carried out on all the website sections corresponding to the websites of each website in the first website list, and corresponding website section content is obtained;
constructing a second site list, wherein the second site list comprises site addresses, site sections, site section contents and mapping relations among the site addresses, the site sections and the site section contents;
the method for determining the correlation between a station and industry and a second station list according to claim 3, wherein the industry correlation of each station, which is determined as being related to the industry, in the second station list is obtained;
and constructing a third site list, wherein the third site list comprises site websites, related site sections, site section content, site industry relevance and mapping relations among the sites.
5. The industry oriented site discovery method of claim 4 further comprising, after said building a third site list:
the third site list is arranged in descending order according to the industry relevance of the sites, and a fourth site list is obtained;
and acquiring site websites, related site sections, site section content and industry correlation of the sites of the first P sites in the fourth site list, and storing the sites to the blockchain.
6. The industry oriented site discovery method of claim 5 further comprising:
storing website addresses, related website sections, website section content and industry relevance of the website corresponding to different industries into different blockchains so as to provide the website addresses, the related website section content and the website relevance to industry users.
7. A device for establishing a site and industry correlation model is characterized by comprising a first construction module and a second construction module,
the first building module is used for building a correlation model of site sections and industries:
Figure FDA0004199110420000031
wherein Relevance (Module) is the correlation degree between the website layout and the industry, A i As the ith decision basis related to industry, B j For a different time period before the current time, number (a i ∩B j ) To at B j Inner satisfaction A i Determining the information quantity of the basis, H j Is equal to B j Corresponding weight coefficient is more than or equal to 0 and less than or equal to H j ≤1,
The second building module is connected with the first building module and is used for building a correlation model of the site and the industry according to the correlation model of the site layout and the industry:
Figure FDA0004199110420000032
wherein Relevance (Website) is the correlation between the site and the industry, and Relevance (Module k ) And the correlation degree between the k site layout and the industry in the sites is obtained.
8. A device for judging the correlation degree between a site and industry is characterized by comprising a calculation module and a judgment module,
the calculation module is stored with the site and industry correlation model established by the site and industry correlation model establishing device in claim 7, and is used for calculating the site industry correlation according to the site and industry correlation model;
and the judging module is connected with the calculating module and is used for judging that the site is related to the industry when the calculated industry relevance of the site is not 0.
9. An industry-oriented site discovery device is characterized by comprising a crawling module, a third construction module, an acquisition module and the judging device of the site and industry relevance according to claim 8,
a crawling module for crawling the site information of each site,
the third construction module is connected with the crawling module and is used for constructing a first site list according to the site information of each crawled site, wherein the first site list comprises the mapping relation between site addresses and site sections;
the crawling module is also used for crawling the content of all the website sections corresponding to the websites of each website in the first website list to obtain the content of the corresponding website sections;
the third construction module is further used for constructing a second site list, wherein the second site list comprises site addresses, site sections, site section contents and mapping relations among the site addresses, the site sections and the site section contents;
the acquisition module is connected with the judging device of the site and the industry correlation and is used for acquiring the industry correlation of each site which is related to the industry as a judging result in the second site list according to the judging result of the judging device of the site and the industry correlation and the second site list;
the third construction module is connected with the acquisition module and is also used for constructing a third site list, wherein the third site list comprises site websites, related site sections, site section content, site industry correlation degree and mapping relations among the sites.
10. The industry oriented site discovery apparatus of claim 9 further comprising a storage module,
the third construction module is further used for arranging the third site list in descending order according to the industry relevance of the sites to obtain a fourth site list;
the storage module is connected with the third construction module and used for acquiring site websites of the first P sites, related site sections, site section content and site industry correlation degree in the fourth site list and storing the site websites and the site section content and the site industry correlation degree in the blockchain.
CN202110193713.4A 2021-02-20 2021-02-20 Correlation model building method, correlation model judging method, site discovery method and site discovery device Active CN112860667B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110193713.4A CN112860667B (en) 2021-02-20 2021-02-20 Correlation model building method, correlation model judging method, site discovery method and site discovery device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110193713.4A CN112860667B (en) 2021-02-20 2021-02-20 Correlation model building method, correlation model judging method, site discovery method and site discovery device

Publications (2)

Publication Number Publication Date
CN112860667A CN112860667A (en) 2021-05-28
CN112860667B true CN112860667B (en) 2023-06-20

Family

ID=75988394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110193713.4A Active CN112860667B (en) 2021-02-20 2021-02-20 Correlation model building method, correlation model judging method, site discovery method and site discovery device

Country Status (1)

Country Link
CN (1) CN112860667B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102779120A (en) * 2011-05-09 2012-11-14 北京百度网讯科技有限公司 Method, system and device for determining field information of station and judging correlation
CN103186574A (en) * 2011-12-29 2013-07-03 北京百度网讯科技有限公司 Method and device for generating searching result
CN104331443A (en) * 2014-10-27 2015-02-04 安徽华贞信息科技有限公司 Industry data source detection method
CN105631007A (en) * 2015-12-29 2016-06-01 云南电网有限责任公司电力科学研究院 Industry technical information collecting method and system
CN105653651A (en) * 2015-12-29 2016-06-08 云南电网有限责任公司电力科学研究院 Discovery and arrangement method and apparatus for industry website
CN106980677A (en) * 2017-03-30 2017-07-25 电子科技大学 The subject search method of Industry-oriented
CN112256379A (en) * 2020-10-30 2021-01-22 广东耐思智慧科技有限公司 Method for realizing mass production of multi-industry templates

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150019233A1 (en) * 2013-07-10 2015-01-15 Forte Research Systems, Inc. Site-specific clinical trial performance metric system
US10445377B2 (en) * 2015-10-15 2019-10-15 Go Daddy Operating Company, LLC Automatically generating a website specific to an industry

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102779120A (en) * 2011-05-09 2012-11-14 北京百度网讯科技有限公司 Method, system and device for determining field information of station and judging correlation
CN103186574A (en) * 2011-12-29 2013-07-03 北京百度网讯科技有限公司 Method and device for generating searching result
CN104331443A (en) * 2014-10-27 2015-02-04 安徽华贞信息科技有限公司 Industry data source detection method
CN105631007A (en) * 2015-12-29 2016-06-01 云南电网有限责任公司电力科学研究院 Industry technical information collecting method and system
CN105653651A (en) * 2015-12-29 2016-06-08 云南电网有限责任公司电力科学研究院 Discovery and arrangement method and apparatus for industry website
CN106980677A (en) * 2017-03-30 2017-07-25 电子科技大学 The subject search method of Industry-oriented
CN112256379A (en) * 2020-10-30 2021-01-22 广东耐思智慧科技有限公司 Method for realizing mass production of multi-industry templates

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
行业信息网站的策划设计;周育忠 等;广东电力(第06期);63-66 *
领域相关的Web网站抓取方法;李刚 等;计算机科学(第02期);137-140+148 *

Also Published As

Publication number Publication date
CN112860667A (en) 2021-05-28

Similar Documents

Publication Publication Date Title
CN105187237B (en) The method and apparatus for searching associated user identifier
CN107526807B (en) Information recommendation method and device
US7836045B2 (en) Customizing web search results based on users' offline activity
CN105306495B (en) user identification method and device
CN102347963B (en) Method and device of recommending friends
CN106375369B (en) The business recommended method of mobile Web and Collaborative Recommendation system based on user behavior analysis
CN110462604A (en) The data processing system and method for association internet device are used based on equipment
CN102395993B (en) The method of human network management service is provided in the terminal
US20080160490A1 (en) Seeking Answers to Questions
CN101283353B (en) The system and method for relevant documentation is found by analyzing tags
US7647316B2 (en) Link optimization
US8583634B2 (en) System and method for determining social rank, relevance and attention
CN102150158A (en) Method, system, and apparatus for arranging content search results
CN101632064A (en) System and method for providing a search portal with enhanced results
CN103593373A (en) Search result sorting method and search result sorting device
CN101414296A (en) Self-adapting service recommendation equipment and method, self-adapting service recommendation system and method
CN102999513B (en) Based on information displaying method and the device of geographic position service search
CN103198072A (en) Method and device for mining and recommendation of popular search word
CN105843817A (en) Method and apparatus for searching on terminal device, and device
CN107766234A (en) A kind of assessment method, the apparatus and system of the webpage health degree based on mobile device
CN105894310A (en) Personalized recommendation method
CN103473085B (en) Method and equipment for loading target application on mobile terminal
CN101425981A (en) Information publishing system and method for publishing information according to mutual exclusive indication
US20090292691A1 (en) System and Method for Building Multi-Concept Network Based on User's Web Usage Data
CN106407362A (en) Keyword information retrieval method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant