CN112860667A - Method for establishing relevance model, method for judging relevance model, and method and device for discovering site - Google Patents

Method for establishing relevance model, method for judging relevance model, and method and device for discovering site Download PDF

Info

Publication number
CN112860667A
CN112860667A CN202110193713.4A CN202110193713A CN112860667A CN 112860667 A CN112860667 A CN 112860667A CN 202110193713 A CN202110193713 A CN 202110193713A CN 112860667 A CN112860667 A CN 112860667A
Authority
CN
China
Prior art keywords
site
industry
module
relevance
list
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110193713.4A
Other languages
Chinese (zh)
Other versions
CN112860667B (en
Inventor
曹咪
徐雷
陶冶
边林
刘伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN202110193713.4A priority Critical patent/CN112860667B/en
Publication of CN112860667A publication Critical patent/CN112860667A/en
Application granted granted Critical
Publication of CN112860667B publication Critical patent/CN112860667B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • G06F16/212Schema design and management with details for data modelling support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method for establishing a relevance model of a site and industry, which comprises the steps of establishing a relevance model of a site section and the industry, and establishing the relevance model of the site and the industry according to the relevance model of the site section and the industry. Further, a device for establishing a site and industry relevance model, a method and a device for judging site and industry relevance, and a site discovery method and a device oriented to industry are also provided. The industry user can evaluate and analyze the site of the deep network according to the relevance model of the site and the industry, so that the comprehensive and proper deep network site is obtained.

Description

Method for establishing relevance model, method for judging relevance model, and method and device for discovering site
Technical Field
The invention relates to the technical field of communication, in particular to a method and a device for establishing a site and industry relevancy model, a method and a device for judging site and industry relevancy, and a site discovery method and a site discovery device for industries.
Background
Deep networks refer to non-surface network content that cannot be indexed by standard search engines, with greater data volume and higher data quality than surface networks. With the increasing maturity of the Web (World Wide Web) technology, the data volume contained in the deep network is rapidly increased, so that the research on the deep network is more and more important.
Due to the fact that the data volume of the deep network is large, and a reasonable evaluation model of the relevance between the site of the deep network and the industry is lacked, an industry user cannot acquire comprehensive and appropriate deep network site information easily.
Disclosure of Invention
The invention aims to solve the technical problems of the prior art, and provides a method and a device for establishing a site-industry relevance model, a method and a device for judging the site-industry relevance, and a method and a device for discovering an industry-oriented site, so as to provide a reasonable evaluation model of the site-industry relevance, and enable an industry user to obtain a comprehensive and appropriate deep network site according to the evaluation model.
In a first aspect, an embodiment of the present invention provides a method for establishing a site-industry relevancy model, including: constructing a correlation model of the site section and the industry:
Figure BDA0002945640210000011
wherein Re levance (Module) is the correlation between site block and industry, AiIs the ith judgment basis related to the industry, BjNumber (A) for different time periods before the current timei∩Bj) Is at BjSatisfies A internallyiNumber of information on which a decision is based, HjIs a reaction of with BjCorresponding weight coefficient, H is more than or equal to 0j1 or less according toThe relevance model of the site section and the industry is constructed as follows:
Figure BDA0002945640210000021
wherein Re levance (Website) is the correlation between site and industry, and Re levance (Module)k) The relevance of the k site block in the site to the industry.
Preferably, the ith judgment basis related to the industry comprises industry keywords or industry-related enterprise information, wherein the number of the industry keywords is one or more, and the industry-related enterprise information comprises an enterprise name and an enterprise product. The time granularity of the time period includes months and years. HjIs a reaction of with BjThe corresponding weight coefficients are specifically: hjAnd BjIs inversely related.
In a second aspect, an embodiment of the present invention further provides a method for determining relevance between a site and an industry, where the method includes: calculating the industry relevancy of the site according to the relevancy model of the site and the industry established by the establishment method of the site and industry relevancy model of the first aspect; and when the calculated industry relevance of the site is not 0, judging that the site is relevant to the industry according to the judgment result.
In a third aspect, an embodiment of the present invention further provides an industry-oriented site discovery method, including: crawling site information of each site to construct a first site list, wherein the first site list comprises a mapping relation between a site website and a site layout; crawling the content of all site sections corresponding to websites of all sites in the first site list to obtain the content of the corresponding site sections; constructing a second site list, wherein the second site list comprises site websites, site sections, site section contents and mapping relations among the site websites, the site sections and the site section contents; according to the judgment method of the relevance between the sites and the industry in the second aspect and the second site list, the industry relevance of each site relevant to the industry in the judgment result in the second site list is obtained; and constructing a third site list, wherein the third site list comprises site websites, related site sections, site section contents, industry relevancy of sites and mapping relations among the site websites, the related site sections, the site section contents and the industry relevancy of the sites.
Preferably, after the building the third site list, the industry-oriented site discovery method further includes: the third site list is arranged in a descending order according to the industry relevance of the sites to obtain a fourth site list; and acquiring site addresses, related site sections, site section contents and site industry relevancy of the first N sites in the fourth site list, and storing the site addresses, the related site sections, the site section contents and the site industry relevancy to the block chain.
Preferably, the industry-oriented site discovery method further comprises: and storing the website addresses, the related website sections, the website section contents and the industry relevancy of the websites corresponding to different industries into different block chains so as to provide the industry users with the website addresses, the related website sections, the website section contents and the industry relevancy of the websites.
In a fourth aspect, an embodiment of the present invention further provides an apparatus for building a site-industry relevancy model, which includes a first building module and a second building module. The first building module is used for building a relevance model of the site section and the industry:
Figure BDA0002945640210000031
wherein Re levance (Module) is the correlation between site block and industry, AiIs the ith judgment basis related to the industry, BjNumber (A) for different time periods before the current timei∩Bj) Is at BjSatisfies A internallyiNumber of information on which a decision is based, HjIs a reaction of with BjCorresponding weight coefficient, H is more than or equal to 0jAnd the second construction module is connected with the first construction module and used for constructing a relevance model of the site and the industry according to the relevance model of the site section and the industry:
Figure BDA0002945640210000032
wherein Re levance (Website) is the correlation between sites and industries, and Re levance(Modulek) The relevance of the k site block in the site to the industry.
In a fifth aspect, an embodiment of the present invention further provides a device for determining a relevance between a site and an industry, including a calculating module and a determining module. The calculation module is used for calculating the industry relevancy of the site according to the relevancy model of the site and the industry, and the correlation model of the site and the industry is established by the establishment device of the site and industry relevancy model in the fourth aspect; and the judging module is connected with the calculating module and used for judging that the site is related to the industry when the calculated industry relevance of the site is not 0.
In a sixth aspect, an embodiment of the present invention further provides an industry-oriented site discovery apparatus, including a crawling module, a third building module, an obtaining module, and the apparatus for determining the relevance between a site and an industry in the fifth aspect. And the crawling module is used for crawling the site information of each site. And the third building module is connected with the crawling module and used for building a first site list according to the crawled site information of each site, wherein the first site list comprises a mapping relation between a site website and a site layout block. And the crawling module is also used for crawling the content of all the site sections corresponding to the websites of the sites in the first site list to obtain the content of the corresponding site sections. And the third building module is further used for building a second site list, wherein the second site list comprises site websites, site sections, site section contents and mapping relations among the site websites, the site sections and the site section contents. And the obtaining module is connected to the determining device for determining the site-to-industry relevance in the fifth aspect, and is configured to obtain the industry relevance of each site, of which the determination result is related to the industry, in the second site list according to the determination result of the determining device for determining the site-to-industry relevance and the second site list. And the third building module is connected with the obtaining module and is also used for building a third site list, wherein the third site list comprises site websites, relevant site sections, site section contents, the industry relevancy of sites and the mapping relationship among the site websites, the relevant site sections, the site section contents and the industry relevancy of sites.
Preferably, the industry-oriented site discovery apparatus further comprises a storage module. And the third building module is also used for arranging the third site list in a descending order according to the industry relevancy of the sites to obtain a fourth site list. And the storage module is connected with the third construction module and is used for acquiring site websites, related site sections, site section contents and site industry relevancy of the first N sites in the fourth site list and storing the site websites, the related site sections, the site section contents and the site industry relevancy to the block chain.
According to the method and the device for establishing the relevance model of the site and the industry, the method and the device for judging the relevance of the site and the industry, and the method and the device for discovering the site facing the industry, which are provided by the embodiment of the invention, the relevance model of the site and the industry is established by establishing the relevance model of the site section and the industry with comprehensive judgment basis and based on the relevance model of the site section and the industry. Therefore, the relevance between the site and the industry can be evaluated according to the constructed relevance model between the site and the industry, and a comprehensive and proper deep network site can be obtained to provide for industry users.
Drawings
FIG. 1: the method is a flow chart of a method for establishing a site and industry relevancy model in embodiment 1 of the invention;
FIG. 2: the structure diagram of the device for establishing the relevance model of the site and the industry in embodiment 4 of the invention;
FIG. 3: is a structure diagram of a device for determining the relevance between a site and an industry in embodiment 5 of the present invention;
FIG. 4: a structure diagram of an industry-oriented site discovery apparatus according to embodiment 6 of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the following describes in detail a method and an apparatus for establishing a site-industry relevance model, a method and an apparatus for determining site-industry relevance, and a method and an apparatus for discovering an industry-oriented site according to the present invention with reference to the accompanying drawings and embodiments.
Example 1:
as shown in fig. 1, the present embodiment provides a method for establishing a site-industry relevancy model, including:
step 101, constructing a relevance model of a site section and an industry:
Figure BDA0002945640210000051
wherein Re levance (Module) is the correlation between site block and industry, AiIs the ith judgment basis related to the industry, BjNumber (A) for different time periods before the current timei∩Bj) Is at BjSatisfies A internallyiNumber of information on which a decision is based, HjIs a reaction of with BjCorresponding weight coefficient, H is more than or equal to 0j≤1。
In this embodiment, a site generally includes a plurality of site sections, and by obtaining the content of each site section and determining the correlation between the content of each site section and the industry, the correlation between the site and the industry is finally determined. In the relevance model between the site section and the industry constructed in this embodiment, a plurality of different determination bases are provided for a certain category of industries, so that a is adoptediRepresent different criteria associated with the current industry, where i 1,2, 3. Specifically, the ith judgment basis related to the industry includes industry keywords or industry-related enterprise information, wherein the number of the industry keywords is one or more, and the industry-related enterprise information includes an enterprise name and an enterprise product. B isjThe time periods are different time periods before the current time, such as 2 months, 3 months and 1 year before the current time, and the time granularity of the time periods comprises months and years. Wherein j is 1,2,3jIs a reaction of with BjCorresponding weight coefficient, H is more than or equal to 0j≤1,HjAnd BjIs inversely related. For example, the current industry is the communications industry, A1The method is an industry keyword, wherein the industry keyword comprises a SIM card, a mobile phone number, a telephone bill and the like. A. the2The method is industry-related enterprise information, wherein the industry-related enterprise information comprises enterprise names (such as China Unicom, China telecom, China Mobile), enterprise products (such as flow packages and broadband services) and the like. B isj1 month, 3 months, 6 months, 9 months before the current time,1 year, 2 years, etc. HjThe value of (b) can be set reasonably according to actual conditions, considering that the update speed of information is fast, and the value that can be researched is reduced for information with longer time compared with information with shorter time, therefore, setting HjAnd BjIs inversely related. In this example, when B11 month, corresponding to H1Is 1, when B is23 months corresponding to H2Is 0.8, when B is31 year, corresponding to H3Is 0.2. The value of the correlation degree of the current site block and the industry is BjSatisfies each AiAnd judging the sum of the weights of the information quantity according to the judgment.
102, constructing a relevance model of the site and the industry according to the relevance model of the site section and the industry:
Figure BDA0002945640210000061
wherein Re levance (Website) is the correlation between site and industry, and Re levance (Module)k) The relevance of the k site block in the site to the industry.
In this embodiment, the constructed relevance model of the site and the industry integrates relevance data of each site block and the industry in the site, where k is 1,2, 3.
In the method for establishing the site-industry relevance model provided by the embodiment, because the site includes a plurality of site sections, the established site-industry relevance model comprehensively evaluates the relevance of the site and the industry according to the relevance data of all the site sections and the industry in the site, so that the site-industry relevance model is more reasonable. Furthermore, in the relevance model of the site section and the industry, the timeliness of the content of the site section is divided into fine granularity, and a corresponding weight coefficient is set based on the contribution of the information timeliness to the information value, so that the relevance model of the site section and the industry is more reasonable. In addition, the number of judgment bases in the relevance model of the site section and the industry is multiple, and compared with the situation that judgment is only carried out through industry keywords in the prior art, the judgment is more comprehensive. Therefore, the relevance between the site and the industry evaluated according to the model of the relevance between the site and the industry in the embodiment is more reasonable, comprehensive and accurate, and the site evaluation according to the model of the relevance between the site and the industry in the embodiment can provide comprehensive and proper deep network sites for industry users to carry out deep research.
Example 2:
the embodiment provides a method for judging relevance between sites and industries, which comprises the following steps:
step 201, calculating the industry relevancy of the site according to the relevancy model of the site and the industry established by the method for establishing the relevancy model of the site and the industry according to embodiment 1.
In step 202, when the calculated industry relevance of the site is not 0, the judgment result is that the site is relevant to the industry. And when the calculated industry relevance of the site is 0, judging that the site is irrelevant to the industry as a result.
Example 3:
the embodiment provides an industry-oriented site discovery method, which is used for providing related deep network sites according to the requirements of industry users, and the industry-oriented site discovery method comprises the following steps:
step 301, crawling the site information of each site by using a crawler technology to construct a first site list, wherein the first site list comprises a mapping relation between a site website and a site layout block. The first site list can also comprise site information such as the domain name, the site name and the like of the crawled site.
Step 302, crawling the content of all the site sections corresponding to the websites of the sites in the first site list to obtain the content of the corresponding site sections.
In this embodiment, a website corresponds to a plurality of website sections, and crawls the content of each website section corresponding to the website in sequence by using a crawler technology for each website, so as to obtain the content of the website section corresponding to each website section of each website in the first website list.
Step 303, constructing a second site list, wherein the second site list includes a site website, a site section content, and a mapping relationship among the site website, the site section, and the site section content.
In this embodiment, a second site list is constructed according to the site layout contents corresponding to the site layouts obtained in step 302, and compared with the first site list, the second site list is added with the site layout contents, and the site layout contents and the site layouts have a mapping relationship.
Step 304, according to the method for determining the relevance between the site and the industry and the second site list described in embodiment 2, the industry relevance of each site related to the industry is obtained as a determination result in the second site list.
In this embodiment, when the determination result in the second site list is the industry relevancy of each site related to the industry, the relevancy of part of the site sections of the possible sites to the industry is 0, while the relevancy of the rest of the site sections to the industry is not 0, and the site sections with the industry relevancy of not 0 are defined as the related site sections.
Step 305, a third site list is constructed, wherein the third site list comprises site websites, relevant site sections, site section contents, industry relevancy of sites, and mapping relations among the site websites, the relevant site sections, the site section contents, the industry relevancy of the sites.
Optionally, after the third site list is constructed, the industry-oriented site discovery method further includes: the third site list is arranged in a descending order according to the industry relevance of the sites to obtain a fourth site list; and acquiring site addresses, related site sections, site section contents and site industry relevancy of the first N sites in the fourth site list, and storing the site addresses, the related site sections, the site section contents and the site industry relevancy to the block chain.
In this embodiment, the value of N may be set according to a requirement, for example, the value of N is 40, and site websites, related site sections, site section contents, and industry relevancy of sites of the first 40 sites in the fourth site list that are arranged in descending order according to the site and industry relevancy are stored in the block chain, so as to be queried by the current industry user for analysis.
Optionally, the industry-oriented site discovery method further includes: and storing the website addresses, the related website sections, the website section contents and the industry relevancy of the websites corresponding to different industries into different block chains so as to provide the industry users with the website addresses, the related website sections, the website section contents and the industry relevancy of the websites. For example, a user of industry a may refer to the site addresses, the related site blocks, the site block contents, and the industry relevancy of the sites stored in blockchain a, and a user of industry B may refer to the site addresses, the related site blocks, the site block contents, and the industry relevancy of the sites stored in blockchain B.
Example 4:
as shown in fig. 2, the present embodiment provides a site-industry correlation model building apparatus 4, which includes a first building module 41 and a second building module 42.
A first building module 41, configured to build a relevance model of a site section and an industry:
Figure BDA0002945640210000091
wherein Re levance (Module) is the correlation between site block and industry, AiIs the ith judgment basis related to the industry, BjNumber (A) for different time periods before the current timei∩Bj) Is at BjSatisfies A internallyiNumber of information on which a decision is based, HjIs a reaction of with BjCorresponding weight coefficient, H is more than or equal to 0j≤1。
The second building module 42 is connected to the first building module 41, and is configured to build a relevance model of the site and the industry according to the relevance model of the site section and the industry:
Figure BDA0002945640210000092
wherein Re levance (Website) is the correlation between site and industry, and Re levance (Module)k) The relevance of the k site block in the site to the industry.
Example 5:
as shown in fig. 3, the present embodiment provides a device 5 for determining relevance between a site and an industry, which includes a calculating module 51 and a determining module 52.
The calculating module 51 is connected to the building apparatus 4 for a site and industry relevancy model described in embodiment 4, and is configured to store the site and industry relevancy model built by the building apparatus 4 for a site and industry relevancy model described in embodiment 4, and calculate the industry relevancy of a site according to the site and industry relevancy model.
And the judging module 52 is connected with the calculating module 51 and is used for judging that the site is related to the industry when the calculated industry relevance of the site is not 0.
Example 6:
as shown in fig. 4, the present embodiment provides an industry-oriented site discovery apparatus, which includes a crawling module 61, a third building module 62, an obtaining module 63, and a determination apparatus 5 for determining relevance between a site and an industry in embodiment 5.
And the crawling module 61 is used for crawling the site information of each site.
And the third constructing module 62 is connected to the crawling module 61, and is configured to construct a first site list according to the crawled site information of each site, where the first site list includes a mapping relationship between a site website and a site layout block.
The crawling module 61 is further configured to perform content crawling on all the site sections corresponding to the websites of the sites in the first site list, and acquire corresponding content of the site sections.
The third building module 62 is further configured to build a second site list, where the second site list includes a site website, a site section content, and a mapping relationship among the site website, the site section, and the site section content.
And the obtaining module 63 is connected to the determining device 5 for determining the site-to-industry relevance, and is configured to obtain the industry relevance of each site in which the determination result is related to the industry in the second site list according to the determination result of the determining device 5 for determining the site-to-industry relevance and the second site list.
The third constructing module 62 is connected to the obtaining module 63, and is further configured to construct a third site list, where the third site list includes site addresses, relevant site sections, site section contents, industry relevancy of sites, and mapping relationships between site addresses, relevant site sections, and industry relevancy of sites.
Optionally, the industry-oriented site discovery apparatus further comprises a storage module 64. The third building module 62 is further configured to sort the third site list in a descending order according to the industry relevancy of the sites to obtain a fourth site list. And the storage module 64 is connected to the third building module 62, and is configured to obtain site addresses, related site sections, site section contents, and industry relevancy of sites of the first N sites in the fourth site list, and store the site addresses, related site sections, site section contents, and industry relevancy of sites in the block chain.
It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims (10)

1. A method for establishing a site and industry relevancy model is characterized by comprising the following steps:
constructing a correlation model of the site section and the industry:
Figure FDA0002945640200000011
wherein Re levance (Module) is the correlation between site block and industry, AiIs the ith judgment basis related to the industry, BjNumber (A) for different time periods before the current timei∩Bj) Is at BjSatisfies A internallyiNumber of information on which a decision is based, HjIs a reaction of with BjCorresponding weight coefficient, H is more than or equal to 0j≤1,
Constructing a relevance model of the site and the industry according to the relevance model of the site section and the industry:
Figure FDA0002945640200000012
wherein Re levance (Website) is the correlation between site and industry, and Re levance (Module)k) The relevance of the k site block in the site to the industry.
2. The method for building site-to-industry correlation model according to claim 1,
the ith judgment basis related to the industry comprises industry keywords or industry related enterprise information, wherein the number of the industry keywords is one or more, the industry related enterprise information comprises an enterprise name and an enterprise product,
the time granularity of the time period includes months and years,
Hjis a reaction of with BjThe corresponding weight coefficients are specifically: hjAnd BjIs inversely related.
3. A method for judging relevance between sites and industries is characterized by comprising the following steps:
the correlation degree model of the site and the industry, which is established according to the establishment method of the correlation degree model of the site and the industry of claim 1 or 2, is used for calculating the industry correlation degree of the site;
and when the calculated industry relevance of the site is not 0, judging that the site is relevant to the industry according to the judgment result.
4. An industry-oriented site discovery method, comprising:
crawling site information of each site to construct a first site list, wherein the first site list comprises a mapping relation between a site website and a site layout;
crawling the content of all site sections corresponding to websites of all sites in the first site list to obtain the content of the corresponding site sections;
constructing a second site list, wherein the second site list comprises site websites, site sections, site section contents and mapping relations among the site websites, the site sections and the site section contents;
the method for determining the relevance between the sites and the industry according to claim 3 and the second site list, wherein the industry relevance between each site related to the industry is obtained as a determination result in the second site list;
and constructing a third site list, wherein the third site list comprises site websites, related site sections, site section contents, industry relevancy of sites and mapping relations among the site websites, the related site sections, the site section contents and the industry relevancy of the sites.
5. The industry-oriented site discovery method of claim 4, after said building a third site list, further comprising:
the third site list is arranged in a descending order according to the industry relevance of the sites to obtain a fourth site list;
and acquiring site addresses, related site sections, site section contents and site industry relevancy of the first N sites in the fourth site list, and storing the site addresses, the related site sections, the site section contents and the site industry relevancy to the block chain.
6. The industry-oriented site discovery method of claim 5, further comprising:
and storing the website addresses, the related website sections, the website section contents and the industry relevancy of the websites corresponding to different industries into different block chains so as to provide the industry users with the website addresses, the related website sections, the website section contents and the industry relevancy of the websites.
7. A device for establishing a site-industry correlation model is characterized by comprising a first building module and a second building module,
the first building module is used for building a relevance model of the site section and the industry:
Figure FDA0002945640200000031
whereinRe levance (Module) is the correlation between site block and industry, AiIs the ith judgment basis related to the industry, BjNumber (A) for different time periods before the current timei∩Bj) Is at BjSatisfies A internallyiNumber of information on which a decision is based, HjIs a reaction of with BjCorresponding weight coefficient, H is more than or equal to 0j≤1,
The second construction module is connected with the first construction module and used for constructing the relevance model of the site and the industry according to the relevance model of the site section and the industry:
Figure FDA0002945640200000032
wherein Re levance (Website) is the correlation between site and industry, and Re levance (Module)k) The relevance of the k site block in the site to the industry.
8. A judging device for the relevance of a site and an industry is characterized by comprising a calculating module and a judging module,
the calculation module is stored with the correlation model of the site and the industry, which is established by the establishment device of the correlation model of the site and the industry according to claim 7, and is used for calculating the correlation of the site and the industry according to the correlation model of the site and the industry;
and the judging module is connected with the calculating module and used for judging that the site is related to the industry when the calculated industry relevance of the site is not 0.
9. An industry-oriented site discovery device, comprising a crawling module, a third building module, an acquisition module, and the site-industry relevance determination device of claim 8,
a crawling module used for crawling the site information of each site,
the third building module is connected with the crawling module and used for building a first site list according to the crawled site information of each site, wherein the first site list comprises a mapping relation between a site website and a site layout;
the crawling module is further used for crawling the content of all the site sections corresponding to the websites of the sites in the first site list to obtain the content of the corresponding site sections;
the third building module is further used for building a second site list, wherein the second site list comprises site websites, site sections, site section contents and mapping relations among the site websites, the site sections and the site section contents;
the acquisition module is connected with the judging device of the site-industry relevancy and used for acquiring the industry relevancy of each site which is related to the industry in the second site list according to the judgment result of the judging device of the site-industry relevancy and the second site list;
and the third building module is connected with the obtaining module and is also used for building a third site list, wherein the third site list comprises site websites, relevant site sections, site section contents, the industry relevancy of sites and the mapping relationship among the site websites, the relevant site sections, the site section contents and the industry relevancy of sites.
10. The industry-oriented site discovery apparatus of claim 9, further comprising a storage module,
the third building module is further used for arranging the third site list in a descending order according to the industry relevance of the sites to obtain a fourth site list;
and the storage module is connected with the third construction module and is used for acquiring site websites, related site sections, site section contents and site industry relevancy of the first N sites in the fourth site list and storing the site websites, the related site sections, the site section contents and the site industry relevancy to the block chain.
CN202110193713.4A 2021-02-20 2021-02-20 Correlation model building method, correlation model judging method, site discovery method and site discovery device Active CN112860667B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110193713.4A CN112860667B (en) 2021-02-20 2021-02-20 Correlation model building method, correlation model judging method, site discovery method and site discovery device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110193713.4A CN112860667B (en) 2021-02-20 2021-02-20 Correlation model building method, correlation model judging method, site discovery method and site discovery device

Publications (2)

Publication Number Publication Date
CN112860667A true CN112860667A (en) 2021-05-28
CN112860667B CN112860667B (en) 2023-06-20

Family

ID=75988394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110193713.4A Active CN112860667B (en) 2021-02-20 2021-02-20 Correlation model building method, correlation model judging method, site discovery method and site discovery device

Country Status (1)

Country Link
CN (1) CN112860667B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102779120A (en) * 2011-05-09 2012-11-14 北京百度网讯科技有限公司 Method, system and device for determining field information of station and judging correlation
CN103186574A (en) * 2011-12-29 2013-07-03 北京百度网讯科技有限公司 Method and device for generating searching result
US20150019233A1 (en) * 2013-07-10 2015-01-15 Forte Research Systems, Inc. Site-specific clinical trial performance metric system
CN104331443A (en) * 2014-10-27 2015-02-04 安徽华贞信息科技有限公司 Industry data source detection method
CN105631007A (en) * 2015-12-29 2016-06-01 云南电网有限责任公司电力科学研究院 Industry technical information collecting method and system
CN105653651A (en) * 2015-12-29 2016-06-08 云南电网有限责任公司电力科学研究院 Discovery and arrangement method and apparatus for industry website
US20170109441A1 (en) * 2015-10-15 2017-04-20 Go Daddy Operating Company, LLC Automatically generating a website specific to an industry
CN106980677A (en) * 2017-03-30 2017-07-25 电子科技大学 The subject search method of Industry-oriented
CN112256379A (en) * 2020-10-30 2021-01-22 广东耐思智慧科技有限公司 Method for realizing mass production of multi-industry templates

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102779120A (en) * 2011-05-09 2012-11-14 北京百度网讯科技有限公司 Method, system and device for determining field information of station and judging correlation
CN103186574A (en) * 2011-12-29 2013-07-03 北京百度网讯科技有限公司 Method and device for generating searching result
US20150019233A1 (en) * 2013-07-10 2015-01-15 Forte Research Systems, Inc. Site-specific clinical trial performance metric system
CN104331443A (en) * 2014-10-27 2015-02-04 安徽华贞信息科技有限公司 Industry data source detection method
US20170109441A1 (en) * 2015-10-15 2017-04-20 Go Daddy Operating Company, LLC Automatically generating a website specific to an industry
CN105631007A (en) * 2015-12-29 2016-06-01 云南电网有限责任公司电力科学研究院 Industry technical information collecting method and system
CN105653651A (en) * 2015-12-29 2016-06-08 云南电网有限责任公司电力科学研究院 Discovery and arrangement method and apparatus for industry website
CN106980677A (en) * 2017-03-30 2017-07-25 电子科技大学 The subject search method of Industry-oriented
CN112256379A (en) * 2020-10-30 2021-01-22 广东耐思智慧科技有限公司 Method for realizing mass production of multi-industry templates

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
周育忠 等: "行业信息网站的策划设计", 广东电力, no. 06, pages 63 - 66 *
李刚 等: "领域相关的Web网站抓取方法", 计算机科学, no. 02, pages 137 - 140 *

Also Published As

Publication number Publication date
CN112860667B (en) 2023-06-20

Similar Documents

Publication Publication Date Title
CN102622417B (en) The method and apparatus that information record is ranked up
CN105187237B (en) The method and apparatus for searching associated user identifier
CN101283353B (en) The system and method for relevant documentation is found by analyzing tags
CN105306495B (en) user identification method and device
US20110275047A1 (en) Seeking Answers to Questions
CN103118111B (en) Information push method based on data from a plurality of data interaction centers
CN112511865B (en) Video content recommendation system based on social media
CN103593373A (en) Search result sorting method and search result sorting device
CN106375369A (en) Mobile Web service recommendation method and collaborative recommendation system based on user behavior analysis
CN103198072A (en) Method and device for mining and recommendation of popular search word
CN105894310A (en) Personalized recommendation method
CN107766234A (en) A kind of assessment method, the apparatus and system of the webpage health degree based on mobile device
Peng et al. A graph indexing approach for content-based recommendation system
CN106294788B (en) The recommendation method of Android application
US20090292691A1 (en) System and Method for Building Multi-Concept Network Based on User's Web Usage Data
CN108121741A (en) Website quality appraisal procedure and device
Hu et al. WSRank: a method for web service ranking in cloud environment
CN105224555A (en) A kind of methods, devices and systems of search
KR20130064447A (en) Method and appratus for providing search results using similarity between inclinations of users and device
CN107844536A (en) The methods, devices and systems of application program selection
CN112860667A (en) Method for establishing relevance model, method for judging relevance model, and method and device for discovering site
CN113034231B (en) Multi-supply chain commodity intelligent recommendation system and method based on SaaS cloud service
JP5271952B2 (en) Server apparatus, evaluation method, and evaluation program
CN103646066B (en) Method for selecting credible web services based on qualitative quantitative user preference
CN106651410A (en) Application management method and application management device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant