WO2018161719A1 - 一种基于地域特征向用户推荐文章的方法和装置 - Google Patents

一种基于地域特征向用户推荐文章的方法和装置 Download PDF

Info

Publication number
WO2018161719A1
WO2018161719A1 PCT/CN2018/071961 CN2018071961W WO2018161719A1 WO 2018161719 A1 WO2018161719 A1 WO 2018161719A1 CN 2018071961 W CN2018071961 W CN 2018071961W WO 2018161719 A1 WO2018161719 A1 WO 2018161719A1
Authority
WO
WIPO (PCT)
Prior art keywords
library
region
article
keyword
regional
Prior art date
Application number
PCT/CN2018/071961
Other languages
English (en)
French (fr)
Inventor
潘岸腾
Original Assignee
广州优视网络科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州优视网络科技有限公司 filed Critical 广州优视网络科技有限公司
Publication of WO2018161719A1 publication Critical patent/WO2018161719A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • G06F16/337Profile generation, learning or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries

Definitions

  • the present invention relates to the field of information processing technologies, and in particular, to a method, an apparatus, a computing device, and a storage medium for recommending articles to users based on geographical features.
  • the existing geographical recommendations are passive recommendations, that is, when the user reads the regional column, the information of this column is recommended to the user, and the recommendation method is no more to recommend the article with the highest pageview or the highest recommendation rate.
  • the current article recommendation method can not bring a good experience to users, such as a user who has worked in Shanghai for a long time but his hometown is in Guangzhou, want to know the information of his hometown of Guangzhou, can only log in to the website about Guangzhou, check the Guangzhou Information, and the article that receives the recommendation is only a so-called popular article, but it is not necessarily the information that the user wants to care about and understand.
  • An embodiment of the present invention provides a method for recommending an article to a user based on a regional feature, including:
  • the degree of matching between the article and the region is determined according to the geographical feature degree of the article, the pre-established regional library and the preset regional keyword library;
  • the pre-established regional library includes: a country name of the country, a domain name of each level under the jurisdiction of the state, and a affiliation relationship between the domain names of the respective levels, and a weight between the geographical affiliation.
  • the method for establishing the regional library includes: according to each country's administrative division method for the country, the domain name and the affiliation relationship from the country name of the country to the minimum administrative region are included, and the regional average weight method is used to determine the geographical affiliation.
  • the weight between the relationships that is, the ratio of the number of each lower-level region to the number of all lower-level regions directly subordinate to the upper-level region as the weight of the direct subordinate regional membership; the two regions of the multi-level region
  • the weight between the two is the product of the weight multiplication of the affiliation of the corresponding multiple upper and lower levels.
  • the pre-established regional keyword library includes: one or more keywords indicating each domain name, and an association between the one or more keywords and the corresponding domain name, wherein the circle is used to indicate
  • the rules of one or more keywords for each domain name include, but are not limited to: 1. The official name of each region; 2. A recognized nickname that can represent a region; 3. A representative landmark of a region. Or scenic area.
  • the geographical feature degree of the article in the existing article library can be extracted by the following formula:
  • p a,t represents the geographical feature degree of the article a in the existing article library for the keyword t in the preset region keyword library
  • n a,t represents the number of times the keyword t in the preset regional keyword library appears in the article a in the existing article library;
  • the matching degree can be determined by the following formula:
  • s a,i indicates the degree of matching between the article a in the existing article library and the region i in the regional library
  • R represents a collection of all the regions in the preset regional library
  • T represents a set of all keywords in the preset regional keyword library
  • p a,t represents the geographical feature degree of the article a in the existing article library for the keyword t in the preset region keyword library
  • t F, t i represents the region with a predetermined keyword preset geographical area i keyword library library is associated, the value 1 and 0, the value 1 when the keyword t i associated with the region, and vice versa take The value is 0;
  • f t,j indicates whether the keyword t in the preset region keyword library is associated with the region j in the preset region library, and takes values 1 and 0.
  • the value is 1;
  • the value is 0;
  • w j,i denotes the weight of the region i in the preset region library belonging to the region j, and w j,i is 0 when the region i and the region j have no membership relationship.
  • the step of acquiring the area information associated with the user includes: acquiring the area information associated with the user by using the IP address of the user network, or acquiring the area information associated with the user by using the positioning function of the smart mobile terminal, or by using the user The permanent address provided at the time of registration to obtain the geographic information associated with the user.
  • a certain number of corresponding articles are recommended to the user in a preset manner, and the matching degree corresponding to the preset threshold is greater than or equal to
  • the article randomly selects a certain number of corresponding articles to recommend to the user; or selects a certain number of corresponding articles to recommend to the user according to the matching degree from large to small.
  • the selected number of corresponding articles are first sorted according to certain conditions, and then the plurality of articles ranked in the previous are preferentially recommended to the user.
  • An embodiment of the present invention further provides an apparatus for recommending an article to a user based on a regional feature, including:
  • the article geographic feature degree extracting unit is configured to extract the geographic feature degree of the article in the existing article library
  • a matching degree determining unit configured to determine a matching degree between the article and the region according to the geographical feature degree of the article, the pre-established regional library, and the preset regional keyword library;
  • the recommendation unit is configured to obtain the geographical information associated with the user, and use the matching degree between the region and the article in the article library to select a certain number of corresponding articles to recommend to the user according to a preset manner.
  • the device further comprises:
  • a regional library establishing unit configured to pre-establish a regional library
  • the regional library includes: a country name of the country, a domain name of each level under the jurisdiction of the state, and a affiliation relationship between the domain names of the respective levels, and the geographical affiliation Between the weights;
  • a regional keyword library establishing unit configured to pre-establish a regional keyword library, the regional keyword library comprising: one or more keywords indicating each domain name, and the one or more keywords and corresponding regions The relevance of the name.
  • the article geographic feature degree extracting unit may extract the geographical feature degree of the article in the existing article library by using the following formula:
  • p a,t represents the geographical feature degree of the article a in the existing article library for the keyword t in the preset region keyword library
  • n a,t represents the number of times the keyword t in the preset regional keyword library appears in the article a in the existing article library;
  • the matching degree can be determined by the following formula:
  • s a,i indicates the degree of matching between the article a in the existing article library and the region i in the regional library
  • R represents a collection of all the regions in the preset regional library
  • T represents a set of all keywords in the preset regional keyword library
  • p a,t represents the geographical feature degree of the article a in the existing article library for the keyword t in the preset region keyword library
  • f t,i indicates whether the keyword t in the preset region keyword library is associated with the region i in the preset region library, and takes values 1 and 0.
  • the value is 1;
  • the value is 0;
  • f t,j indicates whether the keyword t in the preset region keyword library is associated with the region j in the preset region library, and takes values 1 and 0.
  • the value is 1;
  • the value is 0;
  • w j, i represents the weight of the preset geographical Curry geographical area j i belong to the heavy, geographical area i and j when no affiliation relationship w j, i is 0.
  • the recommendation unit is configured to obtain the area information associated with the user by using the IP address of the user network, or obtain the area information associated with the user by using the positioning function of the smart mobile terminal, or obtain the permanent address provided by the user when registering.
  • the geographic information associated with the user is configured to obtain the area information associated with the user by using the IP address of the user network, or obtain the area information associated with the user by using the positioning function of the smart mobile terminal, or obtain the permanent address provided by the user when registering.
  • the geographic information associated with the user is configured to obtain the area information associated with the user by using the IP address of the user network, or obtain the area information associated with the user by using the positioning function of the smart mobile terminal, or obtain the permanent address provided by the user when registering.
  • the recommendation unit is configured to randomly select a certain number of corresponding articles from the plurality of articles corresponding to the matching degree greater than or equal to the preset threshold to recommend to the user; or select a certain number according to the matching degree from large to small.
  • the corresponding article is recommended to the user.
  • the recommendation unit is further configured to first sort the selected number of corresponding articles by certain conditions, and then preferentially recommend the plurality of articles ranked in the front to the user.
  • a method and apparatus for recommending an article to a user based on a geographical feature can be based on the degree of matching between the article and the region, and based on the geographical feature of the user without the user entering the regional column Users recommend relevant articles, even articles that are consistent with the user's regional characteristics and are popular, greatly improving the user's experience.
  • An embodiment of the present invention provides a computing device, including: at least one processor, at least one memory, and computer program instructions stored in the memory, when the computer program instructions are executed by the processor, implementing the regional feature based on the foregoing embodiment.
  • the method by which the user recommends the article is not limited to: at least one processor, at least one memory, and computer program instructions stored in the memory, when the computer program instructions are executed by the processor, implementing the regional feature based on the foregoing embodiment. The method by which the user recommends the article.
  • Embodiments of the present invention provide a computer readable storage medium having stored thereon computer program instructions that, when executed by a processor, implement a method of recommending an article to a user based on a geographic feature as in the above-described embodiments.
  • FIG. 1 is a flowchart of a method for recommending an article to a user based on a regional feature according to an embodiment of the present invention.
  • FIG. 2 is a schematic block diagram of an apparatus for recommending an article to a user based on a regional feature according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a computing device according to an embodiment of the present invention.
  • FIG. 1 is a flowchart of a method for recommending an article to a user based on a regional feature according to an embodiment of the present invention. As shown in FIG. 1, the method for recommending an article to a user based on a geographical feature of the present invention includes the following steps:
  • p a,t represents the geographical feature degree of the article a in the existing article library for the keyword t in the preset region keyword library
  • n a, t represents the number of preset keyword geographical keyword library t have articles appear in a library of articles
  • each keyword t in the pre-established regional keyword library is searched, and the number of times each keyword t appears in the article a is counted;
  • the word segmentation technique performs word segmentation on the article a to obtain the total number of word segments.
  • Geographical characteristics of the region The pre-established regional libraries and pre-established regional keyword libraries described herein can be pre-established based on geographic knowledge and administrative management systems.
  • a regional library may be pre-established, that is, a database related to the geographical information, the regional library including: the country name of the country, the domain name of each level under the jurisdiction of the country, and the affiliation between the domain names of the respective levels, and the The weight between geographical affiliations.
  • the domain names and their affiliation from one country's country name to the smallest administrative region may be included according to each country's administrative division method.
  • the regional average weight method is used to determine the weight between the geographical affiliation, that is, the ratio of the number of each lower-level region to the number of all lower-level regions directly subordinate to the upper-level region is used as the weight of the direct subordinate regional affiliation
  • the weight between the two regions of the multi-level region is the product of the multiplication of the weights of the affiliation relationships of the corresponding plurality of direct subordinate regions.
  • China's smallest administrative area is township, town, street, etc.
  • the regional library includes: 1) country name: China, the domain name of each level under the jurisdiction of the country and each The affiliation between the domain names of the level: such as China ⁇ Guangdong City ⁇ Guangzhou City ⁇ Baiyun District ⁇ Renhe Town.
  • Guangdong province governs 21 prefecture-level cities (Guangzhou, Shenzhen, Foshan, etc.), then each prefecture-level city is in Guangdong province.
  • the affiliation weight is 1/21 ⁇ 0.048; for example, Guangzhou has jurisdiction over 11 municipal districts, and the weight of the affiliation of each municipality under the jurisdiction of Guangzhou is 1/11 ⁇ 0.091 for Guangzhou City; The district has 22 sub-districts and towns (under 18 jurisdictions and 4 towns), and each sub-district or town (minimum administrative area) belonging to Baiyun District has a weight of 1/22 ⁇ 0.045 for Baiyun District.
  • the domain library keyword includes: one or more keywords indicating each domain name, and the one or more keywords The relevance of the word to the corresponding domain name.
  • the rules for delimiting one or more keywords for each domain name include but are not limited to: 1.
  • the official name of each region for example, including: country name, province name, city name, district name, county Name, township street name, etc.; 2, can represent a recognized name of a region; 3, a representative landmark or scenic area.
  • each keyword can only be associated with one region, and multiple keywords can be associated with one region, but one keyword is prohibited from being associated with multiple regions.
  • the regional keyword library can be established, including: the association between the region name, the keyword, the keyword, and the region name.
  • the domain name 1 Guangdong province, keyword 1: Guangdong; Keyword 2: Guangdong (referred to as “Guangdong” in the geographical area of Guangdong province), and established the keywords “Guangdong” and “Yue” and regional Guangdong Provincial relevance.
  • Domain Name 2 Guangzhou
  • Keyword 1 Guangzhou
  • Keyword 2 Wuyangcheng (Wuyangcheng) is recognized as Guangzhou)
  • Keyword 3 Xiaomanwa (Guangzhou's representative landmark: Guangzhou New TV Tower , nicknamed the small waist, and established the key words "Guangzhou", “Wuyang City” and "small waist” and the geographical relationship of Guangzhou.
  • S2 Determine the matching degree between the article and the region according to the geographical feature degree of the article, the pre-established regional library and the regional keyword library.
  • the match between the article and the region can be determined by the following formula:
  • s a,i indicates the degree of matching between the article a in the existing article library and the region i in the regional library
  • R represents a collection of all the regions in the preset regional library
  • T represents a set of all keywords in the preset regional keyword library
  • p a,t represents the geographical feature degree of the article a in the existing article library for the keyword t in the preset region keyword library
  • f t,i indicates whether the keyword t in the preset region keyword library is associated with the region i in the preset region library, and takes values 1 and 0.
  • the value is 1;
  • the value is 0;
  • f t,j indicates whether the keyword t in the preset region keyword library is associated with the region j in the preset region library, and takes values 1 and 0.
  • the value is 1;
  • the value is 0;
  • w j,i denotes the weight of the region i in the preset region library belonging to the region j, and w j,i is 0 when the region i and the region j have no membership relationship.
  • f t,i indicates whether any keyword t is associated with any region i, i ⁇ R,t ⁇ T, that is, any region i is located in the set of all regions in the preset region library, and any keyword t is located in the pre-pre- Set within the collection of all keywords in the regional keyword library.
  • i represents the domain name of Guangzhou
  • the keyword t is Guangzhou or Wuyangcheng
  • the first part of the calculation yields the similarity between any article and all the regions in the preset regional library; the second part of the calculation considers the influence of the geographical affiliation on the similarity, that is, the weight of the geographical affiliation To calculate the influence of regional affiliation on similarity, the two parts of the calculation results are added to get the matching degree between the article and the region.
  • S3 Obtain the geographical information associated with the user, and select a certain number of corresponding articles to recommend to the user according to a matching manner between the region and the article in the article library.
  • a certain number of corresponding articles may be randomly selected from the plurality of articles corresponding to the matching degree greater than or equal to the preset threshold to be recommended to the user.
  • the degree of matching between the article and the region may be utilized and a certain number of corresponding articles may be selected and recommended to the user according to the degree of matching from large to small.
  • the domain information associated with the user is obtained, for example, the domain information associated with the user is obtained through the IP address of the user network according to the actual application scenario, and the location of the smart mobile terminal can be obtained.
  • the function is to obtain the geographical information associated with the user, and obtain the geographical information associated with the user through the resident address provided by the user when registering, and use the obtained matching degree between the region and the article in the article library to be greater than or equal to
  • the plurality of articles corresponding to the matching degree of the preset threshold randomly select a certain number of corresponding articles to be recommended to the user, or select a certain number of corresponding articles to be recommended to the user according to the matching degree from the largest to the smallest, for example, selecting the top ranked first - 5 articles or 5-20 articles or more are recommended for users.
  • the preset threshold can be arbitrarily set as needed in practice.
  • a certain number of articles selected may be further prioritized, that is, a certain number of corresponding articles selected are first sorted according to certain conditions, and then the plurality of articles ranked in the first place are preferentially recommended to the user. For example, if the number of selected articles is more than 50, 100 or more, these selected articles can be further preferentially sorted in order to preferentially recommend articles that are both geographically popular and popular. . Therefore, in a preferred embodiment, a certain number of corresponding articles are randomly selected from a plurality of articles corresponding to the matching degree greater than or equal to the preset threshold, or a certain number is selected according to the matching degree from the largest to the smallest.
  • Corresponding articles such as 100-500 articles or more articles, further sort the articles according to certain conditions, for example, they can be sorted in descending order according to the page views of the articles; they can be arranged in descending order according to the click rate of the articles; The rating is sorted in descending order; or other similar methods, and then the first 1-5 or 5-20 or more articles are ranked first and recommended to the user.
  • the related article can be recommended to the user based on the geographical feature of the user by finding the matching degree between the article and the region, or even conforming to the geographical feature of the user.
  • the popular article has greatly improved the user experience.
  • FIG. 2 is a schematic block diagram of an apparatus for recommending an article to a user based on a regional feature according to an embodiment of the present invention.
  • the apparatus for recommending an article to a user based on a geographical feature of the present invention includes:
  • the article geographic feature degree extracting unit is configured to extract the geographic feature degree of the article in the existing article library
  • a matching degree determining unit configured to determine a matching degree between the article and the region according to the geographical feature degree of the article, the pre-established regional library, and the preset regional keyword library;
  • the recommendation unit is configured to obtain the geographical information associated with the user, and select a certain number of corresponding articles to recommend to the user by using a matching degree between the region and the article in the article library.
  • the device for recommending an article to a user based on the geographical feature of the present invention further includes:
  • a regional library establishing unit configured to pre-establish a regional library
  • the regional library includes: a country name of the country, a domain name of each level under the jurisdiction of the state, and a affiliation relationship between the domain names of the respective levels, and the geographical affiliation Between the weights;
  • a regional keyword library establishing unit configured to pre-establish a regional keyword library, the regional keyword library comprising: one or more keywords indicating each domain name, and the one or more keywords and corresponding regions The relevance of the name.
  • the method for establishing the regional library by the regional library establishing unit includes: according to each country's administrative division method for the country, the domain name and the affiliation relationship from the country name of the country to the minimum administrative region are included, and the regional average weight is used.
  • the law determines the weight between the geographical affiliation, that is, the ratio of the number of each lower-level region to the number of all lower-level regions directly subordinate to the upper-level region as the weight of the direct subordinate regional affiliation; multi-level region
  • the weight between the two regions is the product of the multiplication of the weights of the corresponding plurality of direct subordinate regions.
  • the regional keyword library establishing unit executes rules for indicating one or more keywords of each local domain name including but not limited to: 1. an official name of each region; 2. a recognized other name that can represent a region; 3. A representative landmark or scenic spot in a region.
  • the geographical feature degree of the article in the existing article library is extracted by the following formula:
  • p a,t represents the geographical feature degree of the article a in the existing article library for the keyword t in the preset region keyword library
  • n a,t represents the number of times the keyword t in the preset regional keyword library appears in the article a in the existing article library;
  • the degree of matching is determined by the following formula:
  • s a,i indicates the degree of matching between the article a in the existing article library and the region i in the regional library
  • R represents a collection of all the regions in the preset regional library
  • T represents a set of all keywords in the preset regional keyword library
  • p a,t represents the geographical feature degree of the article a in the existing article library for the keyword t in the preset region keyword library
  • f t,i indicates whether the keyword t in the preset region keyword library is associated with the region i in the preset region library, and takes values 1 and 0.
  • the value is 1;
  • the value is 0;
  • f t,j indicates whether the keyword t in the preset region keyword library is associated with the region j in the preset region library, and takes values 1 and 0.
  • the value is 1;
  • the value is 0;
  • w j,i denotes the weight of the region i in the preset region library belonging to the region j, and w j,i is 0 when the region i and the region j have no membership relationship.
  • the recommending unit is configured to acquire the area information associated with the user by using the IP address of the user network, or obtain the area information associated with the user by using the positioning function of the smart mobile terminal, or by using the resident address provided when the user registers. Get the geographic information associated with the user.
  • the recommendation unit selects a certain number of corresponding articles to recommend to the user in a preset manner by using the matching degree between the region and the article in the article library, and the matching degree is greater than or equal to the preset threshold.
  • a corresponding number of corresponding articles are randomly selected from the corresponding articles, and a certain number of corresponding articles are recommended to the user according to the matching degree from large to small.
  • the recommendation unit further selects a certain number of articles to be selected in the process of using a matching degree between the region and the article in the article library to select a certain number of corresponding articles to the user in a preset manner, and further The preferred ordering is that the selected number of corresponding articles are first sorted according to certain conditions, and then the user is recommended to sort the multiple articles in the previous. For example, the first 1-5 articles or 5-20 articles or more will be preferentially recommended to the user.
  • the related article can be recommended to the user based on the geographical feature of the user by finding the matching degree between the article and the region, or even conforming to the geographical feature of the user.
  • the popular article has greatly improved the user experience.
  • FIG. 3 is a schematic structural diagram of a computing device according to an embodiment of the present invention.
  • the computing device can include a processor 301 and a memory 302 that stores computer program instructions.
  • the processor 301 may include a central processing unit (CPU), or an application specific integrated circuit (ASIC), or may be configured to implement one or more integrated circuits of embodiments of the present invention.
  • CPU central processing unit
  • ASIC application specific integrated circuit
  • Memory 302 can include mass storage for data or instructions.
  • the processor 301 implements a method of recommending an article to a user based on a geographical feature by reading and executing computer program instructions stored in the memory 302.
  • the computing device can also include a communication interface 303 and a bus 310.
  • the processor 301, the memory 302, and the communication interface 303 are connected by the bus 310 and complete communication with each other.
  • a computer program product for a method for recommending an article to a user based on a regional feature comprising a computer readable storage medium storing program code, the program code comprising instructions for executing the foregoing method embodiment
  • program code comprising instructions for executing the foregoing method embodiment
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present invention which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • a number of instructions are used to cause a computer device (which may be a personal computer, smart tablet, smartphone, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes various media that can store program codes, such as a USB flash drive, a removable hard disk, a read only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种基于地域特征向用户推荐文章的方法和装置,所述方法包括:提取已有文章库里的文章的地域特征度(S1);根据文章的地域特征度、预先已建立的地域库和预设地域关键词库来确定文章与地域之间的匹配度(S2);获取与用户关联的地域信息,利用该地域与文章库里的文章之间的匹配度,按预设方式选取一定数量的相应文章推荐给用户(S3)。

Description

一种基于地域特征向用户推荐文章的方法和装置 技术领域
本发明涉及信息处理技术领域,具体而言涉及一种基于地域特征向用户推荐文章的方法、装置、计算设备及存储介质。
背景技术
随着通信网络的推广和智能终端的普及,人们越来越习惯使用电子产品进行阅读。例如,在电子计算机上登录新闻网站或者小说网站阅读各种新闻或小说,也可以登录网上图书馆来阅读各种图书。再例如,使用智能手机或平板电脑等智能移动终端上安装的第三方应用来实现阅读,如新闻类的“今日头条”、小说类的“书旗小说”、还有其它期刊类的APP等等。
面对体现产品个性化的市场需求,很多阅读产品都需要提供良好的搜索功能和推荐功能。尤其是智能移动终端,由于屏幕尺寸大小和硬件性能的限制,阅读产品的搜索功能受到一定限制,不像电子计算机提供的搜索功能那样强大。为了弥补这样的不足,更为了让用户不用花费太多时间自己查询资源,很多第三方应用都具有推荐功能,向用户推荐热门文章,例如向用户推荐热门新闻,如上了今日头条的新闻。在实行推荐的各类应用场景中,有一种场景是基于用户所在地域进行文章推荐,例如:在某些新闻类应用中、在提供旅游信息的应用中,都会有一个地域专栏,提供各个地域的新闻、各个地域的旅游信息等。
但是,现有的地域推荐都是被动式推荐,即当用户阅读地域专栏时才会向用户推荐本专栏的信息,而推荐方法也无非采用向用户推荐浏览量最高的文章或者推荐点赞率最高的文章等方式,目前的文章推荐方法不能给用户带来很好的体验,例如一个久在上海工作但家乡在广州的用户,想了解家乡广州的信息,只能登录关于广州的网站,查询广州的信息,而接收 到推荐的文章也仅仅所谓的热门文章,但未必是该用户想要关心和了解的信息。
发明内容
本发明的目的在于提供一种基于地域特征向用户推荐文章的方法、装置、计算设备及存储介质,以改善上述问题。
本发明实施例提供了一种基于地域特征向用户推荐文章的方法,其包括:
提取已有文章库里的文章的地域特征度;
根据文章的地域特征度、预先已建立的地域库和预设地域关键词库来确定文章与地域之间的匹配度;
获取与用户关联的地域信息,利用该地域与文章库里的文章之间的匹配度,按预设方式选取一定数量的相应文章推荐给用户。
其中,所述预先已建立的地域库包括:国家的国名、该国家管辖的各个级别的地域名、以及各个级别的地域名之间的隶属关系、和所述地域隶属关系之间的权重。
其中,建立所述地域库的方法包括:根据每个国家对本国的行政区域划分方法来收录从一个国家的国名到最小行政区域的地域名及其隶属关系,使用区域平均权重法来确定地域隶属关系之间的权重,即用每个下一级区域与直接隶属于上一级区域的所有下一级区域的数量的比值作为直接上下级区域隶属关系的权重;多级区域的两个区域之间的权重为相应多个直接上下级区域隶属关系的权重相乘之积。
其中,所述预先已建立的地域关键词库包括:表示每个地域名的1个或多个关键词,和该1个或多个关键词与对应地域名的关联性,其中圈定用于表示每个地域名的1个或多个关键词的规则包括但不限于:1、每个地域的正式名称;2、能够代表一个地域的公认的别称;3、一个地域有代表 性的标志性建筑或风景区。
其中,可以通过下述公式提取已有文章库里文章的地域特征度:
Figure PCTCN2018071961-appb-000001
其中:
p a,t表示已有文章库里的文章a对预设地域关键词库里的关键词t的地域特征度;
n a,t表示已有文章库里的文章a中出现预设地域关键词库里的关键词t的次数;
l a表示已有文章库里的文章a经过分词处理得到的分词数量。
其中,可以通过下述公式确定匹配度:
Figure PCTCN2018071961-appb-000002
其中:
s a,i表示已有文章库里的文章a与地域库里的地域i的匹配度;
R表示预设地域库中的所有地域的集合;
T表示预设地域关键词库中的所有关键词的集合;
p a,t表示已有文章库里的文章a对预设地域关键词库里的关键词t的地域特征度;
f t,i表示预设地域关键词库里的关键词t与预设地域库里的地域i是否关联,取值1和0,当关键词t与地域i关联时取值为1,反之取值为0;
f t,j表示预设地域关键词库里的关键词t与预设地域库里的地域j是否关联,取值1和0,当关键词t与地域j关联时取值为1,反之取值为0;
w j,i表示预设地域库里的地域i隶属于地域j的权重,地域i和地域j没有隶属关系时w j,i为0。
其中,在获取与用户关联的地域信息的步骤中,包括:通过用户联网 的IP地址获取与用户关联的地域信息,或者通过智能移动终端的定位功能来获取与用户关联的地域信息,或者通过用户注册时提供的常住地址来获取与用户关联的地域信息。
其中,在利用该地域与文章库里的文章之间的匹配度,按预设方式选取一定数量的相应文章推荐给用户的步骤中,从大于或等于预设阈值的匹配度所对应的多篇文章中随机选取一定数量的相应文章推荐给用户;或者按照匹配度从大到小顺序选取一定数量的相应文章推荐给用户。
优选地,对选取的一定数量的相应文章首先按一定条件进行排序,然后优先向用户推荐排序在前面的多篇文章。
本发明实施例还提供了一种基于地域特征向用户推荐文章的装置,其包括:
文章地域特征度提取单元,用于提取已有文章库里的文章的地域特征度;
匹配度确定单元,用于根据文章的地域特征度、预先已建立的地域库和预设地域关键词库来确定文章与地域之间的匹配度;
推荐单元,用于获取与用户关联的地域信息,利用该地域与文章库里的文章之间的匹配度,按预设方式选取一定数量的相应文章推荐给用户。
其中,所述装置还包括:
地域库建立单元,用于预先建立一个地域库,该地域库包括:国家的国名、该国家管辖的各个级别的地域名、以及各个级别的地域名之间的隶属关系、和所述地域隶属关系之间的权重;和
地域关键词库建立单元,用于预先建立一个地域关键词库,该地域关键词库包括:表示每个地域名的1个或多个关键词,和该1个或多个关键词与对应地域名的关联性。
其中,所述文章地域特征度提取单元可以通过下述公式提取已有文章库里文章的地域特征度:
Figure PCTCN2018071961-appb-000003
其中:
p a,t表示已有文章库里的文章a对预设地域关键词库里的关键词t的地域特征度;
n a,t表示已有文章库里的文章a中出现预设地域关键词库里的关键词t的次数;
l a表示已有文章库里的文章a经过分词处理得到的分词数量。
其中,可以通过下述公式确定匹配度:
Figure PCTCN2018071961-appb-000004
其中:
s a,i表示已有文章库里的文章a与地域库里的地域i的匹配度;
R表示预设地域库中的所有地域的集合;
T表示预设地域关键词库中的所有关键词的集合;
p a,t表示已有文章库里的文章a对预设地域关键词库里的关键词t的地域特征度;
f t,i表示预设地域关键词库里的关键词t与预设地域库里的地域i是否关联,取值1和0,当关键词t与地域t关联时取值为1,反之取值为0;
f t,j表示预设地域关键词库里的关键词t与预设地域库里的地域j是否关联,取值1和0,当关键词t与地域j关联时取值为1,反之取值为0;
w j,i表示预设地域库里的地域i隶属于地域j的权重,地域i和地域j没有隶属关系时w j,i为0。
其中,所述推荐单元用于通过用户联网的IP地址获取与用户关联的地域信息,或者通过智能移动终端的定位功能来获取与用户关联的地域信息,或者通过用户注册时提供的常住地址来获取与用户关联的地域信息。
优选地,所述推荐单元用于从大于或等于预设阈值的匹配度所对应的多篇文章中随机选取一定数量的相应文章推荐给用户;或者按照匹配度从大到小顺序选取一定数量的相应文章推荐给用户。
优选地,所述推荐单元还用于对选取的一定数量的相应文章首先按一定条件进行排序,然后优先向用户推荐排序在前面的多篇文章。
根据本发明的一种基于地域特征向用户推荐文章的方法和装置,可以通过找出的文章与地域之间的匹配度,在用户没有进入地域专栏的情况下,基于用户的地域特征来向给用户推荐相关的文章,甚至是既符合用户的地域特征又广受欢迎的文章,大大提高了用户的体验感。
本发明实施例提供了一种计算设备,包括:至少一个处理器、至少一个存储器以及存储在存储器中的计算机程序指令,当计算机程序指令被处理器执行时实现如上述实施方式中基于地域特征向用户推荐文章的方法。
本发明实施例提供了一种计算机可读存储介质,其上存储有计算机程序指令,当计算机程序指令被处理器执行时实现如上述实施方式中基于地域特征向用户推荐文章的方法。
附图说明
图1是本发明实施例提供的基于地域特征向用户推荐文章的方法的流程图。
图2是本发明实施例提供的基于地域特征向用户推荐文章的装置的示意性框图。
图3是本发明实施例提供的计算设备的示意性结构图。
具体实施方式
下面将结合本发明实施例和附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本发明一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本发明实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本发 明的实施例的详细描述并非旨在限制要求保护的本发明的范围,而是仅仅表示本发明的选定实施例。基于本发明的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本发明保护的范围。
图1是本发明实施例提供的基于地域特征向用户推荐文章的方法的流程图。如图1所示,本发明的基于地域特征向用户推荐文章的方法包括以下的步骤:
S1:提取已有文章库里的文章的地域特征度。
可以通过下述公式提取所述文章的地域特征度:
Figure PCTCN2018071961-appb-000005
其中:
p a,t表示已有文章库里的文章a对预设地域关键词库里的关键词t的地域特征度;
n a,t表示已有文章库里的文章a中出现预设地域关键词库里的关键词t的次数;
l a表示已有文章库里的文章a经过分词处理得到的分词数量。
即,在已有文章库里的文章a中搜索在预先已建立的地域关键词库里的每个关键词t,统计出每个关键词t在文章a中出现的次数;还可以采用任意公知的分词技术对所述文章a进行分词处理,得到分词的总数量。从而,通过预先已建立的地域关键词库的每个地域关键词在一篇文章里出现的次数百分比作为该文章的地域特征度,可以得到该文章针对预先已建立的地域库里记录的每个地域的地域特征度。这里所述的对预先已建立的地域库和预先已建立的地域关键词库可以根据地理知识和行政管理制度来预先建立。
可以预先建立一个地域库,即一个和地域信息有关的数据库,该地域库包括:国家的国名、该国家管辖的各个级别的地域名、以及各个级别的地域名之间的隶属关系、和所述地域隶属关系之间的权重。
可以建立中国的地域库,还可以建立包含全世界范围内任意多个国家的地域库。可以根据每个国家对本国的行政区域划分方法来收录从一个国家的国名到最小行政区域的地域名及其隶属关系。使用区域平均权重法来确定地域隶属关系之间的权重,即用每个下一级区域与直接隶属于上一级区域的所有下一级区域的数量的比值作为直接上下级区域隶属关系的权重;多级区域的两个区域之间的权重为相应多个直接上下级区域隶属关系的权重相乘之积。以中国为例,中国的最小行政区域是乡、镇、街道等,所以建立中国的地域库时,该地域库包括:1)国家的国名:中国,该国家管辖的各个级别的地域名以及各个级别的地域名之间的隶属关系:如中国→广东省→广州市→白云区→人和镇。在使用区域平均权重法来确定地域隶属关系之间的权重时,例如广东省管辖21个地级市(广州市、深圳市、佛山市¥……),则每个地级市对广东省的隶属关系权重为1/21≈0.048;再如,广州市下辖11个市辖区,则隶属于广州市的每个市辖区对于广州市的隶属关系权重为1/11≈0.091;再如,白云区下辖22个街道和镇(下辖18个街道和4个镇),则隶属于白云区的每个街道或镇(最小行政区域)对白云区的隶属关系权重为1/22≈0.045。而多级区域的两个区域之间的权重为相应多个上下级区域隶属关系的权重相乘之积的意思是:再以上述例子中国→广东省→广州市→白云区→人和镇为例,人和镇对广州市的隶属关系权重为:0.091*0.045=0.0041;白云区对广东省的隶属关系权重为:0.048*0.091=0.0044;人和镇对广东省的隶属关系权重为:0.048*0.091*0.045=0.0002。
还可以采用人口比例的方法来确定地域隶属关系之间的权重。例如:广州市常住人口1350万,广州市隶属于广东省,广东省常住人口1.08亿,那么广州市对广东省的隶属关系权重为0.135/1.08=0.125。当然,因为人口流动现象,像一些广州市、上海市、北京市等大城市,人口流动频繁,常住人口数量每年都在变动,如果采用人口比例的方法来确定地域隶属关系之间的权重,则至少需要每年或者每隔几年统计各大城市的人口数量,这 无形需要花费额外的成本,因此优选的方式为使用区域平均权重法。
还可以预先建立一个地域关键词库,即一个和地域相关联的关键词数据库,该地域库关键词包括:表示每个地域名的1个或多个关键词,和该1个或多个关键词与对应地域名的关联性。
其中,圈定用于表示每个地域名的1个或多个关键词的规则包括但不限于:1、每个地域的正式名称,例如包括:国家名称、省份名称、城市名称、区名称、县名称、乡镇街道名称等;2、能够代表一个地域的公认的别称;3、一个地域有代表性的标志性建筑或风景区等。在这里,使用每个地域的正式名称作为关键词是必选的,而使用能够代表地域的公认的别称作为关键词和使用对地域有代表性的标志性建筑名称或风景区名称等作为关键词是可选的;当然还可以使用能够对地域有唯一代表性的其他方式的名称作为关键词,这里不再列举。另外,每一个关键词只能与一个地域相关联,可以多个关键词与一个地域相关联,但禁止一个关键词与多个地域相关联。当有了地域名和代表这个地域的1个或多个关键词时,就可以建立所述的地域关键词库,其包括:地域名称、关键词、关键词与地域名称之间的关联性。
下面以具体的例子进行说明。
例如,地域名1:广东省,关键词1:广东;关键词2:粤(在地理学上广东省的简称为“粤”),并且建立了关键词“广东”和“粤”与地域广东省的关联性。地域名2:广州市,关键词1:广州;关键词2:五羊城(五羊城”公认指广州),关键词3:小蛮腰(广州市有代表性的标志性建筑:广州新电视塔,昵称小蛮腰),并且建立了关键词“广州”、“五羊城”和“小蛮腰”与地域广州市的关联性。地域名3:山东省,关键词1:山东;关键词2:鲁(在地理学上山东省的简称为“鲁”);关键词3:泰山(山东省的标志性风景区名称),并且建立了关键词“山东”、“鲁”和“泰山”与地域山东省的关联性。
S2:根据文章的地域特征度、预先已建立的地域库和地域关键词库来确定文章与地域之间的匹配度。
可以通过下述公式确定文章与地域之间的匹配度:
Figure PCTCN2018071961-appb-000006
其中:
s a,i表示已有文章库里的文章a与地域库里的地域i的匹配度;
R表示预设地域库中的所有地域的集合;
T表示预设地域关键词库中的所有关键词的集合;
p a,t表示已有文章库里的文章a对预设地域关键词库里的关键词t的地域特征度;
f t,i表示预设地域关键词库里的关键词t与预设地域库里的地域i是否关联,取值1和0,当关键词t与地域i关联时取值为1,反之取值为0;
f t,j表示预设地域关键词库里的关键词t与预设地域库里的地域j是否关联,取值1和0,当关键词t与地域j关联时取值为1,反之取值为0;
w j,i表示预设地域库里的地域i隶属于地域j的权重,地域i和地域j没有隶属关系时w j,i为0。
在这里f t,i表示任意关键词t与任意地域i是否关联,i∈R,t∈T,即任意地域i位于预设地域库中的所有地域的集合之内,任意关键词t位于预设地域关键词库中的所有关键词的集合之内。例如:当i表示地域名广州市,关键词t为广州或者五羊城时,基于上述的预先建立的多个关键词与对应地域的关联性,此时的关键词t与地域i相关联,f t,i=1;如果关键词t为泰山,此时的关键词t与地域i没有关联性,f t,i=0。
由该公式可知:第一部分的计算得出任意一篇文章与预设地域库里的所有地域的相似度;第二部分的计算考虑了地域隶属关系对相似度的影响,即通过地域隶属关系权重来计算出地域隶属关系对相似度的影响,将这两 部分计算结果相加得到文章与地域之间的匹配度。
S3:获取与用户关联的地域信息,利用该地域与文章库里的文章之间的匹配度按预设方式选取一定数量的相应文章推荐给用户。
优选地,可以从大于或等于预设阈值的匹配度所对应的多篇文章中随机选取一定数量的相应文章推荐给用户。
优选地,可以利用文章与该地域之间的匹配度并且按照匹配度从大到小顺序选取一定数量的相应文章推荐给用户。
在得到每篇文章与各个地域之间的匹配度后,获取与用户关联的地域信息,例如根据实际应用场景,通过用户联网的IP地址获取与用户关联的地域信息,可以通过智能移动终端的定位功能来获取与用户关联的地域信息,可以通过用户注册时提供的常住地址来获取与用户关联的地域信息,利用所得到的该地域与文章库里的文章之间的匹配度,从大于或等于预设阈值的匹配度所对应的多篇文章中随机选取一定数量的相应文章推荐给用户,或者按照匹配度从大到小顺序选取一定数量的相应文章推荐给用户,例如选取排名在前的1-5篇或5-20篇或更多等文章推荐给用户。而所述的预设阈值可以在实践中根据需要任意设置。
另外,还可以对选取的一定数量的文章做进一步的优先排序,即对选取的一定数量的相应文章首先按一定条件进行排序,然后优先向用户推荐排序在前面的多篇文章。例如,如果选取的文章数量比较多,超过50篇、100篇或者更多时,还可以对这些选取的这些文章做进一步的优选排序,以便向用户优先推荐既符合地域特征又广受欢迎的文章。由此,在一个优选实施例中,对从大于或等于预设阈值的匹配度所对应的多篇文章中随机选取出一定数量的相应文章,或者按照匹配度从大到小顺序选取出一定数量的相应文章,例如100-500篇或更多篇文章,进一步对这些文章按一定条件排序,例如,可以根据文章的浏览量进行降序排列;可以根据文章的点击率进行降序排列;可以根据文章的点赞率进行降序排列;或者其他类似 方法,然后将排序在前面的1-5篇或5-20篇或更多等文章优先推荐给用户。
根据本发明的基于地域特征向用户推荐文章的方法,可以通过找出的文章与地域之间的匹配度,基于用户的地域特征来向给用户推荐相关的文章,甚至是既符合用户的地域特征又广受欢迎的文章,大大提高了用户的体验感。
图2是本发明实施例提供的基于地域特征向用户推荐文章的装置的示意性框图。如图2所示,本发明的基于地域特征向用户推荐文章的装置包括:
文章地域特征度提取单元,用于提取已有文章库里的文章的地域特征度;
匹配度确定单元,用于根据文章的地域特征度、预先已建立的地域库和预设地域关键词库来确定文章与地域之间的匹配度;
推荐单元,用于获取与用户关联的地域信息,利用该地域与文章库里的文章之间的匹配度按预设方式选取一定数量的相应文章推荐给用户。
另外,本发明的基于地域特征向用户推荐文章的装置还包括:
地域库建立单元,用于预先建立一个地域库,该地域库包括:国家的国名、该国家管辖的各个级别的地域名、以及各个级别的地域名之间的隶属关系、和所述地域隶属关系之间的权重;和
地域关键词库建立单元,用于预先建立一个地域关键词库,该地域关键词库包括:表示每个地域名的1个或多个关键词,和该1个或多个关键词与对应地域名的关联性。
其中地域库建立单元用于建立所述地域库的方法包括:根据每个国家对本国的行政区域划分方法来收录从一个国家的国名到最小行政区域的地域名及其隶属关系,使用区域平均权重法来确定地域隶属关系之间的权重,即用每个下一级区域与直接隶属于上一级区域的所有下一级区域的数量的 比值作为直接上下级区域隶属关系的权重;多级区域的两个区域之间的权重为相应多个直接上下级区域隶属关系的权重相乘之积。
地域关键词库建立单元执行圈定用于表示每个地域名的1个或多个关键词的规则包括但不限于:1、每个地域的正式名称;2、能够代表一个地域的公认的别称;3、一个地域有代表性的标志性建筑或风景区。
优选地,通过下述公式提取已有文章库里文章的地域特征度:
Figure PCTCN2018071961-appb-000007
其中:
p a,t表示已有文章库里的文章a对预设地域关键词库里的关键词t的地域特征度;
n a,t表示已有文章库里的文章a中出现预设地域关键词库里的关键词t的次数;
l a表示已有文章库里的文章a经过分词处理得到的分词数量。
优选地,通过下述公式确定所述匹配度:
Figure PCTCN2018071961-appb-000008
其中:
s a,i表示已有文章库里的文章a与地域库里的地域i的匹配度;
R表示预设地域库中的所有地域的集合;
T表示预设地域关键词库中的所有关键词的集合;
p a,t表示已有文章库里的文章a对预设地域关键词库里的关键词t的地域特征度;
f t,i表示预设地域关键词库里的关键词t与预设地域库里的地域i是否关联,取值1和0,当关键词t与地域i关联时取值为1,反之取值为0;
f t,j表示预设地域关键词库里的关键词t与预设地域库里的地域j是否关 联,取值1和0,当关键词t与地域j关联时取值为1,反之取值为0;
w j,i表示预设地域库里的地域i隶属于地域j的权重,地域i和地域j没有隶属关系时w j,i为0。
优选地,所述推荐单元用于通过用户联网的IP地址获取与用户关联的地域信息,或者通过智能移动终端的定位功能来获取与用户关联的地域信息,或者通过用户注册时提供的常住地址来获取与用户关联的地域信息。
优选地,所述推荐单元在利用该地域与文章库里的文章之间的匹配度按预设方式选取一定数量的相应文章推荐给用户的过程中,从大于或等于预设阈值的匹配度所对应的多篇文章中随机选取一定数量的相应文章推荐给用户;或者按照匹配度从大到小顺序选取一定数量的相应文章推荐给用户。
优选地,所述推荐单元在利用该地域与文章库里的文章之间的匹配度按预设方式选取一定数量的相应文章推荐给用户的过程中,还可以对选取的一定数量的文章做进一步的优选排序,即对选取的一定数量的相应文章首先按一定条件进行排序,然后优先向用户推荐排序在前面的多篇文章。例如,将排序在前面的1-5篇或5-20篇或更多等文章优先推荐给用户。所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的装置的具体工作过程,可以参考前述方法实施例中的对应过程,前述方法实施例中列举的例子和相关描述,同样适用于解释所描述的装置的工作过程,在此不再重复描述。
根据本发明的基于地域特征向用户推荐文章的装置,可以通过找出的文章与地域之间的匹配度,基于用户的地域特征来向给用户推荐相关的文章,甚至是既符合用户的地域特征又广受欢迎的文章,大大提高了用户的体验感。
图3是本发明实施例提供的计算设备的示意性结构图。如图3所示, 计算设备可以包括处理器301以及存储有计算机程序指令的存储器302。
具体地,处理器301可以包括中央处理器(CPU),或者特定集成电路(Application Specific Integrated Circuit,ASIC),或者可以被配置成实施本发明实施例的一个或多个集成电路。
存储器302可以包括用于数据或指令的大容量存储器。处理器301通过读取并执行存储器302中存储的计算机程序指令,以实现上述实施例中的任意一种基于地域特征向用户推荐文章的方法。
在一个示例中,计算设备还可包括通信接口303和总线310。处理器301、存储器302、通信接口303通过总线310连接并完成相互间的通信。
本发明实施例所提供的一种基于地域特征向用户推荐文章的方法的计算机程序产品,包括存储了程序代码的计算机可读存储介质,所述程序代码包括的指令可用于执行前面方法实施例中所述的方法,具体实现可参见方法实施例,在此不再赘述。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,智能平板电脑,智能手机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM)、随机存取存储器(RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。

Claims (18)

  1. 一种基于地域特征向用户推荐文章的方法,包括:
    提取已有文章库里的文章的地域特征度;
    根据所述文章的地域特征度、预先已建立的地域库和预设地域关键词库来确定文章与地域之间的匹配度;
    获取与用户关联的地域信息,基于用户地域与文章库里的文章之间的匹配度,按预设方式选取一定数量的相应文章推荐给用户。
  2. 根据权利要求1所述的方法,其特征在于,所述预先建立的地域库包括:国家的国名、该国家管辖的各个级别的地域名、以及各个级别的地域名之间的隶属关系、和所述地域隶属关系之间的权重。
  3. 根据权利要求2所述的方法,其特征在于,建立所述地域库的方法包括:
    根据每个国家对本国的行政区域划分方法来收录从一个国家的国名到最小行政区域的地域名及其隶属关系;
    使用区域平均权重法来确定地域隶属关系之间的权重,其中,
    用每个下一级区域与直接隶属于上一级区域的所有下一级区域的数量的比值作为直接上下级区域隶属关系的权重;
    多级区域的两个区域之间的权重为相应多个直接上下级区域隶属关系的权重相乘之积。
  4. 根据权利要求1所述的方法,其特征在于,所述预先已建立的地域关键词库包括:表示每个地域名的1个或多个关键词以及所述1个或多个关键词与对应地域名的关联性,其中,圈定用于表示每个地域名的1个或 多个关键词的规则包括但不限于:
    每个地域的正式名称;
    能够代表一个地域的公认的别称;
    一个地域有代表性的标志性建筑或风景区。
  5. 根据权利要求1所述的方法,其特征在于,通过下述公式提取已有文章库里文章的地域特征度:
    Figure PCTCN2018071961-appb-100001
    其中:
    p a,t表示已有文章库里的文章a对预设地域关键词库里的关键词t的地域特征度;
    n a,t表示已有文章库里的文章a中出现预设地域关键词库里的关键词t的次数;
    l a表示已有文章库里的文章a经过分词处理得到的分词数量。
  6. 根据权利要求1所述的方法,其特征在于,在根据文章的地域特征度、预先已建立的地域库和预设地域关键词库来确定文章与地域之间的匹配度的步骤中,通过下述公式确定所述匹配度:
    Figure PCTCN2018071961-appb-100002
    其中:
    S a,i表示已有文章库里的文章a与地域库里的地域i的匹配度;
    R表示预设地域库中的所有地域的集合;
    T表示预设地域关键词库中的所有关键词的集合;
    p a,t表示已有文章库里的文章a对预设地域关键词库里的关键词t的地域特征度;
    f t,i表示预设地域关键词库里的关键词t与预设地域库里的地域i是否关联,取值1和0,当关键词t与地域i关联时取值为1,反之取值为0;
    f t,j表示预设地域关键词库里的关键词t与预设地域库里的地域j是否关联,取值1和0,当关键词t与地域j关联时取值为1,反之取值为0;
    w j,i表示预设地域库里的地域i隶属于地域j的权重,地域i和地域j没有隶属关系时w j,t为0。
  7. 根据权利要求1所述的方法,其特征在于,所述获取与用户关联的地域信息的步骤包括:
    通过用户联网的IP地址获取与用户关联的地域信息;或者
    通过智能移动终端的定位功能来获取与用户关联的地域信息;或者
    通过用户注册时提供的常住地址来获取与用户关联的地域信息。
  8. 根据权利要求1所述的方法,其特征在于,在所述基于地域与文章库里的文章之间的匹配度按预设方式选取一定数量的相应文章推荐给用户的步骤中,
    从大于或等于预设阈值的匹配度所对应的多篇文章中随机选取一定数量的相应文章推荐给用户;或者
    按照匹配度从大到小顺序选取一定数量的相应文章推荐给用户。
  9. 根据权利要求1-8中任意一项所述的方法,其特征在于,还包括:
    对选取的一定数量的相应文章按一定条件进行排序,优先向用户推荐排序在前面的多篇文章。
  10. 一种基于地域特征向用户推荐文章的装置,包括:
    文章地域特征度提取单元,用于提取已有文章库里的文章的地域特征 度;
    匹配度确定单元,用于根据文章的地域特征度、预先已建立的地域库和预设地域关键词库来确定文章与地域之间的匹配度;
    推荐单元,用于获取与用户关联的地域信息,利用该地域与文章库里的文章之间的匹配度按预设方式选取一定数量的相应文章推荐给用户。
  11. 根据权利要求10所述的装置,其特征在于,还包括:
    地域库建立单元,用于预先建立一个地域库,该地域库包括:国家的国名、该国家管辖的各个级别的地域名、以及各个级别的地域名之间的隶属关系、和所述地域隶属关系之间的权重;和
    地域关键词库建立单元,用于预先建立一个地域关键词库,该地域关键词库包括:表示每个地域名的1个或多个关键词以及所述1个或多个关键词与对应地域名的关联性。
  12. 根据权利要求10所述的装置,其特征在于,所述文章地域特征度提取单元通过下述公式提取已有文章库里文章的地域特征度:
    Figure PCTCN2018071961-appb-100003
    其中:
    p a,t表示已有文章库里的文章a对预设地域关键词库里的关键词t的地域特征度;
    n a,t表示已有文章库里的文章a中出现预设地域关键词库里的关键词t的次数;
    l a表示已有文章库里的文章a经过分词处理得到的分词数量。
  13. 根据权利要求10所述的装置,其特征在于,所述匹配度确定单元通过下述公式确定所述匹配度:
    Figure PCTCN2018071961-appb-100004
    其中:
    S a,i表示已有文章库里的文章a与地域库里的地域i的匹配度;
    R表示预设地域库中的所有地域的集合;
    T表示预设地域关键词库中的所有关键词的集合;
    p a,t表示已有文章库里的文章a对预设地域关键词库里的关键词t的地域特征度;
    f t,i表示预设地域关键词库里的关键词t与预设地域库里的地域i是否关联,取值1和0,当关键词t与地域i关联时取值为1,反之取值为0;
    f t,j表示预设地域关键词库里的关键词t与预设地域库里的地域j是否关联,取值1和0,当关键词t与地域j关联时取值为1,反之取值为0;
    w j,i表示预设地域库里的地域i隶属于地域j的权重,地域i和地域j没有隶属关系时w j,i为0。
  14. 根据权利要求10所述的装置,其特征在于,所述推荐单元用于通过用户联网的IP地址获取与用户关联的地域信息;或者通过智能移动终端的定位功能来获取与用户关联的地域信息;或者通过用户注册时提供的常住地址来获取与用户关联的地域信息。
  15. 根据权利要求10所述的装置,其特征在于,所述推荐单元用于从大于或等于预设阈值的匹配度所对应的多篇文章中随机选取一定数量的相应文章推荐给用户;或者按照匹配度从大到小顺序选取一定数量的相应文章推荐给用户。
  16. 根据权利要求10-15中任意一项所述的装置,其特征在于,所述 推荐单元还用于对选取的一定数量的相应文章按一定条件进行排序,优先向用户推荐排序在前面的多篇文章。
  17. 一种计算设备,其特征在于,包括:至少一个处理器、至少一个存储器以及存储在所述存储器中的计算机程序指令,当所述计算机程序指令被所述处理器执行时实现如权利要求1-9任一项所述的方法。
  18. 一种计算机可读存储介质,其上存储有计算机程序指令,其特征在于,当所述计算机程序指令被处理器执行时实现如权利要求1-9中任一项所述的方法。
PCT/CN2018/071961 2017-03-07 2018-01-09 一种基于地域特征向用户推荐文章的方法和装置 WO2018161719A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710130703.X 2017-03-07
CN201710130703.XA CN106934004A (zh) 2017-03-07 2017-03-07 一种基于地域特征向用户推荐文章的方法和装置

Publications (1)

Publication Number Publication Date
WO2018161719A1 true WO2018161719A1 (zh) 2018-09-13

Family

ID=59424456

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/071961 WO2018161719A1 (zh) 2017-03-07 2018-01-09 一种基于地域特征向用户推荐文章的方法和装置

Country Status (2)

Country Link
CN (1) CN106934004A (zh)
WO (1) WO2018161719A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106934004A (zh) * 2017-03-07 2017-07-07 广州优视网络科技有限公司 一种基于地域特征向用户推荐文章的方法和装置
CN112837106A (zh) * 2019-11-22 2021-05-25 上海哔哩哔哩科技有限公司 商品推荐方法、装置、计算机设备
CN113379481A (zh) * 2021-05-25 2021-09-10 北京大米科技有限公司 一种数据处理方法及装置
CN115049327B (zh) * 2022-08-17 2022-11-15 阿里巴巴(中国)有限公司 数据处理方法、装置、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101651634A (zh) * 2008-08-13 2010-02-17 阿里巴巴集团控股有限公司 提供地域化信息的方法和系统
CN102611785A (zh) * 2011-01-20 2012-07-25 北京邮电大学 面向手机的移动用户个性化新闻主动推荐服务系统及方法
US20130110985A1 (en) * 2011-11-01 2013-05-02 Rahul Shekher Systems and Methods for Geographical Location Based Cloud Storage
CN104077322A (zh) * 2013-03-30 2014-10-01 百度在线网络技术(北京)有限公司 基于问题的地理信息挖掘方法及系统
CN104951543A (zh) * 2015-06-19 2015-09-30 百度在线网络技术(北京)有限公司 通过计算机实现的信息处理方法及装置
CN106934004A (zh) * 2017-03-07 2017-07-07 广州优视网络科技有限公司 一种基于地域特征向用户推荐文章的方法和装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103136300B (zh) * 2011-12-05 2017-02-01 北京百度网讯科技有限公司 一种文本相关主题的推荐方法和装置
CN103678669B (zh) * 2013-12-25 2017-02-08 福州大学 一种社交网络中的社区影响力评估系统及方法
CN104462578A (zh) * 2014-12-29 2015-03-25 北京邮电大学 新闻推送方法
CN106033445B (zh) * 2015-03-16 2019-10-25 北京国双科技有限公司 获取文章关联度数据的方法和装置
CN104915426B (zh) * 2015-06-12 2019-03-26 百度在线网络技术(北京)有限公司 信息排序方法、用于生成信息排序模型的方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101651634A (zh) * 2008-08-13 2010-02-17 阿里巴巴集团控股有限公司 提供地域化信息的方法和系统
CN102611785A (zh) * 2011-01-20 2012-07-25 北京邮电大学 面向手机的移动用户个性化新闻主动推荐服务系统及方法
US20130110985A1 (en) * 2011-11-01 2013-05-02 Rahul Shekher Systems and Methods for Geographical Location Based Cloud Storage
CN104077322A (zh) * 2013-03-30 2014-10-01 百度在线网络技术(北京)有限公司 基于问题的地理信息挖掘方法及系统
CN104951543A (zh) * 2015-06-19 2015-09-30 百度在线网络技术(北京)有限公司 通过计算机实现的信息处理方法及装置
CN106934004A (zh) * 2017-03-07 2017-07-07 广州优视网络科技有限公司 一种基于地域特征向用户推荐文章的方法和装置

Also Published As

Publication number Publication date
CN106934004A (zh) 2017-07-07

Similar Documents

Publication Publication Date Title
TWI564738B (zh) 根據地理位置推薦候選詞的方法和裝置
WO2018161719A1 (zh) 一种基于地域特征向用户推荐文章的方法和装置
US10311478B2 (en) Recommending content based on user profiles clustered by subscription data
Baral et al. Maps: A multi aspect personalized poi recommender system
Arain et al. Intelligent travel information platform based on location base services to predict user travel behavior from user-generated GPS traces
WO2016015468A1 (zh) 数据信息交易方法和系统
WO2014113709A2 (en) Searching and determining active area
CN106415540B (zh) 联合搜索
CN110019645B (zh) 索引库构建方法、搜索方法及装置
CN109726280B (zh) 一种针对同名学者的排歧方法及装置
JP2012500427A (ja) 地理的特性の一致による地域的コンテンツの提供
CN113412608B (zh) 内容推送方法、装置、服务端及存储介质
CN106663100B (zh) 多域查询补全
CN103607496A (zh) 一种推断手机用户兴趣爱好的方法、装置及手机终端
US20230031543A1 (en) Determining Geographic Locations of Network Devices
Hauff et al. Placing images on the world map: a microblog-based enrichment approach
CN107038649B (zh) 一种终端用户的好友推荐方法及装置
Kotzias et al. Home is where your friends are: Utilizing the social graph to locate twitter users in a city
CN108241690A (zh) 一种数据处理方法和装置、一种用于数据处理的装置
Wang et al. Group-based personalized location recommendation on social networks
Jain et al. The evolving ecosystem of predatory journals: a case study in Indian perspective
Phan et al. Collaborative recommendation of photo-taking geolocations
CN108830298B (zh) 一种确定用户特征标签的方法及装置
CN109145307B (zh) 用户画像识别方法、推送方法、装置、设备和存储介质
Zhang et al. Detecting tourist attractions using geo-tagged photo clustering

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18764462

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18764462

Country of ref document: EP

Kind code of ref document: A1