WO2016188347A1 - 网络质量评价方法及装置、网络内容排序方法及系统、计算设备及非暂时性机器可读存储介质 - Google Patents

网络质量评价方法及装置、网络内容排序方法及系统、计算设备及非暂时性机器可读存储介质 Download PDF

Info

Publication number
WO2016188347A1
WO2016188347A1 PCT/CN2016/082376 CN2016082376W WO2016188347A1 WO 2016188347 A1 WO2016188347 A1 WO 2016188347A1 CN 2016082376 W CN2016082376 W CN 2016082376W WO 2016188347 A1 WO2016188347 A1 WO 2016188347A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
score
quality
content
network content
Prior art date
Application number
PCT/CN2016/082376
Other languages
English (en)
French (fr)
Inventor
黄胤人
陈萌辉
李媛媛
陈一宁
Original Assignee
广州神马移动信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州神马移动信息科技有限公司 filed Critical 广州神马移动信息科技有限公司
Publication of WO2016188347A1 publication Critical patent/WO2016188347A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Definitions

  • the present invention relates to network content, and in particular, to a network content quality evaluation method and apparatus, a network content ranking method and system, a computing device, and a non-transitory machine readable storage medium.
  • the ranking of search results by existing search engines is usually based mainly on relevance and popularity.
  • This search method works well in the case of simply searching for information, for example. But for certain kinds of searches (for example, books, especially serial online novels), because a novel may be reprinted by a large number of different websites, even if the same novel has different titles on different websites, there are different qualities. The problem, so when sorting the novel search results according to the above two characteristics, a large number of low-quality duplicate books may be captured.
  • a technical problem to be solved by the present invention is to provide a network content quality evaluation method and apparatus, a network content ordering method and system, a computing device and a non-transitory machine readable storage medium, which can evaluate the quality of the network content itself. This makes it easy for people to choose the content of the web.
  • a network content quality evaluation method including: acquiring a content quality feature of a network content, the content quality feature including at least one of a directory feature, a source quality feature, a meta information feature, and a subject quality feature. a feature; calculating a feature score for each of the at least one feature; and calculating a quality score of the network content based on the feature score.
  • the evaluation of the quality of the network content itself can be specifically implemented according to at least one aspect of the network content such as directory characteristics, source, meta information and subject quality, thereby providing a basis for quality-based network content selection.
  • the web content may be any of the following: books, music, APP, internet radio.
  • the quality of the book itself can be evaluated based on the catalogue characteristics of the book itself, the source of the book, the meta-information of the book, and the quality of the text (ie, the subject quality of the book).
  • For music it can be used for catalog features such as audio tracks or discs, source features such as QQ music or Baidu music, meta-information features such as albums, singers, song names, and song star ratings. At least one of the subject quality characteristics specifically evaluates the quality of the music.
  • the quality can also be specifically evaluated for at least one of its specific catalog, source, meta information and subject quality.
  • the content quality feature may include at least two of the catalog feature, the source quality feature, the meta information feature, and the subject quality feature.
  • the network content quality evaluation method disclosed by the present invention may further include: assigning a feature weight to each of the at least two features, wherein the feature scores of the at least two features are weighted and summed Calculating the quality score of the network content.
  • the directory feature may include one or more of the following features: update time rate; blank rate; useless chapter rate; chapter length; a master station authority assigned to the primary station of the network content; and the master The actual chapter rate of the station.
  • catalogue features can be evaluated based on more refined parameters, thereby further improving the comprehensiveness and accuracy of the content quality assessment.
  • Empty chapter rate number of empty chapters / total number of chapters; and / or
  • Useless chapter rate number of unnecessary chapters / total number of chapters; and / or
  • Chapter length number of chapters / 1000, where the number of chapters is an integer between 1 and 1000. When the number of chapters is greater than 1000, the length of the chapter is 1; and / or
  • the master station authority score is:
  • the authority value of the primary station is higher than a certain threshold or the number of reloadings exceeds a certain number of stations, the authority's own authority is directly used, otherwise the authority of the main station is increased according to the reloading amount to obtain the final authority of the main station; /or
  • Actual chapter rate number of master stations / average number of chapters, where the average number of chapters is the average number of chapters of all sources of the network content, but when the number of chapters of the master station is not less than the average number of chapters, the actual chapter rate is 1.
  • the directory feature points may be obtained as follows: an update score, a null chapter rate, a useless chapter rate, and a chapter length, wherein each feature is multiplied by a master station authority score and an actual chapter rate, and the directory feature score
  • the value range is [0,1].
  • catalog feature scores can be calculated more conveniently and accurately, which provides a further basis for realizing accurate and fast calculation of quality scores.
  • the calculation of the quality score may comprise one or more of the following values:
  • Meta-information feature points first-level directory points + second-level directory points + picture information points + label points + profile points, where the first-level directory points, the second-level directory points, the picture information points, the label points, and the profiles are each in the first-level directory.
  • secondary directory, picture information, tags, and profiles exist with a value of 0.2, otherwise 0; and/or
  • Subject quality score total score/section number of all chapters, and the subject mass score has a value range of [0, 1].
  • the convenient and accurate calculation of the source feature score, the meta-information feature score and the text quality score is further realized, thereby providing a further basis for realizing accurate and fast calculation of the quality score.
  • the quality score of the network content may be obtained by weighting the added catalog feature score, the meta information feature score, the source feature score, and the text quality feature score by a ratio of 6:1:3:5 to obtain a final quality score. .
  • a method of sorting a plurality of network contents comprising: using each of the plurality of network contents according to any one or more of the methods described above; The quality score sorts the plurality of network contents as one of sorting basis.
  • the step of sorting the plurality of networks may include: sorting the plurality of network content obtained by the search in response to the network content query request of the user; or classifying the network content List to sort multiple web content; or sort multiple web content in a leaderboard.
  • the user can obtain the order of considering the quality of the network content itself through the keyword search, the classification list and the leaderboard, thereby realizing the way for the user to select the network content.
  • a network content quality evaluation apparatus includes: a feature acquisition unit for acquiring a content quality feature of a network content, wherein the content quality feature includes a directory feature, a source quality feature, and a meta information feature. And at least one of the subject quality features; a feature score calculation unit for calculating a feature score of each of the at least one feature; and a method for calculating a quality of the network content based on the feature score The quality of the points is calculated.
  • the content quality feature may include at least two of a catalog feature, a source quality feature, a meta information feature, and a text quality feature
  • the device further comprising: for assigning each of the at least two features A weight assignment unit of feature weights, wherein the quality score calculation unit calculates a quality score of the network content by weighting and summing feature scores of the at least two features.
  • device support is provided for the calculation of the quality score of the network content.
  • a system for sorting a plurality of network contents comprising: the network content quality evaluating apparatus as described above, wherein the apparatus evaluates a quality score for each of the plurality of network contents And a sorting means for sorting the plurality of network contents with the quality score as one of sorting basis.
  • the sorting means may include: a search sorting unit for sorting the plurality of searched web contents in response to the user's web content query request; or a sorting list sorting unit for sorting the web content sorting list Sort multiple web content; and leaderboard sorting units to sort multiple web content by leaderboard.
  • a computing device including: a memory; a processor, connected to the memory, for acquiring a content quality feature of the network content, the content quality feature including a directory feature, a source quality feature, and meta information And at least one of the feature and the subject quality feature, calculating a feature score of each of the at least one feature, calculating a quality score of the network content according to the feature score, and storing the quality score in a memory.
  • a non-transitory machine readable storage medium is also provided.
  • Executable code is stored thereon, and when the executable code is executed by the processor, the processor is caused to perform the network content quality evaluation method described above or a method of sorting a plurality of network contents.
  • FIG. 1 is a schematic flow chart of a method for evaluating a network content quality according to an embodiment of the present invention.
  • Figure 2 is an alternative to the method of Figure 1.
  • FIG. 3 is a schematic flow diagram of a method of sorting multiple network content, in accordance with one embodiment of the present invention.
  • step S40 of FIG. 4 is a flow chart of sub-steps that can be included in step S40 of FIG.
  • Figure 5 is a schematic block diagram of a network content quality evaluation apparatus in accordance with one embodiment of the present invention.
  • FIG. 6 is a schematic block diagram of a system for ordering multiple network content in accordance with one embodiment of the present invention.
  • Figure 7 is a schematic block diagram of a sorting device in accordance with one embodiment of the present invention.
  • Figure 8 is a schematic block diagram of a computing device in accordance with one embodiment of the present invention.
  • FIG. 1 is a schematic flow chart of a method for evaluating a network content quality according to an embodiment of the present invention.
  • a content quality feature of the web content is obtained.
  • the content quality feature includes at least one of the following features: a directory feature, a source quality feature, a meta information feature, and a subject quality feature.
  • a feature score for each of the at least one of the features is calculated.
  • the quality score of the network content is calculated based on the feature score.
  • a content quality feature capable of representing the quality of the network content is selected, and the quality score of the network content is calculated by quantifying one or more features (ie, calculating a feature score for each feature).
  • Figure 2 is an alternative to the method of Figure 1.
  • At step S10' at least two of the above-described content quality features of the web content are acquired. That is, at least two of the catalog feature, the source quality feature, the meta information feature, and the subject quality feature are acquired.
  • step S20' the feature score of each of the at least two features is calculated.
  • step S21' weights are assigned to each of the at least two features described above.
  • step S30' the quality score of the network content is obtained based on the feature score and the weight weight.
  • steps S20' and S21' can be reversed, that is, the feature score of each feature is first calculated for each feature, which does not affect the evaluation of the final quality score.
  • the web content can be a book, such as a serialized web novel.
  • the web content may also be any web content such as music, internet radio or APP that can select and define features that represent quality and quantify those features.
  • the following takes the novel as an example to give a specific method for obtaining the content quality features and calculating the feature scores.
  • catalogue features referred to in the present disclosure should be understood to be related to the web content, excluding the subject quality (for example, the text quality of the novel) and the meta information carried by the document, but closely related to the quality of the web content. Content.
  • the above directory features may include, but are not limited to, update speed (ie, update points, especially for network serial novels), blank chapter rate, useless chapter rate, chapter length, actual chapter rate, and authority authority score.
  • the timeliness of the update can be used as a criterion for judging the quality of the work. Therefore, the catalogue feature of whether the book update is timely (update point) can contribute to the calculation of the book quality score.
  • the calculation of the novel update score may include first determining whether the novel has been completed, and if it is finished, updating the predetermined fixed value, and if the novel is not completed, obtaining the latest update time, and the latest update time. The closer the current time is, the higher the update score is. When the number of days (for example, 30 days) is not updated, the update is divided into 0 points.
  • the chapter length can be considered to be a perfect score, that is, the chapter length value is 1.
  • chapterLength number of chapters/1000.
  • the number of chapters is an integer between 1 and 1000. When the number of chapters is greater than 1000, chapterLength is 1.
  • the source of the novel usually has more than one website.
  • a catalogue source will be selected for each novel, which is called the main station of the novel.
  • the authority of the main station can be used as an evaluation criterion for the quality of the novel. That is, the higher the authority of the main station, the better the quality of the relevant novels. However, considering the existence of an unreasonable rise in the score of the novel at the authoritative novel site, the weight of the novel site can be smoothed. If the authority value of the site is insufficient, you can increase the resilience of the main station by increasing the reload rate (for example, multiply the reload amount by a factor, and then use the authority of the main station to obtain the authority of the main station) . But if the authority value of its own main station is high or the reload rate is already more than For example, 30 sites, you can directly use the authoritative points of the novel's own main station.
  • the actual chapter rate calculation method may be: first calculating the average chapter number of all sources of the novel, and if the number of chapters serialized by the main station of the novel exceeds the average number of chapters, the serialization is considered complete, and the actual chapter rate is 1. If the number of chapters serialized by the primary station is less than the average number of chapters, the serialization is considered to be incomplete and the reliability of the primary station is not high.
  • the method for calculating the score of the catalogue feature of the novel according to the above six features may be:
  • the weight of the update score, the blank chapter rate, the useless chapter rate, and the chapter length may be assigned weights according to the importance level or the experience value, and then multiplied by the novel authority score of the novel and the actual chapter rate, thereby calculating the final catalog feature.
  • the credibility of the novel's main station By multiplying each of the four characteristics of the update score, the blank chapter rate, the useless chapter rate and the chapter length by the author's authority and the actual chapter rate of the novel, the credibility of the novel's main station can be used. And the credibility of the novel on the main station to smooth the values of the above four items, so as to more objectively evaluate the quality of the novel according to the two credibility of the novel and the main station.
  • the specific method for obtaining the feature scores of the novel catalogue is given.
  • the above discloses a method for obtaining the score of the novel catalogue based on the update score, the blank chapter rate, the useless chapter rate, the chapter length, the master authority score, and the actual chapter rate, it is obvious that the above items are only evaluation novel catalogues.
  • An example of a feature the calculation method of each score is also shown by way of example only, and those skilled in the art can Other evaluation methods are contemplated in light of the disclosure of the present invention.
  • Online fiction usually has more than one source station. Often the more popular a novel is, the more times the novel is reprinted (ie, the more the source station). Therefore, the number of source stations can reflect the quality of the book from one side. If the number of novel sources exceeds a certain number, for example, 30 stations, the source feature points can be considered to be full marks. And the more the source station of the novel is the original station, the big station also shows that the quality of the book is higher. Therefore, the final source feature of the novel needs to take into account the number of source stations and the authority of each source station itself.
  • the method for calculating the final source feature score of each novel may be, for example, first: assigning a weight to each source station of the novel, calculating the average weight of all the source stations of a novel, and then calculating the source of the novel. The number is weighted, and finally the source quality feature score is calculated based on the average weight of all the source stations mentioned above and the source number weight.
  • Assigning a weight to each source station of the novel is to indicate the importance and authority of each source station.
  • the source quality feature score is used to comprehensively determine the source quality.
  • the credibility of the quality score of the novel can be further improved, thus contributing to the user's choice of the novel.
  • meta information contained in the novel can usually contain five items: a primary directory, a secondary directory, a picture information, a label, and an introduction. Whether the meta information contained in the novel is complete or not can be used as the small Said a reference for quality evaluation. That is to say, the more meta-information a novel contains, the higher the score of the meta-information integrity, and the better the quality of the corresponding book.
  • the meta-information integrity score (meta-information feature score) is calculated by judging whether the quintuple information is included in the novel, and each one contains a meta-information score of MetaScore plus 0.2, when the above five items are included Meta information is 1 for Meta information, and MetaScore has a value range of [0, 1].
  • the text quality feature of the novel refers to whether the body has more punctuation, watermark, and whether the body content is complete.
  • the calculation algorithm of the text quality feature is to call another interface inside the system, which is not the main content of the solution, so it will not be described in detail here.
  • the text quality feature score aveChapterScore the total score/number of chapters in all chapters, and the value range of aveChapterScore is [0,1]
  • the above discloses an exemplary method of obtaining scores of four features of a directory feature, a source quality feature, a meta information feature, and a body quality feature.
  • the final quality score of the novel is calculated based on these four characteristics as follows.
  • the method for calculating the quality score of the novel may be directly adding the scores of the catalogue feature, the source quality feature, the meta-information feature and the text-quality feature, or assigning appropriate weights to perform weighted summation.
  • experiments performed by assigning different weights to the data show that the directory features, meta-information features, source features, and text quality features are weighted and added in a ratio of 6:1:3:5, respectively, to obtain a final The quality of books has a good effect.
  • catalogue features such as track or disc number
  • source features such as QQ music or Baidu music
  • meta-information features such as albums, singers, song titles, and song star ratings At least one of the subject features, etc., specifically evaluates the quality of the music.
  • the quality of the specific directory characteristics, source, meta information and subject quality can also be specifically evaluated, and will not be described here.
  • FIG. 3 is a schematic flow diagram of a method of sorting multiple network content, in accordance with one embodiment of the present invention.
  • Step S40 in FIG. 3 sorts the plurality of network contents by using the quality score calculated by the method described in FIG. 1 or FIG. 2 as one of the sorting basis (other basis may include search heat, correlation, etc.).
  • step S40 of FIG. 4 is a flow chart of sub-steps that can be included in step S40 of FIG.
  • step S401 the plurality of network contents obtained by the search are sorted in response to the user's web content query request.
  • step S402 a plurality of network contents are sorted by a network content classification list.
  • step S403 a plurality of network contents are sorted by a ranking list.
  • the ranking of the web content is further clarified to correspond to the sorting, sorting list and ranking list of the query.
  • the above three methods are also the most common means for users to select network content.
  • By making the specific rankings on the query results, classification lists, and rankings based at least in part on the quality scores of the web content it is possible to provide a reference to the actual quality of the user's daily web content selection, thereby helping the user to ultimately select a higher quality. Satisfied web content.
  • the quality evaluation and ranking method of the network content is described in detail above with reference to FIGS.
  • the apparatus for network content quality evaluation will be described below with reference to FIG. 5, and the network content sorting system and the sorting apparatus it contains are described with reference to FIGS. 6-7.
  • FIG. 5 is a schematic block diagram of a network content quality evaluation apparatus 500 in accordance with one embodiment of the present invention.
  • the network content quality evaluation apparatus 500 includes a feature acquisition unit 510, a feature division calculation unit 520, and a quality score calculation unit 530, and optionally includes a weight assignment unit 521 (shown by a broken line).
  • the feature obtaining unit 510 is configured to acquire a content quality feature of the network content, where the content quality feature includes at least one of a directory feature, a source quality feature, a meta information feature, and a subject quality feature.
  • the feature score calculation unit 520 is configured to calculate a feature score of each of the at least one feature.
  • the quality score calculation unit 530 is configured to calculate a quality score of the network content according to the feature score.
  • the optional weight assigning unit 521 is configured to be used in the at least two features. Each item is assigned a feature weight. Subsequent mass calculation The element 530 calculates a quality score of the network content by weighting and summing the feature scores of the at least two features.
  • FIG. 6 is a schematic block diagram of a network content ordering system 1000 that sorts a plurality of network content in accordance with one embodiment of the present invention.
  • the network content ranking system 1000 includes a network content quality evaluation device 600 and a sorting device 640.
  • the network content quality evaluation device 600 may be the same as or different from the network content quality evaluation device 500 disclosed in FIG. 5, and is configured to rate a quality score for each of the plurality of network contents.
  • the sorting means 640 is configured to sort the plurality of network contents by using the quality score as one of sorting basis.
  • FIG. 7 is a schematic block diagram of a sorting device 740, in accordance with one embodiment of the present invention.
  • Sorting device 740 can be the same or different than sorting device 640 disclosed in FIG.
  • the sorting means 740 may include any one, two or all of the search sorting unit 741, the sorting list sorting unit 742, and the leaderboard sorting unit 743.
  • the search ranking unit 741 can be configured to sort the plurality of searched web content in response to the user's web content query request.
  • the category list sorting unit 742 can be used to sort a plurality of network contents in a network content category list.
  • the leaderboard sorting unit 743 can be used to sort a plurality of web content in a leaderboard.
  • FIG. 8 is a schematic block diagram of a computing device 800 in accordance with one embodiment of the present invention.
  • the computing device 800 can be a server-side device, or a client device such as a desktop computer, a notebook computer, a tablet computer, a smart mobile phone, or the like.
  • the computing device 800 includes a memory 810 and a processor 820 that is coupled to the memory 810.
  • the processor 820 is configured to acquire a content quality feature of the network content, where the content quality feature includes at least one of a directory feature, a source quality feature, a meta information feature, and a subject quality feature, and calculates a feature score of each of the at least one feature.
  • the value, the quality score of the network content is calculated according to the feature score, and the quality score is saved in the memory 810.
  • processor 820 may also perform other steps in the network content quality evaluation method described above, or perform related steps in the method of sorting multiple network contents as described above, and details are not described herein again.
  • the technology in the embodiments of the present invention can be implemented by means of software plus necessary general hardware including general-purpose integrated circuits, general-purpose CPUs, general-purpose memories, general-purpose components, and the like. It can be implemented by dedicated hardware including an application specific integrated circuit, a dedicated CPU, a dedicated memory, a dedicated component, etc., but in many cases the former is a better implementation. Based on such understanding, the technical solution in the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product, which may be stored in a storage medium such as a read-only memory.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • CD Compact Disc
  • the above technical concept of the present invention can also be embodied as a non-transitory machine readable storage medium having executable code stored thereon.
  • the processor When the executable code is executed by the processor, the processor is caused to perform the network content quality evaluation method described above, or to perform the method of sorting a plurality of network contents as described above.
  • the above technical concept of the present invention can also be embodied as a computing device including a processor and a non-transitory machine readable storage medium.
  • the non-transitory machine readable storage medium stores executable code thereon.
  • the processor is caused to perform the method described above.
  • the method according to the invention may also be embodied as a computer program product comprising a computer readable medium on which is stored a computer program for performing the functions described above in the method of the invention.
  • the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.
  • each block of the flowchart or block diagram can represent a module, a program segment, or a portion of code that includes one or more of the Executable instructions.
  • the functions noted in the blocks may also occur in a different order than those illustrated in the drawings. For example, two consecutive blocks may be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented in a dedicated hardware-based system that performs the specified function or operation. Or it can be implemented by a combination of dedicated hardware and computer instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

一种网络内容质量评价方法及装置、网络内容排序方法及系统、计算设备及非暂时性机器可读存储介质。其中,获取网络内容的内容质量特征(S10),所述内容质量特征包括目录特征、来源质量特征、元信息特征以及主体质量特征中的至少一项特征;计算所述至少一项特征中每一项的特征分值(S20);以及根据所述特征分值计算所述网络内容的质量分(S30)。由此,能够对网络内容的质量进行评价,从而更为合理地向用户推荐网络内容。

Description

网络质量评价方法及装置、网络内容排序方法及系统、计算设备及非暂时性机器可读存储介质 技术领域
本发明涉及网络内容,尤其涉及一种网络内容质量评价方法及装置、网络内容排序方法及系统、计算设备及非暂时性机器可读存储介质。
背景技术
随着网络及其相关技术的发展,现今人们花在线上活动上的时间越来越多。例如,人们会在线阅读书籍(例如,连载的小说),在线收听音乐和网络电台,在线选择自己喜欢的APP进行下载等。
由于上述网络内容(例如,书籍、音乐和网络电台、APP等)的大量存在,如何对这些内容进行更为合理的筛选成为课题。例如,可以使用搜索引擎进行关键字搜索,根据排行榜和分类列表进行选择等。
现有搜索引擎对搜索结果的排序通常主要根据相关性和热门度进行。在例如单纯对信息进行搜索的情况下,这种搜索方式效果良好。但是对于某些种类的搜索(例如,书籍,尤其是连载的网络小说),由于一本小说可能被大量不同网站转载,即使同一本小说在不同网站也会有不同的标题,更有不同质量的问题,所以依据上述两个特征对小说搜索结果排序时可能抓取到大量低质重复的书。
根据本申请人另一待决申请“一种基于simhash和章节匹配的同本识别”所公开的方法,可以做到识别出“同本”,但并不能判断出这些“同本”的书籍哪本质量更好,哪本更适合在排序中优先展示。
另外,对于能够根据某些参数对其质量本身进行评价的网络内容,诸如音乐、网络电台和APP等,同样存在对其质量进行评价以方便用户选择的需要。
发明内容
本发明所要解决的一个技术问题是提供一种网络内容质量评价方法及装置、网络内容排序方法及系统、计算设备及非暂时性机器可读存储介质,其能够对网络内容本身的质量做出评价,由此方便人们对网络内容进行选择。
根据本发明的一个方面,公开了一种网络内容质量评价方法,包括:获取网络内容的内容质量特征,所述内容质量特征包括目录特征、来源质量特征、元信息特征以及主体质量特征中的至少一项特征;计算所述至少一项特征中每一项的特征分值;以及根据所述特征分值计算所述网络内容的质量分。
由此,就能够根据网络内容的诸如目录特征、来源、元信息和主体质量的至少一个方面来具体实现对网络内容本身质量的评价,从而为基于质量的网络内容推选提供基础。
优选地,网络内容可以是如下各项中的任一项:书籍、音乐、APP、网络电台。
对于书籍(例如,连载的网络小说)而言,就可以根据该书籍本身的目录特征、书籍来源、书籍元信息及其正文质量(即,书籍的主体质量)等来评价书籍本身的质量。
对于音乐而言,可以对其诸如音轨或光盘编号的目录特征、源自诸如QQ音乐或是百度音乐的来源特征、诸如专辑、歌手、歌曲名的元信息特征、以及歌曲星级评价等的主体质量特征中的至少一项来具体评价音乐质量。
对于APP和网络电台而言,也可以针对其具体的目录、来源、元信息和主体质量的至少一项来具体评价其质量。
优选地,内容质量特征可以包括目录特征、来源质量特征、元信息特征以及主体质量特征中的至少两项特征。并且本发明所公开的网络内容质量评价方法还可以包括:为所述至少两项特征中的每一项分配特征权重,其中,通过对所述至少两项特征的特征分值进行加权求和来计算所述网络内容的质量分。
这样,就能够根据网络内容的诸如目录特征、来源、元信息和主体质 量的至少两个方面,并在考虑这些方面的不同的重要性的情况下具体实现对网络内容本身质量的评价,从而使得质量评价更为全面、灵活和准确。
优选地,目录特征可以包括以下特征中的一项或多项:更新及时率;空章率;无用章率;章节长度;为所述网络内容的主站分配的主站权威分;以及该主站的实际章节率。
这样,就能够根据更细化的参数来评价目录特征,从而进一步提高内容质量评价的全面性和准确性。
优选地,更新分=Time_gap^[1/(Time_gap+1)],其中Time_gap=1-(当前时间-最后更新时间)/30天且位于区间[0,1]内,当最后更新时间超过30天时,Time_gap为0;并且/或者
空章率=空章节数/章节总数;并且/或者
无用章率=无用章节数/章节总数;并且/或者
章节长度=章节数/1000,其中章节数是1到1000之间的整数,当章节数大于1000时,章节长度取值为1;并且/或者
主站权威分取值为:
如果主站权威分值高于一定阈值或者转载量多于一定数量的站点,则直接使用主站自身的权威分,否则根据转载量增加主站自身的权威分得到最终的主站权威分;并且/或者
实际章节率=主站章节数/平均章节数,其中平均章节数是所述网络内容所有来源的平均章节数,但当主站章节数不小于平均章节数时,实际章节率为1。
这样,就能够更为方便准确的计算更新分、空章率、无用章率、章节长度、主站权威分和实际章节率,从而为实现质量分的准确快速计算提供了进一步的基础。
优选地,可以如下求取所述目录特征分:更新分、空章率、无用章率和章节长度相加,其中每个特征都乘以主站权威分及实际章节率,且目录特征分值的取值范围为[0,1]。
这样,就能够更为方便准确地计算目录特征分,从而为实现质量分的准确快速计算提供了再进一步的基础。
优选地,质量分的计算可以包括以下取值的一项或多项:
来源特征分=Ave_host_score*(1+Host_factor),且所述来源特征分的取值范围为[0,1],其中小说所有来源站的平均权重Ave_host_score=(∑host_score)/host_num,host_num是来源站数且host_score是来源站各自的权重,而来源数权重Host_factor=host_num/30,其中host_num是1到30之间的整数,当host_num大于30时,来源数权重取值为1;并且/或者
元信息特征分=一级目录分+二级目录分+图片信息分+标签分+简介分,其中一级目录分、二级目录分、图片信息分、标签分和简介分各自在一级目录、二级目录、图片信息、标签和简介存在时取值为0.2,否则为0;并且/或者
主体质量分=所有章节的总得分/章节数,且主体质量分的取值范围为[0,1]。
这样,就在方便准确地计算目录特征之外,进一步实现了对来源特征分、元信息特征分和正文质量分的方便准确的计算,从而为实现质量分的准确快速计算提供了又进一步的基础。
优选地,可以如下求取所述网络内容的质量分:以6:1:3:5的比例加权相加目录特征分、元信息特征分、来源特征分和正文质量特征分得到最终的质量分。
这样,就进一步优化了最终质量分的计算过程,从而为参考质量分进行选择提供了基础。
根据本发明的另一个方面,公开了一种对多个网络内容进行排序的方法,包括:使用根据上述任一方法或优选方法来为所述多个网络内容中每一个评定质量分;以及以所述质量分作为排序依据之一对所述多个网络内容进行排序。
由此,就能够根据网络内容本身的质量分来对多个网络内容进行排序,从而提高排序的准确性,方便用户对网络内容的选择。
优选地,对多个网络进行排序的步骤可以包括:响应于用户的网络内容查询请求而对搜索得到的多个网络内容进行排序;或者以网络内容分类 列表来排序多个网络内容;或者以排行榜单来排序多个网络内容。
这样,用户就能够通过关键字搜索、分类列表和排行榜来得到考虑了网络内容本身质量的排序,从而具体化了用户选择网络内容的途径。
根据本发明的再一个方面,提供了一种网络内容质量评价装置,包括:用于获取网络内容的内容质量特征的特征获取单元,所述内容质量特征包括目录特征、来源质量特征、元信息特征以及主体质量特征中的至少一项特征;用于计算所述至少一项特征中每一项的特征分值的特征分计算单元;以及用于根据所述特征分值计算所述网络内容的质量分的质量分计算单元。
优选地,所述内容质量特征可以包括目录特征、来源质量特征、元信息特征以及正文质量特征中的至少两项特征,该装置还包括用于为所述至少两项特征中的每一项分配特征权重的权重分配单元,其中,所述质量分计算单元通过对所述至少两项特征的特征分值进行加权求和来计算所述网络内容的质量分。
由此,就为实现网络内容质量分的计算提供了装置支持。
根据本发明的再一个方面,提供了一种对多个网络内容进行排序的系统,包括:如上所述的网络内容质量评价装置,所述装置为所述多个网络内容中每一个评定质量分;以及排序装置,用于以所述质量分作为排序依据之一对所述多个网络内容进行排序。
优选地,所述排序装置可以包括:搜索排序单元,用于响应于用户的网络内容查询请求而对搜索得到的多个网络内容进行排序;或者分类列表排序单元,用于以网络内容分类列表来排序多个网络内容;以及排行榜排序单元,以排行榜单来排序多个网络内容。
由此,就为实现依据网络内容质量分进行排序提供了系统支持。
根据本发明的再一个方面,还提供了一种计算设备,包括:存储器;处理器,连接到存储器,用于获取网络内容的内容质量特征,内容质量特征包括目录特征、来源质量特征、元信息特征以及主体质量特征中的至少一项特征,计算至少一项特征中每一项的特征分值,根据特征分值计算网络内容的质量分,并将所述质量分保存在存储器中。
根据本发明的再一个方面,还提供了一种非暂时性机器可读存储介质。其上存储有可执行代码,当所述可执行代码被处理器执行时,使所述处理器执行上文所述的网络内容质量评价方法或对多个网络内容进行排序的方法。
附图说明
通过结合附图对本公开示例性实施方式进行更详细的描述,本公开的上述以及其它目的、特征和优势将变得更加明显,其中,在本公开示例性实施方式中,相同的参考标号通常代表相同部件。
图1是根据本发明的一个实施例的网络内容质量评价方法的示意性流程图。
图2是图1所示方法的替换方案。
图3是根据本发明的一个实施例的对多个网络内容进行排序的方法的示意性流程图。
图4是图3所示步骤S40可以包括的子步骤的流程图。
图5是根据本发明的一个实施例的网络内容质量评价装置的示意性方框图。
图6是根据本发明的一个实施例的对多个网络内容进行排序的系统的示意性方框图。
图7是根据本发明的一个实施例的排序装置的示意性方框图。
图8为根据本发明一个实施例的计算设备的示意性方框图。
具体实施方式
下面将参照附图更详细地描述本公开的优选实施方式。虽然附图中显示了本公开的优选实施方式,然而应该理解,可以以各种形式实现本公开而不应被这里阐述的实施方式所限制。相反,提供这些实施方式是为了使本公开更加透彻和完整,并且能够将本公开的范围完整地传达给本领域的技术人员。
I.网络内容质量评价方法
图1是根据本发明的一个实施例的网络内容质量评价方法的示意性流程图。
在步骤S10,获取网络内容的内容质量特征。内容质量特征包括如下的至少一项特征:目录特征、来源质量特征、元信息特征以及主体质量特征。
在步骤S20,计算上述至少一项特征中每一项的特征分值。
在步骤S30,根据特征分值计算网络内容的质量分。
在此,选择能够代表网络内容质量的内容质量特征,通过对一项或多项特征的量化(即,计算各特征的特征分值)来计算网络内容的质量分。
图2是图1所示方法的替换方案。
在步骤S10’,获取网络内容的上述内容质量特征中的至少两项特征。即,获取目录特征、来源质量特征、元信息特征以及主体质量特征中的至少两项特征。
在步骤S20’,计算上述至少两项特征中每一项的特征分值。
在步骤S21’,为上述至少两项特征中每一项分配权重。
在步骤S30’,根据特征分值及权重加权求取网络内容的质量分。
其中,步骤S20’和S21’的顺序可以调换,即,先为每一项特征分配权重再计算每一项特征的特征分值,这不会影响对最终的质量分的评价。
在此,通过获取至少两项特征并向不同特征分配相同或不同的权重,就能够考虑各种特征对质量的不同的影响,从而使得对网络内容的量化更为全面和准确。
具体地,网络内容可以是书籍,诸如连载的网络小说。另外,网络内容也可以是能够为其选择并定义代表质量的特征并量化这些特征的任何网络内容,诸如音乐、网络电台或APP等。
应该理解的是,虽然如下对于内容质量特征的具体描述是针对书籍给出的,但是所公开的方法也适用于能够被质量评价的其他网络内容。
如下将以小说为例,给出内容质量特征的获取以及特征分值的计算的具体方法。
1.小说的目录特征
本公开中所涉及的目录特征,应该理解为与网络内容有关的不包括主体质量(例如,小说的正文质量)和文件所携带的元信息之外的、但又与该网络内容的质量密切相关的内容。
对于小说而言,上述目录特征可以包括但不限于更新速度(即,更新分,尤其针对网络连载小说)、空章率、无用章率、章节长度、实际章节率和主站权威分等。
1.1更新速度(更新分)
对于连载中的作品(尤其是网络小说)而言,更新的及时性能够作为判定该作品好坏的一个标准。因此获取书籍更新是否及时(更新分)这一目录特征能够对书籍质量分的计算做出贡献。
具体地,对小说更新分的计算可以包括首先判断该小说是否已完结,若已完结,则更新分取预设的固定数值,若是未完结小说,则获取它的最近更新时间,并且最近更新时间离当前时刻越近则更新分越高,当超过一定天数(例如,30天)未更新时则更新分为0分。
更新分的计算公式可以是:update_score=Time_gap^[1/(time_gap+1)],其中Time_gap=1-(当前时间-最近更新时间)/30天。Time_gap位于区间[0,1]中,当最近更新时间超过30天时,Time_gap为0。
由于函数y=x^[1/(1+x)]在x属于[0,1]的取值区间内有着良好的平滑性,并且x越趋近于1,函数斜率越大。在更新分的计算中使用这一函数,可以使得更加越及时,分数上升的越快,由此更好的突出书籍的优点和缺点,以便于优质书籍和劣质书籍有更明显的分数差异。
通过在小说质量分中包括更新分的计算,就能够在质量评价体系中将读者通常最为关心的作者“填坑”速度(更新频度)、是否“弃坑”(即,放弃更新)等现象考虑在内,从而提高质量分对于用户选择的指导性。
1.2空章率
空章指的是作为一章但没有内容的章节,因此小说的空章越少,表明质量越高,小说空章率的计算公式于是可以是:emptyChapterRate=空章节数/章节总数。
通过在质量评价中引入空章率,就能够进一步帮助用户从章节数看上 去很多的小说中选择实际内容丰富,而非仅靠空章来拉长章数的优质小说。
1.3无用章率
无用章率是指有内容但是非小说正文的章节,如作者请假条,作者的感言等等,因此无用章越少,则表明书籍质量越高,所以无用章率的计算公式为:uselessChapetrRate=无用章节数/章节总数。
类似于上述1.2的阐述,通过在质量评价中引入无用章率,就能够进一步帮助用户从章节数看似很多的小说中选择实际内容丰富,而非仅靠作者感言等无用章来拉长章数的优质小说。
1.4章节长度
由于通常作者会在小说饱受好评的情况下进行长期连载,因此小说的章节数量可以作为反映小说质量的一个标准。即,章节数越多,表明小说越受欢迎,书籍质量越高。
另外,在小说章节数超过一定章数,例如1000章时,可以认为章节长度达到满分,即章节长度值为1。
因此,在这里章节长度值的计算公式为:chapterLength=章节数/1000。章节数是1到1000之间的整数。当章节数大于1000时,chapterLength为1。
通过引入章节长度,就能够在质量评价中体现出长期连载小说的优势,从而为用户的准确选择提供帮助。
1.5主站权威性(主站权威分)
由于网络上连载的小说往往会被进行多次的转载,因此小说的来源通常不止一个网站。但出于方便性和可行性的考虑,会为每一本小说选择一个目录来源,该来源称为该小说的主站。
主站的权威性可以作为小说质量的一个评价标准。即,主站的权威性越高,表明相关小说的质量越好。但是考虑到存在着在权威小说站点断章而导致该小说分数不合理上升的情况,可以对小说站点的权重做了一定的平滑处理。如果该站点的权威值不足则可以通多较多的转载率来提升自身主站的不足(例如,使用转载量乘以一系数,再与主站自身的权威分相加以得到主站权威分)。但如果自身的主站权威值很高或者转载率已经多于 例如30个站点,则可以直接使用该小说自己主站的权威分。
1.6实际章节率
在如上对主站权威性的描述中可知会为每一本小说选择一个目录来源作为该小说的主站。但是由于存在主站并非小说的初始登载站或更新站的情况,这里提出实际章节率这一系数作为该小说主站的一个可信度判断。
具体地,实际章节率的计算方法可以是:先计算该小说的所有来源的平均章节数,若该小说的主站连载的章节数多余此平均章节数,则认为连载完整,实际章节率的为1,若主站连载的章节数小于此平均章节数,则认为连载不完整,主站的可信度不高。由此,实际章节率的计算公式可以是:实际章节率=主站章节数/平均章节数。实际章节率并不能作为一个单独的特征指标分数存在,在计算小说质量分时将被作为其他特征的可信度一起使用。
1.7目录特征分值的计算
由上可以得到对上述6个特征的具体计算方法。而根据上述6个特征计算小说的目录特征分值的方法例如可以是:
相加(例如,均等相加)更新分、空章率、无用章率和章节长度,其中每个特征都需乘以该小说的主站权威分以及实际章节率来平滑可能出现的误差,并且最终的目录特征分值的取值范围为[0,1]。
另外,也可以按照重要程度或经验值为更新分、空章率、无用章率和章节长度分配权重,再乘以该小说的主站权威分及实际章节率,由此计算最终的目录特征。
通过对更新分、空章率、无用章率和章节长度这四项特征中的每一项都乘以该小说的主站权威分以及实际章节率,就能够用该小说主站的可信度和该小说在主站上的可信度两者来平滑上述四项的值,从而根据小说与主站相关的这两个可信度来更为客观地评价小说质量。
综上给出了求取小说目录特征分值的具体方法。虽然上文公开了根据更新分、空章率、无用章率、章节长度、主站权威分以及实际章节率求取小说目录特征分值的方法,但显见的是,以上各项只是评价小说目录特征的例子,各项分值的计算方法也仅仅作为例子示出,本领域技术人员能够 根据本发明的公开想到其他的评价方法。
2.小说的来源质量特征
网络小说通常不止有一个来源站。通常小说越受欢迎,该小说被转载的次数就越多(即,来源站就越多)。因此,来源站的数目能从一个侧面反映该书籍质量的高低。如果小说来源数超过一定数量,例如30站,则可认为来源特征分达到满分。且小说的来源站越是原创站、大站也同样表明书籍质量越高。因此,对小说的最终来源特征分需要考虑到来源站的数量以及各来源站自身的权威性。
由此,计算每一本小说的最终来源特征分的方法例如可以是:首先,给小说的每一个来源站分配一个权重,计算一本小说所有来源站的平均权重,然后再计算该小说的来源数权重,最后根据上述所有来源站的平均权重以及来源数权重来计算来源质量特征分值。
给小说的每一个来源站分配一个权重是为了表明每一个来源站的重要程度以及权威性,小说所有来源站的平均权重的计算公式为Ave_host_score=(∑host_score)/host_num,其中host_num是来源站数,host_score是来源站各自的权重,各来源站的权重根据该来源站热门书籍占总体热门书籍比重综合确定。
来源数权重的计算公式例如可以是:Host_factor=host_num>=30?1:host_num/30。也就是说,当host_num是1到30之间的整数时,Host_factor=host_num/30,而当host_num大于30时,来源数权重取值为1。
来源质量特征分值是用于综合判定来源质量,来源质量特征分值的计算公式为Host_score=Ave_host_score*(1+Host_factor),且Host_score的取值范围为[0,1]。
通过从小说主站和转载站来综合评定小说的来源质量特征,就能够进一步提升小说质量评分的可信度,从而有助于用户对小说的选择。
3.小说的元信息特征
对于小说而言,其元信息通常可以包含一级目录,二级目录,图片信息,标签与简介这五项内容。小说包含的元信息完整与否,可以作为该小 说质量评价的一个参考。也就是说,一本小说包含以上元信息越多则表明该元信息完整性分值越高,相应的书籍质量就越好。
因此,元信息完整性分值(元信息特征分)的计算方法为:判断小说中是否包含上述五项元信息,每包含一项则元信息完整性分值MetaScore加0.2,当包含上述五项元信息时,则MetaScore为1,且MetaScore的取值范围为[0,1]。
通过将小说元信息完整与否引入小说质量评价体系,就能够在小说的最终质量分中体现小说文件本身(与内容无关)的制作水平,从而更为客观地评价小说的质量。
4.小说的正文质量特征
在本公开中,小说的正文质量特征是指正文是否有较多的标点,水印,正文内容是否完整等。该正文质量特征的计算算法是调用系统内部另外的接口,并非本方案主要内容,所以在此不再详述。正文质量特征分值aveChapterScore=所有章节的总得分/章节数,且aveChapterScore的取值范围为[0,1]
通过对小说文法质量的评价,就能够更为准确地评价小说本身的质量,从而促进最终质量分对小说质量的准确反映。
5.小说质量分的计算
以上公开了求取目录特征、来源质量特征、元信息特征与正文质量特征这四个特征的分值的示例性方法。如下将讨论如何根据这四个特征来计算小说最终的质量分。
计算小说质量分的方法可以是将这目录特征、来源质量特征、元信息特征与正文质量特征的分值直接相加,也可以分别分配合适的权重进行加权求和。
在一个优选实施例中,通过对数据分配不同权重进行的实验表明,将目录特征,元信息特征,来源特征和正文质量特征,分别以6:1:3:5的比例加权相加,得到最终的书籍质量分有比较好的效果。
当然,显见的是,给出的上述比值只是一个经验值,并且本领域技术人员能够根据具体情况给出不同的权重或比值。
6.上述方法的总结
通过以上的公开,给出了网络内容评价方法的具体例子。虽然上文针对书籍(尤其是网络小说)给出了质量分计算的例子,但是本领域技术人员显而易见的是诸如音乐、APP和网络电台之类的能够评价其目录特征的网络内容也适用于上述公开的原理。
例如,对于音乐而言,可以对其诸如音轨或光盘编号的目录特征、源自诸如QQ音乐或是百度音乐的来源特征、诸如专辑、歌手、歌曲名的元信息特征、以及歌曲星级评价等的主体特征中的至少一项来具体评价音乐质量。
对于APP和网络电台而言,也可以针对其具体的目录特征、来源、元信息和主体质量中的至少一项来具体评价其质量,在此不再赘述。
另外,虽然上文给出了各特征的具体计算方法和公式,但本发明不限于这些具体例子,而是根据特定情况使用更为合适的其他方法和公式。
II.对多个网络内容进行排序的方法
图3是根据本发明的一个实施例的对多个网络内容进行排序的方法的示意性流程图。
图3中的步骤S40将使用图1或图2所述的方法计算出的质量分作为排序依据之一(其他依据可以包括搜索热度、相关性等),对多个网络内容进行排序。
由此,通过将小说质量引入网络内容排序体系,就能够在呈现最热门最相关的网络内容的基础上保证呈现网络内容的质量,从而有助于用户对网络内容的正确选择。
图4是图3所示步骤S40可以包括的子步骤的流程图。
在步骤S401,响应于用户的网络内容查询请求而对搜索得到的多个网络内容进行排序。
在步骤S402,以网络内容分类列表来排序多个网络内容。
在步骤S403,以排行榜单来排序多个网络内容。
在这里需要强调的是,以上三个步骤可以同时存在,存在两个或单独存在,并且在存在两个及以上步骤时,其先后顺序可以调换,而不会影响 该排序方法的实现。
由此,将网络内容排序进一步明确为相应于查询的排序、分类列表和排行榜单。上述三种方法也是用户选择网络内容最为常见的手段。通过让查询结果、分类列表和排行榜单上的具体排序至少部分基于网络内容的质量分,就能够为用户日常的网络内容选择提供实际质量的参考,从而有助于用户最终选择到更为优质称心的网络内容。
III.装置和系统
上面参考图1-4详细描述了网络内容的质量评价和排序方法。下面参考图5描述网络内容质量评价的装置,参考图6-7描述了网络内容排序系统及其所包含的排序装置。
下面描述的装置和系统的很多单元和装置的功能分别与上面参考图1-4描述的相应步骤的功能相同。为了避免重复,这里重点描述该装置和系统可以具有的单元或装置结构,而对于一些细节则不再赘述,可以参考上文中的相应描述。
图5是根据本发明的一个实施例的网络内容质量评价装置500的示意性方框图。
如图5所示,该网络内容质量评价装置500包括特征获取单元510、特征分计算单元520和质量分计算单元530,并且可选地包括权重分配单元521(用虚线示出)。
特征获取单元510用于获取网络内容的内容质量特征,上述内容质量特征包括目录特征、来源质量特征、元信息特征以及主体质量特征中的至少一项特征。
特征分计算单元520用于计算所述至少一项特征中每一项的特征分值。
质量分计算单元530用于根据所述特征分值计算所述网络内容的质量分。
另外,在上述内容质量特征包括目录特征、来源质量特征、元信息特征以及主体质量特征中的至少两项特征的情况下,可选的权重分配单元521用于为所述至少两项特征中的每一项分配特征权重。随后,质量分计算单 元530通过对所述至少两项特征的特征分值进行加权求和来计算所述网络内容的质量分。
图6是根据本发明的一个实施例的对多个网络内容进行排序的网络内容排序系统1000的示意性方框图。
该网络内容排序系统1000包括网络内容质量评价装置600和排序装置640。该网络内容质量评价装置600可以与图5公开的网络内容质量评价装置500相同或不同,并且用于为所述多个网络内容中每一个评定质量分。
排序装置640用于以所述质量分作为排序依据之一对所述多个网络内容进行排序。
图7是根据本发明的一个实施例的排序装置740的示意性方框图。排序装置740可以与图6公开的排序装置640相同或不同。
排序装置740可以包括搜索排序单元741、分类列表排序单元742以及排行榜排序单元743中的任一项、两项或全部。
搜索排序单元741可以用于响应于用户的网络内容查询请求而对搜索得到的多个网络内容进行排序。
分类列表排序单元742可以用于以网络内容分类列表来排序多个网络内容。
排行榜排序单元743可以用于以排行榜单来排序多个网络内容。
结合图5-7公开的装置和系统也可以使用与在部分I中各特征的计算方法和公式(参见部分I中的1-5小节)相对应的计算装置来求取各特征的分值,在此不再赘述。
图8为根据本发明一个实施例的计算设备800的示意性方框图。
其中,该计算设备800可以是服务器端的设备,也可以是客户端设备如台式计算机、笔记本计算机、平板计算机、智能移动电话等。
如图8所示,该计算设备800包括存储器810和处理器820,处理器820连接到存储器810。
处理器820用于获取网络内容的内容质量特征,内容质量特征包括目录特征、来源质量特征、元信息特征以及主体质量特征中的至少一项特征,计算至少一项特征中每一项的特征分值,根据特征分值计算所述网络内容的质量分,并将质量分保存在存储器810中。
另外,处理器820还可以执行上文所述的网络内容质量评价方法中的其它步骤,或者执行上文所述的对多个网络内容进行排序的方法中的相关步骤,此处不再赘述。
本领域的技术人员可以清楚地了解到本发明实施例中的技术可借助软件加必需的通用硬件的方式来实现,通用硬件包括通用集成电路、通用CPU、通用存储器、通用元器件等,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明实施例中的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例或者实施例的某些部分所述的方法。
因此,本发明的上述技术构思还可以被实施为一种非暂时性机器可读存储介质,其上存储有可执行代码。当该可执行代码被处理器执行时,使该处理器执行上文所述的网络内容质量评价方法,或者执行上文所述的对多个网络内容进行排序的方法。
另一方面,本发明的上述技术构思还可以被实施为一种计算设备,该计算设备包括处理器和非暂时性机器可读存储介质。该非暂时性机器可读存储介质上存储有可执行代码。当该可执行代码被该处理器执行时,使该处理器执行上文所述的方法。
此外,根据本发明的方法还可以实现为一种计算机程序产品,该计算机程序产品包括计算机可读介质,在该计算机可读介质上存储有用于执行本发明的方法中限定的上述功能的计算机程序。本领域技术人员还将明白的是,结合这里的公开所描述的各种示例性逻辑块、模块、电路和算法步骤可以被实现为电子硬件、计算机软件或两者的组合。
附图中的流程图和框图显示了根据本发明的多个实施例的系统和方法的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应 当注意,在有些作为替换的实现中,方框中所标记的功能也可以以不同于附图中所标记的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
以上已经描述了本发明的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。

Claims (16)

  1. 一种网络内容质量评价方法,包括:
    获取网络内容的内容质量特征,所述内容质量特征包括目录特征、来源质量特征、元信息特征以及主体质量特征中的至少一项特征;
    计算所述至少一项特征中每一项的特征分值;以及
    根据所述特征分值计算所述网络内容的质量分。
  2. 如权利要求1所述的方法,其中所述网络内容是如下各项中的任一项:
    书籍、音乐、APP、网络电台。
  3. 如权利要求1所述的方法,其中,
    所述内容质量特征包括目录特征、来源质量特征、元信息特征以及主体质量特征中的至少两项特征,
    该方法还包括:
    为所述至少两项特征中的每一项分配特征权重,
    其中,通过对所述至少两项特征的特征分值进行加权求和来计算所述网络内容的质量分。
  4. 如权利要求1所述的方法,其中所述目录特征包括以下特征中的一项或多项:
    更新分;
    空章率;
    无用章率;
    章节长度;
    为所述网络内容的主站分配的主站权威分;以及
    所述主站的实际章节率。
  5. 如权利要求4所述的方法,其中
    更新分=Time_gap^[1/(Time_gap+1)],其中Time_gap=1-(当前时间-最后更新时间)/30天且位于区间[0,1]内,当最后更新时间超过30天时,Time_gap为0;并且/或者
    空章率=空章节数/章节总数;并且/或者
    无用章率=无用章节数/章节总数;并且/或者
    章节长度=章节数/1000,其中章节数是1到1000之间的整数,当章节数大于1000时,章节长度取值为1;并且/或者
    主站权威分取值为:
    如果主站权威分值高于一定阈值或者转载量多于一定数量的站点,则直接使用主站自身的权威分,否则根据转载量增加主站自身的权威分得到最终的主站权威分;并且/或者
    实际章节率=主站章节数/平均章节数,其中平均章节数是所述网络内容所有来源的平均章节数,但当主站章节数不小于平均章节数时,实际章节率为1。
  6. 如权利要求5所述的方法,如下求取所述目录特征分:
    更新分、空章率、无用章率和章节长度相加,其中每个特征都乘以主站权威分及实际章节率,且目录特征分的取值范围为[0,1]。
  7. 如权利要求1所述的方法,其中所述质量分的计算包括以下取值的一项或多项:
    来源特征分=Ave_host_score*(1+Host_factor),且所述来源特征分的取值范围为[0,1],其中小说所有来源站的平均权重Ave_host_score=(Σhost_score)/host_num,host_num是来源站数且host_score是来源站各自的权重,而来源数权重Host_factor=host_num/30,其中host_num是1到30之间的整数,当host_num大于30时,来源数权重取值为1;并且/或者
    元信息特征分=一级目录分+二级目录分+图片信息分+标签分+ 简介分,其中一级目录分、二级目录分、图片信息分、标签分和简介分各自在一级目录、二级目录、图片信息、标签和简介存在时取值为0.2,否则为0;并且/或者
    主体质量分=所有章节的总得分/章节数,且主体质量分的取值范围为[0,1]。
  8. 如权利要求1所述的方法,如下求取所述网络内容的质量分:
    以6:1:3:5的比例加权相加目录特征分、元信息特征分、来源特征分和主体质量特征分得到所述质量分。
  9. 一种对多个网络内容进行排序的方法,包括:
    使用根据权利要求1至8中任何一项所述的方法,为所述多个网络内容中每一个评定质量分;以及
    以所述质量分作为排序依据之一对所述多个网络内容进行排序。
  10. 如权利要求9所述的方法,其中,对所述多个网络进行排序包括:
    响应于用户的网络内容查询请求而对搜索得到的多个网络内容进行排序;或者
    以网络内容分类列表来排序多个网络内容;或者
    以排行榜单来排序多个网络内容。
  11. 一种网络内容质量评价装置,包括:
    用于获取网络内容的内容质量特征的特征获取单元,所述内容质量特征包括目录特征、来源质量特征、元信息特征以及主体质量特征中的至少一项特征;
    用于计算所述至少一项特征中每一项的特征分值的特征分计算单元;以及
    用于根据所述特征分值计算所述网络内容的质量分的质量分计算单元。
  12. 如权利要求11所述的装置,其中,
    所述内容质量特征包括目录特征、来源质量特征、元信息特征以及主体质量特征中的至少两项特征,
    该装置还包括:
    用于为所述至少两项特征中的每一项分配特征权重的权重分配单元,
    其中,所述质量分计算单元通过对所述至少两项特征的特征分值进行加权求和来计算所述网络内容的质量分。
  13. 一种对多个网络内容进行排序的系统,包括:
    根据权利要求11或12所述的网络内容质量评价装置,所述装置为所述多个网络内容中每一个评定质量分;以及
    排序装置,用于以所述质量分作为排序依据之一对所述多个网络内容进行排序。
  14. 如权利要求13所述的系统,其中,所述排序装置包括:
    搜索排序单元,用于响应于用户的网络内容查询请求而对搜索得到的多个网络内容进行排序;或者
    分类列表排序单元,用于以网络内容分类列表来排序多个网络内容;或者
    排行榜排序单元,以排行榜单来排序多个网络内容。
  15. 一种计算设备,包括:
    存储器;
    处理器,连接到所述存储器,用于获取网络内容的内容质量特征,所述内容质量特征包括目录特征、来源质量特征、元信息特征以及主体质量特征中的至少一项特征,计算所述至少一项特征中每一项的特征分值,根据所述特征分值计算所述网络内容的质量分,并将所述质量分保存在所述存储器中。
  16. 一种非暂时性机器可读存储介质,其上存储有可执行代码,当所述可执行代码被处理器执行时,使所述处理器执行根据权利要求1-10中任一项所述的方法。
PCT/CN2016/082376 2015-05-26 2016-05-17 网络质量评价方法及装置、网络内容排序方法及系统、计算设备及非暂时性机器可读存储介质 WO2016188347A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510274495.1 2015-05-26
CN201510274495.1A CN104850642B (zh) 2015-05-26 2015-05-26 网络内容质量评价方法和装置

Publications (1)

Publication Number Publication Date
WO2016188347A1 true WO2016188347A1 (zh) 2016-12-01

Family

ID=53850286

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/082376 WO2016188347A1 (zh) 2015-05-26 2016-05-17 网络质量评价方法及装置、网络内容排序方法及系统、计算设备及非暂时性机器可读存储介质

Country Status (2)

Country Link
CN (1) CN104850642B (zh)
WO (1) WO2016188347A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115277070A (zh) * 2022-06-17 2022-11-01 西安热工研究院有限公司 一种网络安全运维热力图的生成方法

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104850642B (zh) * 2015-05-26 2017-05-17 广州神马移动信息科技有限公司 网络内容质量评价方法和装置
CN105302913B (zh) * 2015-11-12 2018-09-18 北京奇虎科技有限公司 网络小说章节列表评估方法及装置
CN105787287B (zh) * 2016-05-06 2018-08-10 广州爱九游信息技术有限公司 一种生成榜单数据的系统、设备、装置及方法
CN107870912A (zh) 2016-09-22 2018-04-03 广州市动景计算机科技有限公司 文章质量评分方法、设备、客户端、服务器及可编程设备
CN106649468B (zh) * 2016-09-28 2023-04-07 杭州电子科技大学 一种cdn网络内容查询方法及系统
CN108733672B (zh) * 2017-04-14 2023-01-24 腾讯科技(深圳)有限公司 实现网络信息质量评估的方法和系统
CN107784109A (zh) * 2017-10-31 2018-03-09 浠绘旦 一种网络小说商业价值的评估方法及系统
CN110008369A (zh) * 2018-12-26 2019-07-12 阿里巴巴集团控股有限公司 信息处理方法及其装置、电子设备、计算机可读介质
CN110472096A (zh) * 2019-08-22 2019-11-19 腾讯音乐娱乐科技(深圳)有限公司 歌曲库的管理方法、装置、设备及存储介质
CN110727841A (zh) * 2019-09-12 2020-01-24 上海麦克风文化传媒有限公司 一种网络电台的音频专辑内容质量评价方法及系统
CN110728966B (zh) * 2019-09-12 2023-05-23 上海麦克风文化传媒有限公司 一种音频专辑内容质量评价方法及系统
CN111260197A (zh) * 2020-01-10 2020-06-09 光明网传媒有限公司 网络文章评价方法、系统、计算机设备及可读存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090100094A1 (en) * 2007-10-15 2009-04-16 Xavier Verdaguer Recommendation system and method for multimedia content
CN101582086A (zh) * 2009-06-11 2009-11-18 腾讯科技(深圳)有限公司 获取博客质量信息的方法和装置
CN102999490A (zh) * 2011-09-08 2013-03-27 北京无限讯奇信息技术有限公司 商户文档权重评价方法
US20140089322A1 (en) * 2012-09-14 2014-03-27 Grail Inc. System And Method for Ranking Creator Endorsements
CN104219575A (zh) * 2013-05-29 2014-12-17 酷盛(天津)科技有限公司 相关视频推荐方法及系统
CN104239468A (zh) * 2014-09-02 2014-12-24 百度在线网络技术(北京)有限公司 一种用于对推荐信息进行重新排序的方法与装置
CN104850642A (zh) * 2015-05-26 2015-08-19 广州神马移动信息科技有限公司 网络内容质量评价方法和装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7840581B2 (en) * 2008-02-01 2010-11-23 Realnetworks, Inc. Method and system for improving the quality of deep metadata associated with media content
US9202269B2 (en) * 2011-06-21 2015-12-01 Thomson Licensing User terminal device, server device, system and method for assessing quality of media data

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090100094A1 (en) * 2007-10-15 2009-04-16 Xavier Verdaguer Recommendation system and method for multimedia content
CN101582086A (zh) * 2009-06-11 2009-11-18 腾讯科技(深圳)有限公司 获取博客质量信息的方法和装置
CN102999490A (zh) * 2011-09-08 2013-03-27 北京无限讯奇信息技术有限公司 商户文档权重评价方法
US20140089322A1 (en) * 2012-09-14 2014-03-27 Grail Inc. System And Method for Ranking Creator Endorsements
CN104219575A (zh) * 2013-05-29 2014-12-17 酷盛(天津)科技有限公司 相关视频推荐方法及系统
CN104239468A (zh) * 2014-09-02 2014-12-24 百度在线网络技术(北京)有限公司 一种用于对推荐信息进行重新排序的方法与装置
CN104850642A (zh) * 2015-05-26 2015-08-19 广州神马移动信息科技有限公司 网络内容质量评价方法和装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115277070A (zh) * 2022-06-17 2022-11-01 西安热工研究院有限公司 一种网络安全运维热力图的生成方法
CN115277070B (zh) * 2022-06-17 2023-08-29 西安热工研究院有限公司 一种网络安全运维热力图的生成方法

Also Published As

Publication number Publication date
CN104850642B (zh) 2017-05-17
CN104850642A (zh) 2015-08-19

Similar Documents

Publication Publication Date Title
WO2016188347A1 (zh) 网络质量评价方法及装置、网络内容排序方法及系统、计算设备及非暂时性机器可读存储介质
US20230205828A1 (en) Related entities
US10423677B2 (en) Time-box constrained searching in a distributed search system
US7962477B2 (en) Blending mobile search results
US20170357714A1 (en) Query Understanding Pipeline
US20120059838A1 (en) Providing entity-specific content in response to a search query
US8977625B2 (en) Inference indexing
WO2014176192A2 (en) Live recommendation generation
US9047278B1 (en) Identifying and ranking attributes of entities
US10152478B2 (en) Apparatus, system and method for string disambiguation and entity ranking
RU2685991C1 (ru) Основанные на контексте мгновенные поисковые рекомендации
US9135307B1 (en) Selectively generating alternative queries
WO2013184957A1 (en) Systems and methods of classifying content items
KR101344913B1 (ko) 지역별 자동완성 질의어 제공 시스템 및 방법
US20140317073A1 (en) Ranking signals in mixed corpora environments
US10685073B1 (en) Selecting textual representations for entity attribute values
US9251202B1 (en) Corpus specific queries for corpora from search query
US20130325852A1 (en) Searching based on an identifier of a searcher
US9779140B2 (en) Ranking signals for sparse corpora
US9424342B1 (en) Geographically local query detection
CN107463590B (zh) 自动的对话阶段发现
US20170091214A1 (en) System and method for populating dynamic folders for sharing content over the web
US8868579B2 (en) Restricted web search based on user-specified source characteristics
US9116996B1 (en) Reverse question answering

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16799236

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16799236

Country of ref document: EP

Kind code of ref document: A1