CN105653737A - Method, equipment and electronic equipment for content document sorting - Google Patents

Method, equipment and electronic equipment for content document sorting Download PDF

Info

Publication number
CN105653737A
CN105653737A CN201610116247.9A CN201610116247A CN105653737A CN 105653737 A CN105653737 A CN 105653737A CN 201610116247 A CN201610116247 A CN 201610116247A CN 105653737 A CN105653737 A CN 105653737A
Authority
CN
China
Prior art keywords
content document
document
object content
ranking score
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610116247.9A
Other languages
Chinese (zh)
Other versions
CN105653737B (en
Inventor
高建煌
郑颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Guangzhou Shenma Mobile Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Shenma Mobile Information Technology Co Ltd filed Critical Guangzhou Shenma Mobile Information Technology Co Ltd
Priority to CN201610116247.9A priority Critical patent/CN105653737B/en
Publication of CN105653737A publication Critical patent/CN105653737A/en
Priority to PCT/CN2017/074510 priority patent/WO2017148323A1/en
Application granted granted Critical
Publication of CN105653737B publication Critical patent/CN105653737B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/93Document management systems

Abstract

The invention discloses a method, equipment and electronic equipment for content document sorting. The method for content document sorting includes the steps that a reverse index is established for reference content documents based on index terms in a reference content document library; key words in target content documents are selected, and the target content documents are the content documents to be sorted and recommended to a client; based on the key words, corresponding index terms are searched for so as to obtain a corresponding reverse zipper from the reverse index; based on the reference content documents in the reversed zipper, sorting scores of the target content documents are set; based on the sorting scores, the target content documents are sorted.

Description

For the method for content document sequence, equipment and electronic equipment
Technical field
The present invention relates to electronic information technical field, more particularly, the present invention relates to a kind of method for content document sequence, a kind of equipment for content document sequence and a kind of electronic equipment for providing a user with content document.
Background technology
Along with the development of electronic information technology, (offer) information content actively can be recommended to user by increasing content supplier. Described information content can also be referred to as article or content document. If the information content recommended to user is useful to client, then this can increase the stickiness of user. Therefore, how information content is recommended to become the technical problem that those skilled in the art pay close attention to always to user.
Additionally, present inventors have recognized that, for information recommended technology, those skilled in the art's technical issues that need to address not only include how the information content recommending user interested, also should include how the information content recommending quality higher.
Summary of the invention
It is an object of the present invention to provide a kind of new solution for content document sequence.
According to the first aspect of the invention, it is provided that a kind of for content document sequence method, including: in reference content document library, based on index terms, reference content document is set up inverted index; Choosing the key word in object content document, wherein object content document is to be sorted with to the content document of lead referral; Based on key word, the index terms that search is corresponding, corresponding arrange slide fastener to obtain from described inverted index; Based on the reference content document in the row's of falling slide fastener, the ranking score of object content document is set; And based on ranking score, described object content document is ranked up.
Preferably, the step setting up inverted index includes: the title of reference content document is cut word, to obtain index terms; Calculate the weight representing index terms significance level in reference content document; And store described weight.
Preferably, the step calculating weight includes: calculate word frequency and reverse document-frequency for index terms; Calculate described word frequency and the product of reverse document-frequency; By the product normalization of described word frequency and reverse document-frequency.
Preferably, the step choosing the key word in object content document includes: the title of object content document is cut word, to obtain at least one key word. Preferably, it is determined that the step of the ranking score of object content document includes: for each reference content document in the row's of falling slide fastener, obtain the weight of term corresponding at least one key word described; To acquired weight summation, to determine hit weight; When hitting weight more than predetermined value, it is determined that hit this reference content document library; And it is additionally based upon the reference content document library hit, ranking score is set. Preferably, described predetermined value is 0.8.
Preferably, at least one reference content document library is hit, and each hit reference content document library is equipped with the stepping value representing concerned degree and described ranking score being set with value based on described stepping value.
Preferably, described method also includes: the number of words of based target content document, whether there is paragraph and piles up and whether there is at least one in picture, arranges ranking score.
Preferably, described method also includes: based on user click object content document on average click duration, user share object content document share number, user collect object content document collection number and user represent do not like object content document do not like number at least one, the clicking rate of object content document is adjusted, with the clicking rate being adjusted; And the clicking rate being additionally based upon adjustment arranges ranking score.
According to the second aspect of the invention, it is provided that a kind of equipment for content document sequence, including: for reference content document being set up based on index terms the device of inverted index in reference content document library; For choosing the device of the key word in object content document, wherein object content document is to be sorted with to the content document of lead referral; For based on key word, the index terms of search correspondence to obtain the device of the corresponding row's of falling slide fastener from described inverted index; Device for the ranking score based on the reference content document setup object content document in the row's of falling slide fastener; And the device for object content document being ranked up based on ranking score.
Preferably, the device setting up inverted index includes: for the title of reference content document is cut word to obtain the device of index terms; For calculating the device of the weight representing index terms significance level in reference content document; And for storing the device of described weight.
Preferably, the described device for calculating weight includes: for calculating the device of word frequency and reverse document-frequency for index terms; For calculating the device of the product of described word frequency and reverse document-frequency; For by the normalized device of product of described word frequency and reverse document-frequency.
Preferably, the device of the described key word for choosing in object content document includes: for the title of object content document is cut word to obtain the device of at least one key word. Preferably, the device of the described ranking score for determining object content document includes: for obtaining the device of the weight of term corresponding at least one key word described for each reference content document in the row's of falling slide fastener; For the device that acquired weight is sued for peace to determine hit weight; For determining the device hitting this reference content document library when hitting weight more than predetermined value; And for being additionally based upon the reference content document library hit, the device of ranking score is set.Preferably, described predetermined value is 0.8.
Preferably, at least one reference content document library is hit, and each hit reference content document library is equipped with the stepping value representing concerned degree and described ranking score being set with value based on described stepping value.
Preferably, described equipment also includes: for based target content document number of words, whether there is paragraph and pile up and whether there is at least one in picture the device of ranking score is set.
Preferably, described equipment also includes: for based on user click object content document on average click duration, user share object content document share number, user collects the collection number of object content document and user represent do not like object content document do not like in number at least one the clicking rate of object content document is adjusted the device of the clicking rate being adjusted; And the clicking rate for being additionally based upon adjustment arranges the device of ranking score.
According to the third aspect of the invention we, provide a kind of electronic equipment for providing a user with content document, including the equipment for content document sequence according to the present invention, wherein, this electronic equipment is for providing a user with object content document according to the result that object content file is ranked up.
According to the fourth aspect of the invention, provide a kind of electronic equipment for providing a user with content document, including processor and memorizer, described memorizer is used for storing instruction, and described instruction is operated performing the method for content document sequence according to the present invention for controlling described processor.
By referring to the accompanying drawing detailed description to the exemplary embodiment of the present invention, the further feature of the present invention and advantage thereof will be made apparent from.
Accompanying drawing explanation
Combined in the description and the accompanying drawing of the part constituting description illustrate embodiments of the invention, and be used for explaining principles of the invention together with its explanation.
Fig. 1 is according to an embodiment of the invention for the flow chart of the method for content document sequence.
Fig. 2 is the schematic block diagram of the electronic equipment according to the present invention.
Fig. 3 shows the schematic diagram of an example according to the present invention.
Detailed description of the invention
The various exemplary embodiments of the present invention are described in detail now with reference to accompanying drawing. It should also be noted that unless specifically stated otherwise, the parts otherwise set forth in these embodiments and positioned opposite, the numerical expression of step and numerical value do not limit the scope of the invention.
Description only actually at least one exemplary embodiment is illustrative below, never as any restriction to the present invention and application or use.
The known technology of person of ordinary skill in the relevant, method and apparatus are likely to be not discussed in detail, but in the appropriate case, described technology, method and apparatus should be considered a part for description.
Shown here with in all examples discussed, any occurrence should be construed as merely exemplary, not as restriction. Therefore, other example of exemplary embodiment can have different values.
It should also be noted that similar label and letter below figure represent similar terms, therefore, once a certain Xiang Yi accompanying drawing is defined, then it need not be further discussed in accompanying drawing subsequently.
Below, describe with reference to the accompanying drawings according to embodiments of the invention and example.
The flow chart of Fig. 1 method for content document sequence according to an embodiment of the invention.
In an embodiment of the present invention, it is provided that the ordering techniques scheme of a kind of priori, by the sortord of this priori, it is possible to when without the help of posteriority data, information content is ranked up. Such as, when an information recommendation system initially recommends information content to user, information recommendation system does not store enough data for system judges to recommend which information content and/or which information content of preferential recommendation. In this case, the ordering techniques scheme according to the priori of the present invention is especially advantageous.
As it is shown in figure 1, in step S1100, reference content document is set up inverted index based on index terms in reference content document library.
Here, content document represents information content or article. Reference content document library is certain recommendation information content set existing, for instance, operation article that the popular list of other websites, management personnel are manually arranged etc.
Due to the expression form in each reference content document library (website) of the content document (especially the title of the document) for identical content may difference to some extent, so judging whether object content document hits the mode of a reference content document library (popular list) it is possible that problem either directly through coupling completely. Therefore, in one embodiment of the invention, it is also possible to the weight of computation index word is to help above-mentioned judgement.
Here, first, the title of reference content document is cut word, to obtain index terms. It is well known by persons skilled in the art for how carrying out cutting word, omits its description at this. Here, it is possible to from the word obtained after cutting word, remove the stop words in title, to improve treatment effeciency. Stop words refers to the word that some in article are nonsensical, such as " ", " ", " we " etc.
Then, the weight representing index terms significance level in reference content document is calculated. For example, it is possible to calculate word frequency tf and reverse document-frequency idf for index terms; Calculate the product (tf*idf) of described word frequency and reverse document-frequency; By product (tf*idf) normalization (such as, by described product to 1, to obtain described weight) of described word frequency and reverse document-frequency. Word frequency tf represents the number of times that a word occurs in one section of content document. Reverse document-frequency idf indicates that the tolerance of the general importance of a word, and the business obtained by the general act number number divided by the file comprising this word, then can be taken the logarithm and obtain by the idf of some specific word.
Then, store described weight, for follow-up use.
In step S1200, choose the key word in object content document.
Object content document is to be sorted with to the content document of lead referral.
In one example, from the title of object content document, key word is chosen. In this case, the title of object content document is cut word, to obtain at least one key word. Here, it is possible to from the word obtained after cutting word, remove the stop words in title.
Although step S1200 is described after step S1100, it will be understood by those skilled in the art, however, that this is not offered as the order of step step S1100 and step S1200. In some cases, it is possible to before step S1100, perform step S1200.
In step S1300, based on key word, the index terms that search is corresponding, corresponding arrange slide fastener to obtain from described inverted index.
In the present invention, the inverted index by pre-building, it is possible to make the processing speed to object content document faster.
In step S1400, based on the reference content document in the row's of falling slide fastener, the ranking score of object content document is set.Here, referring to reference content document " based on reference content document " is a parameter for arranging in multiple parameters of ranking score.
After obtained the row's of falling slide fastener by key word, it is possible to ranking score is set in several ways. Such as, in the simplest situations, when the row's of falling slide fastener is not empty, it is determined that this object content document hits this reference content document library and the stepping value according to this reference content document library arranges ranking score.
In one embodiment, it is possible to use foregoing weight judges whether hit reference content document library. Such as, for each reference content document in the row's of falling slide fastener, the weight of term corresponding at least one key word described is obtained; To acquired weight summation, to determine hit weight; When hitting weight more than predetermined value, it is determined that hit this reference content document library; And based on the reference content document library hit, ranking score is set. Such as, described predetermined value is 0.8.
Can from network or obtain multiple popular lists or other content recommendation collection of document elsewhere as multiple reference content document library. Judge whether it is hit for each reference content document library. Each hit reference content document library is equipped with the stepping value representing concerned degree. When at least one reference content document library is hit, described ranking score can being set with value based on described stepping value. Here,
For example, it is possible to reference content document library to be divided into high, medium and low three class, correspond respectively to 3,2,1 three stepping values. Calculate object content document each reference content document library hit stepping value and value. Such as, when with value be more than or equal to 5, it is believed that the degree such as this object content document is high are concerned; When with value be more than or equal to 3 and less than or equal to 4, it is believed that this object content document is moderate concerned; When with value less than or equal to 2, it is believed that the degree such as this object content document is low are concerned. Based on concerned degree, ranking score can be set.
Further, it is also possible to the number of words of based target content document, whether there is paragraph and pile up and whether there is at least one in picture, ranking score is set. These parameters can reflect the quality of object content document to a certain extent.
Ranking score can be set based on the parameter of many aspects, for instance, the parameter such as the result of hit reference content document library recited above and number of words. Such as, in an exemplary example, it is possible to by following layout, ranking score is set.
1, when the text number of words of object content document is less than 15, ranking score is 1.
2, piling up and when it does not have picture when there is multiple paragraph in object content document, ranking score is 1. Can pass through to check whether object content document exists multiple repetition paragraph, or by checking whether the cryptographic Hash of multiple paragraphs is similar to, and detect whether that there is paragraph piles up. Those skilled in the art, it is conceivable that the mode piled up of multiple detection paragraph, therefore, are here not described in detail.
3, when there is multiple paragraph in object content document and pile up but it having picture, ranking score is 2.
4, when the title of object content document comprises the word such as " [video] " or " [picture] ", but when its actual content does not comprise video or picture, ranking score is 2.
5, when object content document be high degree concerned time, and when the text number of words that object content document has is more than 150 or when the text number of words of object content document is more than 100 and when it has picture, ranking score is 5.
6, when object content document is that high degree is concerned and the text number of words of object content document more than 100 time, ranking score is 4.
7, when object content document be moderate concerned time, and when the text number of words of object content document is more than 150 or when the text number of words of object content document is more than 100 and when it has picture, ranking score is 4.
8, when object content document be the degree such as low concerned time, and when the text number of words of object content is more than 150 and when it has picture, ranking score is 4.
9, in some cases, it is possible to pre-set some content document as reference content document library and to the mark predetermined to this lab setting by content provider. When determining that object content document hits this reference content document library by foregoing step, it is possible to use described predetermined mark is as ranking score.
In another embodiment, it is also possible to the technical scheme of foregoing priori is combined with posterior technical scheme. Such as, when in information recommendation system, storage has enough data, it is possible to verify, by described data, the degree and/or its quality that object content article makes user interested. Such as, the described method for content document sequence also includes: based on user click object content document on average click duration, user share object content document share number, user collects the collection number of object content document and user represent do not like object content document do not like in number at least one, the clicking rate of object content document is adjusted the clicking rate being adjusted; And the clicking rate being additionally based upon adjustment arranges ranking score. In this case, except foregoing parameter, the reference for arranging ranking score can also include above-mentioned clicking rate. This described ranking score can be reflected more exactly degree that object content document is likely to make user interested and/or its quality.
In this embodiment, it is possible to use its quality is determined in the performance in launch process of the object content document. Usually, if the quality of a table of contents mark content document is high, then the behavior that more user carries out clicking and/or sharing and/or collect is had. If the quality of a table of contents mark content document is low, then above-mentioned behavior can be less. Additionally, the quality of some object content document is relatively low, but its title is very attractive to gain the click of user by cheating. The clicking rate of this object content document is likely to significantly high, but the time that user clicks through this content document can be shorter, and tends not to generation and share and/or the behavior such as collection, or user can explicitly indicate that and not like. These posterior behaviors can assist in ranking score.
Such as, in one example, it is possible to by equation below, the clicking rate of object content document is adjusted the clicking rate being adjusted:
Tune_ctr=ctr* (1+like_num/click_num)a*(min(1,duration/60s))b(formula 1)
Tune_ctr is the clicking rate adjusted. Ctr is initial clicking rate (that is, the click volume/amount of representing). Like_num=user shares the number+user that shares of the document and collects the collection number-user of the document and represent that what do not like the document does not like number. The click_num amount of being click on. Duration is the averaged residence duration that user clicks this object content document. A and b is factor of influence respectively, it is possible to rule of thumb arranges or can be determined by the mode learnt.
The tune_ctr distribution of multiple object content documents in certain period of time (such as, past 7 days) can be added up, it is possible to the bucket dividing mode such as use, adjust_ctr is divided in 10 buckets, to determine posterior ranking score.It is, for example possible to use the setting of 1-2-4-2-1, i.e. the posterior ranking score of the 1st bucket is 1; The posterior ranking score of the 2nd��3 bucket is 2; The posterior ranking score 3 of the 4th��7 bucket; The posterior ranking score of the 8th��9 bucket is 4; And the posterior ranking score of the 10th bucket is 5.
Then, for instance can by the ranking score combination of the priori in the example of posterior ranking score and previous exemplary, and using the ranking score after combination as final ranking score. In one example, it is possible to carry out described combination by equation below:
Final_quality=a*posterior_quality+ (1-a) * prior_quality (formula 2)
In formula (2), final_quality is final ranking score. Posterior_quality is posterior ranking score. Prior-quality: the ranking score of priori. A is the weight accounting of posterior ranking score, and it can represent number of times according to article and obtain. Such as, and a=min (0.9, (show_num/2000)0.5), wherein show_num be article represent number of times.
In step S1500, based on ranking score, object content document is ranked up.
It will be appreciated by those skilled in the art that in electronic technology field, it is possible to by the mode that software, hardware and software and hardware combine, said method is embodied in the product. Instruction according to this specification, those skilled in the art are easy to based on method as disclosed above, produce a kind of equipment for content document sequence, including: for reference content document being set up based on index terms the device of inverted index in reference content document library; For choosing the device of the key word in object content document, wherein object content document is to be sorted with to the content document of lead referral; For based on key word, the index terms of search correspondence to obtain the device of the corresponding row's of falling slide fastener from described inverted index; Device for the ranking score based on the reference content document setup object content document in the row's of falling slide fastener; And the device for described object content document being ranked up based on ranking score. Preferably, the device setting up inverted index includes: for the title of reference content document is cut word to obtain the device of index terms; For calculating the device of the weight representing index terms significance level in reference content document; And for storing the device of described weight. Preferably, the described device for calculating weight includes: for calculating the device of word frequency tf and reverse document-frequency idf for index terms; For calculating the device of the product (tf*idf) of described word frequency and reverse document-frequency; For the device by the product of described word frequency and reverse document-frequency (tf*idf) normalization (such as, described product being normalized to 1 to obtain described weight). Preferably, the device of the described key word for choosing in object content document includes: for the title of object content document is cut word to obtain the device of at least one key word; Wherein, the device of the described ranking score for determining object content document includes: for obtaining the device of the weight of term corresponding at least one key word described for each reference content document in the row's of falling slide fastener; For the device that acquired weight is sued for peace to determine hit weight; For determining the device hitting this reference content document library when hitting weight more than predetermined value;And for based on the reference content document library hit, arranging the device of ranking score. Preferably, described predetermined value is 0.8. Preferably, at least one reference content document library is hit, and each hit reference content document library is equipped with the stepping value representing concerned degree and described ranking score being set with value based on described stepping value. Preferably, described equipment also includes: for based target content document number of words, whether there is paragraph and pile up and whether there is at least one in picture the device of ranking score is set. Preferably, described equipment also includes: for based on user click object content document on average click duration, user share object content document share number, user collects the collection number of object content document and user represent do not like object content document do not like in number at least one calculate the device of clicking rate of object content document; And the device of ranking score is set for being additionally based upon clicking rate.
It will be appreciated by those skilled in the art that, it is possible to realize said apparatus by various modes. For example, it is possible to configure processor by instruction to realize said apparatus. For example, it is possible to instruction is stored in ROM, and when starting the device, instruction is read from ROM programming device realizes said apparatus. For example, it is possible to said apparatus is cured in dedicated devices (such as ASIC). Said apparatus can be divided into separate unit, or they can be combined realization. Said apparatus can be realized by the one in above-mentioned various implementations, or can be realized by the combination of two or more modes in above-mentioned various implementations. To those skilled in the art, these embodiments are all of equal value.
Along with the development of information technology, the gap between server and terminal unit is more and more less. Therefore, it can in the server on network or realize the technical scheme of embodiments of the invention on the terminal device. From this aspect, one embodiment of the present of invention also includes the electronic equipment for providing a user with content document. This electronic equipment can be the server on network, or can also be terminal unit.
In one embodiment, described electronic equipment includes the equipment for content document sequence recited above, and wherein, this electronic equipment is for providing a user with object content document according to the result that object content file is ranked up.
It is well known by those skilled in the art that the development along with the such as electronic information technology of large scale integrated circuit technology and the trend of hardware and software, the soft and hardware boundary that will clearly divide electronic equipment has seemed relatively difficult. Because any operation can realize by software, it is also possible to by realizing. The execution of any instruction can be completed by hardware, can also be completed by software equally. A certain machine function is adopted to hardware implementations or software implement scheme, depend on the Non-technical factors such as price, speed, reliability, memory capacity, change cycle. Therefore, to those skilled in the art, described electronic equipment should contain all these implementations.
In one more specifically embodiment, described electronic equipment can include processor and memorizer. Fig. 2 schematically shows the block diagram of this electronic equipment for providing a user with content document. Electronic equipment 2000 in Fig. 2 is such as server, terminal unit etc.
As in figure 2 it is shown, electronic equipment 2000 can include processor 2010, memorizer 2020, interface arrangement 2030, communicator 2040, display device 2050, input equipment 2060, speaker 2070, mike 2080, etc.
Processor 2010 can be such as central processor CPU, Micro-processor MCV etc. Memorizer 2020 such as includes the nonvolatile memory etc. of ROM (read only memory), RAM (random access memory), such as hard disk. Interface arrangement 2030 such as includes USB interface, earphone interface etc.
Communicator 2040 such as can carry out there is wired or wireless communication.
Display device 2050 is such as LCDs, touches display screen etc. Input equipment 2060 such as can include touch screen, keyboard etc. User can pass through speaker 2070 and mike 2080 inputting/outputting voice information.
Electronic equipment shown in Fig. 2 is only indicative, and is never intended to restriction invention, its application, or uses.
In this specific embodiment, described memorizer 2020 is used for storing instruction, the method for content document sequence that described instruction performs shown in Fig. 1 for controlling described processor 2010 to be operated. Although it will be appreciated by those skilled in the art that and figure 2 illustrates multiple device, but, the present invention can only relate to partial devices therein, for instance, processor 2010 and storage device 2020 etc. Technical staff can according to presently disclosed conceptual design instruction. How instruction controls processor is operated, and this is it is known in the art that therefore is not described in detail at this.
Fig. 3 shows the schematic diagram of an example according to the present invention. As shown in Figure 3, it is possible to the technical scheme of each embodiment according to the present invention is applied in information recommendation system, thus recommending more to make its information content interested (content document) and/or higher-quality information content to user.
As it is shown on figure 3, information recommendation system 3000 includes electronic equipment 3020 and client 3030. Electronic equipment 3020 is such as the electronic equipment for providing a user with content document of foregoing each embodiment according to the present invention. Client device 3030 is the equipment of user. Here, electronic equipment 3020 is shown as being connected by network 3010 with client device 3030, but, it can also combine with client device 3030. Specifically, electronic equipment 3020 can be the server on network, it is also possible to be integrated in the client.
In figure 3, electronic equipment 3020 such as obtains reference content document library by network 3010 from other guide supplier 3040-1,3040-2, for instance, popular list on website or recommend information etc. Electronic equipment 3020 utilizes described reference content document library, by the technical scheme for content document sequence of each embodiment according to the present invention to be ranked up to the article that user recommends, and recommends and/or transmission article to user based on ranking results. Such as, preferentially there is the article of higher ranked mark to user's recommendation.
The present invention can be equipment, method and/or computer program. Computer program can include computer-readable recording medium, containing for making processor realize the computer-readable program instructions of various aspects of the invention.
Computer-readable recording medium can be the tangible device that can keep and store and be performed the instruction that equipment uses by instruction. computer-readable recording medium such as may be-but not limited to-the combination of storage device electric, magnetic storage apparatus, light storage device, electromagnetism storage device, semiconductor memory apparatus or above-mentioned any appropriate. the example more specifically (non exhaustive list) of computer-readable recording medium includes: portable computer diskette, hard disk, random access memory (RAM), read only memory (ROM), erasable type programmable read only memory (EPROM or flash memory), static RAM (SRAM), Portable compressed dish read only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanical coding equipment, such as on it, storage has punch card or the groove internal projection structure of instruction, and the combination of above-mentioned any appropriate.Computer-readable recording medium used herein above is not construed as instantaneous signal itself, the electromagnetic wave of such as radio wave or other Free propagations, the electromagnetic wave (such as, by the light pulse of fiber optic cables) propagated by waveguide or other transmission mediums or by the signal of telecommunication of wire transfer.
Computer-readable program instructions as described herein can download to each from computer-readable recording medium and calculate/process equipment, or downloaded to outer computer or External memory equipment by network, such as the Internet, LAN, wide area network and/or wireless network. Network can include copper transmission cable, fiber-optic transfer, is wirelessly transferred, router, fire wall, switch, gateway computer and/or Edge Server. Adapter or network interface in each calculating/process equipment receive computer-readable program instructions from network, and forward this computer-readable program instructions, for be stored in each calculate/process equipment in computer-readable recording medium in.
Can be the source code write of assembly instruction, instruction set architecture (ISA) instruction, machine instruction, machine-dependent instructions, microcode, firmware instructions, condition setup data or the combination in any with one or more programming languages or object code for performing the computer program instructions of present invention operation, described programming language includes OO programming language such as Smalltalk, C++ etc. and the procedural programming languages of routine such as " C " language or similar programming language. Computer-readable program instructions fully can perform on the user computer, partly performs on the user computer, performs as an independent software kit, partly partly perform on the remote computer on the user computer or perform on remote computer or server completely. In the situation relating to remote computer, remote computer can include LAN (LAN) by the network of any kind or wide area network (WAN) is connected to subscriber computer, or, it may be connected to outer computer (such as utilizes ISP to pass through Internet connection). In certain embodiments, by utilizing the status information of computer-readable program instructions to carry out personalized customization electronic circuit, such as Programmable Logic Device, field programmable gate array (FPGA) or programmable logic array (PLA), this electronic circuit can perform computer-readable program instructions, thus realizing various aspects of the invention.
Flow chart and/or block diagram referring herein to method according to embodiments of the present invention, device (system) and computer program describe various aspects of the invention. Should be appreciated that the combination of each square frame in each square frame of flow chart and/or block diagram and flow chart and/or block diagram, can be realized by computer-readable program instructions.
These computer-readable program instructions can be supplied to general purpose computer, special-purpose computer or other programmable data and process the processor of device, thus producing a kind of machine, make these instructions when the processor being processed device by computer or other programmable data is performed, create the device of the function/action of regulation in the one or more square frames in flowchart and/or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium, these instructions make computer, programmable data process device and/or other equipment works in a specific way, thus, storage has the computer-readable medium of instruction then to include a manufacture, and it includes the instruction of the various aspects of the function/action of regulation in the one or more square frames in flowchart and/or block diagram.
Computer-readable program instructions can also be loaded into computer, other programmable data processes on device or miscellaneous equipment, make to process at computer, other programmable data device or miscellaneous equipment perform sequence of operations step, to produce computer implemented process, so that process the function/action of regulation in the one or more square frames in the instruction flowchart and/or block diagram performed on device or miscellaneous equipment at computer, other programmable data.
Flow chart and block diagram in accompanying drawing show according to the system of multiple embodiments of the present invention, the architectural framework in the cards of method and computer program product, function and operation. In this, flow chart or each square frame in block diagram can represent a part for a module, program segment or instruction, and a part for described module, program segment or instruction comprises the executable instruction of one or more logic function for realizing regulation. At some as in the realization replaced, the function marked in square frame can also to be different from the order generation marked in accompanying drawing. Such as, two continuous print square frames can essentially perform substantially in parallel, and they can also perform sometimes in the opposite order, and this determines according to involved function. It will also be noted that, the combination of the square frame in each square frame in block diagram and/or flow chart and block diagram and/or flow chart, can realize by the special hardware based system of the function or action that perform regulation, or can realize with the combination of specialized hardware Yu computer instruction. It is well known that to those skilled in the art, the mode realized by hardware mode, being realized by software mode and being combined by software and hardware realizes being all of equal value.
Being described above various embodiments of the present invention, described above is illustrative of, and non-exclusive, and it is also not necessarily limited to disclosed each embodiment. When not necessarily departing from the scope and spirit of illustrated each embodiment, many modifications and changes will be apparent from for those skilled in the art. The selection of term used herein, it is intended to explain the principle of each embodiment, practical application or the technological improvement to the technology in market best, or make other those of ordinary skill of the art be understood that each embodiment disclosed herein. The scope of the present invention be defined by the appended claims.

Claims (18)

1. for a method for content document sequence, including:
Reference content document is set up inverted index based on index terms by reference content document library;
Choosing the key word in object content document, wherein object content document is to be sorted with to the content document of lead referral;
Based on key word, the index terms that search is corresponding, corresponding arrange slide fastener to obtain from described inverted index;
Based on the reference content document in the row's of falling slide fastener, the ranking score of described object content document is set; And
Based on ranking score, object content document is ranked up.
2. method according to claim 1, wherein, the step setting up inverted index includes:
The title of reference content document is cut word, to obtain index terms;
Calculate the weight representing index terms significance level in reference content document; And
Store described weight.
3. method according to claim 2, wherein, the step calculating weight includes:
Word frequency and reverse document-frequency is calculated for index terms;
Calculate described word frequency and the product of reverse document-frequency;
By the product normalization of described word frequency and reverse document-frequency.
4. method according to claim 3, wherein, the step choosing the key word in object content document includes: the title of object content document is cut word, to obtain at least one key word;
Wherein it is determined that the step of the ranking score of object content document includes:
For each reference content document in the row's of falling slide fastener, obtain the weight of term corresponding at least one key word described;
To acquired weight summation, to determine hit weight;
When hitting weight more than predetermined value, it is determined that hit this reference content document library; And
Based on the reference content document library hit, ranking score is set.
5. method according to claim 4, wherein, described predetermined value is 0.8.
6. method according to claim 4, wherein, at least one reference content document library is hit, and each hit reference content document library is equipped with the stepping value representing concerned degree and described ranking score being set with value based on described stepping value.
7. method according to claim 1, also includes:
It is additionally based upon the number of words of object content document, whether there is paragraph and pile up and whether there is at least one in picture, ranking score is set.
8. method according to claim 1, also includes:
Based on user click object content document on average click duration, user share object content document share number, user collect object content document collection number and user represent do not like object content document do not like number at least one, the clicking rate of object content document is adjusted, with the clicking rate being adjusted; And
The clicking rate being additionally based upon adjustment arranges ranking score.
9. for an equipment for content document sequence, including:
For reference content document being set up based on index terms the device of inverted index in reference content document library;
For choosing the device of the key word in object content document, wherein object content document is to be sorted with to the content document of lead referral;
For based on key word, the index terms of search correspondence to obtain the device of the corresponding row's of falling slide fastener from described inverted index;
Device for the ranking score based on the reference content document setup object content document in the row's of falling slide fastener; And
For the device described object content document being ranked up based on ranking score.
10. equipment according to claim 9, wherein, the device setting up inverted index includes:
For the title of reference content document being cut word to obtain the device of index terms;
For calculating the device of the weight representing index terms significance level in reference content document; And
For storing the device of described weight.
11. equipment according to claim 10, wherein, the described device for calculating weight includes:
For calculating the device of word frequency and reverse document-frequency for index terms;
For calculating the device of the product of described word frequency and reverse document-frequency;
For by the normalized device of product of described word frequency and reverse document-frequency.
12. equipment according to claim 11, wherein, the device of the described key word for choosing in object content document includes: for the title of object content document is cut word to obtain the device of at least one key word;
Wherein, the device of the described ranking score for determining object content document includes:
For obtaining the device of the weight of term corresponding at least one key word described for each reference content document in the row's of falling slide fastener;
For the device that acquired weight is sued for peace to determine hit weight;
For determining the device hitting this reference content document library when hitting weight more than predetermined value; And
The device of ranking score is set for being additionally based upon the reference content document library hit.
13. equipment according to claim 12, wherein, described predetermined value is 0.8.
14. equipment according to claim 12, wherein, at least one reference content document library is hit, and each hit reference content document library is equipped with the stepping value representing concerned degree and described ranking score being set with value based on described stepping value.
15. equipment according to claim 9, also include:
For based target content document number of words, whether there is paragraph and pile up and whether there is at least one in picture the device of ranking score is set.
16. equipment according to claim 9, also include:
For based on user click object content document on average click duration, user share object content document share number, user collects the collection number of object content document and user represent do not like object content document do not like in number at least one the clicking rate of object content document is adjusted the device of the clicking rate being adjusted; And
Clicking rate for being additionally based upon adjustment arranges the device of ranking score.
17. one kind for providing a user with the electronic equipment of content document, including the equipment for content document sequence according to any one in claim 9-16, wherein, this electronic equipment is for providing a user with object content document according to the result that object content file is ranked up.
18. one kind for providing a user with the electronic equipment of content document, including processor and memorizer, described memorizer is used for storing instruction, and described instruction is operated performing the method for content document sequence according to any one in claim 1-8 for controlling described processor.
CN201610116247.9A 2016-03-01 2016-03-01 Method, device and electronic device for content document sequencing Active CN105653737B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201610116247.9A CN105653737B (en) 2016-03-01 2016-03-01 Method, device and electronic device for content document sequencing
PCT/CN2017/074510 WO2017148323A1 (en) 2016-03-01 2017-02-23 Method and device for sorting content documents

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610116247.9A CN105653737B (en) 2016-03-01 2016-03-01 Method, device and electronic device for content document sequencing

Publications (2)

Publication Number Publication Date
CN105653737A true CN105653737A (en) 2016-06-08
CN105653737B CN105653737B (en) 2020-04-17

Family

ID=56492782

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610116247.9A Active CN105653737B (en) 2016-03-01 2016-03-01 Method, device and electronic device for content document sequencing

Country Status (2)

Country Link
CN (1) CN105653737B (en)
WO (1) WO2017148323A1 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017148323A1 (en) * 2016-03-01 2017-09-08 广州神马移动信息科技有限公司 Method and device for sorting content documents
CN107239497A (en) * 2017-05-02 2017-10-10 广东万丈金数信息技术股份有限公司 Hot content searching method and system
CN107943908A (en) * 2017-11-17 2018-04-20 郑州云海信息技术有限公司 A kind of document acquisition methods and device
CN108228648A (en) * 2016-12-21 2018-06-29 伊姆西Ip控股有限责任公司 The method and apparatus for creating index
CN109672706A (en) * 2017-10-16 2019-04-23 百度在线网络技术(北京)有限公司 A kind of information recommendation method, device, server and storage medium
CN109670183A (en) * 2018-12-21 2019-04-23 北京锐安科技有限公司 A kind of calculation method, device, equipment and the storage medium of text importance
CN109726390A (en) * 2018-12-06 2019-05-07 天津字节跳动科技有限公司 Document processing method, device, electronic equipment and storage medium
CN111061830A (en) * 2019-12-27 2020-04-24 深圳市元征科技股份有限公司 Method and device for processing automobile repair data
CN111444304A (en) * 2019-01-17 2020-07-24 北京京东尚科信息技术有限公司 Search ranking method and device
WO2020151548A1 (en) * 2019-01-24 2020-07-30 北京字节跳动网络技术有限公司 Method and device for sorting followed pages
CN111913912A (en) * 2020-07-16 2020-11-10 北京字节跳动网络技术有限公司 File processing method, file matching device, electronic equipment and medium
CN112417091A (en) * 2020-10-16 2021-02-26 北京斗米优聘科技发展有限公司 Text retrieval method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101923544A (en) * 2009-06-15 2010-12-22 北京百分通联传媒技术有限公司 Method for monitoring and displaying Internet hot spots
CN102955849A (en) * 2012-10-29 2013-03-06 新浪技术(中国)有限公司 Method for recommending documents based on tags and document recommending device
CN103049440A (en) * 2011-10-11 2013-04-17 腾讯科技(深圳)有限公司 Recommendation processing method and processing system for related articles
CN103793503A (en) * 2014-01-24 2014-05-14 北京理工大学 Opinion mining and classification method based on web texts

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103064842B (en) * 2011-10-20 2016-01-20 北京中搜网络技术股份有限公司 Information subscribing treating apparatus and information subscribing disposal route
US8751505B2 (en) * 2012-03-11 2014-06-10 International Business Machines Corporation Indexing and searching entity-relationship data
CN105653737B (en) * 2016-03-01 2020-04-17 广州神马移动信息科技有限公司 Method, device and electronic device for content document sequencing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101923544A (en) * 2009-06-15 2010-12-22 北京百分通联传媒技术有限公司 Method for monitoring and displaying Internet hot spots
CN103049440A (en) * 2011-10-11 2013-04-17 腾讯科技(深圳)有限公司 Recommendation processing method and processing system for related articles
CN102955849A (en) * 2012-10-29 2013-03-06 新浪技术(中国)有限公司 Method for recommending documents based on tags and document recommending device
CN103793503A (en) * 2014-01-24 2014-05-14 北京理工大学 Opinion mining and classification method based on web texts

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017148323A1 (en) * 2016-03-01 2017-09-08 广州神马移动信息科技有限公司 Method and device for sorting content documents
US11429648B2 (en) 2016-12-21 2022-08-30 EMC IP Holding Company LLC Method and device for creating an index
CN108228648A (en) * 2016-12-21 2018-06-29 伊姆西Ip控股有限责任公司 The method and apparatus for creating index
CN107239497A (en) * 2017-05-02 2017-10-10 广东万丈金数信息技术股份有限公司 Hot content searching method and system
CN109672706B (en) * 2017-10-16 2022-06-14 百度在线网络技术(北京)有限公司 Information recommendation method and device, server and storage medium
CN109672706A (en) * 2017-10-16 2019-04-23 百度在线网络技术(北京)有限公司 A kind of information recommendation method, device, server and storage medium
CN107943908A (en) * 2017-11-17 2018-04-20 郑州云海信息技术有限公司 A kind of document acquisition methods and device
CN109726390A (en) * 2018-12-06 2019-05-07 天津字节跳动科技有限公司 Document processing method, device, electronic equipment and storage medium
CN109670183A (en) * 2018-12-21 2019-04-23 北京锐安科技有限公司 A kind of calculation method, device, equipment and the storage medium of text importance
CN109670183B (en) * 2018-12-21 2023-03-24 北京锐安科技有限公司 Text importance calculation method, device, equipment and storage medium
CN111444304A (en) * 2019-01-17 2020-07-24 北京京东尚科信息技术有限公司 Search ranking method and device
WO2020151548A1 (en) * 2019-01-24 2020-07-30 北京字节跳动网络技术有限公司 Method and device for sorting followed pages
CN111061830A (en) * 2019-12-27 2020-04-24 深圳市元征科技股份有限公司 Method and device for processing automobile repair data
CN111061830B (en) * 2019-12-27 2023-12-05 深圳市元征科技股份有限公司 Method and device for processing automobile repair data
CN111913912A (en) * 2020-07-16 2020-11-10 北京字节跳动网络技术有限公司 File processing method, file matching device, electronic equipment and medium
CN112417091A (en) * 2020-10-16 2021-02-26 北京斗米优聘科技发展有限公司 Text retrieval method and device

Also Published As

Publication number Publication date
WO2017148323A1 (en) 2017-09-08
CN105653737B (en) 2020-04-17

Similar Documents

Publication Publication Date Title
CN105653737A (en) Method, equipment and electronic equipment for content document sorting
US11151663B2 (en) Calculating expertise confidence based on content and social proximity
CN108153901A (en) The information-pushing method and device of knowledge based collection of illustrative plates
US9542458B2 (en) Systems and methods for processing and displaying user-generated content
US20220100807A1 (en) Systems and methods for categorizing, evaluating, and displaying user input with publishing content
KR20090081447A (en) Modifying an on-line dating search using inline editing
US10607139B2 (en) Candidate visualization techniques for use with genetic algorithms
US11151618B2 (en) Retrieving reviews based on user profile information
US20180067912A1 (en) System and method to minimally reduce characters in character limiting scenarios
US10956473B2 (en) Article quality scoring method and device, client, server, and programmable device
US9940007B2 (en) Shortening multimedia content
CN109582872A (en) A kind of information-pushing method, device, electronic equipment and storage medium
US20190089731A1 (en) Abuser detection
CN109087162A (en) Data processing method, system, medium and calculating equipment
US11157983B2 (en) Generating a framework for prioritizing machine learning model offerings via a platform
CN108259547A (en) Information push method, equipment and programmable device
CN110175264A (en) Construction method, server and the computer readable storage medium of video user portrait
US20170262161A1 (en) Systems and methods for navigating a set of data objects
US20160042370A1 (en) Providing survey content recommendations
US10699450B2 (en) Interactive tool for causal graph construction
AU2014302051A1 (en) Method and apparatus for automating network data analysis of user's activities
US11126675B2 (en) Systems and methods for optimizing the selection and display of electronic content
CN112990625A (en) Method and device for allocating annotation tasks and server
CN108139900B (en) Communicating information about updates of an application
US11645049B2 (en) Automated software application generation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20200602

Address after: 310051 room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Patentee after: Alibaba (China) Co.,Ltd.

Address before: 510627 Guangdong city of Guangzhou province Whampoa Tianhe District Road No. 163 Xiping Yun Lu Yun Ping B radio 16 floor tower square

Patentee before: GUANGZHOU SHENMA MOBILE INFORMATION TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right