CN107239497A - Hot content searching method and system - Google Patents

Hot content searching method and system Download PDF

Info

Publication number
CN107239497A
CN107239497A CN201710301979.XA CN201710301979A CN107239497A CN 107239497 A CN107239497 A CN 107239497A CN 201710301979 A CN201710301979 A CN 201710301979A CN 107239497 A CN107239497 A CN 107239497A
Authority
CN
China
Prior art keywords
value
hot
content
text data
dimensional parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710301979.XA
Other languages
Chinese (zh)
Other versions
CN107239497B (en
Inventor
覃文森
张伟力
陈鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Infinite Information Technology Co Ltd
Original Assignee
Guangdong Infinite Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Infinite Information Technology Co Ltd filed Critical Guangdong Infinite Information Technology Co Ltd
Priority to CN201710301979.XA priority Critical patent/CN107239497B/en
Publication of CN107239497A publication Critical patent/CN107239497A/en
Application granted granted Critical
Publication of CN107239497B publication Critical patent/CN107239497B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Abstract

The present invention relates to a kind of hot content searching method and system, wherein, hot content searching method may comprise steps of:Obtain search key;According to search key, retrieved in default index database, obtain each text data;According to default time fluctuation temperature algorithm, the corresponding temperature amplification value of each dimensional parameter of text data is obtained;Using temperature amplification value and the product of default pad value as the hot value of dimensional parameter, and addition summation is carried out to the hot value of each dimensional parameter, obtain the content hot value of text data;According to content hot value, each text data is ranked up, each text data after being sorted;Each text data after sequence is shown or is sent to corresponding applications as the hot content searched out according to search key.The present invention can embody the ageing of temperature situation in the period and content hot value, and effectively improve the accuracy for obtaining hot content information.

Description

Hot content searching method and system
Technical field
The present invention relates to data retrieval technology field, more particularly to a kind of hot content searching method and system.
Background technology
In data retrieval service, content information is gathered first, and rope is then set up according to the content information data collected Draw.Applications carry out full-text search when using these content information datas by indexing, and give tacit consent to the hair according to information The dimensions such as cloth time, comment number, thumb up number are ranked up, and obtain the high content information of attention rate.
In implementation process, inventor has found that at least there are the following problems in conventional art:Using conventional contents retrieval side Method, because comment number, thumb up number etc. understand growth over time and becomes big, and the content hot value drawn can constantly increase;But Content hot value is often effective property, and change that can over time produces fluctuation, and traditional hot content searching method can not Embody this ageing, it is impossible to obtain accurate content hot value, so that the accuracy rate for obtaining hot content information is low.
The content of the invention
Based on this, it is necessary to which the accuracy rate that obtains hot content information for traditional hot content searching method is low to ask There is provided a kind of hot content searching method and system for topic.
To achieve these goals, on the one hand, the embodiments of the invention provide a kind of hot content searching method, including with Lower step:
Obtain search key;According to search key, retrieved in default index database, obtain each text data;
According to default time fluctuation temperature algorithm, the corresponding temperature amplification value of each dimensional parameter of text data is obtained; Using temperature amplification value and the product of default pad value as the hot value of dimensional parameter, and the hot value of each dimensional parameter is carried out Summation is added, the content hot value of text data is obtained;
According to content hot value, each text data is ranked up, each text data after being sorted;
Each text data after sequence is shown or sent as the hot content searched out according to search key To corresponding applications.
On the other hand, the embodiment of the present invention additionally provides a kind of hot content search system, including:
Full-text search unit, for obtaining search key, and according to search key, is examined in default index database Rope, obtains each text data;
Content hot value acquiring unit, for according to default time fluctuation temperature algorithm, obtaining each dimension of text data Spend the corresponding temperature amplification value of parameter;Using temperature amplification value and the product of default pad value as dimensional parameter hot value, and Hot value to each dimensional parameter carries out addition summation, obtains the content hot value of text data;
Sequencing unit, for according to content hot value, being ranked up to each text data, each textual data after being sorted According to;
Feedback unit, for each text data after sequence to be entered as the hot content searched out according to search key Row display is sent to corresponding applications.
The invention has the advantages that and beneficial effect:
Hot content searching method of the present invention and system, according to default time fluctuation temperature algorithm, obtain each textual data According to content hot value;Wherein, for example passage time temperature pad value is multiplied by the step of temperature amplification is worth hot value, Ke Yi great Big reduction increases the deviation defined to content temperature over time, and the content hot value drawn is more accurate;Then according to content heat Angle value, is ranked up to each text data, obtains accurately embodying the ranking results of content temperature;Above step causes this to send out Bright the temperature situation that can be embodied in the period and content hot value it is ageing;Simultaneously according to the temperature amplification value in the period Calculated, using the hot value sum of each dimensional parameter as content hot value, acquisition hot content information can be effectively improved Accuracy.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of hot content searching method embodiment 1 of the present invention;
Fig. 2 is the schematic flow sheet of hot content searching method embodiment 2 of the present invention;
Fig. 3 is the structural representation of hot content search system embodiment 1 of the present invention;
Fig. 4 is the structural representation of hot content search system embodiment 2 of the present invention.
Embodiment
For the ease of understanding the present invention, the present invention is described more fully below with reference to relevant drawings.In accompanying drawing Give the preferred embodiment of the present invention.But, the present invention can be realized in many different forms, however it is not limited to this paper institutes The embodiment of description.On the contrary, the purpose that these embodiments are provided be make to the disclosure more it is thorough comprehensively.
Unless otherwise defined, all of technologies and scientific terms used here by the article is with belonging to technical field of the invention The implication that technical staff is generally understood that is identical.Term used in the description of the invention herein is intended merely to description tool The purpose of the embodiment of body, it is not intended that in the limitation present invention.Term as used herein " and/or " include one or more phases The arbitrary and all combination of the Listed Items of pass.
Hot content searching method of the present invention and system application scenarios explanation:
In conventional contents search method, attention rate is entered according to dimension datas such as issuing time, number of reviews, thumb up numbers The value drawn after row summation operation, value is higher, and degree of paying close attention to is higher.And index and used for providing full-text search, attention rate It is the reference foundation to the sort result of retrieval.Traditional hot content searching method is when being retrieved, according to search key Each text data is obtained, final ranking results are determined then in conjunction with attention rate.But conventional method is according to dimensional parameter (i.e. dimension Data) value carry out direct computing, the dimension data of different contents easily occur has a case that larger deviation, eventually leads Reason is that the size of content hot value in itself causes the inaccurate of ranking results.
Hot content searching method of the present invention and system, specifically go for targetedly website, such as every profession and trade net Stand;It is preferred that, hot content searching method of the present invention and system are soft suitable for this kind of content cloud series of intelligent semantic knowledge mapping Part project;The central kitchen that intelligent semantic knowledge mapping is runed as media, serve as media data collect, cleaning, be put in storage with And the key player of retrieval service is provided to Edition Contains, i.e., in intelligent semantic platform according to the rule set in advance that crawls from conjunction Make media client website and crawl associated media data deposit database, precipitate media data, editor provides data for media content Search service.The present invention can crawl related data from cooperation media client website, the hot content that final search is arrived closer to The industry temperature of a certain class industry, improves the accuracy of search result.
Hot content searching method embodiment 1 of the present invention:
In order to solve the problem of accuracy rate that traditional hot content searching method obtains hot content information is low, the present invention is carried A kind of hot content searching method embodiment 1 is supplied, Fig. 1 illustrates for the flow of hot content searching method embodiment 1 of the present invention Figure;As shown in figure 1, may comprise steps of:
Step S110:Obtain search key;According to search key, retrieved in default index database, obtain each Text data;
Step S120:According to default time fluctuation temperature algorithm, the corresponding heat of each dimensional parameter of text data is obtained Spend amplification value;Using temperature amplification value and the product of default pad value as dimensional parameter hot value, and to each dimensional parameter Hot value carries out addition summation, obtains the content hot value of text data;
Step S130:According to content hot value, each text data is ranked up, each text data after being sorted;
Step S140:Each text data after sequence is shown as the hot content searched out according to search key Show or be sent to corresponding applications.
Specifically, the present invention obtains each text data (preferably, can take the mode of full-text search) by retrieving, To text data according to default time fluctuation temperature algorithm, the heat that temperature amplification calculates each dimensional parameter is multiplied by by pad value Angle value, and it is worth to according to temperature the content hot value of text data;When user entered keyword is retrieved, first according to key Word carries out full-text search, then each text data is ranked up according to content hot value, then the result after sequence is returned to User.
Wherein, dimensional parameter is the parameter of the measurement content temperature obtained according to user behavior data;It is preferred that, dimension ginseng Number refer to embodying user's attention rates of text data dimension data (for example like, thumb up number, comment number and reprinting Number etc. records the data of user behavior);Pad value can be being incremented by and gradually declining over time according to the difference in the period The numeric constant subtracted.Temperature amplification value can be increased according to a certain dimension data in time range (i.e. a certain dimensional parameter) Value.And content hot value is the value for the popular degree for embodying content change over time and embodying, it is worth bigger represent It is more popular.It is preferred that, temperature amplification value can refer to according to calculating text data dimensional parameter (such as thumb up in a period of time Number, read number, comment number) amplification value.Pad value can be obtained according to the period flexibly to divide, it is preferred that three days The pad value of time is 0.8, and the pad value of week age is 0.5, and the pad value of half month is 0.3, pad value smaller generation The degree of table decay is bigger.
The present invention carries out full-text search by default index database, then to obtained each text data according to content hot value Be ranked up, such ranking results can accurately embody text data temperature situation and content hot value it is ageing, So as to effectively improve the accuracy for obtaining hot content information.
In a specific embodiment, according to default time fluctuation temperature algorithm, text is obtained based on below equation The corresponding temperature amplification value of each dimensional parameter of data:
The parameter value of a period on parameter value-dimensional parameter of temperature amplification value=dimensional parameter current time.
Specifically, by default time fluctuation temperature algorithm in the present invention, the temperature feelings in the period can be embodied Condition, rather than simply according to the direct computing of value progress of the dimensional parameters such as comment number, reading number, thumb up number, because different is interior Holding its comment number, reading number, thumb up number has larger deviation, eventually results in because the size of value itself has influence on sequence knot Fruit it is inaccurate, and according to this default time fluctuation temperature algorithm so that the present invention can be according to the amplification in the period Calculated, effectively raise accuracy.
It is preferred that, default time fluctuation temperature algorithm can be realized according to below equation:(1) temperature amplification value=a certain The value of the value of dimension (i.e. a certain dimensional parameter) current time-certain dimension (i.e. a certain dimensional parameter) upper period;(2) Pad value is the particular constant value for elapsing and constantly decaying over time;(3) hot value of certain dimension=pad value * is one-dimensional The temperature amplification value (i.e. the product of the two) of degree;(4) the content hot value=hot values of multiple dimensions is subjected to summation addition;
Further, when the quantity of dimensional parameter is one, content hot value can also be multiplied by the dimension by pad value The temperature amplification value of degree parameter calculates what is obtained.
In a specific embodiment, dimensional parameter includes thumb up parameter, comment parameter and reading parameters;
The step of hot value of each dimensional parameter is carried out into addition summation, the content hot value for obtaining text data includes:
The product of hot value temperature weight corresponding with dimensional parameter is obtained, addition summation is carried out to each product, obtains interior Hold hot value.
Specifically, in order to more be accurately obtained content hot value, can according to user behavior (such as thumb up, like, Comment etc.) frequency of usage, respectively set dimensional parameter temperature weight;It is preferred that, the temperature weight that user can be liked Numerical value is set to maximum, and comment is taken second place.Then temperature weight and the hot value of dimensional parameter are subjected to product calculation, by each product Value carries out addition summation, so as to obtain the content hot value of text data.It so, it is possible more accurately to reflect the true of text data Real heat degree.
In a specific embodiment, step is also included before the step of obtaining search key:
Rule is crawled according to default, the content information of website is crawled, the text data of content information is obtained;
Participle is carried out to text data, the word after participle and sentence is obtained;
According to the word and sentence after participle, inverted index is set up, and according to inverted index, builds default index database.
Specifically, the industrial sustainability in the present invention can be profession portal website;The present invention can be according to crawling The text of web site contents sets up index, carries out participle to text data first, is then built according to the word and sentence that cut out Vertical inverted index, index is exactly to be used to be stored in some word depositing in a document or one group of document under full-text search The data for the mapping that storage space is put.By crawling related data from profession portal website, the hot content that final search is arrived closer to The industry temperature of a certain class industry, can further improve the accuracy of search result.
It should be noted that the default rule that crawls can refer to web crawlers;Participle is carried out to text data, participle is obtained The step of rear word and sentence, participle that can be using such as segmenting method based on dictionary pattern matching, based on semantic analysis is calculated Method and segmenting method based on probability statistics model are realized.
It is preferred that, it can realize according to search key to be entered in the present invention by solr (enterprise-level search application server) Row full-text search, the step of obtaining each text data, so as to further improve the matching degree of each text data and keyword, it is ensured that Search for the accuracy of hot content.Simultaneously technology is provided to set up index and obtaining the accurate temperature amplification value of dimensional parameter Support.
Hot content searching method embodiment 1 of the present invention, according to default time fluctuation temperature algorithm, obtains each textual data According to content hot value;Wherein, the hot value that use time temperature pad value is multiplied by temperature amplification and is worth going out can substantially reduce with The deviation that time growth is defined to content temperature, the content hot value drawn is more accurate;Then according to content hot value, to each Text data is ranked up, and obtains accurately embodying the ranking results of content temperature;Above step causes the present invention can be with body Temperature situation and content hot value in the existing period it is ageing;Counted simultaneously according to the temperature amplification value in the period Calculate, using the hot value sum of each dimensional parameter as content hot value, can effectively improve and obtain the accurate of hot content information Property.
Hot content searching method embodiment 2 of the present invention:
In order to solve the problem of accuracy rate that traditional hot content searching method obtains hot content information is low, the present invention is also There is provided a kind of hot content searching method embodiment 2;Embodiment 2 is compared with above-described embodiment 1, except according to content hot value Outside each text data is ranked up, when carrying out full-text search to text data, in addition it is also necessary to according to the degree meter of text matches Calculate and retrieval content, such sequence knot are returned after matching value score, and the sort result that matching value and hot value are combined Fruit can more embody the temperature situation of article.Fig. 2 is the schematic flow sheet of hot content searching method embodiment 2 of the present invention;Such as Fig. 2 It is shown, it may comprise steps of:
Step S210:Obtain search key;According to search key, retrieved in default index database, obtain each Text data;
Step S220:According to default time fluctuation temperature algorithm, the corresponding heat of each dimensional parameter of text data is obtained Spend amplification value;Using temperature amplification value and the product of default pad value as dimensional parameter hot value, and to each dimensional parameter Hot value carries out addition summation, obtains the content hot value of text data;
Step S230:According to the matching degree of words and phrases in search key and default index database, each text data is obtained With value;
Step S240:Content hot value and matching value are carried out being added summation, final score value is obtained;
Step S250:According to the order that final score value is descending, each text data is ranked up, after being sorted Each text data;
Step S260:Each text data after sequence is shown as the hot content searched out according to search key Show or be sent to corresponding applications.
Specifically, i.e., in embodiment 1 according to content hot value, before the step of being ranked up to each text data also Including step:
According to the matching degree of words and phrases in search key and default index database, each matches text data value is obtained;
Each text data is ranked up in embodiment 1, can be included the step of each text data after being sorted:
Content hot value and matching value are carried out being added summation, final score value is obtained;
According to the order that final score value is descending, each text data is ranked up, each textual data after being sorted According to.
It is preferred that, the present invention carries out full text matching into index database, according to keyword according to keyword first in retrieval The degree matched with the word in index database calculates score value (for example, obtaining matching value by similarity algorithm), then ties again Co content hot value be added obtaining final score value, and each text data is exactly the sequence according to fractional value progress from big to small Return, such ranking results can more embody the temperature situation of text data.
It is clear that, other steps flow charts of hot content searching method embodiment 2 of the present invention can be with above-mentioned reality The step flow applied in example 1 is identical, and reaches that identical or more preferably technique effect (for example more accurately embodies the heat of search content Degree obtains more accurately content hot value etc.), it is no longer repeated herein.
Hot content search system embodiment 1 of the present invention:
Based on the technical scheme of each embodiment of above hot content searching method, while being searched to solve traditional hot content Suo Fangfa obtain hot content information accuracy rate it is low the problem of, present invention also offers a kind of implementation of hot content search system Example 1;Fig. 3 is the structural representation of hot content search system embodiment 1 of the present invention;As shown in figure 3, can include
Full-text search unit 310, for obtaining search key, and according to search key, enters in default index database Row full-text search, obtains each text data;
Content hot value acquiring unit 320, for according to default time fluctuation temperature algorithm, obtaining each of text data The corresponding temperature amplification value of dimensional parameter;Using temperature amplification value and the product of default pad value as dimensional parameter hot value, And addition summation is carried out to the hot value of each dimensional parameter, obtain the content hot value of text data;
Sequencing unit 330, for according to content hot value, being ranked up to each text data, each text after being sorted Notebook data;
Feedback unit 340, for using each text data after sequence as according to search key search out hot topic in Appearance is shown or is sent to corresponding applications.
In a specific embodiment, content hot value acquiring unit is according to default time fluctuation temperature algorithm, base The corresponding temperature amplification value of each dimensional parameter for obtaining text data in below equation:
The parameter value of a period on parameter value-dimensional parameter of temperature amplification value=dimensional parameter current time.
In a specific embodiment, dimensional parameter includes thumb up parameter, comment parameter and reading parameters;
Content hot value acquiring unit 320, is additionally operable to obtain the product of hot value temperature weight corresponding with dimensional parameter, Addition summation is carried out to each product, content hot value is obtained.
In a specific embodiment, hot content search system also includes index database construction unit 350;
Building index library unit 350 includes:
Module 352 is crawled, for crawling rule according to default, the content information of industrial sustainability is crawled, content letter is obtained The text data of breath;
Word-dividing mode 354, for carrying out participle to text data, obtains the words and phrases after participle;
Index database builds module 356, according to the words and phrases after participle, sets up inverted index, and according to inverted index, builds pre- If index database.
In particular, it is desirable to which explanation is that hot content search system embodiment 1 of the present invention correspondingly can realize above-mentioned heat Various method steps in door content search method embodiment 1, it is no longer repeated herein.
Hot content search system embodiment 2 of the present invention:
Based on the technical scheme of each embodiment of above hot content searching method, while being searched to solve traditional hot content Suo Fangfa obtain hot content information accuracy rate it is low the problem of, the present invention is based on hot content search system embodiment 1 System structure, additionally provides a kind of hot content search system embodiment 2;Fig. 4 is hot content search system embodiment 2 of the present invention Structural representation;As shown in Fig. 2 hot content search system can also include:
Matching value acquiring unit 460, for the matching degree according to words and phrases in search key and default index database, is obtained Each matches text data value;
Sequencing unit 430 can include:
Plus with module 432, for carrying out being added summation to content hot value and matching value, obtain final score value;
Order module 434, for according to the descending order of final score value, being ranked up, obtaining to each text data Each text data after sequence.
In particular, it is desirable to which explanation is that hot content search system embodiment 2 of the present invention correspondingly can realize above-mentioned heat Various method steps in door content search method embodiment 2, it is no longer repeated herein.
Each embodiment of hot content search system of the present invention, according to default time fluctuation temperature algorithm, obtains each text Data content hot value;Wherein, the hot value that use time temperature pad value is multiplied by temperature amplification and is worth going out can be substantially reduced Increase the deviation defined to content temperature over time, the content hot value drawn is more accurate;Then it is right according to content hot value Each text data is ranked up, and obtains accurately embodying the ranking results of content temperature;Above step make it that the present invention can be with Embody the period in temperature situation and content hot value it is ageing;Counted simultaneously according to the temperature amplification value in the period Calculate, using the hot value sum of each dimensional parameter as content hot value, can effectively improve and obtain the accurate of hot content information Property.
Each technical characteristic of embodiment described above can be combined arbitrarily, to make description succinct, not to above-mentioned reality Apply all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited In contradiction, the scope of this specification record is all considered to be.One of ordinary skill in the art will appreciate that realizing above-mentioned implementation All or part of step in example method can be by program to instruct the hardware of correlation to complete, and described program can be deposited Be stored in a computer read/write memory medium, the program upon execution, including the step described in above method, described storage Medium, such as:ROM/RAM, magnetic disc, CD etc..
Embodiment described above only expresses the several embodiments of the present invention, and it describes more specific and detailed, but simultaneously Can not therefore it be construed as limiting the scope of the patent.It should be pointed out that coming for one of ordinary skill in the art Say, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the protection of the present invention Scope.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.

Claims (10)

1. a kind of hot content searching method, it is characterised in that comprise the following steps:
Obtain search key;According to the search key, retrieved in default index database, obtain each text data;
According to default time fluctuation temperature algorithm, the corresponding temperature amplification value of each dimensional parameter of the text data is obtained; Using the temperature amplification value and the product of default pad value as the dimensional parameter hot value, and to each dimensional parameter Hot value carry out addition summation, obtain the content hot value of the text data;
According to the content hot value, each text data is ranked up, each text data after being sorted;
Each text data after the sequence is shown as the hot content searched out according to the search key or It is sent to corresponding applications.
2. hot content searching method according to claim 1, it is characterised in that calculated according to default time fluctuation temperature Method, the corresponding temperature amplification value of each dimensional parameter that the text data is obtained based on below equation:
The ginseng of a period on parameter value-dimensional parameter of the temperature amplification value=dimensional parameter current time Numerical value.
3. hot content searching method according to claim 1, it is characterised in that the dimensional parameter is joined including thumb up Number, comment parameter and reading parameters;
The step of hot value of each dimensional parameter is carried out into addition summation, the content hot value for obtaining the text data is wrapped Include:
The product of hot value temperature weight corresponding with the dimensional parameter is obtained, carrying out addition to each product asks With obtain the content hot value.
4. the hot content searching method according to claims 1 to 3 any one, it is characterised in that according to the content Also include before hot value, the step of being ranked up to each text data:
According to the matching degree of words and phrases in the search key and the default index database, of each text data is obtained With value;
Each text data is ranked up, included the step of each text data after being sorted:
The content hot value and the matching value are carried out being added summation, final score value is obtained;
According to the order that the final score value is descending, each text data is ranked up, obtained after the sequence Each text data.
5. hot content searching method according to claim 4, it is characterised in that the step of search key is obtained it It is preceding also to include step:
Rule is crawled according to default, the content information of website is crawled, the text data of the content information is obtained;
Participle is carried out to the text data, the word after participle and sentence is obtained;
According to the word and sentence after the participle, inverted index is set up, and according to the inverted index, builds the default rope Draw storehouse.
6. a kind of hot content search system, it is characterised in that including:
Full-text search unit, for obtaining search key, and according to the search key, is examined in default index database Rope, obtains each text data;
Content hot value acquiring unit, for according to default time fluctuation temperature algorithm, obtaining each dimension of the text data Spend the corresponding temperature amplification value of parameter;Using the temperature amplification value and the product of default pad value as the dimensional parameter heat Angle value, and addition summation is carried out to the hot value of each dimensional parameter, obtain the content hot value of the text data;
Sequencing unit, for according to the content hot value, being ranked up to each text data, each text after being sorted Notebook data;
Feedback unit, for using each text data after the sequence as according to the search key search out hot topic in Appearance is shown or is sent to corresponding applications.
7. hot content search system according to claim 6, it is characterised in that the content hot value acquiring unit root According to default time fluctuation temperature algorithm, increased based on the corresponding temperature of each dimensional parameter that below equation obtains the text data Amplitude:
The ginseng of a period on parameter value-dimensional parameter of the temperature amplification value=dimensional parameter current time Numerical value.
8. hot content search system according to claim 6, it is characterised in that the dimensional parameter is joined including thumb up Number, comment parameter and reading parameters;
The content hot value acquiring unit, is additionally operable to obtain hot value temperature weight corresponding with the dimensional parameter Product, carries out addition summation to each product, obtains the content hot value.
9. the hot content search system according to claim 6 to 8 any one, it is characterised in that also include:
Matching value acquiring unit, for the matching degree according to words and phrases in the search key and the default index database, is obtained To the matching value of each text data;
The sequencing unit includes:
Plus and module, for carrying out being added summation to the content hot value and the matching value, obtain final score value;
Order module, for according to the descending order of the final score value, being ranked up, obtaining to each text data Each text data after the sequence.
10. hot content search system according to claim 9, it is characterised in that the hot content search system is also Including index database construction unit;
The index library unit that builds includes:
Module is crawled, for crawling rule according to default, the content information of website is crawled, obtains the text of the content information Data;
Word-dividing mode, for carrying out participle to the text data, obtains the word after participle and sentence;
Index database builds module, according to the word and sentence after the participle, sets up inverted index;And according to the row's of falling rope Draw, build the default index database.
CN201710301979.XA 2017-05-02 2017-05-02 Hot content search method and system Active CN107239497B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710301979.XA CN107239497B (en) 2017-05-02 2017-05-02 Hot content search method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710301979.XA CN107239497B (en) 2017-05-02 2017-05-02 Hot content search method and system

Publications (2)

Publication Number Publication Date
CN107239497A true CN107239497A (en) 2017-10-10
CN107239497B CN107239497B (en) 2020-11-03

Family

ID=59984213

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710301979.XA Active CN107239497B (en) 2017-05-02 2017-05-02 Hot content search method and system

Country Status (1)

Country Link
CN (1) CN107239497B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109145246A (en) * 2018-07-31 2019-01-04 成都华栖云科技有限公司 A kind of news virtual click amount implementation method based on paas media cloud multi-tenant platform
CN109275031A (en) * 2018-09-25 2019-01-25 有米科技股份有限公司 A kind of temperature appraisal procedure, device and the electronic equipment of video
CN109582852A (en) * 2018-12-05 2019-04-05 中国银行股份有限公司 A kind of sort method and system of full-text search result
CN110532419A (en) * 2019-08-29 2019-12-03 腾讯科技(深圳)有限公司 A kind of processing method and processing device of audio
CN110704436A (en) * 2019-09-26 2020-01-17 郑州阿帕斯科技有限公司 Hbase-based index generation method and device
CN111026958A (en) * 2019-11-29 2020-04-17 微梦创科网络科技(中国)有限公司 Hot microblog sorting method and device
CN113886685A (en) * 2021-09-23 2022-01-04 北京三快在线科技有限公司 Searching method, searching device, storage medium and electronic equipment
CN114154075A (en) * 2022-02-08 2022-03-08 北京大氪信息科技有限公司 Hot information determination method, hot information determination device, computer equipment and medium
CN114996550A (en) * 2021-05-24 2022-09-02 中移互联网有限公司 Information retrieval method and device
CN111159312B (en) * 2019-12-27 2024-04-26 东软集团股份有限公司 Fault related information auxiliary retrieval method and device, storage medium and electronic equipment

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101246499A (en) * 2008-03-27 2008-08-20 腾讯科技(深圳)有限公司 Network information search method and system
CN101742722A (en) * 2009-12-22 2010-06-16 卓望数码技术(深圳)有限公司 Service searching method and device
CN102937960A (en) * 2012-09-06 2013-02-20 北京邮电大学 Device and method for identifying and evaluating emergency hot topic
CN103077190A (en) * 2012-12-20 2013-05-01 人民搜索网络股份公司 Hot event ranking method based on order learning technology
CN103365902A (en) * 2012-03-31 2013-10-23 北大方正集团有限公司 Method and device for evaluating Internet News
CN104657496A (en) * 2015-03-09 2015-05-27 杭州朗和科技有限公司 Method and equipment for calculating information hot value
CN105488196A (en) * 2015-12-07 2016-04-13 中国人民大学 Automatic hot topic mining system based on internet corpora
CN105653705A (en) * 2015-12-30 2016-06-08 北京奇艺世纪科技有限公司 Hot event searching method and device
CN105653661A (en) * 2015-12-29 2016-06-08 云南电网有限责任公司电力科学研究院 Search result re-ranking method and device
CN105653737A (en) * 2016-03-01 2016-06-08 广州神马移动信息科技有限公司 Method, equipment and electronic equipment for content document sorting
CN105718598A (en) * 2016-03-07 2016-06-29 天津大学 AT based time model construction method and network emergency early warning method
CN106599181A (en) * 2016-12-13 2017-04-26 浙江网新恒天软件有限公司 Hot news detecting method based on topic model

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101246499A (en) * 2008-03-27 2008-08-20 腾讯科技(深圳)有限公司 Network information search method and system
CN101742722A (en) * 2009-12-22 2010-06-16 卓望数码技术(深圳)有限公司 Service searching method and device
CN103365902A (en) * 2012-03-31 2013-10-23 北大方正集团有限公司 Method and device for evaluating Internet News
CN102937960A (en) * 2012-09-06 2013-02-20 北京邮电大学 Device and method for identifying and evaluating emergency hot topic
CN103077190A (en) * 2012-12-20 2013-05-01 人民搜索网络股份公司 Hot event ranking method based on order learning technology
CN104657496A (en) * 2015-03-09 2015-05-27 杭州朗和科技有限公司 Method and equipment for calculating information hot value
CN105488196A (en) * 2015-12-07 2016-04-13 中国人民大学 Automatic hot topic mining system based on internet corpora
CN105653661A (en) * 2015-12-29 2016-06-08 云南电网有限责任公司电力科学研究院 Search result re-ranking method and device
CN105653705A (en) * 2015-12-30 2016-06-08 北京奇艺世纪科技有限公司 Hot event searching method and device
CN105653737A (en) * 2016-03-01 2016-06-08 广州神马移动信息科技有限公司 Method, equipment and electronic equipment for content document sorting
CN105718598A (en) * 2016-03-07 2016-06-29 天津大学 AT based time model construction method and network emergency early warning method
CN106599181A (en) * 2016-12-13 2017-04-26 浙江网新恒天软件有限公司 Hot news detecting method based on topic model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
魏萌 等: "《基于关键词的微博热点话题实时检测方法》", 《计算机与现代化》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109145246A (en) * 2018-07-31 2019-01-04 成都华栖云科技有限公司 A kind of news virtual click amount implementation method based on paas media cloud multi-tenant platform
CN109275031A (en) * 2018-09-25 2019-01-25 有米科技股份有限公司 A kind of temperature appraisal procedure, device and the electronic equipment of video
CN109275031B (en) * 2018-09-25 2021-09-28 有米科技股份有限公司 Video popularity evaluation method and device and electronic equipment
CN109582852B (en) * 2018-12-05 2021-04-09 中国银行股份有限公司 Method and system for sorting full-text retrieval results
CN109582852A (en) * 2018-12-05 2019-04-05 中国银行股份有限公司 A kind of sort method and system of full-text search result
CN110532419A (en) * 2019-08-29 2019-12-03 腾讯科技(深圳)有限公司 A kind of processing method and processing device of audio
CN110704436A (en) * 2019-09-26 2020-01-17 郑州阿帕斯科技有限公司 Hbase-based index generation method and device
CN111026958A (en) * 2019-11-29 2020-04-17 微梦创科网络科技(中国)有限公司 Hot microblog sorting method and device
CN111159312B (en) * 2019-12-27 2024-04-26 东软集团股份有限公司 Fault related information auxiliary retrieval method and device, storage medium and electronic equipment
CN114996550A (en) * 2021-05-24 2022-09-02 中移互联网有限公司 Information retrieval method and device
CN114996550B (en) * 2021-05-24 2024-03-19 中移互联网有限公司 Information retrieval method and device
CN113886685A (en) * 2021-09-23 2022-01-04 北京三快在线科技有限公司 Searching method, searching device, storage medium and electronic equipment
CN113886685B (en) * 2021-09-23 2023-01-06 北京三快在线科技有限公司 Searching method, searching device, storage medium and electronic equipment
CN114154075A (en) * 2022-02-08 2022-03-08 北京大氪信息科技有限公司 Hot information determination method, hot information determination device, computer equipment and medium
CN114154075B (en) * 2022-02-08 2022-05-17 北京大氪信息科技有限公司 Hot information determination method, device, computer equipment and medium

Also Published As

Publication number Publication date
CN107239497B (en) 2020-11-03

Similar Documents

Publication Publication Date Title
CN107239497A (en) Hot content searching method and system
JP5391634B2 (en) Selecting tags for a document through paragraph analysis
US9317593B2 (en) Modeling topics using statistical distributions
CN106339502A (en) Modeling recommendation method based on user behavior data fragmentation cluster
Eom Author Cocitation Analysis: Quantitative Methods for Mapping the Intellectual Structure of an Academic Discipline: Quantitative Methods for Mapping the Intellectual Structure of an Academic Discipline
JP2009093649A (en) Recommendation for term specifying ontology space
US20170083564A1 (en) Computer-Implemented System And Method For Assigning Document Classifications
US20150363405A1 (en) Method and apparatus for generating ordered user expert lists for a shared digital document
Zhou et al. Relevance feature mapping for content-based multimedia information retrieval
CN107958014A (en) Search engine
Levine-Clark et al. A new comparative citation analysis: Google Scholar, Microsoft Academic, Scopus, and Web of Science
Yang et al. Exploiting various implicit feedback for collaborative filtering
CN108009194A (en) A kind of books method for pushing, electronic equipment, storage medium and device
US20120191725A1 (en) Document ranking system with user-defined continuous term weighting
Homocianu et al. An Analysis of Scientific Publications on'Decision Support Systems' and'Business Intelligence'Regarding Related Concepts Using Natural Language Processing Tools
Sharma et al. A trend analysis of significant topics over time in machine learning research
CN106951517B (en) Method for inquiring diversity of documents in narrow range
Al Zamil et al. A model based on multi-features to enhance healthcare and medical document retrieval
Basuki et al. Detection of reference topics and suggestions using latent Dirichlet allocation (LDA)
Ayyasamy et al. Mining Wikipedia knowledge to improve document indexing and classification
CN114461778A (en) Comprehensive scientific research result recommendation method and device for mass scientific research data
Ependi et al. Sentiment Analysis of Covid-19 Handling in Indonesia Based on Lexicon Weighting
MirShojaee et al. Biogeography-based optimization algorithm for automatic extractive text summarization
Arjannikov et al. Verifying tag annotations through association analysis
Pu et al. Enriching user‐oriented class associations for library classification schemes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant