CN103885989B - Estimate the method and device of neologisms document frequency - Google Patents
Estimate the method and device of neologisms document frequency Download PDFInfo
- Publication number
- CN103885989B CN103885989B CN201210566103.5A CN201210566103A CN103885989B CN 103885989 B CN103885989 B CN 103885989B CN 201210566103 A CN201210566103 A CN 201210566103A CN 103885989 B CN103885989 B CN 103885989B
- Authority
- CN
- China
- Prior art keywords
- document
- sets
- frequency
- neologisms
- document sets
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention discloses a kind of method and device for estimating neologisms document frequency, and its method includes:Obtain the first document sets and the second document sets;The document data generation time that first document sets are included is earlier than second document sets;Document frequency of each default everyday words in the first document sets and the second document sets is counted respectively;Count document frequency of each default neologisms in the second document sets;Obtain the corresponding fit correlation of default document frequency of the everyday words in the first document sets and the second document sets;According to the document frequency of corresponding fit correlation and default neologisms in the second document sets, default document frequency of the neologisms in the first document sets is obtained.The present invention improve neologisms document frequency statistics accuracy rate, compensate for traditional statistical method for the document frequency statistical result error of neologisms it is larger the defects of;And the present invention is significant in the application of the technical fields such as feature selecting, keyword abstraction, vector space model expression for neologisms.
Description
Technical field
The present invention relates to Internet technical field, more particularly to a kind of method and device for estimating neologisms document frequency.
Background technology
With the development of Internet technology, neologisms are increasing, and it is more and more common that it has been increasingly becoming internet arena
One phenomenon.Neologisms are called unregistered word, never occur before referring to, and significant word popular recently.Neologisms one
As with focus incident, focus personage and produce, be the skills such as text classification, keyword abstraction often with great information content
The indispensable characteristic item of art.And document frequency (DF, Document Frequency) as a kind of classical measure information because
Son, also it is widely used in these correlative technology fields, such as vector space model, feature selecting, feature weight etc..
Generally, document frequency refers to the document number that a word occurs in magnanimity collection of document.Traditional document frequency
Computational methods are generally based on the statistics of magnanimity collection of document.Substantially method is that first random screening goes out one from full dose document for it
Then every document sets are segmented by the document sets of larger amt (such as 1,000,000), and count each word in how many documents
Middle appearance, the document number thus counted the just document frequency as the word.
This method based on magnanimity collection of document statistics is more stable, more accurate for the document frequency of everyday words,
But because neologisms are only present in the high document of few timeliness n, document frequency of traditional this statistical method for neologisms
Rate statistical result error is larger, can typically be significantly less than its actual value.
Therefore, traditional document frequency computational methods based on magnanimity document sets statistics are less applicable neologisms, find more preferable
Neologisms document frequency computational methods be particularly important.
The content of the invention
It is a primary object of the present invention to provide a kind of method and device for estimating neologisms document frequency, it is intended to improve neologisms
The accuracy rate of document frequency statistics.
In order to achieve the above object, the present invention proposes a kind of method for estimating neologisms document frequency, including:
Obtain the first document sets and the second document sets;The document data generation time that first document sets are included earlier than
Second document sets;
Document frequency of each default everyday words in first document sets and the second document sets is counted respectively;Statistics is every
One default document frequency of the neologisms in second document sets;
Obtain the corresponding fitting of document frequency of the default everyday words in first document sets and the second document sets
Relation;
According to the document frequency of the corresponding fit correlation and default neologisms in second document sets, described in acquisition
Default document frequency of the neologisms in first document sets.
The present invention also proposes a kind of device for estimating neologisms document frequency, including:
Document sets acquisition module, for obtaining the first document sets and the second document sets;What first document sets were included
Document data generation time is earlier than second document sets;
Statistical module, for counting text of each default everyday words in first document sets and the second document sets respectively
Shelves frequency;Count document frequency of each default neologisms in second document sets;
Fit correlation acquisition module, for obtaining the default everyday words in first document sets and the second document sets
Document frequency corresponding fit correlation;
Neologisms document frequency acquisition module, for literary described second according to the corresponding fit correlation and default neologisms
The document frequency that shelves are concentrated, obtain document frequency of the default neologisms in first document sets.
A kind of method and device for estimating neologisms document frequency proposed by the present invention, by determining magnanimity document sets (first
Document sets) and new document sets (the second document sets), and document frequency of the everyday words in magnanimity document sets and new document sets is counted,
The relation between the two document frequencies is found again, finally estimates it in sea using document frequency of the neologisms in new document sets
The document frequency in document sets is measured, the accuracy rate of neologisms document frequency statistics is which thereby enhanced, so as to compensate for traditional statistics
Method for the document frequency statistical result error of neologisms it is larger the defects of;And the present invention for neologisms feature selecting, close
The application for the technical fields such as keyword extracts, vector space model represents is significant.
Brief description of the drawings
Fig. 1 is the schematic flow sheet for the method preferred embodiment that the present invention estimates neologisms document frequency;
Fig. 2 is a kind of document frequency scatterplot of example in the method preferred embodiment of the invention for estimating neologisms document frequency
Figure;
Fig. 3 is the structural representation for the device preferred embodiment that the present invention estimates neologisms document frequency;
Fig. 4 is that the structure of fit correlation acquisition module in the device preferred embodiment of the invention for estimating neologisms document frequency is shown
It is intended to.
In order that technical scheme is clearer, clear, it is described in further detail below in conjunction with accompanying drawing.
Embodiment
The solution of the embodiment of the present invention is mainly:By determining magnanimity document sets (the first document sets) and new document sets
(the second document sets), and document frequency of the everyday words in magnanimity document sets and new document sets is counted, then find the two documents
Relation between frequency, finally estimate its document in magnanimity document sets using document frequency of the neologisms in new document sets
Frequency, to improve the accuracy rate of neologisms document frequency statistics, the document frequency for making up traditional statistical method for neologisms counts
The defects of resultant error is larger.
As shown in figure 1, present pre-ferred embodiments propose a kind of method for estimating neologisms document frequency, including:
Step S101, obtain the first document sets and the second document sets;The document data production that first document sets are included
The raw time is earlier than second document sets;
Because neologisms are often only present in the high page of timeliness n, and traditional document based on magnanimity document sets statistics
There is larger error in frequency calculation method, the present embodiment introduces new document sets concept, and is based on magnanimity document sets and new document sets
To estimate document frequency of the neologisms in magnanimity document sets.
Specifically, first, magnanimity document sets A (i.e. the first document sets alleged by the present embodiment) and new document sets B are determined (i.e. originally
Second document sets alleged by embodiment) two collection of document, wherein:
Preferably, magnanimity document sets A includes about 1,000,000 documents altogether, is selected at random from full dose document;Magnanimity
Document in document sets A is essentially the data before 2 years.
New document sets B includes about 50,000 documents altogether, can be captured from major portal website's homepage;In new document sets B
Document is essentially the data within nearest one month.
It should be noted that before the generation time of the document data in above-mentioned magnanimity document sets A can also be not limited to 2 years,
For example it can also wait the year before;The generation time of document data in above-mentioned new document sets B can also be not limited to nearest one
Within month, for example can also be within first quarter moon, etc..
Step S102, document frequency of each default everyday words in first document sets and the second document sets is counted respectively
Rate;Count document frequency of each default neologisms in second document sets;
Wherein, default everyday words refers to the word often occurred, and the everyday words defined at present there are about 70,000;Default neologisms are
Refer to and developed based on Internet technology and appear in the word in the high document of timeliness n, neologisms are typically accompanied by focus incident, focus people
Thing and produce, its existence time is shorter.
Everyday words is set as w, neologisms t, it is determined that after two document sets A and B, count respectively each everyday words w in A and
Document frequency in B, is expressed as DF_A_w and DF_B_w, and wherein DF_A_w is everyday words w in the true of magnanimity document sets A
Document frequency, DF_B_w are used to continue to make comparisons with neologisms in new document sets B.
In addition, document frequency DF_B_ts of each neologisms t in new document sets B is also counted, subsequently to be commonly used
After the corresponding fit correlation of document frequency of the word in magnanimity document sets A and new document sets B, according to neologisms t in new document sets B
Document frequency DF_B_t obtain document frequency DF_A_t of the neologisms in magnanimity document sets A.
Document frequencies of the above-mentioned statistics everyday words w in A and B, and document frequencies of the statistics neologisms t in B, can be adopted
Use following scheme:
First every document in document sets (A or B) is segmented, each word is then counted and occurs in how many documents
Cross, document frequency of the document number for thus counting to obtain i.e. as the word.
Step S103, obtain document frequency of the default everyday words in first document sets and the second document sets
Corresponding fit correlation;
Step S104, according to the document frequency of the corresponding fit correlation and default neologisms in second document sets
Rate, obtain document frequency of the default neologisms in first document sets.
In above-mentioned steps 103 and step S104, document frequency DF_s of each everyday words w in magnanimity document sets A is being got
After document frequency DF_B_w in A_w and new document sets B, document of the analysis everyday words in magnanimity document sets A and new document sets B
Frequency relation.
First, by document frequency of all everyday words in magnanimity document sets A from being as low as ranked up greatly, the sequence that sorts is obtained
Row;Then the collating sequence is segmented in units of group;Here be section gap with 100, i.e. 0-100 is one group,
101-200 is one group, and the rest may be inferred.
Afterwards in units of group, the average DF_B_w of all everyday words in each group is calculated;Then, it is averaged with each group
DF_B_w draws, drafting obtains document frequency matched curve as abscissa by ordinate of the ranking value at this group of center.Its
In, the document frequency scatter diagram that the data based on preceding 50 groups obtain is as shown in Figure 2.
From the scatterplot it can be seen from the figure that shown in Fig. 2:Document frequency of the everyday words in magnanimity document sets A and new document sets B
Both rates are present close to linear fit correlation, exist between this document frequency of explanation everyday words in two document sets A and B
Linear relationship.
It finally can also become everyday words in view of neologisms and settle out, therefore the document with neologisms in new document sets B
Frequency DF_B_t is abscissa, and the ordinate value obtained using the scatter diagram shown in Fig. 2 is neologisms in magnanimity document sets A
Document frequency DF_A_t.
It is big that error caused by the statistics of magnanimity collection of document is only based on compared to traditional document frequency computational methods
The defects of, the present embodiment improves the accuracy rate that neologisms document frequency counts, so as to compensate for traditional system by such scheme
The defects of meter method;And the present embodiment for neologisms feature selecting, keyword abstraction, vector space model represent etc. technology
The application in field is significant.
As shown in figure 3, present pre-ferred embodiments propose a kind of device for estimating neologisms document frequency, including:Document sets
Acquisition module 201, statistical module 202, fit correlation acquisition module 203 and neologisms document frequency acquisition module 204, wherein:
Document sets acquisition module 201, for obtaining the first document sets and the second document sets;First document sets are included
Document data generation time earlier than second document sets;
Statistical module 202, for counting each default everyday words respectively in first document sets and the second document sets
Document frequency;Count document frequency of each default neologisms in second document sets;
Fit correlation acquisition module 203, for obtaining the default everyday words in first document sets and the second document
The corresponding fit correlation of the document frequency of concentration;
Neologisms document frequency acquisition module 204, for according to the corresponding fit correlation and default neologisms described the
Document frequency in two document sets, obtain document frequency of the default neologisms in first document sets.
Because neologisms are often only present in the high page of timeliness n, and traditional document based on magnanimity document sets statistics
There is larger error in frequency calculation method, the present embodiment introduces new document sets concept, and is based on magnanimity document sets and new document sets
To estimate document frequency of the neologisms in magnanimity document sets.
Specifically, first, magnanimity document sets A (i.e. the first document sets alleged by the present embodiment) and new document sets B are determined (i.e. originally
Second document sets alleged by embodiment) two collection of document, wherein:
Preferably, magnanimity document sets A includes about 1,000,000 documents altogether, is selected at random from full dose document;Magnanimity
Document in document sets A is essentially the data before 2 years.
New document sets B includes about 50,000 documents altogether, can be captured from major portal website's homepage;In new document sets B
Document is essentially the data within nearest one month.
It should be noted that before the generation time of the document data in above-mentioned magnanimity document sets A can also be not limited to 2 years,
For example it can also wait the year before;The generation time of document data in above-mentioned new document sets B can also be not limited to nearest one
Within month, for example can also be within first quarter moon, etc..
Then, document frequency of each default everyday words in first document sets and the second document sets is counted respectively;
Count document frequency of each default neologisms in second document sets.
Wherein, default everyday words refers to the word often occurred, and the everyday words defined at present there are about 70,000;Default neologisms are
Refer to and developed based on Internet technology and appear in the word in the high document of timeliness n, neologisms are typically accompanied by focus incident, focus people
Thing and produce, its existence time is shorter.
Everyday words is set as w, neologisms t, it is determined that after two document sets A and B, count respectively each everyday words w in A and
Document frequency in B, is expressed as DF_A_w and DF_B_w, and wherein DF_A_w is everyday words w in the true of magnanimity document sets A
Document frequency, DF_B_w are used to continue to make comparisons with neologisms in new document sets B.
In addition, document frequency DF_B_ts of each neologisms t in new document sets B is also counted, subsequently to be commonly used
After the corresponding fit correlation of document frequency of the word in magnanimity document sets A and new document sets B, according to neologisms t in new document sets B
Document frequency DF_B_t obtain document frequency DF_A_t of the neologisms in magnanimity document sets A.
Document frequencies of the above-mentioned statistics everyday words w in A and B, and document frequencies of the statistics neologisms t in B, can be adopted
Use following scheme:
First every document in document sets (A or B) is segmented, each word is then counted and occurs in how many documents
Cross, document frequency of the document number for thus counting to obtain i.e. as the word.
Getting the frequency of the document in document frequency DF_A_w and new document sets B of each everyday words w in magnanimity document sets A
After rate DF_B_w, document frequency relation of the analysis everyday words in magnanimity document sets A and new document sets B.
First, by document frequency of all everyday words in magnanimity document sets A from being as low as ranked up greatly, the sequence that sorts is obtained
Row;Then the collating sequence is segmented in units of group;Here be section gap with 100, i.e. 0-100 is one group,
101-200 is one group, and the rest may be inferred.
Afterwards in units of group, the average DF_B_w of all everyday words in each group is calculated;Then, it is averaged with each group
DF_B_w draws, drafting obtains document frequency matched curve as abscissa by ordinate of the ranking value at this group of center.Its
In, the document frequency scatter diagram that the data based on preceding 50 groups obtain is as shown in Figure 2.
From the scatterplot it can be seen from the figure that shown in Fig. 2:Document frequency of the everyday words in magnanimity document sets A and new document sets B
Both rates are present close to linear fit correlation, exist between this document frequency of explanation everyday words in two document sets A and B
Linear relationship.
It finally can also become everyday words in view of neologisms and settle out, therefore the document with neologisms in new document sets B
Frequency DF_B_t is abscissa, and the ordinate value obtained using the scatter diagram shown in Fig. 2 is neologisms in magnanimity document sets A
Document frequency DF_A_t.
In specific implementation process, as shown in figure 4, above-mentioned fit correlation acquisition module 203 can include:Sequencing unit
2031st, segmenting unit 2032, computing unit 2033 and drawing unit 2034, wherein:
Sequencing unit 2031, for by document frequency of all default everyday words in first document sets from as low as big
It is ranked up, obtains collating sequence;
Segmenting unit 2032, for being segmented to the collating sequence in units of group;
Computing unit 2033, for calculating average text of all default everyday words in second document sets in each group
Shelves frequency;
Drawing unit 2034, for using each group of the average document frequency as abscissa, with the row at this group of center
Sequence value is ordinate, and drafting obtains document frequency matched curve.
The embodiment of the present invention estimates the method and device of neologisms document frequency, by determining magnanimity document sets (the first document
Collection) and new document sets (the second document sets), and document frequency of the everyday words in magnanimity document sets and new document sets is counted, then seek
The relation looked between the two document frequencies, finally the document frequency using neologisms in new document sets is literary in magnanimity to estimate it
The document frequency that shelves are concentrated, the accuracy rate of neologisms document frequency statistics is which thereby enhanced, so as to compensate for traditional statistical method
For neologisms document frequency statistical result error it is larger the defects of;And the present invention for neologisms in feature selecting, keyword
The application of the technical fields such as extraction, vector space model expression is significant.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the scope of the invention, every utilization
Equivalent structure or the flow conversion that description of the invention and accompanying drawing content are made, or directly or indirectly it is used in other related skills
Art field, is included within the scope of the present invention.
Claims (10)
- A kind of 1. method for estimating neologisms document frequency, it is characterised in that including:Obtain the first document sets and the second document sets;The document data generation time that first document sets are included is earlier than described Second document sets;Document frequency of each default everyday words in first document sets and the second document sets is counted respectively;Count each pre- If document frequency of the neologisms in second document sets;Obtain the corresponding fit correlation of document frequency of the default everyday words in first document sets and the second document sets;According to the document frequency of the corresponding fit correlation and default neologisms in second document sets, obtain described default Document frequency of the neologisms in first document sets.
- 2. according to the method for claim 1, it is characterised in that it is described obtain default everyday words in first document sets and The step of corresponding fit correlation of document frequency in second document sets, includes:By document frequency of all default everyday words in first document sets from being as low as ranked up greatly, the sequence that sorts is obtained Row;The collating sequence is segmented in units of group;Calculate average document frequency of all default everyday words in second document sets in each group;Using each group of the average document frequency as abscissa, using the ranking value at this group of center as ordinate, drafting obtains Document frequency matched curve.
- 3. according to the method for claim 2, it is characterised in that it is described according to corresponding fit correlation and default neologisms in institute The step of stating the document frequency in the second document sets, obtaining document frequency of the default neologisms in first document sets is wrapped Include:Using document frequency of the default neologisms in second document sets as abscissa, from the document frequency matched curve Ordinate corresponding to middle lookup, it is the default document frequency of the neologisms in first document sets.
- 4. according to the method described in claim 1,2 or 3, it is characterised in that the first document sets of the acquisition and the second document sets The step of include:The magnanimity document of the first predetermined quantity is selected at random from given full dose document, as first document sets;From pre- The new document of the second predetermined quantity is captured in fixed portal website's homepage, as second document sets;First predetermined number Amount is more than second predetermined quantity.
- 5. according to the method for claim 4, it is characterised in that the document data generation time in first document sets is extremely It is more than 2 years less;Document data generation time in second document sets is within January.
- A kind of 6. device for estimating neologisms document frequency, it is characterised in that including:Document sets acquisition module, for obtaining the first document sets and the second document sets;The document that first document sets are included Data generation time is earlier than second document sets;Statistical module, for counting document frequency of each default everyday words in first document sets and the second document sets respectively Rate;Count document frequency of each default neologisms in second document sets;Fit correlation acquisition module, for obtaining text of the default everyday words in first document sets and the second document sets The corresponding fit correlation of shelves frequency;Neologisms document frequency acquisition module, for according to the corresponding fit correlation and default neologisms in second document sets In document frequency, obtain the document frequency of the default neologisms in first document sets.
- 7. device according to claim 6, it is characterised in that the fit correlation acquisition module includes:Sequencing unit, for document frequency of all default everyday words in first document sets to be arranged from as low as big Sequence, obtain collating sequence;Segmenting unit, for being segmented to the collating sequence in units of group;Computing unit, for calculating average document frequency of all default everyday words in second document sets in each group;Drawing unit, for being vertical using the ranking value at this group of center using each group of the average document frequency as abscissa Coordinate, drafting obtain document frequency matched curve.
- 8. device according to claim 7, it is characterised in that the neologisms document frequency acquisition module is additionally operable to described Default document frequency of the neologisms in second document sets is abscissa, is searched from the document frequency matched curve corresponding Ordinate, be the default document frequency of the neologisms in first document sets.
- 9. according to the device described in claim 6,7 or 8, it is characterised in that the document sets acquisition module is additionally operable to from given Full dose document in select the magnanimity document of the first predetermined quantity at random, as first document sets;From predetermined portal The new document of the second predetermined quantity is captured in homepage of standing, as second document sets;First predetermined quantity is more than described Second predetermined quantity.
- 10. device according to claim 9, it is characterised in that the document data generation time in first document sets At least more than 2 years;Document data generation time in second document sets is within January.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210566103.5A CN103885989B (en) | 2012-12-24 | 2012-12-24 | Estimate the method and device of neologisms document frequency |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210566103.5A CN103885989B (en) | 2012-12-24 | 2012-12-24 | Estimate the method and device of neologisms document frequency |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103885989A CN103885989A (en) | 2014-06-25 |
CN103885989B true CN103885989B (en) | 2017-12-01 |
Family
ID=50954884
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210566103.5A Active CN103885989B (en) | 2012-12-24 | 2012-12-24 | Estimate the method and device of neologisms document frequency |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103885989B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108241611B (en) * | 2016-12-26 | 2021-08-17 | 北京国双科技有限公司 | Keyword extraction method and extraction equipment |
CN112883186B (en) * | 2019-11-29 | 2024-04-12 | 智慧芽信息科技(苏州)有限公司 | Method, system, equipment and storage medium for generating information map |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6549897B1 (en) * | 1998-10-09 | 2003-04-15 | Microsoft Corporation | Method and system for calculating phrase-document importance |
WO2007005742A2 (en) * | 2005-07-01 | 2007-01-11 | Ebrary, Inc. | Method and apparatus for document clustering and document sketching |
CN101196904A (en) * | 2007-11-09 | 2008-06-11 | 清华大学 | News keyword abstraction method based on word frequency and multi-component grammar |
CN102662952A (en) * | 2012-03-02 | 2012-09-12 | 成都康赛电子科大信息技术有限责任公司 | Chinese text parallel data mining method based on hierarchy |
-
2012
- 2012-12-24 CN CN201210566103.5A patent/CN103885989B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6549897B1 (en) * | 1998-10-09 | 2003-04-15 | Microsoft Corporation | Method and system for calculating phrase-document importance |
WO2007005742A2 (en) * | 2005-07-01 | 2007-01-11 | Ebrary, Inc. | Method and apparatus for document clustering and document sketching |
CN101196904A (en) * | 2007-11-09 | 2008-06-11 | 清华大学 | News keyword abstraction method based on word frequency and multi-component grammar |
CN102662952A (en) * | 2012-03-02 | 2012-09-12 | 成都康赛电子科大信息技术有限责任公司 | Chinese text parallel data mining method based on hierarchy |
Non-Patent Citations (2)
Title |
---|
一个中文文本自动分类数学模型;曹素青等;《情报学报》;19990228;全文 * |
最小二乘原理及其matlab实现;刘志平等;《开发应用》;20080630;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN103885989A (en) | 2014-06-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107644089B (en) | Hot event extraction method based on network media | |
Minami et al. | The Lewis turning point of Chinese economy: Comparison with Japanese experience | |
Song et al. | Review of environmental efficiency and its influencing factors in China: 1998–2009 | |
CN104866572B (en) | A kind of network short text clustering method | |
CN104504124B (en) | Go out the method for entity temperature by video search and broadcasting behavior expression | |
CN102929928A (en) | Multidimensional-similarity-based personalized news recommendation method | |
CN102426590B (en) | Quality evaluation method and device | |
CN103164427A (en) | Method and device of news aggregation | |
CN105426514A (en) | Personalized mobile APP recommendation method | |
CN104462383A (en) | Movie recommendation method based on feedback of users' various behaviors | |
CN103309894B (en) | Based on search implementation method and the system of user property | |
CN105023178B (en) | A kind of electronic commerce recommending method based on ontology | |
CN107391670A (en) | A kind of mixing recommendation method for merging collaborative filtering and user property filtering | |
CN104598450A (en) | Popularity analysis method and system of network public opinion event | |
CN102880712A (en) | Method and system for sequencing searched network videos | |
CN109325117A (en) | Social security events detection method in a kind of microblogging of multiple features fusion | |
Holtkamp et al. | Regional patterns of food safety in China: What can we learn from media data? | |
CN105338408B (en) | Video recommendation method based on time factor | |
CN107277115A (en) | A kind of content delivery method and device | |
CN108132964A (en) | A kind of collaborative filtering method to be scored based on user item class | |
CN106919699A (en) | A kind of recommendation method for personalized information towards large-scale consumer | |
CN105740480A (en) | Air ticket recommending method and system | |
CN103885989B (en) | Estimate the method and device of neologisms document frequency | |
CN109508407A (en) | The tv product recommended method of time of fusion and Interest Similarity | |
CN103870452A (en) | Method and method for recommending data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |