CN103885989B - Estimate the method and device of neologisms document frequency - Google Patents

Estimate the method and device of neologisms document frequency Download PDF

Info

Publication number
CN103885989B
CN103885989B CN201210566103.5A CN201210566103A CN103885989B CN 103885989 B CN103885989 B CN 103885989B CN 201210566103 A CN201210566103 A CN 201210566103A CN 103885989 B CN103885989 B CN 103885989B
Authority
CN
China
Prior art keywords
document
sets
frequency
neologisms
document sets
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210566103.5A
Other languages
Chinese (zh)
Other versions
CN103885989A (en
Inventor
蔡兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Wuhan Co Ltd
Original Assignee
Tencent Technology Wuhan Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Wuhan Co Ltd filed Critical Tencent Technology Wuhan Co Ltd
Priority to CN201210566103.5A priority Critical patent/CN103885989B/en
Publication of CN103885989A publication Critical patent/CN103885989A/en
Application granted granted Critical
Publication of CN103885989B publication Critical patent/CN103885989B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses a kind of method and device for estimating neologisms document frequency, and its method includes:Obtain the first document sets and the second document sets;The document data generation time that first document sets are included is earlier than second document sets;Document frequency of each default everyday words in the first document sets and the second document sets is counted respectively;Count document frequency of each default neologisms in the second document sets;Obtain the corresponding fit correlation of default document frequency of the everyday words in the first document sets and the second document sets;According to the document frequency of corresponding fit correlation and default neologisms in the second document sets, default document frequency of the neologisms in the first document sets is obtained.The present invention improve neologisms document frequency statistics accuracy rate, compensate for traditional statistical method for the document frequency statistical result error of neologisms it is larger the defects of;And the present invention is significant in the application of the technical fields such as feature selecting, keyword abstraction, vector space model expression for neologisms.

Description

Estimate the method and device of neologisms document frequency
Technical field
The present invention relates to Internet technical field, more particularly to a kind of method and device for estimating neologisms document frequency.
Background technology
With the development of Internet technology, neologisms are increasing, and it is more and more common that it has been increasingly becoming internet arena One phenomenon.Neologisms are called unregistered word, never occur before referring to, and significant word popular recently.Neologisms one As with focus incident, focus personage and produce, be the skills such as text classification, keyword abstraction often with great information content The indispensable characteristic item of art.And document frequency (DF, Document Frequency) as a kind of classical measure information because Son, also it is widely used in these correlative technology fields, such as vector space model, feature selecting, feature weight etc..
Generally, document frequency refers to the document number that a word occurs in magnanimity collection of document.Traditional document frequency Computational methods are generally based on the statistics of magnanimity collection of document.Substantially method is that first random screening goes out one from full dose document for it Then every document sets are segmented by the document sets of larger amt (such as 1,000,000), and count each word in how many documents Middle appearance, the document number thus counted the just document frequency as the word.
This method based on magnanimity collection of document statistics is more stable, more accurate for the document frequency of everyday words, But because neologisms are only present in the high document of few timeliness n, document frequency of traditional this statistical method for neologisms Rate statistical result error is larger, can typically be significantly less than its actual value.
Therefore, traditional document frequency computational methods based on magnanimity document sets statistics are less applicable neologisms, find more preferable Neologisms document frequency computational methods be particularly important.
The content of the invention
It is a primary object of the present invention to provide a kind of method and device for estimating neologisms document frequency, it is intended to improve neologisms The accuracy rate of document frequency statistics.
In order to achieve the above object, the present invention proposes a kind of method for estimating neologisms document frequency, including:
Obtain the first document sets and the second document sets;The document data generation time that first document sets are included earlier than Second document sets;
Document frequency of each default everyday words in first document sets and the second document sets is counted respectively;Statistics is every One default document frequency of the neologisms in second document sets;
Obtain the corresponding fitting of document frequency of the default everyday words in first document sets and the second document sets Relation;
According to the document frequency of the corresponding fit correlation and default neologisms in second document sets, described in acquisition Default document frequency of the neologisms in first document sets.
The present invention also proposes a kind of device for estimating neologisms document frequency, including:
Document sets acquisition module, for obtaining the first document sets and the second document sets;What first document sets were included Document data generation time is earlier than second document sets;
Statistical module, for counting text of each default everyday words in first document sets and the second document sets respectively Shelves frequency;Count document frequency of each default neologisms in second document sets;
Fit correlation acquisition module, for obtaining the default everyday words in first document sets and the second document sets Document frequency corresponding fit correlation;
Neologisms document frequency acquisition module, for literary described second according to the corresponding fit correlation and default neologisms The document frequency that shelves are concentrated, obtain document frequency of the default neologisms in first document sets.
A kind of method and device for estimating neologisms document frequency proposed by the present invention, by determining magnanimity document sets (first Document sets) and new document sets (the second document sets), and document frequency of the everyday words in magnanimity document sets and new document sets is counted, The relation between the two document frequencies is found again, finally estimates it in sea using document frequency of the neologisms in new document sets The document frequency in document sets is measured, the accuracy rate of neologisms document frequency statistics is which thereby enhanced, so as to compensate for traditional statistics Method for the document frequency statistical result error of neologisms it is larger the defects of;And the present invention for neologisms feature selecting, close The application for the technical fields such as keyword extracts, vector space model represents is significant.
Brief description of the drawings
Fig. 1 is the schematic flow sheet for the method preferred embodiment that the present invention estimates neologisms document frequency;
Fig. 2 is a kind of document frequency scatterplot of example in the method preferred embodiment of the invention for estimating neologisms document frequency Figure;
Fig. 3 is the structural representation for the device preferred embodiment that the present invention estimates neologisms document frequency;
Fig. 4 is that the structure of fit correlation acquisition module in the device preferred embodiment of the invention for estimating neologisms document frequency is shown It is intended to.
In order that technical scheme is clearer, clear, it is described in further detail below in conjunction with accompanying drawing.
Embodiment
The solution of the embodiment of the present invention is mainly:By determining magnanimity document sets (the first document sets) and new document sets (the second document sets), and document frequency of the everyday words in magnanimity document sets and new document sets is counted, then find the two documents Relation between frequency, finally estimate its document in magnanimity document sets using document frequency of the neologisms in new document sets Frequency, to improve the accuracy rate of neologisms document frequency statistics, the document frequency for making up traditional statistical method for neologisms counts The defects of resultant error is larger.
As shown in figure 1, present pre-ferred embodiments propose a kind of method for estimating neologisms document frequency, including:
Step S101, obtain the first document sets and the second document sets;The document data production that first document sets are included The raw time is earlier than second document sets;
Because neologisms are often only present in the high page of timeliness n, and traditional document based on magnanimity document sets statistics There is larger error in frequency calculation method, the present embodiment introduces new document sets concept, and is based on magnanimity document sets and new document sets To estimate document frequency of the neologisms in magnanimity document sets.
Specifically, first, magnanimity document sets A (i.e. the first document sets alleged by the present embodiment) and new document sets B are determined (i.e. originally Second document sets alleged by embodiment) two collection of document, wherein:
Preferably, magnanimity document sets A includes about 1,000,000 documents altogether, is selected at random from full dose document;Magnanimity Document in document sets A is essentially the data before 2 years.
New document sets B includes about 50,000 documents altogether, can be captured from major portal website's homepage;In new document sets B Document is essentially the data within nearest one month.
It should be noted that before the generation time of the document data in above-mentioned magnanimity document sets A can also be not limited to 2 years, For example it can also wait the year before;The generation time of document data in above-mentioned new document sets B can also be not limited to nearest one Within month, for example can also be within first quarter moon, etc..
Step S102, document frequency of each default everyday words in first document sets and the second document sets is counted respectively Rate;Count document frequency of each default neologisms in second document sets;
Wherein, default everyday words refers to the word often occurred, and the everyday words defined at present there are about 70,000;Default neologisms are Refer to and developed based on Internet technology and appear in the word in the high document of timeliness n, neologisms are typically accompanied by focus incident, focus people Thing and produce, its existence time is shorter.
Everyday words is set as w, neologisms t, it is determined that after two document sets A and B, count respectively each everyday words w in A and Document frequency in B, is expressed as DF_A_w and DF_B_w, and wherein DF_A_w is everyday words w in the true of magnanimity document sets A Document frequency, DF_B_w are used to continue to make comparisons with neologisms in new document sets B.
In addition, document frequency DF_B_ts of each neologisms t in new document sets B is also counted, subsequently to be commonly used After the corresponding fit correlation of document frequency of the word in magnanimity document sets A and new document sets B, according to neologisms t in new document sets B Document frequency DF_B_t obtain document frequency DF_A_t of the neologisms in magnanimity document sets A.
Document frequencies of the above-mentioned statistics everyday words w in A and B, and document frequencies of the statistics neologisms t in B, can be adopted Use following scheme:
First every document in document sets (A or B) is segmented, each word is then counted and occurs in how many documents Cross, document frequency of the document number for thus counting to obtain i.e. as the word.
Step S103, obtain document frequency of the default everyday words in first document sets and the second document sets Corresponding fit correlation;
Step S104, according to the document frequency of the corresponding fit correlation and default neologisms in second document sets Rate, obtain document frequency of the default neologisms in first document sets.
In above-mentioned steps 103 and step S104, document frequency DF_s of each everyday words w in magnanimity document sets A is being got After document frequency DF_B_w in A_w and new document sets B, document of the analysis everyday words in magnanimity document sets A and new document sets B Frequency relation.
First, by document frequency of all everyday words in magnanimity document sets A from being as low as ranked up greatly, the sequence that sorts is obtained Row;Then the collating sequence is segmented in units of group;Here be section gap with 100, i.e. 0-100 is one group, 101-200 is one group, and the rest may be inferred.
Afterwards in units of group, the average DF_B_w of all everyday words in each group is calculated;Then, it is averaged with each group DF_B_w draws, drafting obtains document frequency matched curve as abscissa by ordinate of the ranking value at this group of center.Its In, the document frequency scatter diagram that the data based on preceding 50 groups obtain is as shown in Figure 2.
From the scatterplot it can be seen from the figure that shown in Fig. 2:Document frequency of the everyday words in magnanimity document sets A and new document sets B Both rates are present close to linear fit correlation, exist between this document frequency of explanation everyday words in two document sets A and B Linear relationship.
It finally can also become everyday words in view of neologisms and settle out, therefore the document with neologisms in new document sets B Frequency DF_B_t is abscissa, and the ordinate value obtained using the scatter diagram shown in Fig. 2 is neologisms in magnanimity document sets A Document frequency DF_A_t.
It is big that error caused by the statistics of magnanimity collection of document is only based on compared to traditional document frequency computational methods The defects of, the present embodiment improves the accuracy rate that neologisms document frequency counts, so as to compensate for traditional system by such scheme The defects of meter method;And the present embodiment for neologisms feature selecting, keyword abstraction, vector space model represent etc. technology The application in field is significant.
As shown in figure 3, present pre-ferred embodiments propose a kind of device for estimating neologisms document frequency, including:Document sets Acquisition module 201, statistical module 202, fit correlation acquisition module 203 and neologisms document frequency acquisition module 204, wherein:
Document sets acquisition module 201, for obtaining the first document sets and the second document sets;First document sets are included Document data generation time earlier than second document sets;
Statistical module 202, for counting each default everyday words respectively in first document sets and the second document sets Document frequency;Count document frequency of each default neologisms in second document sets;
Fit correlation acquisition module 203, for obtaining the default everyday words in first document sets and the second document The corresponding fit correlation of the document frequency of concentration;
Neologisms document frequency acquisition module 204, for according to the corresponding fit correlation and default neologisms described the Document frequency in two document sets, obtain document frequency of the default neologisms in first document sets.
Because neologisms are often only present in the high page of timeliness n, and traditional document based on magnanimity document sets statistics There is larger error in frequency calculation method, the present embodiment introduces new document sets concept, and is based on magnanimity document sets and new document sets To estimate document frequency of the neologisms in magnanimity document sets.
Specifically, first, magnanimity document sets A (i.e. the first document sets alleged by the present embodiment) and new document sets B are determined (i.e. originally Second document sets alleged by embodiment) two collection of document, wherein:
Preferably, magnanimity document sets A includes about 1,000,000 documents altogether, is selected at random from full dose document;Magnanimity Document in document sets A is essentially the data before 2 years.
New document sets B includes about 50,000 documents altogether, can be captured from major portal website's homepage;In new document sets B Document is essentially the data within nearest one month.
It should be noted that before the generation time of the document data in above-mentioned magnanimity document sets A can also be not limited to 2 years, For example it can also wait the year before;The generation time of document data in above-mentioned new document sets B can also be not limited to nearest one Within month, for example can also be within first quarter moon, etc..
Then, document frequency of each default everyday words in first document sets and the second document sets is counted respectively; Count document frequency of each default neologisms in second document sets.
Wherein, default everyday words refers to the word often occurred, and the everyday words defined at present there are about 70,000;Default neologisms are Refer to and developed based on Internet technology and appear in the word in the high document of timeliness n, neologisms are typically accompanied by focus incident, focus people Thing and produce, its existence time is shorter.
Everyday words is set as w, neologisms t, it is determined that after two document sets A and B, count respectively each everyday words w in A and Document frequency in B, is expressed as DF_A_w and DF_B_w, and wherein DF_A_w is everyday words w in the true of magnanimity document sets A Document frequency, DF_B_w are used to continue to make comparisons with neologisms in new document sets B.
In addition, document frequency DF_B_ts of each neologisms t in new document sets B is also counted, subsequently to be commonly used After the corresponding fit correlation of document frequency of the word in magnanimity document sets A and new document sets B, according to neologisms t in new document sets B Document frequency DF_B_t obtain document frequency DF_A_t of the neologisms in magnanimity document sets A.
Document frequencies of the above-mentioned statistics everyday words w in A and B, and document frequencies of the statistics neologisms t in B, can be adopted Use following scheme:
First every document in document sets (A or B) is segmented, each word is then counted and occurs in how many documents Cross, document frequency of the document number for thus counting to obtain i.e. as the word.
Getting the frequency of the document in document frequency DF_A_w and new document sets B of each everyday words w in magnanimity document sets A After rate DF_B_w, document frequency relation of the analysis everyday words in magnanimity document sets A and new document sets B.
First, by document frequency of all everyday words in magnanimity document sets A from being as low as ranked up greatly, the sequence that sorts is obtained Row;Then the collating sequence is segmented in units of group;Here be section gap with 100, i.e. 0-100 is one group, 101-200 is one group, and the rest may be inferred.
Afterwards in units of group, the average DF_B_w of all everyday words in each group is calculated;Then, it is averaged with each group DF_B_w draws, drafting obtains document frequency matched curve as abscissa by ordinate of the ranking value at this group of center.Its In, the document frequency scatter diagram that the data based on preceding 50 groups obtain is as shown in Figure 2.
From the scatterplot it can be seen from the figure that shown in Fig. 2:Document frequency of the everyday words in magnanimity document sets A and new document sets B Both rates are present close to linear fit correlation, exist between this document frequency of explanation everyday words in two document sets A and B Linear relationship.
It finally can also become everyday words in view of neologisms and settle out, therefore the document with neologisms in new document sets B Frequency DF_B_t is abscissa, and the ordinate value obtained using the scatter diagram shown in Fig. 2 is neologisms in magnanimity document sets A Document frequency DF_A_t.
In specific implementation process, as shown in figure 4, above-mentioned fit correlation acquisition module 203 can include:Sequencing unit 2031st, segmenting unit 2032, computing unit 2033 and drawing unit 2034, wherein:
Sequencing unit 2031, for by document frequency of all default everyday words in first document sets from as low as big It is ranked up, obtains collating sequence;
Segmenting unit 2032, for being segmented to the collating sequence in units of group;
Computing unit 2033, for calculating average text of all default everyday words in second document sets in each group Shelves frequency;
Drawing unit 2034, for using each group of the average document frequency as abscissa, with the row at this group of center Sequence value is ordinate, and drafting obtains document frequency matched curve.
The embodiment of the present invention estimates the method and device of neologisms document frequency, by determining magnanimity document sets (the first document Collection) and new document sets (the second document sets), and document frequency of the everyday words in magnanimity document sets and new document sets is counted, then seek The relation looked between the two document frequencies, finally the document frequency using neologisms in new document sets is literary in magnanimity to estimate it The document frequency that shelves are concentrated, the accuracy rate of neologisms document frequency statistics is which thereby enhanced, so as to compensate for traditional statistical method For neologisms document frequency statistical result error it is larger the defects of;And the present invention for neologisms in feature selecting, keyword The application of the technical fields such as extraction, vector space model expression is significant.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the scope of the invention, every utilization Equivalent structure or the flow conversion that description of the invention and accompanying drawing content are made, or directly or indirectly it is used in other related skills Art field, is included within the scope of the present invention.

Claims (10)

  1. A kind of 1. method for estimating neologisms document frequency, it is characterised in that including:
    Obtain the first document sets and the second document sets;The document data generation time that first document sets are included is earlier than described Second document sets;
    Document frequency of each default everyday words in first document sets and the second document sets is counted respectively;Count each pre- If document frequency of the neologisms in second document sets;
    Obtain the corresponding fit correlation of document frequency of the default everyday words in first document sets and the second document sets;
    According to the document frequency of the corresponding fit correlation and default neologisms in second document sets, obtain described default Document frequency of the neologisms in first document sets.
  2. 2. according to the method for claim 1, it is characterised in that it is described obtain default everyday words in first document sets and The step of corresponding fit correlation of document frequency in second document sets, includes:
    By document frequency of all default everyday words in first document sets from being as low as ranked up greatly, the sequence that sorts is obtained Row;
    The collating sequence is segmented in units of group;
    Calculate average document frequency of all default everyday words in second document sets in each group;
    Using each group of the average document frequency as abscissa, using the ranking value at this group of center as ordinate, drafting obtains Document frequency matched curve.
  3. 3. according to the method for claim 2, it is characterised in that it is described according to corresponding fit correlation and default neologisms in institute The step of stating the document frequency in the second document sets, obtaining document frequency of the default neologisms in first document sets is wrapped Include:
    Using document frequency of the default neologisms in second document sets as abscissa, from the document frequency matched curve Ordinate corresponding to middle lookup, it is the default document frequency of the neologisms in first document sets.
  4. 4. according to the method described in claim 1,2 or 3, it is characterised in that the first document sets of the acquisition and the second document sets The step of include:
    The magnanimity document of the first predetermined quantity is selected at random from given full dose document, as first document sets;From pre- The new document of the second predetermined quantity is captured in fixed portal website's homepage, as second document sets;First predetermined number Amount is more than second predetermined quantity.
  5. 5. according to the method for claim 4, it is characterised in that the document data generation time in first document sets is extremely It is more than 2 years less;Document data generation time in second document sets is within January.
  6. A kind of 6. device for estimating neologisms document frequency, it is characterised in that including:
    Document sets acquisition module, for obtaining the first document sets and the second document sets;The document that first document sets are included Data generation time is earlier than second document sets;
    Statistical module, for counting document frequency of each default everyday words in first document sets and the second document sets respectively Rate;Count document frequency of each default neologisms in second document sets;
    Fit correlation acquisition module, for obtaining text of the default everyday words in first document sets and the second document sets The corresponding fit correlation of shelves frequency;
    Neologisms document frequency acquisition module, for according to the corresponding fit correlation and default neologisms in second document sets In document frequency, obtain the document frequency of the default neologisms in first document sets.
  7. 7. device according to claim 6, it is characterised in that the fit correlation acquisition module includes:
    Sequencing unit, for document frequency of all default everyday words in first document sets to be arranged from as low as big Sequence, obtain collating sequence;
    Segmenting unit, for being segmented to the collating sequence in units of group;
    Computing unit, for calculating average document frequency of all default everyday words in second document sets in each group;
    Drawing unit, for being vertical using the ranking value at this group of center using each group of the average document frequency as abscissa Coordinate, drafting obtain document frequency matched curve.
  8. 8. device according to claim 7, it is characterised in that the neologisms document frequency acquisition module is additionally operable to described Default document frequency of the neologisms in second document sets is abscissa, is searched from the document frequency matched curve corresponding Ordinate, be the default document frequency of the neologisms in first document sets.
  9. 9. according to the device described in claim 6,7 or 8, it is characterised in that the document sets acquisition module is additionally operable to from given Full dose document in select the magnanimity document of the first predetermined quantity at random, as first document sets;From predetermined portal The new document of the second predetermined quantity is captured in homepage of standing, as second document sets;First predetermined quantity is more than described Second predetermined quantity.
  10. 10. device according to claim 9, it is characterised in that the document data generation time in first document sets At least more than 2 years;Document data generation time in second document sets is within January.
CN201210566103.5A 2012-12-24 2012-12-24 Estimate the method and device of neologisms document frequency Active CN103885989B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210566103.5A CN103885989B (en) 2012-12-24 2012-12-24 Estimate the method and device of neologisms document frequency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210566103.5A CN103885989B (en) 2012-12-24 2012-12-24 Estimate the method and device of neologisms document frequency

Publications (2)

Publication Number Publication Date
CN103885989A CN103885989A (en) 2014-06-25
CN103885989B true CN103885989B (en) 2017-12-01

Family

ID=50954884

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210566103.5A Active CN103885989B (en) 2012-12-24 2012-12-24 Estimate the method and device of neologisms document frequency

Country Status (1)

Country Link
CN (1) CN103885989B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108241611B (en) * 2016-12-26 2021-08-17 北京国双科技有限公司 Keyword extraction method and extraction equipment
CN112883186B (en) * 2019-11-29 2024-04-12 智慧芽信息科技(苏州)有限公司 Method, system, equipment and storage medium for generating information map

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6549897B1 (en) * 1998-10-09 2003-04-15 Microsoft Corporation Method and system for calculating phrase-document importance
WO2007005742A2 (en) * 2005-07-01 2007-01-11 Ebrary, Inc. Method and apparatus for document clustering and document sketching
CN101196904A (en) * 2007-11-09 2008-06-11 清华大学 News keyword abstraction method based on word frequency and multi-component grammar
CN102662952A (en) * 2012-03-02 2012-09-12 成都康赛电子科大信息技术有限责任公司 Chinese text parallel data mining method based on hierarchy

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6549897B1 (en) * 1998-10-09 2003-04-15 Microsoft Corporation Method and system for calculating phrase-document importance
WO2007005742A2 (en) * 2005-07-01 2007-01-11 Ebrary, Inc. Method and apparatus for document clustering and document sketching
CN101196904A (en) * 2007-11-09 2008-06-11 清华大学 News keyword abstraction method based on word frequency and multi-component grammar
CN102662952A (en) * 2012-03-02 2012-09-12 成都康赛电子科大信息技术有限责任公司 Chinese text parallel data mining method based on hierarchy

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一个中文文本自动分类数学模型;曹素青等;《情报学报》;19990228;全文 *
最小二乘原理及其matlab实现;刘志平等;《开发应用》;20080630;全文 *

Also Published As

Publication number Publication date
CN103885989A (en) 2014-06-25

Similar Documents

Publication Publication Date Title
CN107644089B (en) Hot event extraction method based on network media
Minami et al. The Lewis turning point of Chinese economy: Comparison with Japanese experience
Song et al. Review of environmental efficiency and its influencing factors in China: 1998–2009
CN104866572B (en) A kind of network short text clustering method
CN104504124B (en) Go out the method for entity temperature by video search and broadcasting behavior expression
CN102929928A (en) Multidimensional-similarity-based personalized news recommendation method
CN102426590B (en) Quality evaluation method and device
CN103164427A (en) Method and device of news aggregation
CN105426514A (en) Personalized mobile APP recommendation method
CN104462383A (en) Movie recommendation method based on feedback of users' various behaviors
CN103309894B (en) Based on search implementation method and the system of user property
CN105023178B (en) A kind of electronic commerce recommending method based on ontology
CN107391670A (en) A kind of mixing recommendation method for merging collaborative filtering and user property filtering
CN104598450A (en) Popularity analysis method and system of network public opinion event
CN102880712A (en) Method and system for sequencing searched network videos
CN109325117A (en) Social security events detection method in a kind of microblogging of multiple features fusion
Holtkamp et al. Regional patterns of food safety in China: What can we learn from media data?
CN105338408B (en) Video recommendation method based on time factor
CN107277115A (en) A kind of content delivery method and device
CN108132964A (en) A kind of collaborative filtering method to be scored based on user item class
CN106919699A (en) A kind of recommendation method for personalized information towards large-scale consumer
CN105740480A (en) Air ticket recommending method and system
CN103885989B (en) Estimate the method and device of neologisms document frequency
CN109508407A (en) The tv product recommended method of time of fusion and Interest Similarity
CN103870452A (en) Method and method for recommending data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant