CN106021526B - News category method and device - Google Patents

News category method and device Download PDF

Info

Publication number
CN106021526B
CN106021526B CN201610352644.6A CN201610352644A CN106021526B CN 106021526 B CN106021526 B CN 106021526B CN 201610352644 A CN201610352644 A CN 201610352644A CN 106021526 B CN106021526 B CN 106021526B
Authority
CN
China
Prior art keywords
keyword
press release
score
preliminary classification
determined
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610352644.6A
Other languages
Chinese (zh)
Other versions
CN106021526A (en
Inventor
麦涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Corp
Original Assignee
Neusoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Corp filed Critical Neusoft Corp
Priority to CN201610352644.6A priority Critical patent/CN106021526B/en
Publication of CN106021526A publication Critical patent/CN106021526A/en
Application granted granted Critical
Publication of CN106021526B publication Critical patent/CN106021526B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes

Abstract

The application proposes a kind of news category method and device, wherein this method comprises: receiving Press release;Determine each matching degree between the Press release and each preset information template, wherein each information template corresponds to a kind of news category;According to each matching degree, preliminary classification belonging to the Press release is determined;According to preset algorithm, the score of each keyword in the Press release is determined;According to the score of each keyword, dimension of the Press release in the preliminary classification is determined.Hereby it is achieved that the automatic classification to Press release is classified, the efficiency to Press release classification is improved, and classification results are not influenced by subjective personal feeling, classification results are more accurate.

Description

News category method and device
Technical field
This application involves technical field of information processing more particularly to a kind of news category method and devices.
Background technique
Now, it is usually to carry out tissue according to field belonging to news content and arrange that news, which reads product, such as basis Hot spot, internal and international etc. carry out first floor classification, carry out subclassification again under same category, news finally will be carried out classification hair Row.
Currently, the above-mentioned process for carrying out classification distribution to news, usually by what is manually carried out, this not only wastes people Power, and news category result is affected by subjective personal feeling, so that classification results are not accurate enough.
Summary of the invention
The application is intended to solve at least some of the technical problems in related technologies.
For this purpose, first purpose of the application is to propose a kind of news category method, the method achieve to news release The automatic classification of part is classified, and improves the efficiency to Press release classification, and classification results are not influenced by subjective personal feeling, Classification results are more accurate.
Second purpose of the application is to propose a kind of news category device.
In order to achieve the above object, the application first aspect embodiment proposes a kind of news category method, comprising:
Receive Press release;
Determine each matching degree between the Press release and each preset information template, wherein each information template pair Answer a kind of news category;
According to each matching degree, preliminary classification belonging to the Press release is determined;
According to preset algorithm, the score of each keyword in the Press release is determined;
According to the score of each keyword, dimension of the Press release in the preliminary classification is determined, wherein Each dimension in preliminary classification corresponds to N number of keyword, and N is the positive integer more than or equal to 1.
The news category method of the embodiment of the present application, after receiving Press release, it is first determined Press release and preset new Each matching degree heard between template determines preliminary classification belonging to Press release according to each matching degree, then according to preset calculation Method determines the score of each keyword in Press release, then according to the score of each keyword, determines Press release in first fraction Dimension in class.Hereby it is achieved that the automatic classification to Press release is classified, the efficiency to Press release classification is improved, and And classification results are not influenced by subjective personal feeling, classification results are more accurate.
In order to achieve the above object, the application second aspect embodiment proposes a kind of news category device, comprising:
Receiving module, for receiving Press release;First determining module, for determine the Press release with it is each preset Each matching degree between information template, wherein each information template corresponds to a kind of news category;Second determining module is used for root According to each matching degree, preliminary classification belonging to the Press release is determined;Computing module is used for according to preset algorithm, really The score of each keyword in the fixed Press release;Third determining module, for the score according to each keyword, really Fixed dimension of the Press release in the preliminary classification, wherein each dimension in preliminary classification corresponds to N number of keyword, N For the positive integer more than or equal to 1.
The news category device of the embodiment of the present application, after receiving Press release, it is first determined Press release and preset new Each matching degree heard between template determines preliminary classification belonging to Press release according to each matching degree, then according to preset calculation Method determines the score of each keyword in Press release, then according to the score of each keyword, determines Press release in first fraction Dimension in class.Hereby it is achieved that the automatic classification to Press release is classified, the efficiency to Press release classification is improved, and And classification results are not influenced by subjective personal feeling, classification results are more accurate.
Detailed description of the invention
Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments Obviously and it is readily appreciated that, in which:
Fig. 1 is the flow chart of the news category method of the application one embodiment;
Fig. 2 is the flow chart of the news category method of the application another embodiment;
Fig. 3 is the structural schematic diagram of the news category device of the application one embodiment;
Fig. 4 is the structural schematic diagram of the news category device of the application another embodiment.
Specific embodiment
Embodiments herein is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, it is intended to for explaining the application, and should not be understood as the limitation to the application.
Below with reference to the accompanying drawings the news category method and device of the embodiment of the present application is described.
Fig. 1 is the flow chart of the news category method of the application one embodiment.
As shown in Figure 1, the news category method includes:
Step 101, Press release is received.
Specifically, the executing subject of news category method provided by the embodiments of the present application is news category device.
Step 102, each matching degree between the Press release and each preset information template is determined, wherein Mei Gexin It hears template and corresponds to a kind of news category.
Wherein, a variety of information templates can be stored in advance in news category device, each information template corresponds to a kind of news Classification.
For example, the information template of military class news may include such as: military affairs-weapons, military affairs-military situation, military affairs-army History, military affairs-current events etc..
After receiving Press release, Press release can be matched with preset information template, so that it is determined that news Matching degree between contribution and information template.
Specifically, can determine news according to the word quantity identical with the word in information template in Press release Matching degree between contribution and information template.
Step 103, according to each matching degree, preliminary classification belonging to the Press release is determined.
In general, news category corresponding with the highest information template of Press release matching degree, as belonging to Press release Preliminary classification.
For example, if certain Press release and military affairs-weapons matching degree are 0.9, it is with military affairs-military history matching degree 0.88, it is 0.5 with military affairs-military situation matching degree, is paired into 0.7 with military affairs-current events, then can determine the Press release institute The preliminary classification of category are as follows: military affairs-weapons.
It should be noted that a matching degree threshold value can also be set, set when the matching degree of Press release and template is greater than When fixed threshold value, then it is assumed that Press release belongs to the corresponding preliminary classification of the information template, and the threshold value of matching degree can basis The threshold value of the adjustment of text class shape sets itself, such as specified matching degree is 0.8, then it is believed that the Press release belongs to military affairs-military history News category corresponding with two information templates of military affairs-weapons, that is to say, that the Press release can be respectively divided to two newly It hears in template.
Step 104, according to preset algorithm, the score of each keyword in the Press release is determined.
Wherein, the keyword in Press release can be obtained using any keyword grasping means, alternatively, can will be The word occurred in the title and text of Press release is determined as keyword, or will frequency of occurrence be more than in Press release The word of preset value is determined as keyword, and the present embodiment is not construed as limiting this.
Specifically, can use following formula, the score of each keyword is determined:
S=a × t1+b×t2+c×t3
Wherein, s is the score of keyword, and a, b, c are proportionality constant, t1For the number that keyword occurs in title, t2For The number that keyword occurs in body, t3Exist for what is obtained according to the preliminary classification with word similar in the keyword The number occurred in Press release.
Wherein, a, b, c and be 1, for example, in calculating Press release when each keyword score, a can take the 0.5, b to be 0.3, c 0.2.
It should be noted that the value of proportionality constant a, b, c are dynamic changes, for different keywords, the ratio is normal Number can take different value.
For example, if received Press release content is as follows:
[March 8, International Women's Day special issue] boosted missile machine of making war steps on aircraft carrier into jungle: who says woman not as good as male
This is the collective for having strong fighting spirit not lose the peculiar exquisiteness of women again.They be engaged in profession no longer as It is confined to medical treatment and service field in the past, but fights bravely in nearly all wars of Liaoning warship such as steering, electromechanics, damage pipe, supervision, radars Pan door is filled with more vigor to move towards dark blue Chinese Navy.
This glorious collective being made of more than 90 militarized female personnels --- Liaoning warship female warship person team, naval, since establishment, remarkably Complete all previous test trial voyage or flight of Liaoning warship and airplane carrier fighter warship take off equal significant tasks.
After keyword extraction, determining keyword includes: militarized female personnel, Liaoning warship, fighter plane, guided missile, aircraft carrier, war Machine.
Wherein, keyword " militarized female personnel " does not occur in title, occurs in article 1 time, with word similar in militarized female personnel " woman " occurred 1 time in title, and " women ", " female warship person " occurred respectively in the body of the email once, thus according to above formula, i.e., It can determine the score of keyword " militarized female personnel " are as follows:
S=a × 0+b × 1+c × 3
Identical method can determine the score of other each keywords.
It should be noted that near synonym dictionary can also be stored in news apparatus for automatically sorting, after obtaining keyword, Word similar in each keyword can be obtained, and then word similar in determining and keyword is in Press release by inquiring the dictionary The number of appearance.
Step 105, according to the score of each keyword, dimension of the Press release in the preliminary classification is determined Degree, wherein each dimension in preliminary classification corresponds to N number of keyword, and N is the positive integer more than or equal to 1.
Specifically, in order to accurately be classified to Press release, it can be under each news category, further according to keyword The division that each news category is carried out to different dimensions carries out Press release further accurate again under preliminary classification Classification.
When practical application, in the score according to keyword each in Press release, determine Press release in preliminary classification Dimension when, can successively determine dimension belonging to each keyword according to the score of keyword, from high to low, and then determine Dimension belonging to Press release.
Specifically, above-mentioned steps 105, comprising:
1051: according to the score of each keyword, determining the keyword sorted lists of the Press release;
In the present embodiment, if sharing n keyword, all n keyword roots can be calculated score, and root according to above-mentioned steps According to score, it is ranked up from high to low.
1052: top n keyword is chosen from the keyword sorted lists;
It is understood that in order to improve the accuracy of classification, a part of keyword can be chosen, rather than all keys Word, wherein 1≤N≤n/2, N are integer.
For example, if keyword shares 5 score higher first 1 can be chosen according to keyword sorted lists Or preceding 2 keywords carry out subsequent operation.
1053: according to the top n keyword, determining dimension of the Press release in the preliminary classification.
It should be noted that aforesaid way can be used, the top n keyword of highest scoring is chosen as determining news release The standard of part dimension can also determine the dimension of Press release according to all keywords, to keep determining dimension more smart Really, but can data-handling capacity to news category device and rate request it is higher.
For example, if after by matching with preset information template, preliminary classification belonging to above-mentioned Press release is determined For " military affairs-weapons ", aforesaid way is then used, the keyword of the corresponding highest scoring of above-mentioned Press release of selection is " the Liao Dynasty Ning Jian ".And under the preliminary classification of " military affairs-weapons ", including 8 classification by 1 key definition, respectively " fight Machine ", " warship ", " rifle ", " guided missile ", " tank ", " submarine ", " nuclear weapon " pass through the nearly justice in inquiry news category device Word dictionary, determine " Liaoning warship " it is close with " warship ", or by " Liaoning warship " it is upper after can belong in " warship ", so as to true Fixed above-mentioned Press release is specifically classified as " military affairs-weapons-warship ", to realize the precise classification to Press release.
It should be noted that if corresponding 1 keyword of a dimension under preset each preliminary classification, and according to above-mentioned side One Press release of formula selection corresponding 2 or the identical keyword of multiple scores, then can belong to the Press release simultaneously Into two dimensions.
The news category method of the embodiment of the present application, after receiving Press release, it is first determined Press release and preset new Each matching degree heard between template determines preliminary classification belonging to Press release according to each matching degree, then according to preset calculation Method determines the score of each keyword in Press release, then according to the score of each keyword, determines Press release in first fraction Dimension in class.Hereby it is achieved that the automatic classification to Press release is classified, the efficiency to Press release classification is improved, and And classification results are not influenced by subjective personal feeling, classification results are more accurate.
It, can be according to Press release and pre- by above-mentioned analysis it is found that news category device is after receiving Press release If information template between matching degree, determine preliminary classification belonging to Press release.Correspondingly, being needed in news category device The corresponding information template of each preliminary classification is stored in advance, alternatively, the information template, can also be news category device to news It is obtained after all Press release progress model training in library.That is this method further include:
Model training is carried out to Press release library, determines the corresponding information template of each preliminary classification.
For example, the algorithm of support vector machines (Support Vector Machine, abbreviation SVM) can be used, it is right Press release library carries out model training, so that it is determined that the corresponding information template of each preliminary classification.
It is to be appreciated that through the foregoing embodiment it is found that the corresponding preliminary classification of information template is limited by two features.Such as " military affairs-weapons " classification, is just limited by " military affairs " and " weapons " two features, therefore, can be by SVM algorithm, first will be new The news heard in contribution library carries out the first hierarchical classification, for example Press release is first split into " current events ", " amusement ", " premises Production ", " economy ", " military affairs " etc., and then SVM algorithm is recycled, each first level is carried out to the classification of the second level again, than Such as " military affairs " are finally divided into: military affairs-weapons, military affairs-military situation, military affairs-military history, military affairs-current events, and each secondary classification Respectively correspond an information template.So as to directly determine Press release according to the matching degree of Press release and each information template Corresponding preliminary classification.
It should be noted that news category device can also be to new after information template has been determined according to Press release library Received news continues model training, to carry out supplement and perfect to determining information template, and then makes according to news The preliminary classification for the news that template determines is more and more accurate.
Further, it in above-described embodiment, when determining the score of each keyword, can use in news category device Dictionary, it is determining to be determined to further increase the precision to news category according to keyword score with word similar in keyword When the dimension of Press release, duplicate removal processing can also be carried out to each keyword.Below with reference to Fig. 2, to news provided by the present application Classification method is further detailed.
Fig. 2 is the flow chart of the news category method of the application another embodiment.
As shown in Fig. 2, the news category method may comprise steps of:
Step 201, model training is carried out to Press release library, determines the corresponding information template of each preliminary classification.
Step 202, Press release is received.
Step 203, each matching degree between the Press release and each preset information template is determined, wherein Mei Gexin It hears template and corresponds to a kind of news category.
Step 204, according to news category corresponding with the highest information template of Press release matching degree, the news is determined Preliminary classification belonging to contribution.
Step 205, each keyword in the Press release is obtained.
Specifically, the keyword in Press release can be obtained using existing keyword grasping means, can also choose The word occurred in title and text is as keyword, alternatively, can also be chosen at frequency of occurrence in Press release reaches pre- If value word as keyword, the present embodiment is not construed as limiting this.
Step 206, preset dictionary is inquired, determines the near synonym and/or substitute of each keyword.
Specifically, preset dictionary, can be news category device according to the training to Press release library, oneself is generated , alternatively, being also possible to determining according to the input of user.
It wherein, may include the near synonym and substitute of various words in preset dictionary.Wherein, substitute can refer to this The hypernym of word.For example, " hydrogen bomb " word can replace with " nuclear weapon " by substitute.
Step 207, using preset algorithm, the score of each keyword is determined.
Step 208, according to the score of each keyword, the keyword sorted lists of the Press release are determined.
Step 209, the higher top n keyword of score is chosen from the keyword sorted lists.
Wherein, N can be a fixed numerical value, for example be 1,3,5,6 or 8 etc., can also be according to actual scene It determines.
For example, first 2 or 3 keywords in keyword sorted lists are only chosen first, if just according only to first 2 The dimension of Press release can be accurately determined, then preceding 2 keywords can be selected only;And if according to preceding 3 keywords, determination Press release dimension it is not unique, at this point it is possible to continue select keyword, to the dimension of predetermined Press release It is modified or corrects, to finally determine dimension belonging to Press release.
Step 210, according to N number of keyword, determine the Press release in the dimension in the preliminary classification.
The news category method of the embodiment of the present application, first reception Press release, then determine Press release with it is preset The matching degree of information template determines news release according to news category corresponding with the highest information template of Press release matching degree The preliminary classification of part, and then keyword is chosen from Press release again, then by inquiring preset dictionary, determine each key The near synonym and/or substitute of word determine the score of each keyword in Press release, according to each further according to preset algorithm The score of a keyword determines the keyword sorted lists of Press release, after choosing top n keyword in sorted lists, then According to top n keyword, the dimension of Press release is determined.Hereby it is achieved that the automatic classification to Press release is classified, improve To the efficiency of Press release classification, and classification results are not influenced by subjective personal feeling, and classification results are more accurate.
In order to realize above-described embodiment, the application also proposes a kind of news category device.
Fig. 3 is the structural schematic diagram of the news category device of the application one embodiment.
As shown in figure 3, the news category device includes:
Receiving module 31, for receiving Press release;
First determining module 32, for determining each matching degree between the Press release and each preset information template, Wherein, each information template corresponds to a kind of news category;
Second determining module 33, for determining preliminary classification belonging to the Press release according to each matching degree;
Computing module 34, for determining the score of each keyword in the Press release according to preset algorithm;
Third determining module 35 determines the Press release described first for the score according to each keyword Dimension in grade classification, wherein each dimension in preliminary classification corresponds to N number of keyword, and N is the positive integer more than or equal to 1.
Wherein, news category device provided in this embodiment, for executing news category method provided by the above embodiment.
Specifically, above-mentioned computing module 34, is specifically used for:
Utilize s=a × t1+b×t2+c×t3, determine the score of each keyword;
Wherein, s is the score of keyword, and a, b, c are proportionality constant, t1For the number that keyword occurs in title, t2For The number that keyword occurs in body, t3Exist for what is obtained according to the preliminary classification with word similar in the keyword The number occurred in Press release.
In one embodiment, third determining module 35, is specifically used for:
According to the score of each keyword, the keyword sorted lists of the Press release are determined;
The higher top n keyword of score is chosen from keyword sequence;
According to N number of keyword, dimension of the Press release in the preliminary classification is determined.
It should be noted that the aforementioned news for being also applied for the embodiment to the explanation of news category embodiment of the method Sorter, details are not described herein again.
The news category device of the embodiment of the present application, after receiving Press release, it is first determined Press release and preset new Each matching degree heard between template determines preliminary classification belonging to Press release according to each matching degree, then according to preset calculation Method determines the score of each keyword in Press release, then according to the score of each keyword, determines Press release in first fraction Dimension in class.Hereby it is achieved that the automatic classification to Press release is classified, the efficiency to Press release classification is improved, and And classification results are not influenced by subjective personal feeling, classification results are more accurate.
Fig. 4 is the structural schematic diagram of the news category device of the application another embodiment.
As shown in figure 4, in above-mentioned base shown in Fig. 3, the news category device, further includes:
Enquiry module 41 determines the near synonym and/or substitute of each keyword for inquiring preset dictionary.
It further, can be according to news by above-mentioned analysis it is found that news category device is after receiving Press release Matching degree between contribution and preset information template determines preliminary classification belonging to Press release.Correspondingly, news category fills Need to be stored in advance the corresponding information template of each preliminary classification in setting, alternatively, the information template, can also be that news category fills It sets and is obtained after carrying out model training to all Press release in news library.The then device, further includes:
Training module 42 determines the corresponding news mould of each preliminary classification for carrying out model training to Press release library Plate.
It should be noted that the aforementioned news for being also applied for the embodiment to the explanation of news category embodiment of the method Sorter, details are not described herein again.
The news category device of the embodiment of the present application, first reception Press release, then determine Press release with it is preset The matching degree of information template determines news release according to news category corresponding with the highest information template of Press release matching degree The preliminary classification of part, and then keyword is chosen from Press release again, then by inquiring preset dictionary, determine each key The near synonym and/or substitute of word determine the score of each keyword in Press release, according to each further according to preset algorithm The score of a keyword determines the keyword sorted lists of Press release, after the keyword for choosing top n in sorted lists, Further according to top n keyword, the dimension of Press release is determined.Hereby it is achieved that the automatic classification to Press release is classified, improve To the efficiency of Press release classification, and classification results are not influenced by subjective personal feeling, and classification results are more accurate.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is contained at least one embodiment or example of the application.In addition, term " first ", " second " are used for description purposes only, It is not understood to indicate or imply relative importance or implicitly indicates the quantity of indicated technical characteristic.
It should be appreciated that each section of the application can be realized with hardware, software, firmware or their combination.Above-mentioned In embodiment, software that multiple steps or method can be executed in memory and by suitable instruction execution system with storage Or firmware is realized.It, and in another embodiment, can be under well known in the art for example, if realized with hardware Any one of column technology or their combination are realized: having a logic gates for realizing logic function to data-signal Discrete logic, with suitable combinational logic gate circuit specific integrated circuit, programmable gate array (PGA), scene Programmable gate array (FPGA) etc..
Those skilled in the art are understood that realize all or part of step that above-described embodiment method carries It suddenly is that relevant hardware can be instructed to complete by program, the program can store in a kind of computer-readable storage medium In matter, which when being executed, includes the steps that one or a combination set of embodiment of the method.
Storage medium mentioned above can be read-only memory, disk or CD etc..Although having been shown and retouching above Embodiments herein is stated, it is to be understood that above-described embodiment is exemplary, and should not be understood as the limit to the application System, those skilled in the art can be changed above-described embodiment, modify, replace and become within the scope of application Type.

Claims (6)

1. a kind of news category method, which comprises the following steps:
Model training is carried out to Press release library using support vector machines, determines the corresponding information template of each preliminary classification;
Receive Press release;
According to the word quantity identical with the word in each preset information template in the Press release, the news is determined Each matching degree between contribution and each preset information template, wherein each information template corresponds to a kind of news category;
According to each matching degree, preliminary classification belonging to the Press release is determined;
According to preset algorithm, the score of each keyword in the Press release is determined;
According to the score of each keyword, dimension of the Press release in the preliminary classification, preliminary classification are determined In each dimension correspond to N number of keyword, N is the positive integer more than or equal to 1;
Wherein, described according to preset algorithm, determine the score of each keyword in the Press release, comprising:
Utilize s=a × t1+b×t2+c×t3, determine the score of each keyword;
Wherein, s is the score of keyword, and a, b, c are proportionality constant, t1For the number that keyword occurs in title, t2For key The number that word occurs in body, t3To be obtained according to the preliminary classification with word similar in the keyword in news The number occurred in contribution.
2. the method as described in claim 1, which is characterized in that it is described according to preset algorithm, it determines in the Press release Before the score of each keyword, further includes:
Preset dictionary is inquired, determines the near synonym and/or substitute of each keyword.
3. the method as described in claim 1, which is characterized in that the score according to each keyword, determine described in Dimension of the Press release in the preliminary classification, comprising:
According to the score of each keyword, the keyword sorted lists of the Press release are determined;
The higher top n keyword of score is chosen from keyword sequence;
According to N number of keyword, dimension of the Press release in the preliminary classification is determined.
4. a kind of news category device characterized by comprising
Training module determines that each preliminary classification is corresponding for carrying out model training to Press release library using support vector machines Information template;
Receiving module, for receiving Press release;
First determining module, for identical with the word in each preset information template according to the word in the Press release Quantity determines each matching degree between the Press release and each preset information template, wherein each information template corresponding one Kind news category;
Second determining module, for determining preliminary classification belonging to the Press release according to each matching degree;
Computing module, for determining the score of each keyword in the Press release according to preset algorithm;
Third determining module determines the Press release in the preliminary classification for the score according to each keyword In dimension, wherein each dimension in preliminary classification corresponds to N number of keyword, and N is the positive integer more than or equal to 1;
Wherein, the computing module, is specifically used for:
Utilize s=a × t1+b×t2+c×t3, determine the score of each keyword;
Wherein, s is the score of keyword, and a, b, c are proportionality constant, t1For the number that keyword occurs in title, t2For key The number that word occurs in body, t3To be obtained according to the preliminary classification with word similar in the keyword in news The number occurred in contribution.
5. device as claimed in claim 4, which is characterized in that further include:
Enquiry module determines the near synonym and/or substitute of each keyword for inquiring preset dictionary.
6. device as claimed in claim 4, which is characterized in that the third determining module is specifically used for:
According to the score of each keyword, the keyword sorted lists of the Press release are determined;
The keyword of the higher top n of score is chosen from keyword sequence;
According to N number of keyword, dimension of the Press release in the preliminary classification is determined.
CN201610352644.6A 2016-05-25 2016-05-25 News category method and device Active CN106021526B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610352644.6A CN106021526B (en) 2016-05-25 2016-05-25 News category method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610352644.6A CN106021526B (en) 2016-05-25 2016-05-25 News category method and device

Publications (2)

Publication Number Publication Date
CN106021526A CN106021526A (en) 2016-10-12
CN106021526B true CN106021526B (en) 2019-09-27

Family

ID=57093745

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610352644.6A Active CN106021526B (en) 2016-05-25 2016-05-25 News category method and device

Country Status (1)

Country Link
CN (1) CN106021526B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111209390B (en) * 2020-01-06 2023-09-05 新方正控股发展有限责任公司 News display method and system and computer readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218432A (en) * 2013-04-15 2013-07-24 北京邮电大学 Named entity recognition-based news search result similarity calculation method
CN103530334A (en) * 2013-09-29 2014-01-22 方正国际软件有限公司 System and method for data matching based on comparison module
CN103870474A (en) * 2012-12-11 2014-06-18 北京百度网讯科技有限公司 News topic organizing method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9448992B2 (en) * 2013-06-04 2016-09-20 Google Inc. Natural language search results for intent queries

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103870474A (en) * 2012-12-11 2014-06-18 北京百度网讯科技有限公司 News topic organizing method and device
CN103218432A (en) * 2013-04-15 2013-07-24 北京邮电大学 Named entity recognition-based news search result similarity calculation method
CN103530334A (en) * 2013-09-29 2014-01-22 方正国际软件有限公司 System and method for data matching based on comparison module

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于文本内容的农业网页信息抽取和分类研究;朱学芳;《情报科学》;20120731;第30卷(第7期);第1012-1015页 *

Also Published As

Publication number Publication date
CN106021526A (en) 2016-10-12

Similar Documents

Publication Publication Date Title
Herbelot et al. High-risk learning: acquiring new word vectors from tiny data
CN108304437B (en) automatic question answering method, device and storage medium
CN109190017B (en) Method and device for determining hotspot information, server and storage medium
Shardlow et al. Semeval-2021 task 1: Lexical complexity prediction
Lawrence et al. Combining argument mining techniques
Chang et al. Webqa: Multihop and multimodal qa
CN105760526B (en) A kind of method and apparatus of news category
CN110569354A (en) Barrage emotion analysis method and device
CN103092966A (en) Vocabulary mining method and device
Yang et al. Learning to answer visual questions from web videos
Ismailov Humor Analysis Based on Human Annotation Challenge at IberLEF 2019: First-place Solution.
CN112401886A (en) Processing method, device and equipment for emotion recognition and storage medium
CN106021526B (en) News category method and device
CN112667866A (en) Test paper generation method and device, electronic equipment and storage medium
Bernstein et al. Comparative rates of text reuse in classical Latin hexameter poetry.
CN106095941B (en) Big data knowledge base-based solution recommendation method and system
CN114416929A (en) Sample generation method, device, equipment and storage medium of entity recall model
Ryan et al. People Tend to Like Related Games.
CN104978375B (en) A kind of language material filter method and device
Becker et al. Reverse dynamical evolution of η Chamaeleontis
CN110990709B (en) Role automatic recommendation method and device and electronic equipment
Maharana et al. Exposing and addressing cross-task inconsistency in unified vision-language models
Moen et al. Towards dynamic word sense discrimination with random indexing
CN110858218B (en) Automatic scoring method and system for divergent thinking test
Nicosia et al. Learning to rank aggregated answers for crossword puzzles

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant