CN1480875A - System for registering key words of articles and its method - Google Patents
System for registering key words of articles and its method Download PDFInfo
- Publication number
- CN1480875A CN1480875A CNA02131859XA CN02131859A CN1480875A CN 1480875 A CN1480875 A CN 1480875A CN A02131859X A CNA02131859X A CN A02131859XA CN 02131859 A CN02131859 A CN 02131859A CN 1480875 A CN1480875 A CN 1480875A
- Authority
- CN
- China
- Prior art keywords
- article
- synonym
- keyword
- words
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Abstract
The system possesses a data storage device including symbol base, a function word base and a keyword database, as well as a processor. The processor compares an article with the symbol base, further deletes symbols, which are appeared in the symbol base, in the article. Function words, which are appeared in the function word base, in the article are deleted. Then, The number of times of all words appearing in the article is calculated so as to obtain multiple candidate words as well as their relevant appearing number of times. Finally, based on preset conditions, multiple key words are selected from the said candidate words, and the selected candidate words are registered to the keyword database.
Description
Technical field
The present invention relates to a kind of system for registering key words of articles and method, and be particularly related to a kind of system for registering key words of articles and the method that can automatically the keyword that repeats in the article be logined.
Background technology
In the face of the epoch that information is spread unchecked, also can't have time enough digests a large amount of articles to common people.Also just owing to this reason, if there is effective method to confirm the theme of article or the association area that article is touched upon, but the user just direct reading and meet the article that the user expects the field through screening, do not read all articles and need not spend a large amount of time.
For the affirmation of the association area of the theme of article or article, normally judge with the most normal keyword of mentioning in the article.Analysis and the login method known for the keyword of article mainly screen with manual type.Fig. 1 shows to know the analysis of article keyword and the synoptic diagram of login method.At first, a large amount of articles 10 through artificial analysis one by one (11) afterwards can be by obtaining its relevant keyword 12 in each article 10.Afterwards, the analysis personnel login keyword to keyword database 14 by the mode of manual entry (13).
Because the analysis of the article keyword of knowing and login are to see through manpower to analyze for article one by one, therefore need expend plenty of time and manpower and can finish the keyword analysis.In addition, for some synonym words, also must see through analysis personnel's memory and the analysis that experience can correctly be finished synonymous keyword.
Summary of the invention
In view of this, fundamental purpose of the present invention is for providing a kind of system for registering key words of articles and the method that can automatically the keyword that repeats in the article be logined.In addition, the present invention also can recognize automatically for the synonym words in the article, to increase the correctness of keyword analysis.
In order to realize above-mentioned purpose of the present invention, can realize by system for registering key words of articles provided by the present invention and method.
According to the system for registering key words of articles of the embodiment of the invention, comprise have a symbolic library, the data memory device and a processor of an empty word dictionary and a keyword database.Processor compares article and symbolic library, and then with in the article with symbolic library in the identical symbol deletion of noting down, and with in the article with the empty word dictionary in the identical empty word deletion of noting down, afterwards, calculate the number of times that all words occur in the article, thereby obtain its corresponding occurrence number of a plurality of prepare words, last, foundation one imposes a condition by a plurality of keywords of selection in the prepare word, and the keyword of choosing is logined to keyword database.
Also can have a synonym dictionary in the data memory device.Processor also compares article and thesaurus, and then with in the article with thesaurus in the identical synonym deletion of noting down, and the number of times that synonym occurs in the record article, and the number of times that will occur with the words and the synonym of synonym synonym is embedded in a synonym buffer zone.In addition, processor also combines with its corresponding occurrence number of prepare word with the number of times that synonym occurs with the words synonym synonym that note down in the synonym buffer zone.
Article keyword login method according to the embodiment of the invention at first, receives an article, then, article and symbolic library are compared, and then with in the article with symbolic library in the identical symbol of noting down delete.Afterwards, with in the article with the empty word dictionary in the identical empty word deletion of noting down.
Afterwards, calculate the number of times that all words occur in the article, thereby obtain its corresponding occurrence number of a plurality of prepare words.At last, foundation one imposes a condition by a plurality of keywords of selection in the prepare word, and keyword is logined to keyword database.
In addition, article and thesaurus can also be compared, and then with in the article with thesaurus in the identical synonym deletion of noting down, and the number of times that synonym occurs in the record article, and will be embedded in a synonym buffer zone with the number of times of the words of synonym synonym and synonym appearance.Afterwards, also the number of times that occurs with the words synonym synonym and synonym that note down in the synonym buffer zone is added corresponding candidate words and corresponding occurrence number thereof.
According to the embodiment of the invention, imposing a condition can be a set number of times lower limit, and occurrence number then is chosen as keyword greater than the prepare word of set number of times lower limit, and logins to keyword database.In addition, processor also can sort prepare word according to its corresponding occurrence number.At this moment, imposing a condition can be an ordering ranking lower limit, and ordering then is chosen as keyword greater than the prepare word of ordering ranking lower limit, and logins to keyword database.
Description of drawings
For above-mentioned purpose of the present invention, feature and advantage can be become apparent, embodiment cited below particularly, and cooperate appended diagram, it is as follows to be elaborated:
Fig. 1 shows to know the analysis of article keyword and the synoptic diagram of login method.
Fig. 2 is a synoptic diagram, shows the system architecture according to the system for registering key words of articles of the embodiment of the invention.
Fig. 3 is the process flow diagram that shows according to the article keyword login method of the embodiment of the invention.
Embodiment
Fig. 2 is a synoptic diagram, shows the system architecture according to the system for registering key words of articles of the embodiment of the invention.
According to the system for registering key words of articles of the embodiment of the invention, comprise a data memory device 200 and a processor 210.Have in the data memory device 200 a synonym dictionary 201, a symbolic library 202, an empty word dictionary 203, a keyword database 204, with a synonym buffer zone 205.
Corresponding relation in the thesaurus 201 between record synonym words, the synonym that for example is synonymous to " VIA " have " VIA Tech " and " VIA Technologies, Inc. " etc.Some special symbols of record in the symbolic library 202 are as punctuation mark etc.Tool function word in all senses not in the general article of record in the empty word dictionary 203 is not had a words of meaning as verb, adjective, adverbial word, auxiliary word or other, for instance, and " a ", " is ", " on " and " he " or the like.Then can be in the keyword database 204 in order to deposit the keyword of analyzing out.
Then, processor 210 calculates all remaining number of times that words occurred in the article, thereby obtains its corresponding occurrence number of a plurality of prepare words.Afterwards, processor 210 adds corresponding candidate words and corresponding occurrence number thereof with the number of times that occurs with the words synonym synonym and synonym record in the synonym buffer zone 205.
At last, processor 210 sorts prepare word according to its occurrence number, and foundation one imposes a condition, as a set number of times lower limit (as, occurrence number is more than 10 times) or an ordering ranking lower limit (as, preceding 5), by selecting keyword in the prepare word, and the keyword of choosing is logined to keyword database 204.
Fig. 3 is the process flow diagram that shows according to the article keyword login method of the embodiment of the invention.With reference to figure 2 and Fig. 3, will be illustrated in down according to the article keyword login method of the embodiment of the invention.
Article keyword login method according to the embodiment of the invention, at first, as step S30, receive an article, then, as step S31, article and thesaurus 201 are compared, and then with in the article with thesaurus 201 in the identical synonym of noting down by deleting among the article, and the number of times that synonym occurs in the record article, and will being embedded among the synonym buffer zone 205 with the number of times of the words of synonym synonym and synonym appearance.
Then,, article and symbolic library 202 are compared as step S32, so with in the article with symbolic library 202 in the identical symbol deletion of noting down.And as step S33, article and symbolic library 203 are compared, and then with in the article with empty word dictionary 203 in the identical empty word deletion of noting down.
Afterwards, as step S34, calculate all remaining number of times that words occurred in the article, thereby obtain its corresponding occurrence number of a plurality of prepare words.Then, as step S35, the number of times that occurs with the words synonym synonym and synonym record in the synonym buffer zone 205 is added corresponding candidate words and corresponding occurrence number thereof.
At last, as step S36, prepare word is sorted according to its occurrence number, and as step S37, according to imposing a condition, as set number of times lower limit or ordering ranking lower limit, by selecting to meet the keyword that imposes a condition in the prepare word, and, the keyword of choosing is logined to keyword database 204 as step S38.
Wherein, if impose a condition set number of times lower limit,, and login to keyword database 204 just then occurrence number can be selected as keyword greater than the prepare word of set number of times lower limit.And if the ordering ranking lower limit that imposes a condition just the prepare word that then sorts greater than ordering ranking lower limit can be selected as keyword, and is logined to keyword database 204.
It should be noted that in embodiments of the present invention, because step S31, step S32, different for the target of article deletion with step S33, and be respectively independently, so the change that its order can be mutual.In addition, only be to prescribe a time limit under the set number of times if impose a condition, then step S36 (prepare word is sorted according to its occurrence number) then can omit.
In addition, according to another kenel, be identical with the purpose of empty word dictionary 203 because symbolic library 202 is provided, promptly by leaving out special symbol and empty word in the article, therefore, symbolic library 202 also can be combined into a character word stock with empty word dictionary 203, wherein notes down symbol and the words that must delete in the article.
Next, lifting an example describes.
The text is as follows to suppose an article: the article original text
The?VIA?C3?1GHz?processor?is?the?coolest?1GHz?processor?on?the?market, saving?energy?and?maximizing?total?system?savings?by?allowing?the?use of?inexpensive,off-the-shelf?components.The?proces?sor?runs?so?cool that?it?can?operate?with?standard?small?coolers?and?power?supplies, making?it?the?ideal?solution?for?ergonomic?small?footprint?quiet?PC designs.The?first?processor?in?the?world?to?be?manufactured?using?a leading?edge?0.13?micron?manufacturing?process,the?VIA?C3?1GHz processor?has?the?world′s?smallest?x86?processor?die?size. VIA?Technologies,Inc.is?a?leading?innovator?and?developer?of?PC?core logic?chipsets,microprocessors,and?multimedia?and?communications chips |
In addition, thesaurus is as follows: thesaurus
VIA | VIATech |
VIA | VIA?Technolog?ies,Inc. |
At first, article is through after the thesaurus contrast, in the article with thesaurus in the synonym noted down, can be deleted as " VIA Technologies, Inc ", and calculate the number of times that it occurs in article.Afterwards, the same words " VIA " of synonym is noted down to the synonym buffer zone with occurrence number therewith again, and is as follows: the synonym buffer zone
Article behind the deletion synonym is as follows: article
VIA(1) |
The?VIA?C3?1GHz?processor?is?the?coolest?1GHz?processor?on?the?market, saving?energy?and?maximizing?total?system?savings?by?allowing?the?use of?inexpensive,off-the-shelf?components.The?processor?runs?so?cool that?it?can?operate?with?standard?small?coolers?and?power?supplies, making?it?the?ideal?solution?for?ergonomic?small?footprint?quiet?PC designs.The?first?processor?in?the?world?to?be?manufactured?using?a leading?edge?0.13?micron?manufactur?ing?process,the?VIA?C3?1GHz processor?has?the?world′s?smallest?x86?processor?die?size. ??????????????????????is?a?leading?innovator?and?developer?of?PC?core |
?logic?chipsets,microprocessors,and?multimedia?and?communications ?chips |
Conventional letter storehouse and empty word dictionary are as follows: symbolic library
The empty word dictionary
????, | ????. | ????‘ | ????“ |
????; | ????[ | ????、 | ????! |
????@ | ????# | ????$ | ????% |
????A | ????It | ????this | ????by |
????Is | ????On | ????Are | ????she |
????The | ????He | ????that | ????I |
Article passes through after symbolic library and the contrast of empty word dictionary and delete mark and the empty word again, and article is as follows: article
?VIA?C3?1GHz?processor?coolest?1GHz?proces?sor?market?saving?energy?and ?maximizing?total?system?savings?allowing?use?of?inexpensive?off?shelf ?components?processor?runs?so?cool?can?operate?with?standard?small?coolers ?and?power?supplies?making?ideal?solution?for?ergonomic?small?footprint ?quiet?PC?designs?first?processor?in?world?to?be?manufactured?using?leading ?edge?013?micron?manufacturing?process?VIA?C3?1GHz?processor?has?worlds ?smallest?x86?processor?die?size ????????????????????????leading?innovator?and?developer?of?PC?core?logic ?chipsets?microprocessors?and?multimedia?and?communications?chips |
Afterwards, calculate the number of times that all remaining words occurred in the article, therefore, prepare word and occurrence number thereof (in the bracket) are as follows: prepare word
????VIA(3) | ????C3(2) | ????1GH(3) | ??processor(6) |
??coolest(1) | ??Viatech(1) | ?????... |
Afterwards, add the interior data of synonym buffer zone: prepare word
????VIA(4) | ????C3(2) | ????1GH(3) | ??processor(6) |
??coolest(1) | ??Viatech(1) | ?????... |
Then, sort according to the occurrence number of each prepare word, ranking results is as follows: ranking results
At last, just can meet the keyword that imposes a condition by selection in the prepare word, and the keyword of choosing is logined to keyword database according to imposing a condition.Wherein, be in article, to occur more than 3 times if impose a condition, then " processor ", " VIA ", just can be selected as keyword with " 1GHz ", and login to keyword database.And if to impose a condition be ordering ranking more than 4, then " processor ", " VIA ", " 1GHz " just can be selected as keyword with " C3 ", and login to keyword database.
processor(6) VIA(4) 1GHz(3) C3(2) Coolest(1) Viatech(1) |
In addition, according to another distortion of the present invention, can also be encoded in computing machine and read computer program in the media and come the activation computing machine to carry out the login of article keyword, as described in the embodiment of the invention.
Therefore, by system for registering key words of articles provided by the present invention and method, can automatically the keyword that repeats in the article be logined.In addition, the present invention also can recognize automatically for the synonym words in the article, to increase the correctness of keyword analysis.
Though the present invention with preferred embodiment openly as above; right its is not in order to limit the present invention; any those skilled in the art; without departing from the spirit and scope of the present invention; when can doing a little change and retouching, so protection scope of the present invention is as the criterion when looking accompanying the claim person of defining.
Claims (10)
1. system for registering key words of articles comprises:
One data memory device has a symbolic library, an empty word dictionary and a keyword database; And
One processor, one article and this symbolic library are compared, and then with in this article with this symbolic library in the identical symbol deletion of noting down, and this article and this empty word dictionary compared, and then with in this article with this empty word dictionary in the identical empty word deletion of noting down, afterwards, calculate the number of times that all words occur in this article, thereby obtain its corresponding occurrence number of a plurality of prepare words, at last, foundation one imposes a condition by a plurality of keywords of selection in the described prepare word, and described keyword is logined to this keyword database.
2. system for registering key words of articles as claimed in claim 1, wherein this data memory device also has a synonym dictionary, and this processor also compares this article and this thesaurus, and then with in this article with this thesaurus in the identical synonym deletion of noting down, and the number of times that this synonym occurs in record this article, and the number of times that will occur with words and this synonym of this synonym synonym is embedded in a synonym buffer zone.
3. system for registering key words of articles as claimed in claim 2, wherein this processor also comprises number of times adding corresponding candidate words and the corresponding occurrence number thereof that occurs with the words synonym synonym and synonym that note down in this synonym buffer zone.。
4. system for registering key words of articles as claimed in claim 1, wherein this to impose a condition be a set number of times lower limit, and occurrence number is chosen as described keyword greater than the described prepare word side of this set number of times lower limit, and login is to this keyword database.
5. system for registering key words of articles as claimed in claim 1, wherein this to impose a condition be an ordering ranking lower limit, and this processor also sorts described prepare word according to its corresponding occurrence number, the described prepare word side that wherein sorts greater than this ordering ranking lower limit is chosen as described keyword, and login is to this keyword database.
6. an article keyword login method comprises the following steps:
Receive an article;
A this article and a symbolic library are compared, so with in this article with this symbolic library in the identical symbol deletion of noting down;
A this article and an empty word dictionary are compared, so with in this article with this empty word dictionary in the identical empty word deletion of noting down;
Calculate the number of times that all words occur in this article, thereby obtain its corresponding occurrence number of a plurality of prepare words;
Impose a condition by selecting a plurality of keywords in the described prepare word according to one; And
Described keyword is logined to a keyword database.
7. article keyword login method as claimed in claim 6 also comprises the following steps:
A this article and a synonym dictionary are compared, so with in this article with this thesaurus in the identical synonym deletion of noting down;
The number of times that this synonym occurs in record this article; And
The number of times that will occur with words and this synonym of this synonym synonym is embedded in a synonym buffer zone.
8. article keyword login method as claimed in claim 7 also comprises the number of times that occurs with the words synonym synonym and synonym that note down in this synonym buffer zone is added corresponding candidate words and corresponding occurrence number thereof.
9. article keyword login method as claimed in claim 6, wherein this to impose a condition be a set number of times lower limit, and occurrence number is chosen as described keyword greater than the described prepare word side of this set number of times lower limit, and login is to this keyword database.
10. article keyword login method as claimed in claim 6, wherein this to impose a condition be an ordering ranking lower limit, and also comprise described prepare word is sorted according to its corresponding occurrence number, the described prepare word side that wherein sorts greater than this ordering ranking lower limit is chosen as described keyword, and login is to this keyword database.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB02131859XA CN1211747C (en) | 2002-09-06 | 2002-09-06 | System for registering key words of articles and its method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB02131859XA CN1211747C (en) | 2002-09-06 | 2002-09-06 | System for registering key words of articles and its method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1480875A true CN1480875A (en) | 2004-03-10 |
CN1211747C CN1211747C (en) | 2005-07-20 |
Family
ID=34145051
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB02131859XA Expired - Lifetime CN1211747C (en) | 2002-09-06 | 2002-09-06 | System for registering key words of articles and its method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN1211747C (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101980196A (en) * | 2010-10-25 | 2011-02-23 | 中国农业大学 | Article comparison method and device |
US8180772B2 (en) | 2008-02-26 | 2012-05-15 | Sharp Kabushiki Kaisha | Electronic data retrieving apparatus |
CN113096635A (en) * | 2021-03-31 | 2021-07-09 | 北京字节跳动网络技术有限公司 | Audio and text synchronization method, device, equipment and medium |
-
2002
- 2002-09-06 CN CNB02131859XA patent/CN1211747C/en not_active Expired - Lifetime
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8180772B2 (en) | 2008-02-26 | 2012-05-15 | Sharp Kabushiki Kaisha | Electronic data retrieving apparatus |
CN101980196A (en) * | 2010-10-25 | 2011-02-23 | 中国农业大学 | Article comparison method and device |
CN113096635A (en) * | 2021-03-31 | 2021-07-09 | 北京字节跳动网络技术有限公司 | Audio and text synchronization method, device, equipment and medium |
CN113096635B (en) * | 2021-03-31 | 2024-01-09 | 抖音视界有限公司 | Audio and text synchronization method, device, equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN1211747C (en) | 2005-07-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109992645B (en) | Data management system and method based on text data | |
US9990421B2 (en) | Phrase-based searching in an information retrieval system | |
Ceccarelli et al. | Learning relatedness measures for entity linking | |
CN1240011C (en) | File classifying management system and method for operation system | |
US8781817B2 (en) | Phrase based document clustering with automatic phrase extraction | |
US8266121B2 (en) | Identifying related objects using quantum clustering | |
US7580921B2 (en) | Phrase identification in an information retrieval system | |
US20160012061A1 (en) | Similar document detection and electronic discovery | |
US20070217693A1 (en) | Automated evaluation systems & methods | |
EP1622054A1 (en) | Phrase-based searching in an information retrieval system | |
US20070019864A1 (en) | Image search system, image search method, and storage medium | |
US20080319971A1 (en) | Phrase-based personalization of searches in an information retrieval system | |
EP1622052A1 (en) | Phrase-based generation of document description | |
GB2391087A (en) | Content extraction configured to automatically accommodate new raw data extraction algorithms | |
CN1609859A (en) | Search result clustering method | |
CN1694101A (en) | Reinforced clustering of multi-type data objects for search term suggestion | |
US20080288483A1 (en) | Efficient retrieval algorithm by query term discrimination | |
Deselaers et al. | Automatic medical image annotation in ImageCLEF 2007: Overview, results, and discussion | |
CN1145899C (en) | Method for automatic generating abstract from word or file | |
Urena-López et al. | Integrating linguistic resources in TC through WSD | |
CN115618014A (en) | Standard document analysis management system and method applying big data technology | |
Ko et al. | Learning with unlabeled data for text categorization using a bootstrapping and a feature projection technique | |
TWI289770B (en) | Keyword register system of articles and computer readable recording medium | |
CN1211747C (en) | System for registering key words of articles and its method | |
Kim et al. | Image retrieval model based on weighted visual features determined by relevance feedback |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CX01 | Expiry of patent term |
Granted publication date: 20050720 |
|
CX01 | Expiry of patent term |