CN1480875A - System for registering key words of articles and its method - Google Patents

System for registering key words of articles and its method Download PDF

Info

Publication number
CN1480875A
CN1480875A CNA02131859XA CN02131859A CN1480875A CN 1480875 A CN1480875 A CN 1480875A CN A02131859X A CNA02131859X A CN A02131859XA CN 02131859 A CN02131859 A CN 02131859A CN 1480875 A CN1480875 A CN 1480875A
Authority
CN
China
Prior art keywords
article
synonym
keyword
words
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA02131859XA
Other languages
Chinese (zh)
Other versions
CN1211747C (en
Inventor
陈丁豪
赖文树
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Via Technologies Inc
Original Assignee
Via Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Via Technologies Inc filed Critical Via Technologies Inc
Priority to CNB02131859XA priority Critical patent/CN1211747C/en
Publication of CN1480875A publication Critical patent/CN1480875A/en
Application granted granted Critical
Publication of CN1211747C publication Critical patent/CN1211747C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Abstract

The system possesses a data storage device including symbol base, a function word base and a keyword database, as well as a processor. The processor compares an article with the symbol base, further deletes symbols, which are appeared in the symbol base, in the article. Function words, which are appeared in the function word base, in the article are deleted. Then, The number of times of all words appearing in the article is calculated so as to obtain multiple candidate words as well as their relevant appearing number of times. Finally, based on preset conditions, multiple key words are selected from the said candidate words, and the selected candidate words are registered to the keyword database.

Description

System for registering key words of articles and method
Technical field
The present invention relates to a kind of system for registering key words of articles and method, and be particularly related to a kind of system for registering key words of articles and the method that can automatically the keyword that repeats in the article be logined.
Background technology
In the face of the epoch that information is spread unchecked, also can't have time enough digests a large amount of articles to common people.Also just owing to this reason, if there is effective method to confirm the theme of article or the association area that article is touched upon, but the user just direct reading and meet the article that the user expects the field through screening, do not read all articles and need not spend a large amount of time.
For the affirmation of the association area of the theme of article or article, normally judge with the most normal keyword of mentioning in the article.Analysis and the login method known for the keyword of article mainly screen with manual type.Fig. 1 shows to know the analysis of article keyword and the synoptic diagram of login method.At first, a large amount of articles 10 through artificial analysis one by one (11) afterwards can be by obtaining its relevant keyword 12 in each article 10.Afterwards, the analysis personnel login keyword to keyword database 14 by the mode of manual entry (13).
Because the analysis of the article keyword of knowing and login are to see through manpower to analyze for article one by one, therefore need expend plenty of time and manpower and can finish the keyword analysis.In addition, for some synonym words, also must see through analysis personnel's memory and the analysis that experience can correctly be finished synonymous keyword.
Summary of the invention
In view of this, fundamental purpose of the present invention is for providing a kind of system for registering key words of articles and the method that can automatically the keyword that repeats in the article be logined.In addition, the present invention also can recognize automatically for the synonym words in the article, to increase the correctness of keyword analysis.
In order to realize above-mentioned purpose of the present invention, can realize by system for registering key words of articles provided by the present invention and method.
According to the system for registering key words of articles of the embodiment of the invention, comprise have a symbolic library, the data memory device and a processor of an empty word dictionary and a keyword database.Processor compares article and symbolic library, and then with in the article with symbolic library in the identical symbol deletion of noting down, and with in the article with the empty word dictionary in the identical empty word deletion of noting down, afterwards, calculate the number of times that all words occur in the article, thereby obtain its corresponding occurrence number of a plurality of prepare words, last, foundation one imposes a condition by a plurality of keywords of selection in the prepare word, and the keyword of choosing is logined to keyword database.
Also can have a synonym dictionary in the data memory device.Processor also compares article and thesaurus, and then with in the article with thesaurus in the identical synonym deletion of noting down, and the number of times that synonym occurs in the record article, and the number of times that will occur with the words and the synonym of synonym synonym is embedded in a synonym buffer zone.In addition, processor also combines with its corresponding occurrence number of prepare word with the number of times that synonym occurs with the words synonym synonym that note down in the synonym buffer zone.
Article keyword login method according to the embodiment of the invention at first, receives an article, then, article and symbolic library are compared, and then with in the article with symbolic library in the identical symbol of noting down delete.Afterwards, with in the article with the empty word dictionary in the identical empty word deletion of noting down.
Afterwards, calculate the number of times that all words occur in the article, thereby obtain its corresponding occurrence number of a plurality of prepare words.At last, foundation one imposes a condition by a plurality of keywords of selection in the prepare word, and keyword is logined to keyword database.
In addition, article and thesaurus can also be compared, and then with in the article with thesaurus in the identical synonym deletion of noting down, and the number of times that synonym occurs in the record article, and will be embedded in a synonym buffer zone with the number of times of the words of synonym synonym and synonym appearance.Afterwards, also the number of times that occurs with the words synonym synonym and synonym that note down in the synonym buffer zone is added corresponding candidate words and corresponding occurrence number thereof.
According to the embodiment of the invention, imposing a condition can be a set number of times lower limit, and occurrence number then is chosen as keyword greater than the prepare word of set number of times lower limit, and logins to keyword database.In addition, processor also can sort prepare word according to its corresponding occurrence number.At this moment, imposing a condition can be an ordering ranking lower limit, and ordering then is chosen as keyword greater than the prepare word of ordering ranking lower limit, and logins to keyword database.
Description of drawings
For above-mentioned purpose of the present invention, feature and advantage can be become apparent, embodiment cited below particularly, and cooperate appended diagram, it is as follows to be elaborated:
Fig. 1 shows to know the analysis of article keyword and the synoptic diagram of login method.
Fig. 2 is a synoptic diagram, shows the system architecture according to the system for registering key words of articles of the embodiment of the invention.
Fig. 3 is the process flow diagram that shows according to the article keyword login method of the embodiment of the invention.
Embodiment
Fig. 2 is a synoptic diagram, shows the system architecture according to the system for registering key words of articles of the embodiment of the invention.
According to the system for registering key words of articles of the embodiment of the invention, comprise a data memory device 200 and a processor 210.Have in the data memory device 200 a synonym dictionary 201, a symbolic library 202, an empty word dictionary 203, a keyword database 204, with a synonym buffer zone 205.
Corresponding relation in the thesaurus 201 between record synonym words, the synonym that for example is synonymous to " VIA " have " VIA Tech " and " VIA Technologies, Inc. " etc.Some special symbols of record in the symbolic library 202 are as punctuation mark etc.Tool function word in all senses not in the general article of record in the empty word dictionary 203 is not had a words of meaning as verb, adjective, adverbial word, auxiliary word or other, for instance, and " a ", " is ", " on " and " he " or the like.Then can be in the keyword database 204 in order to deposit the keyword of analyzing out.
Processor 210 can compare article and thesaurus 201, and then with in the article with thesaurus 201 in the identical synonym of noting down by deleting among the article, and the number of times that synonym occurs in the record article, and the number of times that will occur with the words and the synonym of synonym synonym is embedded among the synonym buffer zone 205.
Processor 210 can compare article and symbolic library 202, so with in the article with symbolic library 202 in the identical symbol of noting down by deleting among the article.Processor 210 also can compare article and empty word dictionary 203, so with in the article with empty word dictionary 203 in the identical empty word deletion of noting down.
Then, processor 210 calculates all remaining number of times that words occurred in the article, thereby obtains its corresponding occurrence number of a plurality of prepare words.Afterwards, processor 210 adds corresponding candidate words and corresponding occurrence number thereof with the number of times that occurs with the words synonym synonym and synonym record in the synonym buffer zone 205.
At last, processor 210 sorts prepare word according to its occurrence number, and foundation one imposes a condition, as a set number of times lower limit (as, occurrence number is more than 10 times) or an ordering ranking lower limit (as, preceding 5), by selecting keyword in the prepare word, and the keyword of choosing is logined to keyword database 204.
Fig. 3 is the process flow diagram that shows according to the article keyword login method of the embodiment of the invention.With reference to figure 2 and Fig. 3, will be illustrated in down according to the article keyword login method of the embodiment of the invention.
Article keyword login method according to the embodiment of the invention, at first, as step S30, receive an article, then, as step S31, article and thesaurus 201 are compared, and then with in the article with thesaurus 201 in the identical synonym of noting down by deleting among the article, and the number of times that synonym occurs in the record article, and will being embedded among the synonym buffer zone 205 with the number of times of the words of synonym synonym and synonym appearance.
Then,, article and symbolic library 202 are compared as step S32, so with in the article with symbolic library 202 in the identical symbol deletion of noting down.And as step S33, article and symbolic library 203 are compared, and then with in the article with empty word dictionary 203 in the identical empty word deletion of noting down.
Afterwards, as step S34, calculate all remaining number of times that words occurred in the article, thereby obtain its corresponding occurrence number of a plurality of prepare words.Then, as step S35, the number of times that occurs with the words synonym synonym and synonym record in the synonym buffer zone 205 is added corresponding candidate words and corresponding occurrence number thereof.
At last, as step S36, prepare word is sorted according to its occurrence number, and as step S37, according to imposing a condition, as set number of times lower limit or ordering ranking lower limit, by selecting to meet the keyword that imposes a condition in the prepare word, and, the keyword of choosing is logined to keyword database 204 as step S38.
Wherein, if impose a condition set number of times lower limit,, and login to keyword database 204 just then occurrence number can be selected as keyword greater than the prepare word of set number of times lower limit.And if the ordering ranking lower limit that imposes a condition just the prepare word that then sorts greater than ordering ranking lower limit can be selected as keyword, and is logined to keyword database 204.
It should be noted that in embodiments of the present invention, because step S31, step S32, different for the target of article deletion with step S33, and be respectively independently, so the change that its order can be mutual.In addition, only be to prescribe a time limit under the set number of times if impose a condition, then step S36 (prepare word is sorted according to its occurrence number) then can omit.
In addition, according to another kenel, be identical with the purpose of empty word dictionary 203 because symbolic library 202 is provided, promptly by leaving out special symbol and empty word in the article, therefore, symbolic library 202 also can be combined into a character word stock with empty word dictionary 203, wherein notes down symbol and the words that must delete in the article.
Next, lifting an example describes.
The text is as follows to suppose an article: the article original text
The?VIA?C3?1GHz?processor?is?the?coolest?1GHz?processor?on?the?market, saving?energy?and?maximizing?total?system?savings?by?allowing?the?use of?inexpensive,off-the-shelf?components.The?proces?sor?runs?so?cool that?it?can?operate?with?standard?small?coolers?and?power?supplies, making?it?the?ideal?solution?for?ergonomic?small?footprint?quiet?PC designs.The?first?processor?in?the?world?to?be?manufactured?using?a leading?edge?0.13?micron?manufacturing?process,the?VIA?C3?1GHz processor?has?the?world′s?smallest?x86?processor?die?size. VIA?Technologies,Inc.is?a?leading?innovator?and?developer?of?PC?core logic?chipsets,microprocessors,and?multimedia?and?communications chips
In addition, thesaurus is as follows: thesaurus
VIA VIATech
VIA VIA?Technolog?ies,Inc.
At first, article is through after the thesaurus contrast, in the article with thesaurus in the synonym noted down, can be deleted as " VIA Technologies, Inc ", and calculate the number of times that it occurs in article.Afterwards, the same words " VIA " of synonym is noted down to the synonym buffer zone with occurrence number therewith again, and is as follows: the synonym buffer zone
VIA(1)
Article behind the deletion synonym is as follows: article
The?VIA?C3?1GHz?processor?is?the?coolest?1GHz?processor?on?the?market, saving?energy?and?maximizing?total?system?savings?by?allowing?the?use of?inexpensive,off-the-shelf?components.The?processor?runs?so?cool that?it?can?operate?with?standard?small?coolers?and?power?supplies, making?it?the?ideal?solution?for?ergonomic?small?footprint?quiet?PC designs.The?first?processor?in?the?world?to?be?manufactured?using?a leading?edge?0.13?micron?manufactur?ing?process,the?VIA?C3?1GHz processor?has?the?world′s?smallest?x86?processor?die?size. ??????????????????????is?a?leading?innovator?and?developer?of?PC?core
?logic?chipsets,microprocessors,and?multimedia?and?communications ?chips
Conventional letter storehouse and empty word dictionary are as follows: symbolic library
????, ????. ????‘ ????“
????; ????[ ????、 ????!
????@ ????# ????$ ????%
The empty word dictionary
????A ????It ????this ????by
????Is ????On ????Are ????she
????The ????He ????that ????I
Article passes through after symbolic library and the contrast of empty word dictionary and delete mark and the empty word again, and article is as follows: article
?VIA?C3?1GHz?processor?coolest?1GHz?proces?sor?market?saving?energy?and ?maximizing?total?system?savings?allowing?use?of?inexpensive?off?shelf ?components?processor?runs?so?cool?can?operate?with?standard?small?coolers ?and?power?supplies?making?ideal?solution?for?ergonomic?small?footprint ?quiet?PC?designs?first?processor?in?world?to?be?manufactured?using?leading ?edge?013?micron?manufacturing?process?VIA?C3?1GHz?processor?has?worlds ?smallest?x86?processor?die?size ????????????????????????leading?innovator?and?developer?of?PC?core?logic ?chipsets?microprocessors?and?multimedia?and?communications?chips
Afterwards, calculate the number of times that all remaining words occurred in the article, therefore, prepare word and occurrence number thereof (in the bracket) are as follows: prepare word
????VIA(3) ????C3(2) ????1GH(3) ??processor(6)
??coolest(1) ??Viatech(1) ?????...
Afterwards, add the interior data of synonym buffer zone: prepare word
????VIA(4) ????C3(2) ????1GH(3) ??processor(6)
??coolest(1) ??Viatech(1) ?????...
Then, sort according to the occurrence number of each prepare word, ranking results is as follows: ranking results
processor(6) VIA(4) 1GHz(3) C3(2) Coolest(1) Viatech(1)
At last, just can meet the keyword that imposes a condition by selection in the prepare word, and the keyword of choosing is logined to keyword database according to imposing a condition.Wherein, be in article, to occur more than 3 times if impose a condition, then " processor ", " VIA ", just can be selected as keyword with " 1GHz ", and login to keyword database.And if to impose a condition be ordering ranking more than 4, then " processor ", " VIA ", " 1GHz " just can be selected as keyword with " C3 ", and login to keyword database.
In addition, according to another distortion of the present invention, can also be encoded in computing machine and read computer program in the media and come the activation computing machine to carry out the login of article keyword, as described in the embodiment of the invention.
Therefore, by system for registering key words of articles provided by the present invention and method, can automatically the keyword that repeats in the article be logined.In addition, the present invention also can recognize automatically for the synonym words in the article, to increase the correctness of keyword analysis.
Though the present invention with preferred embodiment openly as above; right its is not in order to limit the present invention; any those skilled in the art; without departing from the spirit and scope of the present invention; when can doing a little change and retouching, so protection scope of the present invention is as the criterion when looking accompanying the claim person of defining.

Claims (10)

1. system for registering key words of articles comprises:
One data memory device has a symbolic library, an empty word dictionary and a keyword database; And
One processor, one article and this symbolic library are compared, and then with in this article with this symbolic library in the identical symbol deletion of noting down, and this article and this empty word dictionary compared, and then with in this article with this empty word dictionary in the identical empty word deletion of noting down, afterwards, calculate the number of times that all words occur in this article, thereby obtain its corresponding occurrence number of a plurality of prepare words, at last, foundation one imposes a condition by a plurality of keywords of selection in the described prepare word, and described keyword is logined to this keyword database.
2. system for registering key words of articles as claimed in claim 1, wherein this data memory device also has a synonym dictionary, and this processor also compares this article and this thesaurus, and then with in this article with this thesaurus in the identical synonym deletion of noting down, and the number of times that this synonym occurs in record this article, and the number of times that will occur with words and this synonym of this synonym synonym is embedded in a synonym buffer zone.
3. system for registering key words of articles as claimed in claim 2, wherein this processor also comprises number of times adding corresponding candidate words and the corresponding occurrence number thereof that occurs with the words synonym synonym and synonym that note down in this synonym buffer zone.。
4. system for registering key words of articles as claimed in claim 1, wherein this to impose a condition be a set number of times lower limit, and occurrence number is chosen as described keyword greater than the described prepare word side of this set number of times lower limit, and login is to this keyword database.
5. system for registering key words of articles as claimed in claim 1, wherein this to impose a condition be an ordering ranking lower limit, and this processor also sorts described prepare word according to its corresponding occurrence number, the described prepare word side that wherein sorts greater than this ordering ranking lower limit is chosen as described keyword, and login is to this keyword database.
6. an article keyword login method comprises the following steps:
Receive an article;
A this article and a symbolic library are compared, so with in this article with this symbolic library in the identical symbol deletion of noting down;
A this article and an empty word dictionary are compared, so with in this article with this empty word dictionary in the identical empty word deletion of noting down;
Calculate the number of times that all words occur in this article, thereby obtain its corresponding occurrence number of a plurality of prepare words;
Impose a condition by selecting a plurality of keywords in the described prepare word according to one; And
Described keyword is logined to a keyword database.
7. article keyword login method as claimed in claim 6 also comprises the following steps:
A this article and a synonym dictionary are compared, so with in this article with this thesaurus in the identical synonym deletion of noting down;
The number of times that this synonym occurs in record this article; And
The number of times that will occur with words and this synonym of this synonym synonym is embedded in a synonym buffer zone.
8. article keyword login method as claimed in claim 7 also comprises the number of times that occurs with the words synonym synonym and synonym that note down in this synonym buffer zone is added corresponding candidate words and corresponding occurrence number thereof.
9. article keyword login method as claimed in claim 6, wherein this to impose a condition be a set number of times lower limit, and occurrence number is chosen as described keyword greater than the described prepare word side of this set number of times lower limit, and login is to this keyword database.
10. article keyword login method as claimed in claim 6, wherein this to impose a condition be an ordering ranking lower limit, and also comprise described prepare word is sorted according to its corresponding occurrence number, the described prepare word side that wherein sorts greater than this ordering ranking lower limit is chosen as described keyword, and login is to this keyword database.
CNB02131859XA 2002-09-06 2002-09-06 System for registering key words of articles and its method Expired - Lifetime CN1211747C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB02131859XA CN1211747C (en) 2002-09-06 2002-09-06 System for registering key words of articles and its method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB02131859XA CN1211747C (en) 2002-09-06 2002-09-06 System for registering key words of articles and its method

Publications (2)

Publication Number Publication Date
CN1480875A true CN1480875A (en) 2004-03-10
CN1211747C CN1211747C (en) 2005-07-20

Family

ID=34145051

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB02131859XA Expired - Lifetime CN1211747C (en) 2002-09-06 2002-09-06 System for registering key words of articles and its method

Country Status (1)

Country Link
CN (1) CN1211747C (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101980196A (en) * 2010-10-25 2011-02-23 中国农业大学 Article comparison method and device
US8180772B2 (en) 2008-02-26 2012-05-15 Sharp Kabushiki Kaisha Electronic data retrieving apparatus
CN113096635A (en) * 2021-03-31 2021-07-09 北京字节跳动网络技术有限公司 Audio and text synchronization method, device, equipment and medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8180772B2 (en) 2008-02-26 2012-05-15 Sharp Kabushiki Kaisha Electronic data retrieving apparatus
CN101980196A (en) * 2010-10-25 2011-02-23 中国农业大学 Article comparison method and device
CN113096635A (en) * 2021-03-31 2021-07-09 北京字节跳动网络技术有限公司 Audio and text synchronization method, device, equipment and medium
CN113096635B (en) * 2021-03-31 2024-01-09 抖音视界有限公司 Audio and text synchronization method, device, equipment and medium

Also Published As

Publication number Publication date
CN1211747C (en) 2005-07-20

Similar Documents

Publication Publication Date Title
CN109992645B (en) Data management system and method based on text data
US9990421B2 (en) Phrase-based searching in an information retrieval system
Ceccarelli et al. Learning relatedness measures for entity linking
CN1240011C (en) File classifying management system and method for operation system
US8781817B2 (en) Phrase based document clustering with automatic phrase extraction
US8266121B2 (en) Identifying related objects using quantum clustering
US7580921B2 (en) Phrase identification in an information retrieval system
US20160012061A1 (en) Similar document detection and electronic discovery
US20070217693A1 (en) Automated evaluation systems & methods
EP1622054A1 (en) Phrase-based searching in an information retrieval system
US20070019864A1 (en) Image search system, image search method, and storage medium
US20080319971A1 (en) Phrase-based personalization of searches in an information retrieval system
EP1622052A1 (en) Phrase-based generation of document description
GB2391087A (en) Content extraction configured to automatically accommodate new raw data extraction algorithms
CN1609859A (en) Search result clustering method
CN1694101A (en) Reinforced clustering of multi-type data objects for search term suggestion
US20080288483A1 (en) Efficient retrieval algorithm by query term discrimination
Deselaers et al. Automatic medical image annotation in ImageCLEF 2007: Overview, results, and discussion
CN1145899C (en) Method for automatic generating abstract from word or file
Urena-López et al. Integrating linguistic resources in TC through WSD
CN115618014A (en) Standard document analysis management system and method applying big data technology
Ko et al. Learning with unlabeled data for text categorization using a bootstrapping and a feature projection technique
TWI289770B (en) Keyword register system of articles and computer readable recording medium
CN1211747C (en) System for registering key words of articles and its method
Kim et al. Image retrieval model based on weighted visual features determined by relevance feedback

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CX01 Expiry of patent term

Granted publication date: 20050720

CX01 Expiry of patent term