CN104598593A - Traditional Mongolian webpage recognition method and traditional Mongolian webpage recognition system - Google Patents
Traditional Mongolian webpage recognition method and traditional Mongolian webpage recognition system Download PDFInfo
- Publication number
- CN104598593A CN104598593A CN201510033629.0A CN201510033629A CN104598593A CN 104598593 A CN104598593 A CN 104598593A CN 201510033629 A CN201510033629 A CN 201510033629A CN 104598593 A CN104598593 A CN 104598593A
- Authority
- CN
- China
- Prior art keywords
- web pages
- word
- traditional mongolian
- webpage
- characters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Abstract
The invention relates to a traditional Mongolian webpage recognition method and a traditional Mongolian webpage recognition system. The method includes the following steps: the word frequency and document frequency of each word in a traditional Mongolian webpage corpus are obtained and counted, and the harmonic mean of each word is calculated; according to the harmonic means in descending order, a first previous number of words are chosen, and the harmonic means of the first previous number of words are accumulated, so that a first accumulated sum is obtained; the word frequencies of the first previous number of words in a webpage to be recognized are obtained and counted, and are accumulated, so that a second accumulated sum is obtained; when the difference between the first accumulated sum and the second accumulated sum is less than or equal to a first threshold, the webpage to be recognized is determined to be a traditional Mongolian webpage. The traditional Mongolian webpage recognition method provided by the invention can carry out the recognition of traditional Mongolian webpages with high accuracy and high efficiency, and thereby can help to collect traditional Mongolian webpages and implement a traditional Mongolian full-text search engine.
Description
Technical field
The present invention relates to networking technology area, particularly relate to a kind of traditional Mongolian Characters in Web Pages recognition methods and device.
Background technology
Tradition Mongolian is the municipal Mongolian official in inner mongolia ways of writing (namely writing Mongolian positive literary style with Mongolian letter).Tradition Mongolian Internet resources are Mongols masses important channels with this national writing transmission of information, shared resource, the main platform of Ye Shi Mongols traditional culture succession, traditional Mongolian Internet resources are significant for studying Mongol, Mongols's culture and realizing traditional Mongolian full-text search engine.Traditional Mongolian Internet resources Chinese, English Internet resources negligible amounts relatively of China, and coding is complicated, therefore, collect traditional Mongolian Internet resources accurately and efficiently most important, early-stage Study finds, collects the accurate identification that traditional Mongolian Internet resources key is traditional Mongolian Characters in Web Pages accurately and efficiently.
At present, web page identification method comprises following several: 1) language belonging to the LANG determined property webpage word of HTML (Hypertext Markup Language) (HyperTextMark-up Language, HTML).The LANG attribute of html language needs to declare webpage word used, and this attribute can make search engine and browser read the content of webpage exactly.2) language belonging to " font-family " and " charset " determined property webpage word of HTML.Html language provides the character code of webpage, and different character codes can use different fonts, therefore judges the word of webpage by " font-family " attribute of HTML.Such as: webpage " charset " is GB2312, and " font-family " be " BZDBT ", " charset " of " TIBETBT " or webpage be UTF8, and " font family " is " Microsoft Himalaya ", then can judge that this webpage is Tibetan language.3) based on specific languages high frequency words identification webpage word belonging to language.Often kind of languages have oneself high frequency syntactic units, therefore can by judging that the frequency that webpage medium-high frequency word to be analyzed occurs judges homepages language.The frequency such as occurred according to Tibetan language syllable point and high frequency words judges whether webpage is Tibetan language.
For the method for the LANG determined property webpage word according to HTML, according to World Wide Web Consortium (WorldWide Web Consortium, W3C) standard, each webpage should declare LANG attribute, owing to there is no the LANG attribute of html language in a lot of traditional Mongolian Characters in Web Pages, therefore, can not whether be only traditional Mongolian according to the LANG determined property homepages language of webpage.For the method for language belonging to " font-family " and " charset " determined property webpage word of HTML, a lot of traditional Mongolian Characters in Web Pages only has " charset " information, does not have " font-family " information, therefore can not judge whether webpage word is traditional Mongolian according to " charset " and " font-family ".For language belonging to the high frequency words identification webpage word based on specific languages, different language has oneself language feature, therefore the high frequency words of various language is not identical, such as: " ", " " be the word that Chinese frequency of utilization is higher, " it ", " the " are the words that in English, frequency of utilization is higher
(he, she, it),
(with) be the word that in Uighur, frequency of utilization is higher, the high frequency syntactic units come out towards same language, different pieces of information also has a great difference.Existing three kinds identify in the technology of homepages language, homepages language recognition technology based on high frequency words is comparatively effective relative to other two kinds of methods, but this technology only considers the absolute frequency of linguistic unit, the wording characteristics do not considered in different field text, and therefore the accuracy of identification of homepages language differs greatly.
Summary of the invention
The object of the invention is the defect for prior art, a kind of traditional Mongolian Characters in Web Pages recognition methods is provided, to realize the identification of traditional Mongolian Characters in Web Pages compared with high-accuracy and greater efficiency.
For achieving the above object, the invention provides a kind of traditional Mongolian Characters in Web Pages recognition methods, described method comprises:
Obtain and add up the word frequency TF of each word in traditional Mongolian Characters in Web Pages corpus
iwith document frequency DF
i, wherein, i>=0;
According to
obtain the harmonic-mean F of each word in described traditional Mongolian Characters in Web Pages corpus respectively
i;
In each word by described traditional Mongolian Characters in Web Pages corpus, according to F
ivalue descending, choose a front first quantity word, and the F to a described front first quantity word
ivalue adds up, and obtains the first cumulative sum;
Obtain and add up the word frequency TF of a front first quantity word described in webpage to be identified
j, wherein, j>=0;
To the TF of a first quantity word front in described webpage to be identified
jvalue adds up, and obtains the second cumulative sum;
When difference between described first cumulative sum and described second cumulative sum is less than or equal to first threshold, determine that described webpage to be identified is traditional Mongolian Characters in Web Pages.
On the other hand, present invention also offers a kind of traditional Mongolian Characters in Web Pages recognition device, described device comprises:
First acquiring unit, for obtaining and adding up the word frequency TF of each word in traditional Mongolian Characters in Web Pages corpus
iwith document frequency DF
i, wherein, i>=0;
First computing unit, for basis
obtain the harmonic-mean F of each word in described traditional Mongolian Characters in Web Pages corpus respectively
i;
Second computing unit, in each word by described traditional Mongolian Characters in Web Pages corpus, according to F
ivalue descending, choose a front first quantity word, and the F to a described front first quantity word
ivalue adds up, and obtains the first cumulative sum;
Second acquisition unit, for obtaining and adding up the word frequency TF of a front first quantity word described in webpage to be identified
j, wherein, j>=0;
3rd computing unit, to the TF of a first quantity word front in described webpage to be identified
jvalue adds up, and obtains the second cumulative sum;
Decision package, when being less than or equal to first threshold for the difference between described first cumulative sum and described second cumulative sum, determines that described webpage to be identified is traditional Mongolian Characters in Web Pages.
Traditional Mongolian Characters in Web Pages recognition methods provided by the invention and device, whether the language judging a webpage based on the word frequency of traditional Mongolian Characters in Web Pages corpus and the harmonic-mean of document frequency is traditional Mongolian, to realize the identification of traditional Mongolian Characters in Web Pages compared with high-accuracy and greater efficiency, and then the collection of traditional Mongolian Characters in Web Pages and the realization of traditional Mongolian full-text search engine can be contributed to.
Accompanying drawing explanation
Traditional Mongolian Characters in Web Pages recognition methods process flow diagram that Fig. 1 provides for the embodiment of the present invention one;
Traditional Mongolian Characters in Web Pages recognition device schematic diagram that Fig. 2 provides for the embodiment of the present invention two.
Embodiment
Below by drawings and Examples, technical scheme of the present invention is described in further detail.
Fig. 1 is traditional Mongolian Characters in Web Pages recognition methods process flow diagram that the present embodiment one provides, and as shown in Figure 1, described method comprises:
Step S101, obtains and adds up word frequency and the document frequency of each word in traditional Mongolian Characters in Web Pages corpus.
Particularly, obtain each word in traditional Mongolian Characters in Web Pages corpus, add up the word frequency TF of each word
iwith document frequency DF
i, wherein, i>=0.
Wherein, in the file that portion is given, word frequency (term frequency, TF) refers to the number of times that some given words occur in this document.
In given file set, document frequency (Document Frequency, DF) refers to appearance concentrated by some given files number of times at this file.
Alternatively, obtaining and before the word frequency of adding up each word in traditional Mongolian Characters in Web Pages corpus and document frequency, also comprising:
Download traditional Mongolian Characters in Web Pages, and pre-service is carried out to described traditional Mongolian Characters in Web Pages;
Build traditional Mongolian Characters in Web Pages corpus.
It should be noted that, when building traditional Mongolian corpus, following problem will be noted:
(1) language material scale is large
Language material scale is at least 1,000,000 word levels, and time span is a certain website, the webpage in a certain year.
(2) language material cover type is complete
This corpus should comprise the webpage of news, education, culture (especially national culture), science and technology, amusement, forum, business, other type.
(3) language material composition is reasonable
According to language feature and the network resource conditions of traditional Mongolian, the language material ratio situation of this several types is about: news, culture and forum each 20%, education, amusement, business and other types each 10%.
(4) website type of coding is complete
Because the coding of traditional Mongolian Characters in Web Pages is comparatively complicated, because realize the webpage identification of all traditional Mongolian codes, need the webpage downloading the traditional Mongolian code be at present, as: the webpage of Meng Keli coding, Unicode coding, coding such as match sound, Ming Antu etc.
Build extensive, multi-field traditional Mongolian Characters in Web Pages corpus to need to download and a collection ofly take into account the webpages such as type of coding, the Type of website, language material ratio; And the pre-service such as garbage information filtering, extend markup language (Extensible Markup Language, XML) format conversion and code conversion (other types code conversion is Unicode coding) are carried out to the Mongolian Characters in Web Pages downloaded.
Step S102, calculates the harmonic-mean of each word in described traditional Mongolian Characters in Web Pages corpus according to harmonic-mean computing formula.
Particularly, according to harmonic-mean computing formula
calculate the harmonic-mean F of each word in traditional Mongolian Characters in Web Pages corpus
i, wherein, i>=0.
Step S103, in each word by described traditional Mongolian Characters in Web Pages corpus, descending according to harmonic-mean, choose a front first quantity word, and the harmonic-mean of a described front first quantity word is added up, obtain the first cumulative sum.
Particularly, to the harmonic-mean F of each word calculated in step S102
i, according to the order that harmonic-mean is descending, choose a front first quantity word, and the harmonic-mean of a described front first quantity word added up, obtain the first cumulative sum.
Such as, according to harmonic-mean F
idescending order chooses the F of before rank 5%
iadd up, obtain the first cumulative sum A, computing formula is as follows:
Step S104, obtains and adds up the word frequency of a front first quantity word described in webpage to be identified.
Particularly, the first quantity word before obtaining in step S103 is corresponded in webpage to be identified, from webpage to be identified, obtains the word frequency TF of a described first quantity word
j, wherein, j>=0.
Alternatively, obtain and before adding up the word frequency of a front first quantity word described in webpage to be identified, also comprise: garbage information filtering, format conversion and code conversion are carried out to described webpage to be identified, obtaining the webpage to be identified after processing.
Step S105, adds up to the word frequency of a described front first quantity word, obtains the second cumulative sum.
Such as, to before obtaining from webpage to be identified 5% the word frequency TF of word
jadd up, obtain the second cumulative sum B, computing formula is as follows:
Step S106, when the difference between described first cumulative sum and described second cumulative sum is less than or equal to first threshold, determines that described webpage to be identified is traditional Mongolian Characters in Web Pages.
Such as, if first threshold is α, judge | whether A-B| is less than or equal to α, and if so, then webpage to be identified is traditional Mongolian Characters in Web Pages; If not, then webpage to be identified is not traditional Mongolian Characters in Web Pages, wherein α be one determined by experiment, characterize both the constant of difference.
Traditional Mongolian Characters in Web Pages recognition methods provided by the invention, whether the language judging a webpage based on the word frequency of traditional Mongolian Characters in Web Pages corpus and the harmonic-mean of document frequency is traditional Mongolian, to realize the identification of traditional Mongolian Characters in Web Pages compared with high-accuracy and greater efficiency, and then the collection of traditional Mongolian Characters in Web Pages and the realization of traditional Mongolian full-text search engine can be contributed to.
Be more than the detailed description that traditional Mongolian Characters in Web Pages recognition methods provided by the present invention is carried out, below traditional Mongolian Characters in Web Pages recognition device provided by the invention be described in detail.
Traditional Mongolian Characters in Web Pages recognition device schematic diagram that Fig. 2 provides for the embodiment of the present invention two, as shown in Figure 2, described device comprises: the first acquiring unit 201, first computing unit 202, second computing unit 203, second acquisition unit 204, the 3rd computing unit 205 and decision package 206.
First acquiring unit 201, for obtaining and adding up the word frequency TF of each word in traditional Mongolian Characters in Web Pages corpus
iwith document frequency DF
i, wherein, i>=0;
First computing unit 202, for basis
obtain the harmonic-mean F of each word in described traditional Mongolian Characters in Web Pages corpus respectively
i;
Second computing unit 203, in each word by described traditional Mongolian Characters in Web Pages corpus, according to F
ivalue descending, choose a front first quantity word, and the F to a described front first quantity word
ivalue adds up, and obtains the first cumulative sum;
Second acquisition unit 204, for obtaining and adding up the word frequency TF of a front first quantity word described in webpage to be identified
j, wherein, j>=0;
3rd computing unit 205, to the TF of a first quantity word front in described webpage to be identified
jvalue adds up, and obtains the second cumulative sum;
Decision package 206, when being less than or equal to first threshold for the difference between described first cumulative sum and described second cumulative sum, determines that described webpage to be identified is traditional Mongolian Characters in Web Pages.
Alternatively, described device also comprises:
First processing unit 207, for downloading traditional Mongolian Characters in Web Pages, and carries out pre-service to described traditional Mongolian Characters in Web Pages;
Creating unit 208, for building traditional Mongolian Characters in Web Pages corpus.
Alternatively, described device also comprises:
Second processing unit 209, for carrying out garbage information filtering, format conversion and code conversion to described webpage to be identified, obtains the webpage to be identified after processing.
Alternatively, described traditional Mongolian Characters in Web Pages corpus at least comprises 1,000,000 Mongolian clictions of tradition.
The device that the embodiment of the present application two provides implants the method that the embodiment of the present application one provides, and therefore, the specific works process of the device that the application provides, does not repeat again at this.
Traditional Mongolian Characters in Web Pages recognition device provided by the invention, whether the language judging a webpage based on the word frequency of traditional Mongolian Characters in Web Pages corpus and the harmonic-mean of document frequency is traditional Mongolian, to realize the identification of traditional Mongolian Characters in Web Pages compared with high-accuracy and greater efficiency, and then the collection of traditional Mongolian Characters in Web Pages and the realization of traditional Mongolian full-text search engine can be contributed to.
Professional should recognize further, in conjunction with unit and the algorithm steps of each example of embodiment disclosed herein description, can realize with electronic hardware, computer software or the combination of the two, in order to the interchangeability of hardware and software is clearly described, generally describe composition and the step of each example in the above description according to function.These functions perform with hardware or software mode actually, depend on application-specific and the design constraint of technical scheme.Professional and technical personnel can use distinct methods to realize described function to each specifically should being used for, but this realization should not thought and exceeds scope of the present invention.
The software module that the method described in conjunction with embodiment disclosed herein or the step of algorithm can use hardware, processor to perform, or the combination of the two is implemented.Software module can be placed in the storage medium of other form any known in random access memory (RAM), internal memory, ROM (read-only memory) (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technical field.
Above-described embodiment; object of the present invention, technical scheme and beneficial effect are further described; be understood that; the foregoing is only the specific embodiment of the present invention; the protection domain be not intended to limit the present invention; within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.
Claims (8)
1. a traditional Mongolian Characters in Web Pages recognition methods, is characterized in that, described method comprises:
Obtain and add up the word frequency TF of each word in traditional Mongolian Characters in Web Pages corpus
iwith document frequency DF
i, wherein, i>=0;
According to
obtain the harmonic-mean F of each word in described traditional Mongolian Characters in Web Pages corpus respectively
i;
In each word by described traditional Mongolian Characters in Web Pages corpus, according to F
ivalue descending, choose a front first quantity word, and the F to a described front first quantity word
ivalue adds up, and obtains the first cumulative sum;
Obtain and add up the word frequency TF of a front first quantity word described in webpage to be identified
j, wherein, j>=0;
To the TF of a first quantity word front in described webpage to be identified
jvalue adds up, and obtains the second cumulative sum;
When difference between described first cumulative sum and described second cumulative sum is less than or equal to first threshold, determine that described webpage to be identified is traditional Mongolian Characters in Web Pages.
2. traditional Mongolian Characters in Web Pages recognition methods according to claim 1, is characterized in that, adds up the word frequency TF of each word in traditional Mongolian Characters in Web Pages corpus in described acquisition
iwith document frequency DF
ibefore, described method also comprises:
Download traditional Mongolian Characters in Web Pages, and pre-service is carried out to described traditional Mongolian Characters in Web Pages;
Build traditional Mongolian Characters in Web Pages corpus.
3. traditional Mongolian Characters in Web Pages recognition methods according to claim 1, is characterized in that, is obtaining and is adding up the word frequency TF of a front first quantity word described in webpage to be identified
jbefore, described method also comprises:
Garbage information filtering, format conversion and code conversion are carried out to described webpage to be identified, obtains the webpage to be identified after processing.
4. the traditional Mongolian Characters in Web Pages recognition methods according to any one of claim 1-3, is characterized in that, described traditional Mongolian Characters in Web Pages corpus at least comprises 1,000,000 Mongolian clictions of tradition.
5. a traditional Mongolian Characters in Web Pages recognition device, is characterized in that, described device comprises:
First acquiring unit, for obtaining and adding up the word frequency TF of each word in traditional Mongolian Characters in Web Pages corpus
iwith document frequency DF
i, wherein, i>=0;
First computing unit, for basis
obtain the harmonic-mean F of each word in described traditional Mongolian Characters in Web Pages corpus respectively
i;
Second computing unit, in each word by described traditional Mongolian Characters in Web Pages corpus, according to F
ivalue descending, choose a front first quantity word, and the F to a described front first quantity word
ivalue adds up, and obtains the first cumulative sum;
Second acquisition unit, for obtaining and adding up the word frequency TF of a front first quantity word described in webpage to be identified
j, wherein, j>=0;
3rd computing unit, to the TF of a first quantity word front in described webpage to be identified
jvalue adds up, and obtains the second cumulative sum;
Decision package, when being less than or equal to first threshold for the difference between described first cumulative sum and described second cumulative sum, determines that described webpage to be identified is traditional Mongolian Characters in Web Pages.
6. traditional Mongolian Characters in Web Pages recognition device according to claim 5, it is characterized in that, described device also comprises:
First processing unit, for downloading traditional Mongolian Characters in Web Pages, and carries out pre-service to described traditional Mongolian Characters in Web Pages;
Creating unit, for building traditional Mongolian Characters in Web Pages corpus.
7. traditional Mongolian Characters in Web Pages recognition device according to claim 5, it is characterized in that, described device also comprises:
Second processing unit, for carrying out garbage information filtering, format conversion and code conversion to described webpage to be identified, obtains the webpage to be identified after processing.
8. the traditional Mongolian Characters in Web Pages recognition device according to any one of claim 5-7, is characterized in that, described traditional Mongolian Characters in Web Pages corpus at least comprises 1,000,000 Mongolian clictions of tradition.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510033629.0A CN104598593B (en) | 2015-01-22 | 2015-01-22 | Traditional Mongolian Characters in Web Pages recognition methods and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510033629.0A CN104598593B (en) | 2015-01-22 | 2015-01-22 | Traditional Mongolian Characters in Web Pages recognition methods and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104598593A true CN104598593A (en) | 2015-05-06 |
CN104598593B CN104598593B (en) | 2017-12-22 |
Family
ID=53124378
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510033629.0A Expired - Fee Related CN104598593B (en) | 2015-01-22 | 2015-01-22 | Traditional Mongolian Characters in Web Pages recognition methods and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104598593B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020156760A1 (en) * | 1998-01-05 | 2002-10-24 | Nec Research Institute, Inc. | Autonomous citation indexing and literature browsing using citation context |
CN102129479A (en) * | 2011-04-29 | 2011-07-20 | 南京邮电大学 | World wide web service discovery method based on probabilistic latent semantic analysis model |
CN103942188A (en) * | 2013-01-22 | 2014-07-23 | 腾讯科技(深圳)有限公司 | Method and device for identifying corpus languages |
-
2015
- 2015-01-22 CN CN201510033629.0A patent/CN104598593B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020156760A1 (en) * | 1998-01-05 | 2002-10-24 | Nec Research Institute, Inc. | Autonomous citation indexing and literature browsing using citation context |
CN102129479A (en) * | 2011-04-29 | 2011-07-20 | 南京邮电大学 | World wide web service discovery method based on probabilistic latent semantic analysis model |
CN103942188A (en) * | 2013-01-22 | 2014-07-23 | 腾讯科技(深圳)有限公司 | Method and device for identifying corpus languages |
Also Published As
Publication number | Publication date |
---|---|
CN104598593B (en) | 2017-12-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yin et al. | Empirical strong-line oxygen abundance calibrations from galaxies with electron-temperature measurements | |
CN107797991B (en) | Dependency syntax tree-based knowledge graph expansion method and system | |
CN102541874B (en) | Webpage text content extracting method and device | |
Sun et al. | Dom based content extraction via text density | |
Bragaglia et al. | Old open clusters as key tracers of Galactic chemical evolution-II. Iron and elemental abundances in NGC 2324, NGC 2477, NGC 2660, NGC 3960, and Berkeley 32 | |
CN103853760B (en) | Method and device for extracting contents of bodies of web pages | |
CN104615593A (en) | Method and device for automatic detection of microblog hot topics | |
CN102915361B (en) | Webpage text extracting method based on character distribution characteristic | |
CN101079031A (en) | Web page subject extraction system and method | |
CN101231661A (en) | Method and system for digging object grade knowledge | |
CN101833579B (en) | Method and system for automatically detecting academic misconduct literature | |
CN102169496A (en) | Anchor text analysis-based automatic domain term generating method | |
CN103810251A (en) | Method and device for extracting text | |
CN103514213A (en) | Term extraction method and device | |
De Becker et al. | Early-type stars in the young open cluster IC 1805-II. The probably single stars HD 15570 and HD 15629, and the massive binary/triple system HD 15558 | |
CN101968801A (en) | Method for extracting key words of single text | |
Ozturkmenoglu et al. | Comparison of different lemmatization approaches for information retrieval on Turkish text collection | |
Kumar et al. | FST based morphological analyzer for Hindi language | |
Ashari et al. | Document summarization using TextRank and semantic network | |
CN104598593A (en) | Traditional Mongolian webpage recognition method and traditional Mongolian webpage recognition system | |
EP2096561B1 (en) | Method for extracting relevant content from a markup language file, in particular from a HTML file | |
CN102147731A (en) | Automatic functional requirement extraction system based on extended functional requirement description framework | |
Fahr et al. | Longitudinal variation of the pickup-proton-injection efficiency and rate at the heliospheric termination shock | |
Yasukawa et al. | Stemming Malay text and its application in automatic text categorization | |
US20150019208A1 (en) | Method for identifying a set of sentences in a digital document, method for generating a digital document, and associated device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20171222 Termination date: 20210122 |
|
CF01 | Termination of patent right due to non-payment of annual fee |