CN102855324A - Automatic extracting method and device for network information - Google Patents

Automatic extracting method and device for network information Download PDF

Info

Publication number
CN102855324A
CN102855324A CN2012103357191A CN201210335719A CN102855324A CN 102855324 A CN102855324 A CN 102855324A CN 2012103357191 A CN2012103357191 A CN 2012103357191A CN 201210335719 A CN201210335719 A CN 201210335719A CN 102855324 A CN102855324 A CN 102855324A
Authority
CN
China
Prior art keywords
webpage
regular expression
ssub
intersection
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012103357191A
Other languages
Chinese (zh)
Other versions
CN102855324B (en
Inventor
杨俊拯
温予
张旸
黄百宁
王世平
葛猛
孟玲会
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING YUNHONG DAOYUAN INFORMATION TECHNOLOGY Co Ltd
Original Assignee
BEIJING YUNHONG DAOYUAN INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING YUNHONG DAOYUAN INFORMATION TECHNOLOGY Co Ltd filed Critical BEIJING YUNHONG DAOYUAN INFORMATION TECHNOLOGY Co Ltd
Priority to CN201210335719.1A priority Critical patent/CN102855324B/en
Publication of CN102855324A publication Critical patent/CN102855324A/en
Application granted granted Critical
Publication of CN102855324B publication Critical patent/CN102855324B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention provides an automatic extracting method and an automatic extracting device for network information. The relevant method comprises the steps as follows: finding out a webpage W' of an element in a subset S sub with the given information S in a webpage unit relevant to the given information S; generating an information pattern unit P' based on the preset rule, and summing the information pattern unit P and the regular expression unit P so as to obtain a set P1; matching the set P1 with all the webpage in the webpage collection W relevant to the given information, thus obtaining the set S sub'; and finishing the grasping until S sub= =S sub'. With the adoption of the automatic extracting method and the automatic extracting device for the network information, provided by the invention, the relevant regular expression set can be generated based on different webpage, therefore, the content in the webpage can be automatically extracted, and lots of workloads can be saved.

Description

A kind of extraction method of the network information and device
Technical field
The present invention relates to a kind of extraction method and device of the network information, belong to network information extractive technique field.
Background technology
For the information that represents at webpage, prior art is generally described by regular expression, and for different webpages, corresponding regular expression is different often, and the workload that so just causes the network information to be extracted is larger.
Summary of the invention
The present invention is the larger problem of workload of the existing network information extraction of solution, and then a kind of extraction method and device of the network information are provided.For this reason, the invention provides following technical scheme:
A kind of extraction method of the network information comprises:
From the relevant webpage intersection W of given information S, find the webpage W ' of element among the subset Ssub that contains given information S;
According to pre-defined rule information generated pattern set P ', and information pattern set P ' is gathered P with regular expression ask intersection to obtain set P 1
To gather P 1All webpages among the webpage intersection W relevant with given information mate, and obtain S set sub ', until Ssub==Ssub ' time crawl process finishes.
A kind of automatic extracting device of the network information comprises:
Webpage is chosen the unit, is used for finding from the relevant webpage intersection W of given information S the webpage W ' of element the subset Ssub that contains given information S;
The unit is chosen in set, is used for according to pre-defined rule information generated pattern set P ', and asks intersection to obtain set P information pattern set P ' and regular expression set P 1
The content placement unit is used for gathering P 1All webpages among the webpage intersection W relevant with given information mate, and obtain S set sub ', until Ssub==Ssub ' time crawl process finishes.
Technical scheme provided by the invention realizes automatically extracting the content in the webpage by generate corresponding regular expression set according to different webpages, has saved a lot of workloads.
Description of drawings
In order to be illustrated more clearly in the technical scheme of the embodiment of the invention, the accompanying drawing of required use was done to introduce simply during the below will describe embodiment, apparently, accompanying drawing in the following describes only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain according to these accompanying drawings other accompanying drawing.
Fig. 1 is the synoptic diagram of two webpage obtaining informations providing of the specific embodiment of the present invention;
Fig. 2 is the synoptic diagram of n webpage obtaining information providing of the specific embodiment of the present invention;
Fig. 3 is the schematic flow sheet of the extraction method of the network information that provides of the specific embodiment of the present invention;
Fig. 4 is the schematic flow sheet of the information generated pattern set P ' that provides of the specific embodiment of the present invention;
Fig. 5 is the schematic flow sheet of the checking regular expression set that provides of the specific embodiment of the present invention;
Fig. 6 is the structural representation of the face characteristic locating device that provides of the specific embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that obtains under the creative work prerequisite.
The principle of the technical scheme that this embodiment provides is: can comprise the situation of same information for dissimilar webpages, because same information expression way on different websites is different.For example at music field, a lot of music information website, forums etc. of comprising are arranged on the internet, their different websites, forum's structure of web page and the form of expression generally are not identical, but they have comprised the information of a lot of same kind, information such as song title, singer's name, special edition, for a kind of information, webpage (being designated as urlpattern1) for same type, can represent by regular expression (prefix1 info suffix1), and intersection that will the value of recording is designated as V1.And for dissimilar webpages (urlpattern2), they have different regular expression (prefix2 info suffix2), the intersection of the value of this website is designated as V2, and then the common factor of V1 and V2 is not equal to sky, and the information that the value of V1 and V2 is described is consistent.If there are by that analogy dissimilar webpages of n, then should there be the set less than or equal to n value, exist less than or equal to n regular expression.Concrete logic as depicted in figs. 1 and 2.Therefore for the part set (sample sizes such as 10 to 100) of given information, be designated as Ssub, then can pass through webpage intersection W, obtain information intersection S '.Defining coverage rate is | S ∩ S ' | and/S, the definition accuracy rate | S ∩ S ' |/S ', with respect to coverage rate, accuracy rate is extracted more important for web page contents.Because if accuracy rate is excessively low, nonsensical for most application, but coverage rate is crossed low can remedying by the webpage quantity of magnanimity, so the technical scheme that this embodiment provides proposes for the accuracy rate that improves the web page contents extraction.Be elaborated below in conjunction with Figure of description, as shown in Figure 3, the extraction method of the corresponding network information comprises:
Step 31 finds the webpage W ' of element among the subset Ssub that contains given information S from the relevant webpage intersection W of given information S.
Concrete, for the subset Ssub of given information S, the element among the subset Ssub is enumerable, and definition regular expression intersection
Figure BDA00002125324500031
At first travel through the relevant webpage intersection W of given information S, from webpage intersection W, find the webpage W ' of element among the subset Ssub that contains given information S.
Step 32 according to pre-defined rule information generated pattern set P ', and is gathered P with information pattern set P ' with regular expression and is asked intersection to obtain set P 1
According to pre-defined rule information generated pattern set P ', and make W '=Ssub, wherein the generative process of information pattern set P ' specifically can comprise specifically as shown in Figure 4:
The pattern that at first defines regular expression is: p=prefix info suffix; And be the component of regular expression in order to the next part cooperation: digital collection NumberSet, set of letters EnglishSet, special symbol S set pecialSet, character set ChineseSet, webpage tag set MetaSet; Wherein the content of the info of regular expression represents by digital collection NumberSet, set of letters EnglishSet, special symbol S set pecialSet and character set ChineseSet, and the content of prefix and suffix represents by webpage tag set MetaSet;
The subset Ssub of traversal given information S finds a certain element s, and finds the position of element s in webpage w;
Recall forward, find first webpage label, be designated as prefix; Recall backward, find first webpage label, be designated as suffix;
Description rule generting element s the canonical set on webpage w of the content in the middle of prefix and the suffix according to digital collection NumberSet, set of letters EnglishSet, special symbol S set pecialSet and character set ChineseSet;
Become the regular expressions set of Ssub on webpage w according to element s in the canonic(al) ensemble symphysis on the webpage w, be recorded as P ' p1, p2 ... pn.
Step 33 will be gathered P 1All webpages among the webpage intersection W relevant with given information mate, and obtain S set sub ', until Ssub==Ssub ' time crawl process finishes.
Concrete, will gather P 1All webpages among the webpage intersection W relevant with given information mate, and obtain S set sub ', if Ssub〉Ssub ', re-execute step 31 after then making Ssub=Ssub ', until Ssub==Ssub ' time crawl process finishes.
Further, this embodiment can also comprise the process of checking regular expression set, as shown in Figure 5, specifically can comprise:
Each webpage W ' and the subset Ssub of given information are multiplied each other, obtain regular expression intersection Tt=T1, T2 ... Tn;
Traversal regular expression intersection Tt obtains a regular expression intersection T 1, traversal regular expression intersection T 1, regular expression p ∈ Tn and webpage W ' mate the S set of the value of obtaining arbitrarily;
If S-Ssub ≠ Φ gives up and changes expression formula (effect of this step is to remove the regular expression that mates simultaneously other guide); If S-Ssub=is Φ, then the number Scount of element equals element number in the S set among the subset Ssub of given information;
Traversal regular expression intersection Tt, for Tn ∈ Tt arbitrarily, if the number of regular expression is greater than 1 among the Tn, then choose the regular expression of Scount maximum among the Tn, cast out remaining regular expression (effect of this step is a plurality of expression formulas for same coupling, chooses maximum that of coupling);
Traversal regular expression intersection Tt, contrast is any two Tn wherein, if regular expression is identical, then give up wherein any one (effect of this step is to remove identical regular expression);
Remaining regular expression is formed set, be designated as P '=p1, p2 ... pn.
The technical scheme that adopts this embodiment to provide by generate corresponding regular expression set according to different webpages, realizes automatically extracting the content in the webpage, has saved a lot of workloads, and can verify the correctness of regular expression.
Need to prove, one of ordinary skill in the art will appreciate that all or part of step that realizes in above-mentioned each embodiment of the method is to come the relevant hardware of instruction to finish by program, corresponding program can be stored in a kind of computer-readable recording medium, the above-mentioned storage medium of mentioning can be ROM (read-only memory), disk or CD etc.
The specific embodiment of the present invention also provides a kind of automatic extracting device of the network information, as shown in Figure 6, comprising:
Webpage is chosen unit 61, is used for finding from the relevant webpage intersection W of given information S the webpage W ' of element the subset Ssub that contains given information S;
Unit 62 is chosen in set, is used for according to pre-defined rule information generated pattern set P ', and asks intersection to obtain set P information pattern set P ' and regular expression set P 1;
Content placement unit 63 is used for gathering P 1All webpages among the webpage intersection W relevant with given information mate, and obtain S set sub ', until the crawl process finishes during Ssub==Ssub '.
Optionally, choose in the unit 62 in set and comprise: the traversal subelement, recall subelement, symphysis becomes subelement to canonical set statement subelement with canonic(al) ensemble; Traversal subelement wherein is used for the subset Ssub of traversal given information S, finds a certain element s, and finds the position of element s in webpage w; Recall subelement and be used for recalling forward, find first webpage label, be designated as prefix; Recall backward, find first webpage label, be designated as suffix; Canonical set statement subelement is used for description rule generting element s the canonical set on webpage w of the content in the middle of prefix and the suffix according to digital collection NumberSet, set of letters EnglishSet, special symbol S set pecialSet and character set ChineseSet; The canonic(al) ensemble symphysis becomes subelement to be used for becoming the regular expressions set of Ssub on webpage w according to element s in the canonic(al) ensemble symphysis on the webpage w, is recorded as P '=p1, p2 ... pn.
Optionally, this device can also comprise authentication unit, comprises in authentication unit: get multiplier unit, coupling subelement, element number and determine that subelement, the first screening subelement, the second screening subelement and canonical set determine subelement; The multiplier unit of getting wherein is used for each webpage W ' and the subset Ssub of given information are multiplied each other, and obtains regular expression intersection Tt=T1, T2 ... Tn; The coupling subelement is used for traversal regular expression intersection Tt, obtains a regular expression intersection T 1, traversal regular expression intersection T 1, regular expression p ∈ Tn and webpage W ' mate the S set of the value of obtaining arbitrarily; If element number determines subelement and be used for S-Ssub ≠ Φ, give up and change expression formula; If S-Ssub=is Φ, then the number Scount of element equals element number in the S set among the subset Ssub of given information; The first screening subelement is used for traversal regular expression intersection Tt, for Tn ∈ Tt arbitrarily, if the number of regular expression is greater than 1 among the Tn, then chooses the regular expression of Scount maximum among the Tn, casts out remaining regular expression; The second screening subelement is used for traversal regular expression intersection Tt, and contrast is any two Tn wherein, if regular expression is identical, then gives up wherein any one; The canonical set determines that subelement is used for remaining regular expression is formed set, is designated as P '=p1, p2 ... pn.
The specific implementation of the processing capacity of each unit that comprises in the automatic extracting device of the above-mentioned network information is described in embodiment of the method before, no longer is repeated in this description at this.
The technical scheme that adopts this embodiment to provide by generate corresponding regular expression set according to different webpages, realizes automatically extracting the content in the webpage, has saved a lot of workloads, and can verify the correctness of regular expression.
It should be noted that among the embodiment of said apparatus that included unit is just divided according to function logic, but is not limited to above-mentioned division, as long as can realize corresponding function; In addition, the concrete title of each functional unit also just for the ease of mutual differentiation, is not limited to protection scope of the present invention.
The above; only be the better embodiment of the present invention; but protection scope of the present invention is not limited to this; anyly be familiar with those skilled in the art in the technical scope that the embodiment of the invention discloses; the variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claim.

Claims (6)

1. the extraction method of a network information is characterized in that, comprising:
From the relevant webpage intersection W of given information S, find the webpage W ' of element among the subset Ssub that contains given information S;
According to pre-defined rule information generated pattern set P ', and information pattern set P ' is gathered P with regular expression ask intersection to obtain set P 1
To gather P 1All webpages among the webpage intersection W relevant with given information mate, and obtain S set sub ', until Ssub==Ssub ' time crawl process finishes.
2. method according to claim 1 is characterized in that, describedly comprising according to pre-defined rule information generated pattern set P ':
The subset Ssub of traversal given information S finds a certain element s, and finds the position of element s in webpage w;
Recall forward, find first webpage label, be designated as prefix; Recall backward, find first webpage label, be designated as suffix;
Description rule generting element s the canonical set on webpage w of the content in the middle of prefix and the suffix according to digital collection NumberSet, set of letters EnglishSet, special symbol S set pecialSet and character set ChineseSet;
Become the regular expressions set of Ssub on webpage w according to element s in the canonic(al) ensemble symphysis on the webpage w, be recorded as P '=p1, p2 ... pn.
3. method according to claim 1 is characterized in that, the method also comprises the set of checking regular expression, and described checking regular expression set comprises:
Each webpage W ' and the subset Ssub of given information are multiplied each other, obtain regular expression intersection Tt=T1, T2 ... Tn;
Traversal regular expression intersection Tt obtains a regular expression intersection T 1, traversal regular expression intersection T 1, regular expression p ∈ Tn and webpage W ' mate the S set of the value of obtaining arbitrarily;
If S-Ssub ≠ Φ gives up and changes expression formula; If S-Ssub=is Φ, then the number Scount of element equals element number in the S set among the subset Ssub of given information;
Traversal regular expression intersection Tt for Tn ∈ Tt arbitrarily, if the number of regular expression is greater than 1 among the Tn, then chooses the regular expression of Scount maximum among the Tn, casts out remaining regular expression;
Traversal regular expression intersection Tt, contrast is any two Tn wherein, if regular expression is identical, then give up wherein any one;
Remaining regular expression is formed set, be designated as P '=p1, p2 ... pn.
4. the automatic extracting device of a network information is characterized in that, comprising:
Webpage is chosen the unit, is used for finding from the relevant webpage intersection W of given information S the webpage W ' of element the subset Ssub that contains given information S;
The unit is chosen in set, is used for according to pre-defined rule information generated pattern set P ', and asks intersection to obtain set P information pattern set P ' and regular expression set P 1
The content placement unit is used for gathering P 1All webpages among the webpage intersection W relevant with given information mate, and obtain S set sub ', until Ssub==Ssub ' time crawl process finishes.
5. device according to claim 4 is characterized in that, chooses in the unit in set to comprise:
The traversal subelement, the subset Ssub for traversal given information S finds a certain element s, and finds the position of element s in webpage w;
Recall subelement, be used for recalling forward, find first webpage label, be designated as prefix; Recall backward, find first webpage label, be designated as suffix;
Canonical set statement subelement is used for description rule generting element s the canonical set on webpage w of the content in the middle of prefix and the suffix according to digital collection NumberSet, set of letters EnglishSet, special symbol S set pecialSet and character set ChineseSet;
The canonic(al) ensemble symphysis becomes subelement, is used for becoming the regular expressions set of Ssub on webpage w according to element s in the canonic(al) ensemble symphysis on the webpage w, is recorded as P '=p1, p2 ... pn.
6. device according to claim 4 is characterized in that, this device also comprises authentication unit, and described authentication unit comprises:
Get the multiplier unit, be used for each webpage W ' and the subset Ssub of given information are multiplied each other, obtain regular expression intersection Tt=T1, T2 ... Tn;
The coupling subelement is used for traversal regular expression intersection Tt, obtains a regular expression intersection T 1, traversal regular expression intersection T 1, regular expression p ∈ Tn and webpage W ' mate the S set of the value of obtaining arbitrarily;
Element number is determined subelement, if be used for S-Ssub ≠ Φ, give up and changes expression formula; If S-Ssub=is Φ, then the number Scount of element equals element number in the S set among the subset Ssub of given information;
The first screening subelement is used for traversal regular expression intersection Tt, for Tn ∈ Tt arbitrarily, if the number of regular expression is greater than 1 among the Tn, then chooses the regular expression of Scount maximum among the Tn, casts out remaining regular expression;
The second screening subelement is used for traversal regular expression intersection Tt, and contrast is any two Tn wherein, if regular expression is identical, then gives up wherein any one;
Subelement is determined in the canonical set, is used for remaining regular expression is formed set, is designated as P '=p1, p2 ... pn.
CN201210335719.1A 2012-09-11 2012-09-11 A kind of extraction method of the network information and device Expired - Fee Related CN102855324B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210335719.1A CN102855324B (en) 2012-09-11 2012-09-11 A kind of extraction method of the network information and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210335719.1A CN102855324B (en) 2012-09-11 2012-09-11 A kind of extraction method of the network information and device

Publications (2)

Publication Number Publication Date
CN102855324A true CN102855324A (en) 2013-01-02
CN102855324B CN102855324B (en) 2015-08-26

Family

ID=47401912

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210335719.1A Expired - Fee Related CN102855324B (en) 2012-09-11 2012-09-11 A kind of extraction method of the network information and device

Country Status (1)

Country Link
CN (1) CN102855324B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105740355A (en) * 2016-01-26 2016-07-06 中国人民解放军国防科学技术大学 Aggregated text density based webpage body text extraction method and apparatus
CN106126684A (en) * 2016-06-29 2016-11-16 联想(北京)有限公司 A kind of method and device generating web crawlers configuration file
CN103902578B (en) * 2012-12-27 2017-05-31 中国移动通信集团四川有限公司 A kind of method for abstracting web page information and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080243793A1 (en) * 2007-03-21 2008-10-02 Paul Hallett Contact Information Capture and Link Redirection
CN101582075A (en) * 2009-06-24 2009-11-18 大连海事大学 Web information extraction system
CN101727447A (en) * 2008-10-10 2010-06-09 浙江搜富网络技术有限公司 Generation method and device of regular expression based on URL
CN102456050A (en) * 2010-10-27 2012-05-16 中国移动通信集团四川有限公司 Method and device for extracting data from webpage

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080243793A1 (en) * 2007-03-21 2008-10-02 Paul Hallett Contact Information Capture and Link Redirection
CN101727447A (en) * 2008-10-10 2010-06-09 浙江搜富网络技术有限公司 Generation method and device of regular expression based on URL
CN101582075A (en) * 2009-06-24 2009-11-18 大连海事大学 Web information extraction system
CN102456050A (en) * 2010-10-27 2012-05-16 中国移动通信集团四川有限公司 Method and device for extracting data from webpage

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张树壮等: "大规模复杂规则匹配技术研究", 《高技术通讯》, vol. 20, no. 12, 30 March 2011 (2011-03-30), pages 1217 - 1223 *
程岚岚: "基于正则表达式的大规模网页术语对抽取研究", 《情报杂志》, vol. 27, no. 11, 16 February 2009 (2009-02-16) *
胡军伟等: "正则表达式在Web信息抽取中的应用", 《北京信息科技大学学报》, vol. 26, no. 6, 31 December 2011 (2011-12-31), pages 86 - 89 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902578B (en) * 2012-12-27 2017-05-31 中国移动通信集团四川有限公司 A kind of method for abstracting web page information and device
CN105740355A (en) * 2016-01-26 2016-07-06 中国人民解放军国防科学技术大学 Aggregated text density based webpage body text extraction method and apparatus
CN105740355B (en) * 2016-01-26 2019-03-26 中国人民解放军国防科学技术大学 Webpage context extraction method and device based on aggregation text density
CN106126684A (en) * 2016-06-29 2016-11-16 联想(北京)有限公司 A kind of method and device generating web crawlers configuration file
CN106126684B (en) * 2016-06-29 2019-12-24 联想(北京)有限公司 Method and device for generating network crawler configuration file

Also Published As

Publication number Publication date
CN102855324B (en) 2015-08-26

Similar Documents

Publication Publication Date Title
CN105404699A (en) Method, device and server for searching articles of finance and economics
CN103703467A (en) Method and apparatus for storing data
US8732199B2 (en) System, method, and computer readable media for identifying a user-initiated log file record in a log file
US10878020B2 (en) Automated extraction tools and their use in social content tagging systems
CN104933056A (en) Uniform resource locator (URL) de-duplication method and device
CN105989082A (en) Report view generation method and apparatus
CN103365928B (en) Information recommendation method and information recommendation device
CN105893385B (en) Method and apparatus for analyzing user behavior
CN104462547A (en) Configurable webpage data acquisition method and system
CN103823892A (en) Method and device of determining webpage clustering mode
CN106933916B (en) JSON character string processing method and device
CN104899281A (en) Academic article processing method and search processing method and apparatus for academic articles
CN103853654A (en) Method and device for selecting webpage testing paths
CN108804472A (en) A kind of webpage content extraction method, device and server
CN102855324A (en) Automatic extracting method and device for network information
CN105550179A (en) Webpage collection method and browser plug-in
CN106202050B (en) Theme information acquisition method and device and electronic equipment
CN108228546A (en) A kind of text feature, device, equipment and readable storage medium storing program for executing
CN104899203A (en) Webpage generating method, webpage generating device and terminal equipment
CN103902578B (en) A kind of method for abstracting web page information and device
CN106484746A (en) The analysis method of website transformation event and device
CN110309364A (en) A kind of information extraction method and device
CN106339381B (en) Information processing method and device
CN106547774B (en) Website content detection method and device
JP5761029B2 (en) Dictionary creation device, word collection method, and program

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20150826

Termination date: 20160911