CN101727438A - Method for automatically extracting layout information of digital newspaper - Google Patents

Method for automatically extracting layout information of digital newspaper Download PDF

Info

Publication number
CN101727438A
CN101727438A CN200810225320A CN200810225320A CN101727438A CN 101727438 A CN101727438 A CN 101727438A CN 200810225320 A CN200810225320 A CN 200810225320A CN 200810225320 A CN200810225320 A CN 200810225320A CN 101727438 A CN101727438 A CN 101727438A
Authority
CN
China
Prior art keywords
content
character
content piece
layout information
week
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN200810225320A
Other languages
Chinese (zh)
Other versions
CN101727438B (en
Inventor
徐剑波
董宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New Founder Holdings Development Co ltd
Founder Apabi Technology Ltd
Original Assignee
Peking University Founder Group Co Ltd
Beijing Founder Apabi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Founder Group Co Ltd, Beijing Founder Apabi Technology Co Ltd filed Critical Peking University Founder Group Co Ltd
Priority to CN 200810225320 priority Critical patent/CN101727438B/en
Publication of CN101727438A publication Critical patent/CN101727438A/en
Application granted granted Critical
Publication of CN101727438B publication Critical patent/CN101727438B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

The invention relates to mode identification technology in the field of computer information processing, in particular to a method for automatically extracting the layout information of digital newspaper. The method comprises the following steps: firstly, combining independent characters in layout and organizing the characters into a plurality of content blocks; and automatically extracting publication date data, edition data and edition name data on the layout according to the position and the semantic information of related content on the layout of the newspaper. Through simple and convenient automatic operations, the method has the advantages of improving processing efficiency in the batch processing of a large amount of layout data and making the indexing of the digital newspaper more convenient and accurate at the same time of relieving the labor intensity of a worker.

Description

A kind of extraction method of layout information of digital newspaper
Technical field
The present invention relates to the mode identification technology in computer information processing field, be specifically related to a kind of extraction method of layout information of digital newspaper.
Background technology
Along with infotech, development of internet technology, the digitized developing steps of newpapers and periodicals are also being accelerated day by day.Utilize advanced Internet technology can make masses browse to digitized newpapers and periodicals content easily and efficiently, give full play to the resources advantage of newspaper office, make the faster, wider of news information propagation, promote the attractive force of newspaper office website the reader.
At present, handle and (promptly the content information in the newpapers and periodicals to be organized carrying out index for digital newspaper, as: mark layout information--publication date, release, version name) time, because these data all exist with different forms on different newspaper layouts, caused the index instrument that these contents are extracted the bigger difficulty of existence automatically, extract space of a whole page date, release, version name so all adopt generally speaking, the mode of artificial index.
Because the mode processing speed of artificial index is slower, when needing batch processing for a large amount of space of a whole page data, can limit the raising of treatment effeciency, thus need a kind of mode that the information in these fixing spaces of a whole page that exist are extracted automatically, to improve the index efficient of digital newspaper.
Summary of the invention
The objective of the invention is at present digital newspaper in the defective of carrying out existing when index is handled, a kind of extraction method of layout information of digital newspaper is provided, by comprehensive utilization space and semantic information, content is judged, realized date, the version name in the newspaper layout, the automatic extraction of release's content.
Technical scheme of the present invention is as follows: a kind of extraction method of layout information of digital newspaper comprises the steps:
(1) in the space of a whole page independently literal merge, its tissue is become several content pieces;
(2) filter out the alternating content piece according to the position that may comprise required layout information;
(3) the alternating content piece that obtains in the feature screening step (2) according to date content is judged whether it is the content piece that comprises the publication date, and the content piece that comprises the publication date is extracted;
(4) the alternating content piece that obtains in the feature screening step (2) according to release's content is judged whether it is the content piece that comprises the release, and the content piece that comprises the release is extracted;
(5) the alternating content piece that obtains in the feature screening step (2) according to version name content is judged whether it is the content piece that comprises the version name, and the content piece that comprises the version name is extracted.
Further, the extraction method of aforesaid layout information of digital newspaper, in step (2), the described position that may comprise required layout information comprises the upper left corner, the left side, the upper right corner, the top of the space of a whole page.
Further, the extraction method of aforesaid layout information of digital newspaper, in step (3), judge whether when comprising the content piece of publication date, slightly mate earlier, carefully mate then, if thin coupling is unsuccessful, then use general matched rule, chosen position leans on the content piece at top most in the result of coupling.
Further, the extraction method of aforesaid layout information of digital newspaper, in step (3), the date content of described thick coupling be characterized as following any one:
1.xxxx year xx month xx week day x, " week " and " day " be 0-2 character at interval;
2.xxxx.xx.xx week x, " xx " of " week " and front be 0-2 character at interval;
3.xxxx year xx week month x, " week " and " moon " be 0-8 character at interval;
4.xxxx.xx week x, " xx " of " week " and front be 0-8 character at interval;
Wherein, xxxx is a 1-4 character, and xx is a 1-2 character, and x is 1 character, and character is all chosen from set { 0,123,456,789 123456789 }.
Further, the extraction method of aforesaid layout information of digital newspaper, in step (3), the date content of described thin coupling be characterized as following any one:
1.xxxx year xx week month x, " week " and " moon " be 0-8 character at interval;
2.xxxx.xx week x, " xx " of " week " and front be 0-8 character at interval;
Wherein, xxxx is a 1-4 character, and xx is a 1-2 character, and x is 1 character, and character is all chosen from set { 0,123,456,789 123456789 }.
Further, the extraction method of aforesaid layout information of digital newspaper, in step (3), the date content of described general matched rule be characterized as following any one:
1.xxxx year xx month;
2.xxxx.xx;
Wherein, xxxx is a 1-4 character, and xx is a 1-2 character, and character is all chosen from set { 0,123,456,789 123456789 }.
Further, the extraction method of aforesaid layout information of digital newspaper is in step (3), if the alternating content piece does not all meet the feature Rule of judgment of date content, then all alternating content pieces are merged, the content piece after being combined according to the feature of date content is again judged.
Further, the extraction method of aforesaid layout information of digital newspaper, in step (4), if the alternating content piece comprises any two in following release's content characteristic:
1. current period xx reports xx to fold the xx version,
Wherein, the xx of " xx newspaper " is arbitrarily individual any character, and the xx of " xx is folded " is arbitrarily individual any character, and the xx of " xx version " is a 1-3 any character;
2. xx phase, xx number,
Wherein, xx is any 1-5 character;
3. there is lunar date;
Then this content piece comprises release's content information, and release's content is a front page.
Judge being characterized as of lunar date:
A) head has " lunar calendar " two words;
B) time is the arrangement of any two characters in character set [the first and second the third fourth penta oneself the hot last of the ten Heavenly stems in the ninth of the ten Heavenly Stems in heptan] and [occasion noon in the sixth of the twelve Earthly Branches not Shen the eleventh of the twelve Earthly Branches at tenth of the twelve Earthly Branches last of the twelve Earthly Branches] in an orderly way in the date;
C) month is a 1-3 character.
Further, the extraction method of aforesaid layout information of digital newspaper, in step (4), if the alternating content piece comprises following any one release's content characteristic:
1. xx version,
Wherein, xx is a 1-3 character;
2. letter+numeral does not perhaps have letter, and numeral is no more than three;
Then this content piece comprises release's content information.
Further, the extraction method of aforesaid layout information of digital newspaper, in step (5), if the alternating content piece comprises following version name content characteristic:
1. this content piece and the content piece of determining that comprises the publication date or the content piece that comprises the release are intersecting on the x direction of principal axis or on the Y direction;
2. the content information that comprises of content piece is a single file, and font size is greater than 15, and number of words is between 2-9;
3. the horizontal level of content piece is between the horizontal 30%-70% of the space of a whole page, and vertical position is at the space of a whole page longitudinally between the 5%-30%.
Further, the extraction method of aforesaid layout information of digital newspaper in step (5), if there are a plurality of alternating content pieces that comprise version name content characteristic, is then selected the highest content piece in upright position.
Beneficial effect of the present invention is as follows: the present invention is according to position and the semantic information of related content on newspaper layout, automatically extract publication date, release, version name data on the space of a whole page, by automation mechanized operation simply and easily, treatment effeciency when having improved a large amount of space of a whole page data batch processing, when alleviating intensity of workers, make that also the indexing work of digital newspaper is quick more, accurate.
Description of drawings
Fig. 1 is a method flow diagram of the present invention.
Fig. 2 extracts the synoptic diagram of independent literal for the digital newspaper space of a whole page.
Fig. 3 for in the space of a whole page independently literal merge the synoptic diagram of component content piece.
Embodiment
Below in conjunction with the drawings and specific embodiments the present invention is described in detail.
The present invention specifically is applied in the process that layout information that PDF analyzes extracts, at first utilize automatic folding with in the space of a whole page independently literal merge, make its tissue become the content piece, carry out the extraction of information according to the position and the content of these content pieces then.Described automatic folding is described in patented claim " a kind of indexing method of the complicated space of a whole page based on PDF " (200710179938.4), and particular content sees also the instructions of this patented claim, no longer carries out too much description herein.By this method, the independent literal shown in Fig. 2 just has been merged into the content piece shown in Fig. 3.
All has certain specificity owing to may comprise the position of layout information, after literal being merged into several content pieces, filter out the alternating content piece according to the position that may comprise required layout information, in general these positions are the upper left corner, the left side, the upper right corner, the top of the space of a whole page.Then, carry out the extraction of publication date, release, version name data successively, in the matching process of specific procedure, utilized regular expression.
One, obtains the publication date of the space of a whole page
Screen candidate blocks according to certain content match rule, judge whether to be date type content piece.Judge whether slightly to mate earlier when comprising the content piece of publication date, carefully mate then, if thin coupling is unsuccessful, then use general matched rule, chosen position is by the content piece at top in the result of coupling.
The date content of thick coupling be characterized as following any one:
1.xxxx year xx month xx week day x, " week " and " day " be 0-2 character at interval;
2.xxxx.xx.xx week x, " xx " of " week " and front be 0-2 character at interval;
3.xxxx year xx week month x, " week " and " moon " be 0-8 character at interval;
4.xxxx.xx week x, " xx " of " week " and front be 0-8 character at interval;
Wherein, xxxx is a 1-4 character, and xx is a 1-2 character, and x is 1 character, and character is all chosen from set { 0,123,456,789 123456789 }.
The date content of thin coupling be characterized as following any one:
1.xxxx year xx week month x, " week " and " moon " be 0-8 character at interval;
2.xxxx.xx week x, " xx " of " week " and front be 0-8 character at interval;
Wherein, xxxx is a 1-4 character, and xx is a 1-2 character, and x is 1 character, and character is all chosen from set { 0,123,456,789 123456789 }.
The date content of general matched rule be characterized as following any one:
1.xxxx year xx month;
2.xxxx.xx;
Wherein, xxxx is a 1-4 character, and xx is a 1-2 character, and character is all chosen from set { 0,123,456,789 123456789 }.
If candidate blocks is Satisfying Matching Conditions not all, then may be split assigning in a plurality of candidate blocks of date, so need merge mentioned concrete mode in the concrete merging method priority of use patented claim still " a kind of indexing method of the complicated space of a whole page based on PDF " to these candidate blocks.Piece merges and can merge according to the normal reading order according to concrete relations such as piece position as far as possible, can obtain the starting and ending position of matched character string, thereby can extract concrete date literal thus according to the result of thick coupling after character merges.Object content piece to the non-merging of finding out produces owing to wherein can have other characters that merged by mistake in the front and back on date, so need carry out deconsolidation process, extracts date literal wherein.
Two, obtain the release of the space of a whole page
After the publication date that obtains the space of a whole page, the release is extracted, if the alternating content piece comprises any two in following release's content characteristic:
1. current period xx reports xx to fold the xx version,
Wherein, the xx of " xx newspaper " is arbitrarily individual any character, and the xx of " xx is folded " is arbitrarily individual any character, and the xx of " xx version " is a 1-3 any character;
2. xx phase, xx number,
Wherein, xx is any 1-5 character;
3. there is lunar date;
Then this content piece comprises release's content information, and release's content is a front page.
Being characterized as of above-mentioned judgement lunar date:
A) head has " lunar calendar " two words;
B) time is the arrangement of any two characters in character set [the first and second the third fourth penta oneself the hot last of the ten Heavenly stems in the ninth of the ten Heavenly Stems in heptan] and [occasion noon in the sixth of the twelve Earthly Branches not Shen the eleventh of the twelve Earthly Branches at tenth of the twelve Earthly Branches last of the twelve Earthly Branches] in an orderly way in the date;
C) month is a 1-3 character.
If not front page screens according to following feature:
1. xx version,
Wherein, xx is a 1-3 character;
2. letter+numeral does not perhaps have letter, and numeral is no more than three;
If comprise above-mentioned any one release's content characteristic, then this content piece comprises release's content information.
Three, search the version name
By following feature the content piece is screened
1. general version name block (NAM) must with release or date need on the x direction or have on the Y direction crossing, if non-intersect then be not an edition name block (NAM);
2. the content of version name block (NAM) is to be single file, and font size is greater than 15, and number of words is between 2-9;
3. the horizontal level of version name block (NAM) is generally between the horizontal 30%-70% of the space of a whole page, and the vertical position of version name block (NAM) is generally at the space of a whole page longitudinally between the 5%-30%.
Screen according to above feature, if exist a plurality of candidate blocks then to select the highest content piece in upright position.
Method of the present invention is not limited to the embodiment described in the embodiment, and those skilled in the art's technical scheme according to the present invention draws other embodiment, belongs to technological innovation scope of the present invention equally.

Claims (12)

1. the extraction method of a layout information of digital newspaper comprises the steps:
(1) in the space of a whole page independently literal merge, its tissue is become several content pieces;
(2) filter out the alternating content piece according to the position that may comprise required layout information;
(3) the alternating content piece that obtains in the feature screening step (2) according to date content is judged whether it is the content piece that comprises the publication date, and the content piece that comprises the publication date is extracted;
(4) the alternating content piece that obtains in the feature screening step (2) according to release's content is judged whether it is the content piece that comprises the release, and the content piece that comprises the release is extracted;
(5) the alternating content piece that obtains in the feature screening step (2) according to version name content is judged whether it is the content piece that comprises the version name, and the content piece that comprises the version name is extracted.
2. the extraction method of layout information of digital newspaper as claimed in claim 1, it is characterized in that: in step (2), the described position that may comprise required layout information comprises the upper left corner, the left side, the upper right corner, the top of the space of a whole page.
3. the extraction method of layout information of digital newspaper as claimed in claim 1, it is characterized in that: in step (3), judge whether when comprising the content piece of publication date, slightly mate earlier, carefully mate then, if thin coupling is unsuccessful, then use general matched rule, chosen position leans on the content piece at top most in the result of coupling.
4. the extraction method of layout information of digital newspaper as claimed in claim 3 is characterized in that: in step (3), the date content of described thick coupling be characterized as following any one:
1) .xxxx xx month xx week day x, " week " and " day " be 0-2 character at interval;
2) .xxxx.xx.xx week x, " xx " of " week " and front be 0-2 character at interval;
3) .xxxx xx week month x, " week " and " moon " be 0-8 character at interval;
4) .xxxx.xx week x, " xx " of " week " and front be 0-8 character at interval;
Wherein, xxxx is a 1-4 character, and xx is a 1-2 character, and x is 1 character, and character is all chosen from set { 0,123,456,789 123456789 }.
5. the extraction method of layout information of digital newspaper as claimed in claim 3 is characterized in that: in step (3), the date content of described thin coupling be characterized as following any one:
1) .xxxx xx week month x, " week " and " moon " be 0-8 character at interval;
2) .xxxx.xx week x, " xx " of " week " and front be 0-8 character at interval;
Wherein, xxxx is a 1-4 character, and xx is a 1-2 character, and x is 1 character, and character is all chosen from set { 0,123,456,789 123456789 }.
6. the extraction method of layout information of digital newspaper as claimed in claim 3 is characterized in that: in step (3), the date content of described general matched rule be characterized as following any one:
1) the .xxxx xx month;
2).xxxx.xx;
Wherein, xxxx is a 1-4 character, and xx is a 1-2 character, and character is all chosen from set { 0,123,456,789 123456789 }.
7. as the extraction method of claim 3 or 4 or 5 or 6 described layout information of digital newspaper, it is characterized in that: in step (3), if the alternating content piece does not all meet the feature Rule of judgment of date content, then all alternating content pieces are merged, the content piece after being combined according to the feature of date content is again judged.
8. the extraction method of layout information of digital newspaper as claimed in claim 1 is characterized in that: in step (4), if the alternating content piece comprises any two in following release's content characteristic:
1). current period xx reports xx to fold the xx version,
Wherein, the xx of " xx newspaper " is arbitrarily individual any character, and the xx of " xx is folded " is arbitrarily individual any character, and the xx of " xx version " is a 1-3 any character;
2). the xx phase, xx number,
Wherein, xx is any 1-5 character;
3). there is lunar date;
Then this content piece comprises release's content information, and release's content is a front page.
9. the extraction method of layout information of digital newspaper as claimed in claim 8 is characterized in that: in step (4), judge being characterized as of lunar date:
A) head has " lunar calendar " two words;
B) time is the arrangement of any two characters in character set [the first and second the third fourth penta oneself the hot last of the ten Heavenly stems in the ninth of the ten Heavenly Stems in heptan] and [occasion noon in the sixth of the twelve Earthly Branches not Shen the eleventh of the twelve Earthly Branches at tenth of the twelve Earthly Branches last of the twelve Earthly Branches] in an orderly way in the date;
C) month is a 1-3 character.
10. the extraction method of layout information of digital newspaper as claimed in claim 1 is characterized in that: in step (4), if the alternating content piece comprises following any one release's content characteristic:
1). the xx version,
Wherein, xx is a 1-3 character;
2). letter+numeral perhaps do not have letter, and numeral is no more than three;
Then this content piece comprises release's content information.
11. the extraction method of layout information of digital newspaper as claimed in claim 1 is characterized in that: in step (5), if the alternating content piece comprises following version name content characteristic:
1). this content piece and the content piece of determining that comprises the publication date or the content piece that comprises the release are intersecting on the x direction of principal axis or on the Y direction;
2). the content information that the content piece comprises is a single file, and font size is greater than 15, and number of words is between 2-9;
3). the horizontal level of content piece is between the horizontal 30%-70% of the space of a whole page, and vertical position is at the space of a whole page longitudinally between the 5%-30%.
12. the extraction method of layout information of digital newspaper as claimed in claim 11 is characterized in that: in step (5),, then select the highest content piece in upright position if there are a plurality of alternating content pieces that comprise version name content characteristic.
CN 200810225320 2008-10-30 2008-10-30 Method for automatically extracting layout information of digital newspaper Expired - Fee Related CN101727438B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200810225320 CN101727438B (en) 2008-10-30 2008-10-30 Method for automatically extracting layout information of digital newspaper

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200810225320 CN101727438B (en) 2008-10-30 2008-10-30 Method for automatically extracting layout information of digital newspaper

Publications (2)

Publication Number Publication Date
CN101727438A true CN101727438A (en) 2010-06-09
CN101727438B CN101727438B (en) 2012-07-18

Family

ID=42448341

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200810225320 Expired - Fee Related CN101727438B (en) 2008-10-30 2008-10-30 Method for automatically extracting layout information of digital newspaper

Country Status (1)

Country Link
CN (1) CN101727438B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102841888A (en) * 2012-09-14 2012-12-26 《中国学术期刊(光盘版)》电子杂志社 Rapid typesetting system and method
CN103425651A (en) * 2012-05-15 2013-12-04 北大方正集团有限公司 Method and equipment for detecting data integrity
CN104679875A (en) * 2015-03-10 2015-06-03 杭州凡闻科技有限公司 Method for classifying information data based on digital newspaper
WO2015192567A1 (en) * 2014-06-17 2015-12-23 中兴通讯股份有限公司 Method and device for extracting chinese lunar time from text, and computer storage medium
CN106021218A (en) * 2016-05-26 2016-10-12 北京金山安全软件有限公司 Word processing method and device
CN106156058A (en) * 2015-03-27 2016-11-23 北大方正集团有限公司 Electronics report grasping means and device
CN107153689A (en) * 2017-04-29 2017-09-12 安徽富驰信息技术有限公司 A kind of case search method based on Topic Similarity
CN107193840A (en) * 2016-03-15 2017-09-22 北大方正集团有限公司 Turn the storage method and device of version file

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4907283A (en) * 1987-03-13 1990-03-06 Canon Kabushiki Kaisha Image processing apparatus
CN1320481C (en) * 2004-11-22 2007-06-06 北京北大方正技术研究院有限公司 Method for conducting title and text logic connection for newspaper pages
CN1912874A (en) * 2006-08-30 2007-02-14 北京大学 Method for abstracting document data information appeared in newspaper
CN101206639B (en) * 2007-12-20 2012-05-23 北大方正集团有限公司 Method for indexing complex impression based on PDF

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103425651B (en) * 2012-05-15 2017-10-24 北大方正集团有限公司 A kind of method and apparatus of data integrity detection
CN103425651A (en) * 2012-05-15 2013-12-04 北大方正集团有限公司 Method and equipment for detecting data integrity
CN102841888B (en) * 2012-09-14 2015-10-14 《中国学术期刊(光盘版)》电子杂志社有限公司 A kind of composing system and method fast
CN102841888A (en) * 2012-09-14 2012-12-26 《中国学术期刊(光盘版)》电子杂志社 Rapid typesetting system and method
WO2015192567A1 (en) * 2014-06-17 2015-12-23 中兴通讯股份有限公司 Method and device for extracting chinese lunar time from text, and computer storage medium
CN104679875A (en) * 2015-03-10 2015-06-03 杭州凡闻科技有限公司 Method for classifying information data based on digital newspaper
CN104679875B (en) * 2015-03-10 2017-12-15 杭州凡闻科技有限公司 A kind of information data classification method based on digital newspaper
CN106156058A (en) * 2015-03-27 2016-11-23 北大方正集团有限公司 Electronics report grasping means and device
CN106156058B (en) * 2015-03-27 2019-10-15 北大方正集团有限公司 The grasping means of electronics report and device
CN107193840A (en) * 2016-03-15 2017-09-22 北大方正集团有限公司 Turn the storage method and device of version file
CN107193840B (en) * 2016-03-15 2019-12-31 北大方正集团有限公司 Storage method and device of version conversion file
CN106021218A (en) * 2016-05-26 2016-10-12 北京金山安全软件有限公司 Word processing method and device
CN107153689A (en) * 2017-04-29 2017-09-12 安徽富驰信息技术有限公司 A kind of case search method based on Topic Similarity

Also Published As

Publication number Publication date
CN101727438B (en) 2012-07-18

Similar Documents

Publication Publication Date Title
CN101727438B (en) Method for automatically extracting layout information of digital newspaper
US20240119072A1 (en) Apparatus and method for automated and assisted patent claim mapping and expense planning
CN106446072B (en) The treating method and apparatus of web page contents
JPH07325827A (en) Automatic hyper text generator
CN106021392A (en) News key information extraction method and system
CN106502991B (en) Publication treating method and apparatus
CN108959254A (en) A kind of analytic method for article content in periodical pdf document
US8484229B2 (en) Method and system for identifying traditional arabic poems
CN101950286A (en) Error correction module and method in software translation system
CN103714101A (en) Information processing apparatus and information processing method
TW201741908A (en) Method for corresponding element symbols in the specification to the corresponding element terms in claims
CN109683881B (en) Code format adjusting method and device
CN102982028A (en) Method and device for extracting document structure
CN103778141A (en) Mixed PDF book catalogue automatic extracting algorithm
CN102722562B (en) Organization information integrating and updating method on basis of Internet
CN111814425A (en) Book automatic typesetting implementation method based on book character information
JP2006260570A (en) Image forming device
CN105930352A (en) Crowdsourcing task oriented exploratory search method
KR101293832B1 (en) Method for Collecting and Managing of Traditional Korean Medicine Terminologies
CN107145947A (en) A kind of information processing method, device and electronic equipment
CN105653549A (en) Method and device for extracting document information
CN104298752A (en) Automatic program code abbreviation expanding method based on web page resources
CN102722490A (en) A character-capturing method and a character-capturing device of an electronic reader and the same
KR102043434B1 (en) Apparatus for manufacturing search report and method for displaying the same
US20080154867A1 (en) System and Method for Automatic Text Summarization using a Search Engine

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220621

Address after: 3007, Hengqin international financial center building, No. 58, Huajin street, Hengqin new area, Zhuhai, Guangdong 519031

Patentee after: New founder holdings development Co.,Ltd.

Patentee after: FOUNDER APABI TECHNOLOGY Ltd.

Address before: 100871, fangzheng building, 298 Fu Cheng Road, Beijing, Haidian District

Patentee before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd.

Patentee before: FOUNDER APABI TECHNOLOGY Ltd.

TR01 Transfer of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120718

CF01 Termination of patent right due to non-payment of annual fee