CN1808424A - Method of abstracting key information from documents - Google Patents
Method of abstracting key information from documents Download PDFInfo
- Publication number
- CN1808424A CN1808424A CN 200510002458 CN200510002458A CN1808424A CN 1808424 A CN1808424 A CN 1808424A CN 200510002458 CN200510002458 CN 200510002458 CN 200510002458 A CN200510002458 A CN 200510002458A CN 1808424 A CN1808424 A CN 1808424A
- Authority
- CN
- China
- Prior art keywords
- document
- key message
- character string
- template
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Document Processing Apparatus (AREA)
Abstract
The invention discloses a method for extracting key information from files, which comprises steps of: making file templates which layout is separated into forms according to certain layouts, each unit grid in the forms being used for saving file information; the information category and position of each unit grid is saved in the mark character strings which are saved in the specific area of the file templates; generating files based on templates, acquiring the unit grid position of specific key information via reading the information category in the mark character strings. The invention also provides a file template for implementing the method.
Description
Technical field
The present invention relates to a kind of disposal route of document information, particularly a kind of template of utilizing is obtained key message from document files, realizes the method for office automation (OA), belongs to the microcomputer data processing field.
Background technology
The e-government that with the E-Government is core then is to promote the informationalized key of Chinese national economy.And office automation is one of core application of E-Government, for improving efficiency of the government, simplifies civil servant's various aspects such as work, and important meaning is all arranged.
At present, there is strict requirement in country to the form of government document, and has worked out relevant national standard, the form of all necessary conformance with standard regulation of all kinds of government documents.On the other hand, carrying out E-Government reality also needs official document is classified, retrieves, inquired about and adds up, and this just must be undertaken by database.Database Requirements can be distinguished key message and non-critical information, thereby to require to pass to data of database be " structuring ", promptly can distinguish key message, and each key message all has clear and definite sign or explanation.But the document format that uses the most general office suite class software such as MS Office etc. to be provided is non-structured at present, and promptly each segment information wherein all is identical concerning program, and it is crucial can't distinguishing which information, and which is less important.Therefore,, the needs of database retrieval can't be satisfied, the actual needs of office automation can't be fully satisfied though existing office suite class software can satisfy the requirement on the official documents format.
At above-mentioned objective demand, people thought successively that a lot of ways solved.Known to the applicant, these technology mainly contain following three kinds: " territory " technology of 1.Lotus Notes; 2. use electronic spreadsheet software; 3. directly in webpage, fill in key message.Describe respectively below.
" territory " is a special notion among the Lotus Notes, be usually expressed as can input characters edit box.Lotus Notes can read or be provided with the content in territory, and it is saved in the database.Also there is correspondence in " territory " of Lotus Notes in MS Office.Territory corresponding in MS Office is one section special literal, and this section literal is come out by special marking in MS Office." territory " of Lotus Notes and " territory " among the MS Office are one to one.The OA system only need all use whole key messages the mode in " territory " to offer the user, just can realize extracting key message from the office suite document.But " territory " is normally at the specific function of specific office suite.The office suite that each are different all has different disposal routes, and is compatible bad.And " territory " all is the form of pure words usually, do not have additional control.If used a plurality of " territories " in the official document, there is not structural association so between these territories, be difficult to carry out operation some complexity, related, for example from the territory, generate XML information etc.
Relatively be typically Microsoft InfoPath in the electronic spreadsheet software, they are actually the list of making space of a whole page complexity by the complicated format control, and can information extraction from list.But the shortcoming of electronic spreadsheet software is not have powerful space of a whole page ability to express, and for the document of format complexity, government document for example can't realize the space of a whole page effect of being correlated with also not having corresponding specific function, for example " revision " etc.
Directly filling in key message in webpage is to fill in key message earlier in webpage, then key message is inserted in the document, compiles official document again.This method essence is to allow the manual designated key information of user.But key message can only unidirectionally be transferred to office suite from webpage, and can not directly from the office suite document, extract key message, therefore there are some more serious defectives, as guaranteeing the consistance of data, operation steps is too many, the flow process poor fault tolerance, it is very inconvenient to use, and can not accomplish What You See Is What You Get etc.
In application number was 02159844 application for a patent for invention, a kind of document information processing method was also disclosed.This method is to produce according to document information to contain the character information intermediate information identical with document information; Extract the word information of representing word from document information or intermediate information; Be added to generation conclusion information on the intermediate information with the word information that will extract, therefore can be used in the search of using character information and handle, as full-text search.Obviously, this method can only partly satisfy the functional requirement that official document is handled, and has bigger limitation.
Summary of the invention
In view of above-mentioned the deficiencies in the prior art, the purpose of this invention is to provide a kind of method that can from document, extract key message.Adopt this method can allow the user directly at existing office suite inediting official document, to satisfy the call format of official document, after editor finished, the OA system can extract the key message that database needs easily.
For realizing above-mentioned goal of the invention, the present invention adopts following technical scheme:
A kind of method of extracting key message from document is characterized in that:
(1) make document template, the space of a whole page of described document template is divided into form according to special style, and each cell in the described form is respectively applied for the information of depositing in the document;
(2) information category of described each cell and positional information are kept in the tab character string, and described tab character string is kept at the specific region of the document template;
(3) generate document based on described document template, by reading the information category in the described tab character string, obtain the cell location at specific key message place, thereby obtain the key message in the document.
Wherein, described document is the document of Doc form.
More preferably, described tab character string leaves the remarks zone of described document template in.
More preferably, described special style is the official document pattern that meets national regulation.
More preferably, the XML file of described tab character string for generating based on described document template.
More preferably, each described key message all has unique cell address.
More preferably, office suite software is by ActiveX control identification and read described tab character string, thereby obtains the specific key message in the document.
A kind of document template that is used to implement said method is characterized in that:
The space of a whole page of described document template is divided into form according to special style, and each cell in the described form is respectively applied for the information of depositing in the document; The information category of described each cell and positional information are kept in the tab character string, and described tab character string is kept at the specific region of described document template.
More preferably, described document template is the document template of Dot form, the XML file of described tab character string for generating based on described document template, and this document is kept at the remarks zone of described document template.
More preferably, described document template is the document template that the space of a whole page meets the official document pattern of national regulation.
The method of extracting key message from document of the present invention utilizes form to plan the space of a whole page, and dirigibility is big, function is strong, can produce the official document that meets national regulation fully.When application this method was handled in different office suites, information can not lost, and used different office suites thereby the OA system can be mixed, and is widely applicable.
Description of drawings
The present invention is further illustrated below in conjunction with the drawings and specific embodiments.
Fig. 1 is the basic procedure synoptic diagram of the method for the invention.
Fig. 2 is the synoptic diagram based on an official document template of form.
Fig. 3 is the pairing tab character string of official document template shown in Figure 2.
Fig. 4 selects the step synoptic diagram of the different official document templates of making in advance in office suite.
Embodiment
Current, the office suite class software that is used to make official document has multiple, typical in MSOFFICE, creates OFFICE, WPS etc.At this wherein, be derived from the Doc form of MS OFFICE because use is the most extensive, the industrial standard on having come true.For looking after users' use habit, create other office suites such as OFFICE, WPS and can both support the Doc form fully, promptly read and edit the document of Doc form, the document behind the editor is saved as Doc form etc.On the other hand, the master data storage organization of Doc form is as known technology, for industry is generally followed and used.In view of the foregoing, in following embodiment, be that example describes the method for the invention mainly with the Doc format file.
Basic fundamental thought of the present invention is to make non-structured file structureization by the template of using special facture, thereby can satisfy requirement on the official documents format by template on the one hand, on the other hand can be easily from based on information extraction the document of template.
Based on above-mentioned thinking, the method for extracting key message from document of the present invention comprises following step as shown in Figure 1:
(1) make document template, the space of a whole page of template is divided into form according to special style, each cell in the form is respectively applied for the information of depositing in the document;
(2) information category and the positional information with each cell is kept in the tab character string, and the tab character string is kept at the specific region of the document template;
(3) generate document based on above-mentioned document template, by reading the information category in the described tab character string, obtain the cell location at specific key message place, thereby obtain the key message in the document.
Therefore, primary work of the present invention is to make the document template that meets above-mentioned requirements.The document template has following technical characterstic: use form to plan the space of a whole page, can produce the official document space of a whole page effect in strict conformity with national regulation like this; The position of usage flag character string designated key information in document by the XML file, can be specified corresponding which key message of which cell of form; The harmless lost territory of tab character string is kept in the Doc document, thereby when guaranteeing to use different office suites to open official document, the tab character string can not lost.
Below, respectively above-mentioned technical step is launched explanation.
In the present invention, the fundamental purpose of making template is to use form to come the space of a whole page is divided, and makes key message be positioned at uniquely among the cell of form.So just can distinguish key message and non-critical information, thereby obtain the key message that meets customer requirements automatically by program.
Be example to make the official document that government uses, an official document template that utilizes the form making as shown in Figure 1.In Fig. 2, all kinds of key messages in this official document all are placed in the different cells, for example " official document umber sequence number " is placed on Cell (1 in the 1st cell of the 1st row, 1), " confidential " and " security deadline " all is placed to Cell (1 in the 2nd cell of the 1st row, 2), and " issued organ " is placed on Cell (4,1) in the 1st cell of the 4th row.
In the process of making template, must guarantee that each key message all only is arranged in unique cell, could allow key message have unique address like this.But a plurality of key messages can be arranged in same cell, as long as the developer oneself of relevant office suite can distinguish these key messages by definition.For example " confidential " and " security deadline " these two key messages just can be arranged in same cell among Fig. 2.
In above table, each cell all has " address " information, that is to say, each cell in the form all has a unique sign, makes calling program can find it.For example, in the form shown in the table 1, the address of the cell of black is exactly Cell (2,2).This is an example, and for different office suite class software, its definition to the cell address is not quite similar, and still, just can meet the demands as long as can clearly distinguish the address of different units lattice.
Table 1
By the composing of above table, key message is put into uniquely among the cell of form, just being equivalent to key message has also been specified can be by the address of program looks.
Another advantage that adopts form to divide the space of a whole page is: form has very strong composing function, therefore can produce the official document space of a whole page that meets national regulation fully.And the use additive method all is difficult to produce the official document space of a whole page that meets national regulation fully.In addition, the use form also can not cause the distortion of the space of a whole page, because existing most office suite all supports to hide the empty frame of form.As long as the empty frame of form is hidden, for the user, they do not know to exist form, so all operations do not have any inconvenience all with in the past the same.
After planning the space of a whole page by the use form, the user can be directly at existing office suite inediting official document, to satisfy the call format of official document.Therefore, next step work is to allow the OA system can extract the key message that database needs easily from this official document template, and this just need allow program know that each key message and cell on earth are correspondences how.For this reason, be that " issued organ " this key message is an example with what allow the OA system know to deposit in Cell (4, the 1) cell, can take following several specific practice:
First method is directly " issued organ " key message to be bundled in the cell, for example adds an attribute in Cell (4,1) cell, and title is just named " issued organ ".This kind method is the simplest, but applicable surface is limited, only be suitable for for the office suite that can arbitrarily change document format, however just improper for the general like this document format of Doc, because Doc document format supporter lattice Custom Attributes not.
Another method the present invention designs at the document of Doc form especially.This method solves this problem by " tab character string " is set.The tab character string is a character string, is used for key message and cell are mapped.The tab character string can be any form, and reasonable method is to adopt the XML form.
XML is the english abbreviation of extend markup language, and it is a kind of self-description data form." self-described " is meant that the metadata of description exists with content itself.That is to say that XML document (file that perhaps comprises XML tag) is comprising in its file inside and to convey to the recipient about how explaining the content that is labeled and the information of XML structure.Therefore, XML can be used as the consolidation form of electronic data interchange, is suitable for the exchanges data of various platform environments.Current.XML has become the general official document standard of Electronic Official Document Interchange System as the technology of a comparative maturity, and has also obtained the support of the office suite class software of main flow such as MS OFFICE (2000 and above version).Therefore, in the present invention, also adopt the basic tool of XML as the DOC DATA exchange.
Based on the XML technology, the tab character string that we set " issued organ " is exactly:
<issued organ title 〉! A Table (t1)! Cell (4,1)</issued organ's title 〉
Thus, the pairing complete tab character string of official document shown in Figure 2 as shown in Figure 3.As can be seen from Figure 3, the tab character string is actually the data of the layout position at record information category of all kinds of key messages and place, is issued organ's title as " issued organ " pairing tab character string with regard to the information category that shows this information, it is positioned at Cell (4,1) in this cell, relevant office suite just can be tried to locate by following up a clue by reading this tab character string, finds the information that needs easily.
In practice, as shown in Figure 4, need make different official document templates in advance according to dissimilar official document patterns, this official document template according to the difference of various information position on the space of a whole page, can form different tab character strings in manufacturing process.Official document template is in case determine that the tab character string is also corresponding to be determined, and is kept among the template.Like this, although each is different by the concrete word content of the newly-built official document document of this template, the Distribution Pattern of information is changeless, therefore also will inherit this tab character string automatically.In a single day relevant office suite can discern and read the content of tab character string, just can know the distribution situation on the space of a whole page for information about, thereby reads, retrieves information wherein, satisfies the needs that the OA system realizes office automation.
By each step recited above, in fact an official document document is divided into two parts that connect each other, wherein a part is based on the official document template that is made of form and the document itself that generates utilizes official document template can guarantee the official document form of making up to specification; A part is the tab character string corresponding with this template in addition, and this tab character string and official document template itself form relation one to one.
Because the essence of the method for the invention is that same document separated into two parts is handled respectively, for the user, if, can bring no small trouble to the user undoubtedly carrying out file preservation, editor, will handling two different files when revising.For this reason, the method for the invention takes tab character string wherein is saved in the technical thought of document itself.Like this, the user only needs an independent official document document is edited, revised, and the pairing tab character string of this official document document is corresponding with editor, the modification process of the document automatically on the backstage as " mirror image " of the document template, and is kept at the document itself.
In order to implement above-mentioned technical thought, different processing modes can be arranged for different office suites.For the office suite that can arbitrarily change document format, the tab character string can directly be kept in the data stream of document format, but for the general like this document format of Doc, if preserve by data stream, the situation of loss of data might appear when preserving.In addition, the tab character string can not be kept in the text, because might cause the distortion of the space of a whole page or be missed deletion by the user.
Method provided by the present invention is for the Doc document, this tab character string to be kept in " remarks " information, promptly for Word, select " file "〉" attribute " menu, in " summary " option, the tab character string is placed in " remarks " zone then.Like this, no matter how document itself makes amendment, the tab character string that is positioned at " remarks " zone can not change, and so just can guarantee that relevant office suite can obtain the distribution situation on the space of a whole page for information about all the time, and the database of being convenient to the backstage is handled.
Above-mentioned " remarks " zone is not that the Doc document is exclusive.For existing documents editing software, it arranges generally that in the document storage form this zone is arranged, so that store relevant information.And should can not change in the zone along with the variation of document content.For example,, have same " remarks " zone, therefore the tab character string can be left in this zone for the SXW document of creating OFFICE.For the PDF document of Adobe, in its " file attribute " option, have the fill area of " Custom Attributes ", relevant information also can be stored in wherein.
The Doc document based on official document template, intragenic marker character string that obtains by above-mentioned steps has the advantage that the space of a whole page meets the official document pattern of national regulation fully, in addition, tab character string wherein is as the location index of relevant key message in the document, can read for relevant office suite and analyze, thereby extract the key message that meets user's needs.Therefore, above-mentioned Doc document can be referred to as a kind of " intelligent official document ".
For this " intelligent official document ", existing office suite can not be automatically obtains the tab character string from its " remarks " zone, to carry out reading and retrieving for information about, for this reason, need carry out necessary modifications and upgrading to existing office suite class software.For this reason, the applicant has taked to increase the way of control in existing office suite.This control is ActiveX control under the Windows environment, under the Linux environment, is Mozilla Plugin.Therefore, the office suite developer only need use ready-made api interface, and relevant control is operated, and just can realize the support of office suite to " intelligent official document ".
For different office suites, the title of its ActiveX is different, and API is also slightly variant, but is identical substantially.Be example to create Office below, simply introduce the api function of ActiveX.
Api function is divided into following a few class:
● document function
The ■ new document
The ■ opening document
■ preserves document
The ■ close document
● document properties setting and function
The ■ interface property
■ revision (vestige reservation) and read-only
The ■ function executing
● the transmission of document
● information extraction from the Doc document
The processing of ■ element information
The processing of ■ key message
● filename and path
Because OA system great majority adopt the B/S structure at present, therefore the api function below all is that example is introduced with JavaScript.The method of calling of other language is basic identical.
For new document, its corresponding API is:
void newDoc(
string aDocType="private:factory/swriter",
bool bEmbed=True,
bool bInProcess=True
);
Parameter declaration:
Parameter | Type | Default value | Explanation |
ADocType (input) | String | " private:factory/ swriter " | The type of new document: word processing: " private:factory/swriter " electrical form: " private:factory/scalc " PowerPoint: " private:factory/simpress " |
BEmbed (input) | Bool | True | Whether office suite is embedded among the browser.True: embed among the browser.False: open in the stand-alone program mode in the browser outside. |
BInProcess (input) | Bool | True | Whether office suite controlled by browser.True: browser can be controlled office suite.False: office suite is not controlled by browser. |
Rreturn value: do not have
State when calling: do not have
The element value of key message is set, and its corresponding API is:
bool setElement(
string aDocName,
string aKey,
string aValue,
bool bOAInfo,
bool bDraft
);
Parameter declaration:
Parameter | Type | Default value | Explanation |
ADocName (input) | String | The URL title of file | |
AKey (input) | String | The masurium of key message | |
AValue (input) | String | The corresponding element value | |
BOAInfo (input) | Bool | The attribute 0 of OA: flow process 1:OA is set |
BDraft (input) | Bool | The attribute 1 of rough draft is set: deposit rough draft 0 in: deposit document in |
Returned value specification:
Rreturn value | Type | Default value | Explanation |
bSucc | Bool | Whether record is successful |
State when calling: document is closed
Obtain the element value of key message, its API is:
string getElement(
string aDocName,
string aKey,
bool bOAInfo,
bool bDraft
);
Parameter declaration:
Parameter | Type | Default value | Explanation |
ADocName (input) | String | Filename | |
AKey (input) | String | The masurium of key message | |
BOAInfo (input) | Bool | The attribute 0 of OA: flow process 1:OA | |
BDraft (input) | Bool | Rough draft attribute 1: deposit rough draft 0 in: deposit document in |
Rreturn value:
Rreturn value | Type | Default value | Explanation |
bSucc | String | Element value |
State when calling: document is closed
Resemble above-mentioned such API and also have a lot, their specific implementation is that one of ordinary skill in the art can both solve easily, has not just given unnecessary details at this.
By above-mentioned technical step, make the method for from document, extracting key message provided by the present invention can be applicable to existing multiple office suite, the user only need simply upgrade and get final product, thereby has made things convenient for the user to greatest extent, has enlarged the scope of application of this method.
In the above-described embodiment, be the Doc form with document format, the tab character string is that example is illustrated for the XML file, but the document format that obvious the present invention can be suitable for is not limited to this.For example, for creating OFFICE, document format is SXW, for WPS OFFICE, document format can be WPS, in addition, though the tab character string is the XML form preferably, but,, just can satisfy the requirement of the method for the invention as long as can be read by corresponding office suite for other form.
In order to prove absolutely specific implementation of the present invention, above-mentioned specific embodiment has been described.Should be understood that other variation of the present invention and modification it will be apparent to those skilled in the art that the present invention is not limited to described embodiment.Therefore, at the true spirit of the disclosed content of the present invention and any/all modifications, variation or the equivalent transformation in the cardinal rule scope, all belong to claim protection domain of the present invention.
Claims (10)
1. method of extracting key message from document is characterized in that:
(1) make document template, the space of a whole page of described document template is divided into form according to special style, and each cell in the described form is respectively applied for the information of depositing in the document;
(2) information category of described each cell and positional information are kept in the tab character string, and described tab character string is kept at the specific region of the document template;
(3) generate document based on described document template, by reading the information category in the described tab character string, obtain the cell location at specific key message place, thereby obtain the key message in the document.
2. the method for extracting key message from document as claimed in claim 1 is characterized in that:
Described document is the document of Doc form.
3. the method for extracting key message from document as claimed in claim 2 is characterized in that:
Described tab character string leaves the remarks zone of described document template in.
4. the method for extracting key message from document as claimed in claim 1 is characterized in that:
Described special style is the official document pattern that meets national regulation.
5. the method for extracting key message from document as claimed in claim 1 is characterized in that:
The XML file of described tab character string for generating based on described document template.
6. the method for extracting key message from document as claimed in claim 1 is characterized in that:
Each described key message all has unique cell address.
7. the method for extracting key message from document as claimed in claim 1 is characterized in that:
Office suite software is by ActiveX control identification and read described tab character string, thereby obtains the specific key message in the document.
8. document template that is used to implement the method for claim 1 is characterized in that:
The space of a whole page of described document template is divided into form according to special style, and each cell in the described form is respectively applied for the information of depositing in the document; The information category of described each cell and positional information are kept in the tab character string, and described tab character string is kept at the specific region of described document template.
9. document template as claimed in claim 8 is characterized in that:
Described document template is the document template of Dot form, the XML file of described tab character string for generating based on described document template, and this document is kept at the remarks zone of described document template.
10. document template as claimed in claim 9 is characterized in that:
Described document template is the document template that the space of a whole page meets the official document pattern of national regulation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 200510002458 CN1808424A (en) | 2005-01-21 | 2005-01-21 | Method of abstracting key information from documents |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 200510002458 CN1808424A (en) | 2005-01-21 | 2005-01-21 | Method of abstracting key information from documents |
Publications (1)
Publication Number | Publication Date |
---|---|
CN1808424A true CN1808424A (en) | 2006-07-26 |
Family
ID=36840324
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 200510002458 Pending CN1808424A (en) | 2005-01-21 | 2005-01-21 | Method of abstracting key information from documents |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN1808424A (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102567303A (en) * | 2010-12-24 | 2012-07-11 | 北京大学 | Typesetting method and device for variable official document data |
CN102708336A (en) * | 2012-05-02 | 2012-10-03 | 四川建设网有限责任公司 | Method and system for electronic document processing based on separation of key data from customized template |
CN102841888A (en) * | 2012-09-14 | 2012-12-26 | 《中国学术期刊(光盘版)》电子杂志社 | Rapid typesetting system and method |
CN103136314A (en) * | 2012-01-13 | 2013-06-05 | 北京麦克斯泰科技有限公司 | Method and system of newspaper clipping generation in online public opinion monitoring |
CN103678268A (en) * | 2012-09-19 | 2014-03-26 | 北京大学 | Automatic typesetting method and device for official documents |
CN103744983A (en) * | 2014-01-15 | 2014-04-23 | 北京理工大学 | Method for extracting meta-information of electronic documents |
CN104199975A (en) * | 2014-09-23 | 2014-12-10 | 中国南方电网有限责任公司 | Configurable WORD file structured extraction method |
CN105138563A (en) * | 2015-07-23 | 2015-12-09 | 浪潮电子信息产业股份有限公司 | Method for rapidly extracting key information of test log |
CN106021213A (en) * | 2016-05-18 | 2016-10-12 | 广东源恒软件科技有限公司 | Generation system and method of enterprise taxation review point template |
CN106598919A (en) * | 2015-10-14 | 2017-04-26 | 中兴通讯股份有限公司 | Document generation method and device |
CN107797979A (en) * | 2016-09-02 | 2018-03-13 | 株式会社日立制作所 | Analytical equipment and analysis method |
CN111506588A (en) * | 2020-04-10 | 2020-08-07 | 创景未来(北京)科技有限公司 | Method and device for extracting key information of electronic document |
CN112199467A (en) * | 2020-09-08 | 2021-01-08 | 深圳价值在线信息科技股份有限公司 | Method and device for configuring letter display page |
CN112668316A (en) * | 2020-11-17 | 2021-04-16 | 国家计算机网络与信息安全管理中心 | word document key information extraction method |
-
2005
- 2005-01-21 CN CN 200510002458 patent/CN1808424A/en active Pending
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102567303A (en) * | 2010-12-24 | 2012-07-11 | 北京大学 | Typesetting method and device for variable official document data |
CN103136314A (en) * | 2012-01-13 | 2013-06-05 | 北京麦克斯泰科技有限公司 | Method and system of newspaper clipping generation in online public opinion monitoring |
CN102708336B (en) * | 2012-05-02 | 2015-04-22 | 四川建设网有限责任公司 | Method and system for electronic document processing based on separation of key data from customized template |
CN102708336A (en) * | 2012-05-02 | 2012-10-03 | 四川建设网有限责任公司 | Method and system for electronic document processing based on separation of key data from customized template |
CN102841888A (en) * | 2012-09-14 | 2012-12-26 | 《中国学术期刊(光盘版)》电子杂志社 | Rapid typesetting system and method |
CN102841888B (en) * | 2012-09-14 | 2015-10-14 | 《中国学术期刊(光盘版)》电子杂志社有限公司 | A kind of composing system and method fast |
CN103678268A (en) * | 2012-09-19 | 2014-03-26 | 北京大学 | Automatic typesetting method and device for official documents |
CN103678268B (en) * | 2012-09-19 | 2016-08-31 | 北京大学 | Official document automatic composing method and device |
CN103744983A (en) * | 2014-01-15 | 2014-04-23 | 北京理工大学 | Method for extracting meta-information of electronic documents |
CN104199975A (en) * | 2014-09-23 | 2014-12-10 | 中国南方电网有限责任公司 | Configurable WORD file structured extraction method |
CN105138563A (en) * | 2015-07-23 | 2015-12-09 | 浪潮电子信息产业股份有限公司 | Method for rapidly extracting key information of test log |
CN106598919A (en) * | 2015-10-14 | 2017-04-26 | 中兴通讯股份有限公司 | Document generation method and device |
CN106021213A (en) * | 2016-05-18 | 2016-10-12 | 广东源恒软件科技有限公司 | Generation system and method of enterprise taxation review point template |
CN107797979A (en) * | 2016-09-02 | 2018-03-13 | 株式会社日立制作所 | Analytical equipment and analysis method |
CN111506588A (en) * | 2020-04-10 | 2020-08-07 | 创景未来(北京)科技有限公司 | Method and device for extracting key information of electronic document |
CN112199467A (en) * | 2020-09-08 | 2021-01-08 | 深圳价值在线信息科技股份有限公司 | Method and device for configuring letter display page |
CN112199467B (en) * | 2020-09-08 | 2023-12-08 | 深圳价值在线信息科技股份有限公司 | Configuration method and device for mail display page |
CN112668316A (en) * | 2020-11-17 | 2021-04-16 | 国家计算机网络与信息安全管理中心 | word document key information extraction method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1808424A (en) | Method of abstracting key information from documents | |
US8135755B2 (en) | Templates in a schema editor | |
CN110705237B (en) | Automatic document generation method, data processing device and storage medium | |
AU2009262833B2 (en) | Communication between a document editor in-space user interface and a document editor out-space user interface | |
AU2009262834B2 (en) | Exposing non-authoring features through document status information in an out-space user interface | |
US7890486B2 (en) | Document creation, linking, and maintenance system | |
JP4141556B2 (en) | Structured document management method, apparatus for implementing the method, and medium storing the processing program | |
US20100146491A1 (en) | System for Preparing Software Documentation in Natural Languages | |
US20050216828A1 (en) | Patent annotator | |
CN1609835A (en) | Comment method, apparatus and system for electronic file | |
CN101055578A (en) | File content dredger based on rule | |
CN107992476B (en) | Corpus generation method and system for sentence-level biological relation network extraction | |
CN108710695A (en) | Mind map generation method based on e-book and electronic equipment | |
CN104063365A (en) | Method for inserting object in PDF document | |
CN116245177B (en) | Geographic environment knowledge graph automatic construction method and system and readable storage medium | |
Dixon et al. | Prefab layers and prefab annotations: extensible pixel-based interpretation of graphical interfaces | |
CN106021201A (en) | File editing method and device | |
CN102509314A (en) | Quick generating method for sunlight greenhouse construction drawing | |
Romanovsky et al. | Refactoring the documentation of software product lines | |
CN116258131A (en) | Template engine-based scheme compiling method and system | |
US20050071750A1 (en) | Method and system for automated metamodel system file generation | |
CN1713140A (en) | Interface generating method and device for computer | |
US20090193053A1 (en) | Information management system | |
CN101057232A (en) | Document processing device and document processing method | |
CN113971044A (en) | Component document generation method, device, equipment and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |