CN102819612A - Full text search method based on print documents - Google Patents
Full text search method based on print documents Download PDFInfo
- Publication number
- CN102819612A CN102819612A CN2012103106954A CN201210310695A CN102819612A CN 102819612 A CN102819612 A CN 102819612A CN 2012103106954 A CN2012103106954 A CN 2012103106954A CN 201210310695 A CN201210310695 A CN 201210310695A CN 102819612 A CN102819612 A CN 102819612A
- Authority
- CN
- China
- Prior art keywords
- search
- document printing
- document
- user
- printing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a full text search technique based on print documents. The full text search technique based on print documents includes a printing content character extraction module and a search engine module, wherein the printing content character extraction module is used for extracting character contents in successfully printed documents, and the search engine module is used for analyzing the extracted printing contents to form an index database. As for keywords searched by users, the index database is searched to retrieve a print document list meeting search criteria.
Description
Technical field
The present invention relates to print relevant information management and information security field, promptly relate to a kind of text searching method based on document printing.
Background technology
The print management of government, enterprise, army, military project unit, printing security system; The authentication that air exercise print is closed, watermark interpolation, log tracks, audit, statistical study etc. are managed, but for the key message of document printing database trace, the database tap/dip deep into also do not have ripe scheme.Do not have the support of global search technology, in the document printing storehouse of magnanimity, the inquiry of concerning security matters key message, statistic of classification be just as looking for a needle in a haystack, for management, monitoring, the tracking of type information brought very big difficulty.Simultaneously, depth data excavates, the print record of statistical study enterprise, for the corporate strategy development provides decision analysis to have no way of doing it especially.Search technique is widely used in internet environment, promptly to the searching object data source, sets up information database and index data base through processed, thereby response is made in the various retrievals that the user proposes, and required information of user or associated pointers are provided.User's search channel mainly comprises the retrieval of free word full-text search, keyword retrieval, systematic searching and other specific informations.Search technique is that data source is put in order, gives the user information feedback according to customer requirements.The work of search technique mainly divides 3 points; Set up index data base, in index data base, search for and sort, data-base recording is fed back to the user.Index technology is one of core technology of search, to the information of collecting put in order, classification, index to be to produce index database, the Chinese search core is a participle technique.Participle technique is to utilize certain rule and dictionary, is syncopated as a speech in the sentence, and is ready for searching for.Index generates the concordance list that concerns from keyword to the index Resource Unit.Concordance list generally uses the inverted list of certain form, promptly searches corresponding index Resource Unit by index entry.Physical store so that neighbouring relations or closeness relation between the searcher computation index item, and is carried out with specific data structure in the position that concordance list also wants the recording indexes item in document, to occur.Searcher mainly is to retrieve in the inverted list that index forms according to the keyword of user's typing, accomplishes the degree of correlation evaluation between the page and the retrieval simultaneously, the result that will export is sorted, and realize certain user's relevance feedback mechanism.Result for retrieval through search engine obtains is often hundreds and thousands of, and in order to obtain Useful Information, technology commonly used is to grade to search unit by the importance of search unit or correlativity, carries out relevance ranking.The degree of correlation here is meant the amount that search key occurs in document.When amount is high more, think that then the degree of correlation of the document is high more.Existing search technique is widely used in internet, e-book, sector application system etc., but also is blank to print management, the application of printing security system.In realizing process of the present invention; The inventor finds that existing print system focus rests on authentication, document printing management, the tracking of document printing closed loop, document printing statistical study more, but does not realize to data search, the statistical study of document printing data key message of document printing.
Summary of the invention
The present invention aims to provide a kind of text searching method based on document printing; Can solve in the existing system problem that does not have tap/dip deep into to document printing storehouse key message, cause the security files safe early warning inadequately comprehensively, do not have perfect data statistic analysis, the document of divulging a secret is traced inadequately thoroughly, the security files statistics is not accurate enough.Based on the realization of the text searching method of document printing, for place mat has been carried out in the intellectuality of print system.
A kind of text searching method based on document printing comprises: print What literal extraction module is used for extracting the Word message of document printing, as the data source of full-text search; Search engine module is used for user's typing key message is searched for, analyzed, and Search Results is fed back to the user.
Preferably, said print What literal extraction module comprises: print out task is intercepted and captured the unit, is used to intercept and capture all print out task, prepares for extracting the document printing content; Extract the word content unit, be used to extract the word content of document printing; Preserve the word content unit, be used for the word content that extracts is saved in the file, so that carry out full-text search.
Preferably, said search engine module comprises: search UI unit, be used for user interactions, and receive the search condition of user's typing, and display of search results; The index unit, being used for the document printing is the basis, is unit with every part of document printing, extracts the index entry of this document printing and records in the index data base; The searcher unit is used for the retrieval according to the user, in index database, finds out document printing, carries out degree of correlation coupling, search relevance printing matching document searching result.
Preferably, search UI unit comprises hunting zone setting, keyword setting, keyword search, Search Results ordering, print out task search result list display unit; The user is in hunting zone information such as UI page input types of organization, personal information, document security level, document purposes, time-write intervals; Keyword according to this inquiry; Submit to search engine to search for hunting zone setting, key word information; Search Results according to search engine shows ordering to Search Results, and the print out task list information is shown to the user the most at last; The user can carry out flexible configuration to the detailed row of printing the task list demonstration, with the print job information of explicit user care.
Preferably, said index unit is according to the document printing Word message that extracts, and the document printing information representation is a kind of mode of being convenient to retrieve and is stored in the concordance list that generates document library in the index data base; Look for corresponding document printing by index entry, in the time of with document printing set ordering storage a sorted lists of keywords is arranged, be used to store the mapping relations concordance list of keyword to document printing.
Preferably, said searcher unit is the inquiry according to the user, in index database, finds out relevant document printing, prints the degree of correlation evaluation of document and inquiry, returns to meet the document printing set of setting threshold values.
In such scheme; Literal through to document printing extracts; Word content to document printing is set up index; To the keyword message that the user is concerned about, can search for, add up, analyze relevant document printing, overcome in the existing method for comprehensive retrieval of document printing keyword message, the deficient phenomena of macromethod.Cause enterprises and institutions for the document content in document printing storehouse can't be comprehensively, three-dimensional, depth analysis; The content investigation of document of divulging a secret has been brought very big difficulty; Simultaneously because not to the document library tap/dip deep into, also lack data support and theoretical direction for the decision-making of unit classified information management and control from now on.
Description of drawings
Accompanying drawing described herein is used to provide further understanding of the present invention, constitutes the application's a part, and illustrative examples of the present invention and explanation thereof are used to explain the present invention, do not constitute the improper qualification to this name.In the accompanying drawings:
Fig. 1 shows the synoptic diagram based on the text searching method of document printing;
Fig. 2 shows the process flow diagram of document printing literal extraction module;
Fig. 3 shows the structural drawing of search engine module;
Fig. 4 shows the process flow diagram of the search UI submodule of search engine module;
Fig. 5 shows the process flow diagram of the index submodule of search engine module;
Fig. 6 shows the process flow diagram of the searcher submodule of search engine module.
Embodiment
Below with reference to accompanying drawing and combine embodiment, specify the present invention.
Fig. 1 shows the pie graph based on the text searching method of document printing, comprising:
Print What literal extraction module S1001 is used for extracting the Word message of document printing, as the data source of full-text search;
Search engine module S1002 is used for user's typing key message is searched for, analyzed, and Search Results is fed back to the user.
Fig. 2 shows the process flow diagram of print What literal extraction module.
Print What literal extraction module comprises: intercept and capture print out task cell S 2001, be used to obtain the Client-initiated print job information, comprising: user account number, document title, print out task ID, print out task content; Extract word content cell S 2002, be used for extracting all Word messages in the document according to the print job information of intercepting and capturing; Preserve word content cell S 2003, after the Client-initiated print out task is successfully printed, the Word message that extracts is preserved, the preservation form is a document form.
Fig. 3 shows the pie graph of search engine.
Search engine module S1002 comprises: search UI cell S 3002, and it is mutual to be used for user S3001, receives the search condition of user's typing, and display of search results; Index cell S 3004, being used for the document printing is the basis, is unit with every part of document printing, extracts the index entry of this document printing and records in the index data base; Searcher cell S 3005 is used for based on index data base, according to the keyword of user search, keyword and index data base is carried out degree of correlation coupling, search relevance printing matching document searching result.
As shown in Figure 4, search UI unit is set S4001, keyword by the hunting zone and is set S4002, keyword search S4003, Search Results ordering S4004, print out task search result list display unit S4005 and constitute.The user is in hunting zone information such as UI page input types of organization's (unit, department, group), personal information (account name), document security level (inner, non-close, secret, secret), document purposes (retain, circulate), time-write intervals; Keyword according to this inquiry; Submit to search engine to search for hunting zone setting, key word information; Search Results according to search engine shows ordering to Search Results, and the print out task list information is shown to the user the most at last; The user can carry out flexible configuration to the detailed row of printing the task list demonstration, with the print job information of explicit user care.
As shown in Figure 5, the index unit is according to the document printing S5001 Word message that extracts, and the document printing information representation is a kind of mode of being convenient to retrieve and is stored in the concordance list that generates document library in the index data base.Look for corresponding document printing by index entry, have a sorted lists of keywords to be used to store the mapping relations concordance list of keyword-document printing in the time of with document printing set ordering storage.
As shown in Figure 6, the searcher unit is the inquiry according to the user, in index database, finds out relevant document printing, prints the degree of correlation evaluation of document and inquiry, returns to meet the document printing set of setting threshold values.Search method adopts retrieval, the retrieval based on notion, the content-based retrieval based on keyword.
From above description; Can find out; The above embodiments of the present invention have realized following method effect: the Word message of realizing document printing through the method means extracts, for the data of document printing provide resources bank, through the analysis to the document printing Word message; For each document printing is set up index data base; The user interface that provides the user to print document query, the attribute information and the key word information of the document printing that provides according to the user are carried out degree of correlation inquiry in conjunction with the document printing index data base to printing document database; Return the document printing class table that meets querying condition, can identify with keyword document matching part in every part of document printing.
Obviously, the above embodiment of the present invention only be for clearly the present invention is described and is done for example, and be not to be qualification to embodiment of the present invention.For the commonsense method personnel in affiliated field, on the basis of above-mentioned explanation, can also make other multi-form variation and changes.Here can't give exhaustive to all embodiments.Everyly belong to the row that conspicuous variation that method scheme of the present invention amplified out or change still are in protection scope of the present invention.
Claims (6)
1. the text searching method based on document printing is characterized in that, comprising:
Print What literal extraction module is used for extracting the Word message of document printing, as the data source of full-text search;
Search engine module is used for user's typing key message is searched for, analyzed, and Search Results is fed back to the user.
2. the text searching method based on document printing according to claim 1 is characterized in that, said print What literal extraction module comprises:
Print out task is intercepted and captured the unit, is used to intercept and capture all print out task, prepares for extracting the document printing content;
Extract the word content unit, be used to extract the word content of document printing;
Preserve the word content unit, be used for the word content that extracts is saved in the file, so that carry out full-text search.
3. the text searching method based on document printing according to claim 1 is characterized in that, said search engine module comprises:
Search UI unit is used for user interactions, receives the search condition of user's typing, and display of search results;
The index unit, being used for the document printing is the basis, is unit with every part of document printing, extracts the index entry of this document printing and records in the index data base;
The searcher unit is used for the retrieval according to the user, in index database, finds out document printing, carries out degree of correlation coupling, search relevance printing matching document searching result.
4. the text searching method based on document printing according to claim 3; It is characterized in that said search UI unit comprises hunting zone setting, keyword setting, keyword search, Search Results ordering, print out task search result list display unit; The user is in hunting zone information such as UI page input types of organization, personal information, document security level, document purposes, time-write intervals; Keyword according to this inquiry; Submit to search engine to search for hunting zone setting, key word information; Search Results according to search engine shows ordering to Search Results, and the print out task list information is shown to the user the most at last; The user can carry out flexible configuration to the detailed row of printing the task list demonstration, with the print job information of explicit user care.
5. the text searching method based on document printing according to claim 3; It is characterized in that; Said index unit is according to the document printing Word message that extracts, and the document printing information representation is a kind of mode of being convenient to retrieve and is stored in the concordance list that generates document library in the index data base; Look for corresponding document printing by index entry, in the time of with document printing set ordering storage a sorted lists of keywords is arranged, be used to store the mapping relations concordance list of keyword to document printing.
6. the text searching method based on document printing according to claim 3; It is characterized in that; Said searcher unit is the inquiry according to the user; In index database, find out relevant document printing, print the degree of correlation evaluation of document and inquiry, return and meet the document printing set of setting threshold values.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012103106954A CN102819612A (en) | 2012-08-29 | 2012-08-29 | Full text search method based on print documents |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012103106954A CN102819612A (en) | 2012-08-29 | 2012-08-29 | Full text search method based on print documents |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102819612A true CN102819612A (en) | 2012-12-12 |
Family
ID=47303723
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2012103106954A Pending CN102819612A (en) | 2012-08-29 | 2012-08-29 | Full text search method based on print documents |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102819612A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107037990A (en) * | 2016-02-03 | 2017-08-11 | 株式会社理光 | Image processing apparatus and image processing system |
CN108804553A (en) * | 2018-05-22 | 2018-11-13 | 珠海奔图电子有限公司 | Print document search method and device |
CN110659344A (en) * | 2019-09-10 | 2020-01-07 | 吴生友 | Block method based full text search method |
CN111475536A (en) * | 2019-01-23 | 2020-07-31 | 百度在线网络技术(北京)有限公司 | Data analysis method and device based on search engine |
CN113779032A (en) * | 2021-09-14 | 2021-12-10 | 广州汇通国信科技有限公司 | Search engine index construction method and device based on recurrent neural network |
CN114185500A (en) * | 2021-12-14 | 2022-03-15 | 深圳市润天智数字设备股份有限公司 | Control method and device for printing operation |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020176628A1 (en) * | 2001-05-22 | 2002-11-28 | Starkweather Gary K. | Document imaging and indexing system |
CN1959695A (en) * | 2005-11-04 | 2007-05-09 | 佳能株式会社 | Printing management system and printing management method |
CN101408876A (en) * | 2007-10-09 | 2009-04-15 | 中兴通讯股份有限公司 | Method and system for searching full text of electric document |
CN102509032A (en) * | 2011-09-23 | 2012-06-20 | 国网电力科学研究院 | Implementation method of print security monitoring system based on Windows underlying driver |
-
2012
- 2012-08-29 CN CN2012103106954A patent/CN102819612A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020176628A1 (en) * | 2001-05-22 | 2002-11-28 | Starkweather Gary K. | Document imaging and indexing system |
CN1959695A (en) * | 2005-11-04 | 2007-05-09 | 佳能株式会社 | Printing management system and printing management method |
CN101408876A (en) * | 2007-10-09 | 2009-04-15 | 中兴通讯股份有限公司 | Method and system for searching full text of electric document |
CN102509032A (en) * | 2011-09-23 | 2012-06-20 | 国网电力科学研究院 | Implementation method of print security monitoring system based on Windows underlying driver |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107037990A (en) * | 2016-02-03 | 2017-08-11 | 株式会社理光 | Image processing apparatus and image processing system |
CN107037990B (en) * | 2016-02-03 | 2020-04-07 | 株式会社理光 | Image processing apparatus and image processing system |
CN108804553A (en) * | 2018-05-22 | 2018-11-13 | 珠海奔图电子有限公司 | Print document search method and device |
CN108804553B (en) * | 2018-05-22 | 2020-09-15 | 珠海奔图电子有限公司 | Print document searching method and apparatus |
CN111475536A (en) * | 2019-01-23 | 2020-07-31 | 百度在线网络技术(北京)有限公司 | Data analysis method and device based on search engine |
CN111475536B (en) * | 2019-01-23 | 2023-10-17 | 百度在线网络技术(北京)有限公司 | Data analysis method and device based on search engine |
CN110659344A (en) * | 2019-09-10 | 2020-01-07 | 吴生友 | Block method based full text search method |
CN110659344B (en) * | 2019-09-10 | 2022-12-02 | 吴生友 | Block method based full text search method |
CN113779032A (en) * | 2021-09-14 | 2021-12-10 | 广州汇通国信科技有限公司 | Search engine index construction method and device based on recurrent neural network |
CN113779032B (en) * | 2021-09-14 | 2024-03-12 | 广州汇通国信科技有限公司 | Search engine index construction method and device based on cyclic neural network |
CN114185500A (en) * | 2021-12-14 | 2022-03-15 | 深圳市润天智数字设备股份有限公司 | Control method and device for printing operation |
CN114185500B (en) * | 2021-12-14 | 2024-04-02 | 深圳市润天智数字设备股份有限公司 | Control method and device for printing job |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109992645B (en) | Data management system and method based on text data | |
US9864808B2 (en) | Knowledge-based entity detection and disambiguation | |
US9418144B2 (en) | Similar document detection and electronic discovery | |
Mishne et al. | Leave a reply: An analysis of weblog comments | |
Liu et al. | Tableseer: automatic table metadata extraction and searching in digital libraries | |
US8200642B2 (en) | System and method for managing electronic documents in a litigation context | |
US9619571B2 (en) | Method for searching related entities through entity co-occurrence | |
CN102819612A (en) | Full text search method based on print documents | |
Tao et al. | Multi-Dimensional, Phrase-Based Summarization in Text Cubes. | |
CN103838798A (en) | Page classification system and method | |
CN113297457B (en) | High-precision intelligent information resource pushing system and pushing method | |
Bansal et al. | Searching the Blogosphere. | |
KR20150018880A (en) | Information aggregation, classification and display method and system | |
Simitsis et al. | Multidimensional content exploration | |
CN109783599A (en) | Knowledge mapping search method and system based on multi storage | |
Yang et al. | Video search reranking via online ordinal reranking | |
Zhou et al. | Evaluating large-scale distributed vertical search | |
Pereira et al. | A generic Web‐based entity resolution framework | |
US20190026370A1 (en) | System and Method for Categorizing Web Search Results | |
Kopliku et al. | Attribute retrieval from relational web tables | |
Boden et al. | FactCrawl: A Fact Retrieval Framework for Full-Text Indices. | |
CN109388649B (en) | Land intelligent recommendation method and system | |
Muthmann et al. | Detecting near-duplicate relations in user generated forum content | |
CN111259145A (en) | Text retrieval classification method, system and storage medium based on intelligence data | |
CN104951869A (en) | Workflow-based public opinion monitoring method and workflow-based public opinion monitoring device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20121212 |