CN102819612A - Full text search method based on print documents - Google Patents

Full text search method based on print documents Download PDF

Info

Publication number
CN102819612A
CN102819612A CN2012103106954A CN201210310695A CN102819612A CN 102819612 A CN102819612 A CN 102819612A CN 2012103106954 A CN2012103106954 A CN 2012103106954A CN 201210310695 A CN201210310695 A CN 201210310695A CN 102819612 A CN102819612 A CN 102819612A
Authority
CN
China
Prior art keywords
search
document printing
document
user
printing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012103106954A
Other languages
Chinese (zh)
Inventor
谷宏兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BIS SOFTWARE INFORMATION TECHNOLOGY Co Ltd
Original Assignee
BIS SOFTWARE INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BIS SOFTWARE INFORMATION TECHNOLOGY Co Ltd filed Critical BIS SOFTWARE INFORMATION TECHNOLOGY Co Ltd
Priority to CN2012103106954A priority Critical patent/CN102819612A/en
Publication of CN102819612A publication Critical patent/CN102819612A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a full text search technique based on print documents. The full text search technique based on print documents includes a printing content character extraction module and a search engine module, wherein the printing content character extraction module is used for extracting character contents in successfully printed documents, and the search engine module is used for analyzing the extracted printing contents to form an index database. As for keywords searched by users, the index database is searched to retrieve a print document list meeting search criteria.

Description

A kind of text searching method based on document printing
Technical field
The present invention relates to print relevant information management and information security field, promptly relate to a kind of text searching method based on document printing.
Background technology
The print management of government, enterprise, army, military project unit, printing security system; The authentication that air exercise print is closed, watermark interpolation, log tracks, audit, statistical study etc. are managed, but for the key message of document printing database trace, the database tap/dip deep into also do not have ripe scheme.Do not have the support of global search technology, in the document printing storehouse of magnanimity, the inquiry of concerning security matters key message, statistic of classification be just as looking for a needle in a haystack, for management, monitoring, the tracking of type information brought very big difficulty.Simultaneously, depth data excavates, the print record of statistical study enterprise, for the corporate strategy development provides decision analysis to have no way of doing it especially.Search technique is widely used in internet environment, promptly to the searching object data source, sets up information database and index data base through processed, thereby response is made in the various retrievals that the user proposes, and required information of user or associated pointers are provided.User's search channel mainly comprises the retrieval of free word full-text search, keyword retrieval, systematic searching and other specific informations.Search technique is that data source is put in order, gives the user information feedback according to customer requirements.The work of search technique mainly divides 3 points; Set up index data base, in index data base, search for and sort, data-base recording is fed back to the user.Index technology is one of core technology of search, to the information of collecting put in order, classification, index to be to produce index database, the Chinese search core is a participle technique.Participle technique is to utilize certain rule and dictionary, is syncopated as a speech in the sentence, and is ready for searching for.Index generates the concordance list that concerns from keyword to the index Resource Unit.Concordance list generally uses the inverted list of certain form, promptly searches corresponding index Resource Unit by index entry.Physical store so that neighbouring relations or closeness relation between the searcher computation index item, and is carried out with specific data structure in the position that concordance list also wants the recording indexes item in document, to occur.Searcher mainly is to retrieve in the inverted list that index forms according to the keyword of user's typing, accomplishes the degree of correlation evaluation between the page and the retrieval simultaneously, the result that will export is sorted, and realize certain user's relevance feedback mechanism.Result for retrieval through search engine obtains is often hundreds and thousands of, and in order to obtain Useful Information, technology commonly used is to grade to search unit by the importance of search unit or correlativity, carries out relevance ranking.The degree of correlation here is meant the amount that search key occurs in document.When amount is high more, think that then the degree of correlation of the document is high more.Existing search technique is widely used in internet, e-book, sector application system etc., but also is blank to print management, the application of printing security system.In realizing process of the present invention; The inventor finds that existing print system focus rests on authentication, document printing management, the tracking of document printing closed loop, document printing statistical study more, but does not realize to data search, the statistical study of document printing data key message of document printing.
Summary of the invention
The present invention aims to provide a kind of text searching method based on document printing; Can solve in the existing system problem that does not have tap/dip deep into to document printing storehouse key message, cause the security files safe early warning inadequately comprehensively, do not have perfect data statistic analysis, the document of divulging a secret is traced inadequately thoroughly, the security files statistics is not accurate enough.Based on the realization of the text searching method of document printing, for place mat has been carried out in the intellectuality of print system.
A kind of text searching method based on document printing comprises: print What literal extraction module is used for extracting the Word message of document printing, as the data source of full-text search; Search engine module is used for user's typing key message is searched for, analyzed, and Search Results is fed back to the user.
Preferably, said print What literal extraction module comprises: print out task is intercepted and captured the unit, is used to intercept and capture all print out task, prepares for extracting the document printing content; Extract the word content unit, be used to extract the word content of document printing; Preserve the word content unit, be used for the word content that extracts is saved in the file, so that carry out full-text search.
Preferably, said search engine module comprises: search UI unit, be used for user interactions, and receive the search condition of user's typing, and display of search results; The index unit, being used for the document printing is the basis, is unit with every part of document printing, extracts the index entry of this document printing and records in the index data base; The searcher unit is used for the retrieval according to the user, in index database, finds out document printing, carries out degree of correlation coupling, search relevance printing matching document searching result.
Preferably, search UI unit comprises hunting zone setting, keyword setting, keyword search, Search Results ordering, print out task search result list display unit; The user is in hunting zone information such as UI page input types of organization, personal information, document security level, document purposes, time-write intervals; Keyword according to this inquiry; Submit to search engine to search for hunting zone setting, key word information; Search Results according to search engine shows ordering to Search Results, and the print out task list information is shown to the user the most at last; The user can carry out flexible configuration to the detailed row of printing the task list demonstration, with the print job information of explicit user care.
Preferably, said index unit is according to the document printing Word message that extracts, and the document printing information representation is a kind of mode of being convenient to retrieve and is stored in the concordance list that generates document library in the index data base; Look for corresponding document printing by index entry, in the time of with document printing set ordering storage a sorted lists of keywords is arranged, be used to store the mapping relations concordance list of keyword to document printing.
Preferably, said searcher unit is the inquiry according to the user, in index database, finds out relevant document printing, prints the degree of correlation evaluation of document and inquiry, returns to meet the document printing set of setting threshold values.
In such scheme; Literal through to document printing extracts; Word content to document printing is set up index; To the keyword message that the user is concerned about, can search for, add up, analyze relevant document printing, overcome in the existing method for comprehensive retrieval of document printing keyword message, the deficient phenomena of macromethod.Cause enterprises and institutions for the document content in document printing storehouse can't be comprehensively, three-dimensional, depth analysis; The content investigation of document of divulging a secret has been brought very big difficulty; Simultaneously because not to the document library tap/dip deep into, also lack data support and theoretical direction for the decision-making of unit classified information management and control from now on.
  
Description of drawings
Accompanying drawing described herein is used to provide further understanding of the present invention, constitutes the application's a part, and illustrative examples of the present invention and explanation thereof are used to explain the present invention, do not constitute the improper qualification to this name.In the accompanying drawings:
Fig. 1 shows the synoptic diagram based on the text searching method of document printing;
Fig. 2 shows the process flow diagram of document printing literal extraction module;
Fig. 3 shows the structural drawing of search engine module;
Fig. 4 shows the process flow diagram of the search UI submodule of search engine module;
Fig. 5 shows the process flow diagram of the index submodule of search engine module;
Fig. 6 shows the process flow diagram of the searcher submodule of search engine module.
  
Embodiment
Below with reference to accompanying drawing and combine embodiment, specify the present invention.
Fig. 1 shows the pie graph based on the text searching method of document printing, comprising:
Print What literal extraction module S1001 is used for extracting the Word message of document printing, as the data source of full-text search;
Search engine module S1002 is used for user's typing key message is searched for, analyzed, and Search Results is fed back to the user.
Fig. 2 shows the process flow diagram of print What literal extraction module.
Print What literal extraction module comprises: intercept and capture print out task cell S 2001, be used to obtain the Client-initiated print job information, comprising: user account number, document title, print out task ID, print out task content; Extract word content cell S 2002, be used for extracting all Word messages in the document according to the print job information of intercepting and capturing; Preserve word content cell S 2003, after the Client-initiated print out task is successfully printed, the Word message that extracts is preserved, the preservation form is a document form.
Fig. 3 shows the pie graph of search engine.
Search engine module S1002 comprises: search UI cell S 3002, and it is mutual to be used for user S3001, receives the search condition of user's typing, and display of search results; Index cell S 3004, being used for the document printing is the basis, is unit with every part of document printing, extracts the index entry of this document printing and records in the index data base; Searcher cell S 3005 is used for based on index data base, according to the keyword of user search, keyword and index data base is carried out degree of correlation coupling, search relevance printing matching document searching result.
As shown in Figure 4, search UI unit is set S4001, keyword by the hunting zone and is set S4002, keyword search S4003, Search Results ordering S4004, print out task search result list display unit S4005 and constitute.The user is in hunting zone information such as UI page input types of organization's (unit, department, group), personal information (account name), document security level (inner, non-close, secret, secret), document purposes (retain, circulate), time-write intervals; Keyword according to this inquiry; Submit to search engine to search for hunting zone setting, key word information; Search Results according to search engine shows ordering to Search Results, and the print out task list information is shown to the user the most at last; The user can carry out flexible configuration to the detailed row of printing the task list demonstration, with the print job information of explicit user care.
As shown in Figure 5, the index unit is according to the document printing S5001 Word message that extracts, and the document printing information representation is a kind of mode of being convenient to retrieve and is stored in the concordance list that generates document library in the index data base.Look for corresponding document printing by index entry, have a sorted lists of keywords to be used to store the mapping relations concordance list of keyword-document printing in the time of with document printing set ordering storage.
As shown in Figure 6, the searcher unit is the inquiry according to the user, in index database, finds out relevant document printing, prints the degree of correlation evaluation of document and inquiry, returns to meet the document printing set of setting threshold values.Search method adopts retrieval, the retrieval based on notion, the content-based retrieval based on keyword.
From above description; Can find out; The above embodiments of the present invention have realized following method effect: the Word message of realizing document printing through the method means extracts, for the data of document printing provide resources bank, through the analysis to the document printing Word message; For each document printing is set up index data base; The user interface that provides the user to print document query, the attribute information and the key word information of the document printing that provides according to the user are carried out degree of correlation inquiry in conjunction with the document printing index data base to printing document database; Return the document printing class table that meets querying condition, can identify with keyword document matching part in every part of document printing.
Obviously, the above embodiment of the present invention only be for clearly the present invention is described and is done for example, and be not to be qualification to embodiment of the present invention.For the commonsense method personnel in affiliated field, on the basis of above-mentioned explanation, can also make other multi-form variation and changes.Here can't give exhaustive to all embodiments.Everyly belong to the row that conspicuous variation that method scheme of the present invention amplified out or change still are in protection scope of the present invention.

Claims (6)

1. the text searching method based on document printing is characterized in that, comprising:
Print What literal extraction module is used for extracting the Word message of document printing, as the data source of full-text search;
Search engine module is used for user's typing key message is searched for, analyzed, and Search Results is fed back to the user.
2. the text searching method based on document printing according to claim 1 is characterized in that, said print What literal extraction module comprises:
Print out task is intercepted and captured the unit, is used to intercept and capture all print out task, prepares for extracting the document printing content;
Extract the word content unit, be used to extract the word content of document printing;
Preserve the word content unit, be used for the word content that extracts is saved in the file, so that carry out full-text search.
3. the text searching method based on document printing according to claim 1 is characterized in that, said search engine module comprises:
Search UI unit is used for user interactions, receives the search condition of user's typing, and display of search results;
The index unit, being used for the document printing is the basis, is unit with every part of document printing, extracts the index entry of this document printing and records in the index data base;
The searcher unit is used for the retrieval according to the user, in index database, finds out document printing, carries out degree of correlation coupling, search relevance printing matching document searching result.
4. the text searching method based on document printing according to claim 3; It is characterized in that said search UI unit comprises hunting zone setting, keyword setting, keyword search, Search Results ordering, print out task search result list display unit; The user is in hunting zone information such as UI page input types of organization, personal information, document security level, document purposes, time-write intervals; Keyword according to this inquiry; Submit to search engine to search for hunting zone setting, key word information; Search Results according to search engine shows ordering to Search Results, and the print out task list information is shown to the user the most at last; The user can carry out flexible configuration to the detailed row of printing the task list demonstration, with the print job information of explicit user care.
5. the text searching method based on document printing according to claim 3; It is characterized in that; Said index unit is according to the document printing Word message that extracts, and the document printing information representation is a kind of mode of being convenient to retrieve and is stored in the concordance list that generates document library in the index data base; Look for corresponding document printing by index entry, in the time of with document printing set ordering storage a sorted lists of keywords is arranged, be used to store the mapping relations concordance list of keyword to document printing.
6. the text searching method based on document printing according to claim 3; It is characterized in that; Said searcher unit is the inquiry according to the user; In index database, find out relevant document printing, print the degree of correlation evaluation of document and inquiry, return and meet the document printing set of setting threshold values.
CN2012103106954A 2012-08-29 2012-08-29 Full text search method based on print documents Pending CN102819612A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012103106954A CN102819612A (en) 2012-08-29 2012-08-29 Full text search method based on print documents

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012103106954A CN102819612A (en) 2012-08-29 2012-08-29 Full text search method based on print documents

Publications (1)

Publication Number Publication Date
CN102819612A true CN102819612A (en) 2012-12-12

Family

ID=47303723

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012103106954A Pending CN102819612A (en) 2012-08-29 2012-08-29 Full text search method based on print documents

Country Status (1)

Country Link
CN (1) CN102819612A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107037990A (en) * 2016-02-03 2017-08-11 株式会社理光 Image processing apparatus and image processing system
CN108804553A (en) * 2018-05-22 2018-11-13 珠海奔图电子有限公司 Print document search method and device
CN110659344A (en) * 2019-09-10 2020-01-07 吴生友 Block method based full text search method
CN111475536A (en) * 2019-01-23 2020-07-31 百度在线网络技术(北京)有限公司 Data analysis method and device based on search engine
CN113779032A (en) * 2021-09-14 2021-12-10 广州汇通国信科技有限公司 Search engine index construction method and device based on recurrent neural network
CN114185500A (en) * 2021-12-14 2022-03-15 深圳市润天智数字设备股份有限公司 Control method and device for printing operation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020176628A1 (en) * 2001-05-22 2002-11-28 Starkweather Gary K. Document imaging and indexing system
CN1959695A (en) * 2005-11-04 2007-05-09 佳能株式会社 Printing management system and printing management method
CN101408876A (en) * 2007-10-09 2009-04-15 中兴通讯股份有限公司 Method and system for searching full text of electric document
CN102509032A (en) * 2011-09-23 2012-06-20 国网电力科学研究院 Implementation method of print security monitoring system based on Windows underlying driver

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020176628A1 (en) * 2001-05-22 2002-11-28 Starkweather Gary K. Document imaging and indexing system
CN1959695A (en) * 2005-11-04 2007-05-09 佳能株式会社 Printing management system and printing management method
CN101408876A (en) * 2007-10-09 2009-04-15 中兴通讯股份有限公司 Method and system for searching full text of electric document
CN102509032A (en) * 2011-09-23 2012-06-20 国网电力科学研究院 Implementation method of print security monitoring system based on Windows underlying driver

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107037990A (en) * 2016-02-03 2017-08-11 株式会社理光 Image processing apparatus and image processing system
CN107037990B (en) * 2016-02-03 2020-04-07 株式会社理光 Image processing apparatus and image processing system
CN108804553A (en) * 2018-05-22 2018-11-13 珠海奔图电子有限公司 Print document search method and device
CN108804553B (en) * 2018-05-22 2020-09-15 珠海奔图电子有限公司 Print document searching method and apparatus
CN111475536A (en) * 2019-01-23 2020-07-31 百度在线网络技术(北京)有限公司 Data analysis method and device based on search engine
CN111475536B (en) * 2019-01-23 2023-10-17 百度在线网络技术(北京)有限公司 Data analysis method and device based on search engine
CN110659344A (en) * 2019-09-10 2020-01-07 吴生友 Block method based full text search method
CN110659344B (en) * 2019-09-10 2022-12-02 吴生友 Block method based full text search method
CN113779032A (en) * 2021-09-14 2021-12-10 广州汇通国信科技有限公司 Search engine index construction method and device based on recurrent neural network
CN113779032B (en) * 2021-09-14 2024-03-12 广州汇通国信科技有限公司 Search engine index construction method and device based on cyclic neural network
CN114185500A (en) * 2021-12-14 2022-03-15 深圳市润天智数字设备股份有限公司 Control method and device for printing operation
CN114185500B (en) * 2021-12-14 2024-04-02 深圳市润天智数字设备股份有限公司 Control method and device for printing job

Similar Documents

Publication Publication Date Title
CN109992645B (en) Data management system and method based on text data
US9864808B2 (en) Knowledge-based entity detection and disambiguation
US9418144B2 (en) Similar document detection and electronic discovery
Mishne et al. Leave a reply: An analysis of weblog comments
Liu et al. Tableseer: automatic table metadata extraction and searching in digital libraries
US8200642B2 (en) System and method for managing electronic documents in a litigation context
US9619571B2 (en) Method for searching related entities through entity co-occurrence
CN102819612A (en) Full text search method based on print documents
Tao et al. Multi-Dimensional, Phrase-Based Summarization in Text Cubes.
CN103838798A (en) Page classification system and method
CN113297457B (en) High-precision intelligent information resource pushing system and pushing method
Bansal et al. Searching the Blogosphere.
KR20150018880A (en) Information aggregation, classification and display method and system
Simitsis et al. Multidimensional content exploration
CN109783599A (en) Knowledge mapping search method and system based on multi storage
Yang et al. Video search reranking via online ordinal reranking
Zhou et al. Evaluating large-scale distributed vertical search
Pereira et al. A generic Web‐based entity resolution framework
US20190026370A1 (en) System and Method for Categorizing Web Search Results
Kopliku et al. Attribute retrieval from relational web tables
Boden et al. FactCrawl: A Fact Retrieval Framework for Full-Text Indices.
CN109388649B (en) Land intelligent recommendation method and system
Muthmann et al. Detecting near-duplicate relations in user generated forum content
CN111259145A (en) Text retrieval classification method, system and storage medium based on intelligence data
CN104951869A (en) Workflow-based public opinion monitoring method and workflow-based public opinion monitoring device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20121212