CN101008955A - Method and apparatus for managing information, and computer program product - Google Patents

Method and apparatus for managing information, and computer program product Download PDF

Info

Publication number
CN101008955A
CN101008955A CN 200710004337 CN200710004337A CN101008955A CN 101008955 A CN101008955 A CN 101008955A CN 200710004337 CN200710004337 CN 200710004337 CN 200710004337 A CN200710004337 A CN 200710004337A CN 101008955 A CN101008955 A CN 101008955A
Authority
CN
China
Prior art keywords
information
page
search
unit
zone
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 200710004337
Other languages
Chinese (zh)
Other versions
CN100489857C (en
Inventor
岩崎雅二郎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ricoh Co Ltd
Original Assignee
Ricoh Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ricoh Co Ltd filed Critical Ricoh Co Ltd
Publication of CN101008955A publication Critical patent/CN101008955A/en
Application granted granted Critical
Publication of CN100489857C publication Critical patent/CN100489857C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An area extracting unit extracts area information from a page of document information for each area of different types arranged on the page. A relation extracting unit extracts relation information indicating a relation between the area information and the page of the document information that is an extraction source of the area information, from the page of the document information. A registering unit registers the area information and the relation information in area correspondence information stored in a storage unit in association with each other.

Description

The method and apparatus of management information, and computer program
Technical field
The present invention relates to manage the technology of many file information.
Background technology
The cross reference of related application
The present invention is by with reference to the full content of quoting the 2006-320792 Japanese priority file of submitting in Japan in the 2006-015591 Japanese priority file submitted in Japan on January 24th, 2006 and on November 28th, 2006.
File computerization is improved along with the progress of the communication technology and the development of network environment recently, thereby promotes the no paper environment in the office.
Particularly, the user generates multiple file on computers as e-file.Editor duplicates, and the file that transmission generates is also shared on PC or server.At this moment, when the PC or the server of storage file is connected to other PC by network, browsing with editing also of e-file can be carried out from the PC that is connected.
Under this office environment, because many people generate e-file on a plurality of PC, it is difficult that the public administration of these e-files becomes, and this may cause obscuring between the user.For example, because the user has not known to store the e-file that needs on which PC, the user may can not find the file that needs.Therefore, propose some file management systems and solved this problem.
For example, in Japanese patent application H11-120202 number, the file of memory scanning like this, the file of fax, the e-file that application program generates, world wide web (www) file and so on, that is, that the thumbnail and the text document of each page of raw data and each file is interrelated.Thereby, can concentrated area managing electronic file, and irrelevant with the difference of the form of each e-file.
Recently, because the progress of computer-related technologies, not only can send and comprise the file that is kept at the information in the e-file, the various data such as image and video can also be appended in the file.
Yet, in the invention that Japanese patent application H11-120202 number is described, only that every page text and thumbnail and source document is interrelated.When with other data (such as image) outside the text when appending to e-file, can not with interrelated ground of e-file management data.Therefore, the user can not find this data.
Summary of the invention
The objective of the invention is to solve at least in part the problem in the conventional art.
Consider to read the following detailed description of currently preferred embodiment in conjunction with the drawings, above-mentioned and other purposes of the present invention, feature, advantage and technology and industrial significance will be more obvious.
Description of drawings
Fig. 1 is the block diagram according to the structure of the file management system of first embodiment of the invention;
Fig. 2 is the tableau format according to the file management table of storage in the file metadata storehouse (Meta-database) of the document management server of first embodiment;
Fig. 3 is the tableau format of the page management table stored in the file metadata storehouse according to the document management server of first embodiment;
Fig. 4 is the tableau format of the district management table stored in the file metadata storehouse according to the document management server of first embodiment;
Fig. 5 is the synoptic diagram that explanation is included in the example of the page in the file data of managing according to the document management server of first embodiment;
Fig. 6 is the synoptic diagram that the example screens of wherein searching for the document image that shows on the display of PC is described;
Fig. 7 is the synoptic diagram that is used for the account for screen example, wherein the HTML(Hypertext Markup Language) document that the display of search results generation unit generates on the display of PC;
Fig. 8 illustrates wherein the synoptic diagram of each regional example screens that is shown the Search Results of image file with the thumbnail indicator gauge;
Fig. 9 is the synoptic diagram that the example screens of the detailed description that has wherein shown the zone that is expressed as Search Results is described;
Figure 10 illustrates wherein when pressing search button in screen shown in Figure 8, shows the synoptic diagram of example screens of the Search Results of similar area on the display of PC;
Figure 11 is explanation when the synoptic diagram of the example screens of selection " tree " during as the display format of the Search Results of the similar page;
The synoptic diagram of the example screens that Figure 12 is explanation when pressing the button of the viewing area also of moving right in screen shown in Figure 11;
Figure 13 is the synoptic diagram of the example screens of explanation when showing the Search Results of the similar page as the time series tree construction;
Figure 14 is from the process flow diagram of the process of the registration that receives document image of document image in according to the document management server of first embodiment;
Figure 15 carries out according to the document management server of first embodiment, from from the searching request of the page the document image of the PC process flow diagram to the process of the demonstration of Search Results;
Figure 16 carries out according to the document management server of first embodiment, from from the searching request in the zone the document image of the PC process flow diagram to the process of the demonstration of Search Results;
Figure 17 be in according to the document management server of first embodiment from the zone, be similar to the zone of the page, perhaps the process flow diagram of the process of the demonstration that searches Search Results of page displayed on the display of PC;
Figure 18 is the block diagram according to the structure of the file management system of second embodiment of the invention;
Figure 19 is the tableau format of the district management table stored in the file metadata storehouse according to the document management server of second embodiment;
Figure 20 is the synoptic diagram that is used for the account for screen example, wherein shows the html document that generates according to the Search Results generation unit in the document management server of second embodiment on the display of PC;
Figure 21 is the synoptic diagram that is used for the account for screen example, wherein the html document that demonstration generates according to the Search Results generation unit in the document management server of the modification example of second embodiment on the display of PC;
Figure 22 is the block diagram according to the structure of the file management system of fourth embodiment of the invention;
To be explanation search for the synoptic diagram of the example screens of the similar page according to the 4th embodiment being used to of showing to Figure 23 on the display of PC;
Figure 24 is that explanation is according to the synoptic diagram that be used in the search of the similar page receive the example screens of page selection of the 4th embodiment by the display processing unit demonstration of PC;
To be explanation search for the synoptic diagram of the example screens of similar documents according to the 4th embodiment being used to of showing to Figure 25 on the display of PC;
Figure 26 is up to the process flow diagram that generates the process of html document according to the document management server of the 4th embodiment search similar documents, arranged the thumbnail that representation class is similar to the zone in search source zone for every type of search source region in this html document;
Figure 27 is the synoptic diagram of account for screen example, wherein shows the html document as result's generation of the similar page of searching for according to the Search Results generation unit in the document management server of the 4th embodiment on the display of PC;
Figure 28 is up to the process flow diagram that generates the process of html document according to the document management server search similar documents of the 4th embodiment, has arranged the thumbnail of the page that is similar to the search source page in this html document;
Figure 29 is the synoptic diagram of the notion when illustrating according to the calculating of the similarity information search unit in the document management server of the 4th embodiment similarity;
Figure 30 is the synoptic diagram of account for screen example, wherein shows the html document as result's generation of the similar page of searching for according to the Search Results generation unit in the document management server of the 4th embodiment on the display of PC;
Figure 31 is up to the process flow diagram that generates the process of html document according to the document management server search similar documents of the 4th embodiment, has arranged the thumbnail that is included in the page in the file that is similar to the search source file in this html document;
Figure 32 A is when the search condition of generation/update date is not set, and as another example of revising example 1, (recursively) by recurrence when the search similar area searches for the synoptic diagram of the tree that similar area generates;
Figure 32 B is when with predetermined set during as the search condition of generation/update date, the synoptic diagram of the tree that the search similar area by recurrence generates when the search similar area in revising example 1;
Figure 33 is the synoptic diagram of the tree that the search similar area by recurrence generates when the search similar area in revising example 2;
Figure 34 is a hardware configuration of carrying out the PC of the functional programs that realizes document management server.
Embodiment
Describe exemplary embodiment of the present invention in detail below with reference to accompanying drawing.
Fig. 1 is the block diagram according to the structure of the file management system of first embodiment of the invention.In the file-management services system according to first embodiment, document management server 100 and PC 150 interconnect by network.According to this structure, document management server 100 can be registered the file data that PC 150 sends, and perhaps PC 150 can come the locating file data by search file management server 100.The network that is used for document management server can be any network, no matter be wired or wireless, or Local Area Network, or public telecommunication network.
Here suppose that document image and file that file data according to the file-management services system management of first embodiment comprises that its Chinese words and so on is expressed as image generate the e-file that application program generates.Yet, in the processing that is described below, the situation of main supporting paper image.Document image can be the compound form that can contain a plurality of pages or the single page.
Except the document image that the user generates, these document images comprise the file of the scanning that scanner reads, fax paper that facsimile recorder receives and so on.The document image of document management server 100 management can be any form.Further, can comprise TIFF and so on the form example that multipage face form is preserved.E-file comprises WWW file that generates among the HTML and so on.
PC 150 shown in Figure 1 comprises communications processor element 151, display processing unit 152, and operational processes unit 153.
Communications processor element 151 is in the processing of carrying out between other devices of the document management server 100 that connects via network and PC 150 such as the transmission data.
Display processing unit 152 shows for example file data on the watch-dog (not shown).Display processing unit 152 shows screen and the search result screen that is used for the search file data.Display processing unit 152 internet usage browsers show these screens.These screens can obtain by the communication between communications processor element 151 and the document management server 100.
The operation input from the user is handled in operational processes unit 153.The result is on the scouting screen that shows on the WEB browser search condition to be set.
Document management server 100 comprises storage unit 101, communications processor element 102, search unit 103, similarity information search unit 104, Search Results generation unit 105, extracted region unit 106 concerns extraction unit 107, provincial characteristics extraction unit 108, page feature extraction unit 109, and registering unit 110, thereby can register management and search file data.
Document management server 100 extracts a zone with respect to each page of the file data that will manage, and the storage file image that is mutually related, the page, and the zone of extracting.Document management server 100 search when the request of receiving from PC 150 and so on comprises the zone or the page hereof, and Search Results is sent to PC 150 and so on.
Storage unit 101 comprises file metadata storehouse 121 and data storage cell 122.Storage unit 101 can be by such as hard disk drive (HDD), CD, and storage card, perhaps any normally used storage unit of random access storage device (RAM) constitutes.
File metadata storehouse 121 comprises file management table, page management table, and district management table.
Fig. 2 is the tableau format of file management table.As shown in Figure 2, file management table is preserved file ID interrelatedly, title, generation/update date, page quantity, document format, document path, and document title.According to first embodiment, these information are called the file metamessage of representation attribute and so on.
File ID is a specific ID of distributing to each file data, thereby can the specified file data.Title is the title of file data.Generation/update date has been preserved the date of formation or the final updating date of file data.Page quantity has been preserved the page quantity of file data.Document format has been preserved the form of each file data.The result is, can be at scanning document, and fax paper, the e-file that application program generates, and determine in the WWW file which kind of form the file of management is in.
The position of document path representation file data storage.Document title is represented the document title of file data.
Fig. 3 is the tableau format of page management table.As shown in Figure 3, page ID, file ID, page number, characteristic quantity, text feature amount, and thumbnail path are preserved in page management epiphase mutual correlation ground.According to first embodiment, these information are called page metamessage.
Page ID is to distribute to the specific ID of each page of configuration file data, thereby can pass through this ID page of the file page of specified file management server 100 management uniquely.File ID is specified the file data that comprises this page.Page number is the page number in comprising the file data of this page.Characteristic quantity represents that by the supposition full page be an image and the feature extracted from image.
The feature of text feature amount for extracting in the text message from be included in the page, and for example, preserved the keyword in the text message, frequency etc.When file data is document image, from the text message that from the document image of the page, extracts by use optical character reader (OCR), extract the text feature amount.The position of the thumbnail storage of expression entire image is preserved in the thumbnail path.
Fig. 4 is the tableau format of district management table.As shown in Figure 4, district management epiphase mutual correlation ground storage area ID, file ID, page ID, area coordinate, type, title, text, peripheral text, characteristic quantity, and thumbnail path.According to first embodiment, these information are called zone bit information.
Area I D distributes to each regional specific ID of extracting from file data, thereby can be by the zone that comprises in this ID file page that specified file management server 100 is managed uniquely.File ID and page ID are specified and are comprised the file data and the page that this is regional.Area coordinate is preserved and is specified this regional coordinate, and according to first embodiment, specifies this zone by preserving left upper apex coordinate and bottom right apex coordinate.
Type is preserved the information of the type that is used to specify this area data.Data type comprises, for example, and text, image, and video.According to first embodiment, image further is categorized as chart, form and photo.Yet according to first embodiment, data type is not limited to this, and can classify by other types.Title is preserved this regional title of expression.Text is preserved the text message that comprises in this zone.
When the data types to express image, peripheral text is preserved the text message that is arranged in the image periphery.Thereby the user can be provided with search condition with text from scouting screen and search for associated picture.
Characteristic quantity is preserved and is used to specify this regional characteristic quantity.In characteristic quantity, for example, when type is image, store the characteristic quantity of this image, and when type is text, the characteristic quantity of the storage text.Thereby characteristic quantity is preserved dissimilar characteristic quantities according to type.Thereby,, can determine correctly whether each zone is analogous to each other by comparing the characteristic quantity of same type.The features extraction method will be described in the back.The position of this regional thumbnail storage of expression is preserved in the thumbnail path.
Data storage cell 122 store file data, from each regional data of file data extraction, and the thumbnail of representing each page or zone.Suppose that each regional data is, for example, be included in the view data in each page of file data, video data, perhaps text data.
Communications processor element 102 is in the Data transmission between equipment that network connects and the document management server 100 of passing through such as PC 150.The data that communications processor element 102 receives comprise, for example, and from the file data of PC 150 registrations, and the search condition during the search file data.The data that send comprise, for example, and the file data of management, the data of the screen of scouting screen or expression Search Results.
The 110 pairs of file datas that will register that received by communications processor element 102 of registering unit are registered.The file data that storage receives in the data storage cell 122 of registering unit 110 in storage unit 101.The metamessage that registering unit 110 also will be stored in the file data in the data storage cell 122 is stored in the file management table in the file metadata storehouse 121.Specifically, registering unit 110 is registered the metamessage of extraction interrelatedly with file ID in file management table, the document title of file data, the document format of the postfix notation of document title, and the document path on file data storage purpose ground.File ID is to generate automatically when registration.
Registering unit 110 is the register-file data not only, go back the data in enrollment page admin table and the district management table.The registration in each page and each zone will be described later.
Extract characteristic quantity each page of the file data of the target that page feature extraction unit 109 will be managed from the conduct that is received from PC 150.Understand each page according to the page feature extraction unit 109 of first embodiment, from view data, to extract characteristic quantity as image as view data.When the file data that will extract is not document image but file when generating the e-file that application program generates, page feature extraction unit 109 extracts characteristic quantity after e-file is converted to view data.The result is that page feature extraction unit 109 can extract characteristic quantity from each file data, and irrelevant with the form of file data.Can use any method as from view data, extracting the features extraction method.
Fig. 5 is the synoptic diagram that the example of the page-images in the file data that is included in document management server 100 management is described.Page-images shown in Figure 5 constitutes by two image-regions with corresponding to the body guoup of each image.Image characteristics extraction unit 109 extracts characteristic quantity from the page-images of expression full page 505.
Except the characteristic quantity as image, page feature extraction unit 109 also extracts page number and text feature amount from each page.When file data was document image, page feature extraction unit 109 extracted text message by using in OCR and so on the page-images from be included in document image.Page feature extraction unit 109 extracts the text feature amount from the text message that extracts.
The text feature amount according to first embodiment of supposing is vector (array) data as the text generation of characteristic quantity from be included in the page.That is, page feature extraction unit 109 is included in text data in the page relatively and carries out lexical analysis and extract word.Page feature extraction unit 109 then calculates the weighting of the word that extracts, thereby generates the vector data of the importance degree of expression keyword.
Can use the method for any method as the weighting of the word of realizing extracting, yet, according to first embodiment, carry out weighted calculation by the tf-idf method.The tf-idf method is based on the quantity of word in the page (when quantity is big, then being defined as important), and have in the file based on whole management how many pages occur this word (when quantity hour, then be defined as important) weighting of calculating word.
The weighting formula of equation (1) expression tf-idf method.
wi,j=tfi,j×log(N/dfi) (1)
Wi wherein, j represents the weighting among the page Di of this word in file data, and tfi, j represent the frequency of occurrences of this word in page Di, dfi represents to occur in the whole file data quantity of the page of this word, and N represents to be included in the sum of the page in the file data of management.Thereby page feature extraction unit 109 can extract the text feature amount for each page according to the array of word and the weighting of word.
Page feature extraction unit 109 generates the thumbnail of expression screen.The thumbnail storage that generates is at data storage cell 122.
Registering unit 110 is registered in the metamessage that page feature extraction unit 109 extracts in the page management table.That is, the page number that registering unit 110 is extracted page feature extraction unit 109, characteristic quantity, the text feature amount, and the storage purpose ground of thumbnail (thumbnail path) and page ID and file ID are mutually related and are stored in the page management table.File ID is to generate when registration comprises the file data of this page in file management table.Page ID generates when registration in the page management table automatically.
In extracted region unit 106 each page from the file data that PC 150 sends, represent regional data for each extracted region that is arranged on this page.For example, if there is image-region on the page, extracted region unit 106 extracts image-region as view data.If exist text filedly on the page, extracted region unit 106 extracts text filed as text data.Can use the extracting method of any method as text data, yet, can consider to use for example method of OCR.Also extract by same processing in other zones.When extracting when text filed, extracted region unit 106 can extract text filed for each hurdle (column) that is included in text filed.
In example shown in Figure 5, extracted region unit 106 extracts the image- region 501 and 502 that is included in the page from the page.Extracted region unit 106 also extracts text filed 503 and 504.Text filed 503 and 504 form can be a text, perhaps can be used as view data and extracts to keep the configuration of file.
Can use the extracting method of any method as every type zone of extracted region unit 106 employings.For example, when target is the document image of scanner scanning, the edge of extracted region unit 106 detected image, and the scope of specify text zone or image-region is with for each extracted region zone.At this moment, each regional type is specified in extracted region unit 106.
Concern that extraction unit 107 extracts each regional data that extracted region unit 106 extracts, and comprises the file data of these data, the relation between the page of file data.According to the coordinates regional that extraction unit 107 extracts on each regional page that concerns of first embodiment, expression comprises the page ID of the page of the data that each is regional, and the file ID that comprises this page.Thereby the data in the zone of each extraction can specify this zone to be present in which position of which page of which file.In other words, extracted the page and the regional needed information of tree construction that forms that generates by being included in the file data.
Provincial characteristics extraction unit 108 extracts characteristic quantity from each zone that extracted region unit 106 extracts.Provincial characteristics extraction unit 108 is for the different characteristic quantity of every type extracted region.For example, when the zone that will extract was image-region, provincial characteristics extraction unit 108 extracted the characteristic quantity of view data.When the zone that will extract is file area, extract the text feature amount in the text message of provincial characteristics extraction unit 108 from be included in this zone.When these regional data were video data or voice data, provincial characteristics extraction unit 108 extracted the characteristic quantity that is suitable for various forms.The result is to have registered the characteristic quantity corresponding to each regional type in the district management table.
When file data was document image, provincial characteristics extraction unit 108 was in text filed extraction characteristic quantity, obtained text data in the zone by using OCR.After this, provincial characteristics extraction unit 108 extracts characteristic quantity from the text data that obtains.
If possible, provincial characteristics extraction unit 108 is for the extracted region title and the text of each extraction.When the type in the zone of extracting was image, provincial characteristics extraction unit 108 extracted peripheral text under possible situation.The title that can use any method to adopt as provincial characteristics extraction unit 108, text, and the extracting method of the peripheral text in zone, yet, according to first embodiment, use the method that describes below.
When the zone was image, provincial characteristics extraction unit 108 obtained and is included in the text in the image-region or is included in character string in image periphery text filed as title.
In example shown in Figure 5, " autumn " in the zone of provincial characteristics extraction unit 108 extraction image-regions 502 belows is as the title corresponding to image-region 502.If character string " autumn " is not at lower zone, " season of colored leaf " that 108 extractions of provincial characteristics extraction unit are extracted from image is as title.If character string " season of colored leaf " is not included in the image-region 502, provincial characteristics extraction unit 108 from corresponding to image-region 502 text filed 504 extract suitable character string.Can use any method as definite method of text filed 504 corresponding to image.
When the zone was text, provincial characteristics extraction unit 108 extracted suitable character string as title by considering weighting and so on.
When the zone was view data, provincial characteristics extraction unit 108 extracted character string information by OCR from the zone.The character string information that 108 supposition of provincial characteristics extraction unit are extracted is this regional text.When this zone was file data, the file that is included in this zone became this regional text.
In example shown in Figure 5, provincial characteristics extraction unit 108 extracts " mountain in winter " title as image-region 501.Provincial characteristics extraction unit 108 further extracts " season of colored leaf " text as image-region 502.
When the zone was image, provincial characteristics extraction unit 108 extracted peripheral text.In example shown in Figure 5, the text in 108 extractions " autumn " or text filed 504 of provincial characteristics extraction unit is as the peripheral text of image-region 502.
Provincial characteristics extraction unit 108 generates this regional thumbnail of expression.The thumbnail storage that generates is in data storage cell 122.
After this, registering unit 110 is registered the relation that concerns that extraction unit 107 extracts in the district management table, each regional type of extracted region unit 106 appointments, and the characteristic quantity of provincial characteristics extraction unit 108 extractions.Promptly, registering unit 110 in the district management table with interrelated ground of area I D register-file ID, page ID, concern the area coordinate that extraction unit 107 extracts, the type of extracted region unit 106 appointments, and the title of provincial characteristics extraction unit 108 extractions, text, the periphery text, characteristic quantity, thumbnail.Area I D generates when registration in the district management table automatically.
Because registering unit 110 register these information in the district management table, document management server 100 can be with these information of form management that can search for, and have nothing to do with each the regional data type that is included in the file data.At this moment, because registering unit 110 is gone back the registration feature amount, can also realize using the similarity searching of these characteristic quantities.
Text that registering unit 110 registrations are extracted from view data and so on.Thereby because search unit 103 can be by character string based on the view data region of search or the page, the user can detect the view data of expectation effectively.
Search unit 103 is based on from file management table, page management table and district management table in the searching request search file metadatabase 121 of the file data of PC 150 and so on.Scouting screen on the display that is presented at PC 150 describes search in detail.
Fig. 6 is the synoptic diagram that the example screens of wherein having searched for the document image that shows on the display of PC 150 is described.Scouting screen shows when the user wants by PC 150 search file images.On scouting screen, show the item that is used to be provided with search condition.Ferret out 601 is for selecting " file " for the user, " page ", and any one item as ferret out in " zone ".In Fig. 6, suppose " zone " is provided as ferret out.Display format 604 is for being used for selecting " routine ", any one item in " thumbnail " and " tree ".In Fig. 6, be provided with " routine " form.The operational processes unit 153 of PC 150 based on user's input for every search condition that is provided with.When operational processes unit 153 when the user receives pressing of search button 602, the communications processor element 151 of PC 150 sends to document management server 100 with the search condition of setting.In Fig. 6, shown that input " feature " is as the example of search condition in text 603.
After communications processor element 102 in document management server 100 is finished and is handled from the reception of the search condition of PC 150, the corresponding form in the search condition that search unit 103 search receive.Specifically, when in ferret out shown in Figure 6 601, having selected " file ", search unit 103 search file admin tables.When having selected " page ", search unit 103 searched page admin tables.When having selected " zone ", search unit 103 region of search admin tables.Search unit 103 uses the search condition that receives to come search information as searching key word (key).Thereby search unit 103 can obtain the document image of user expectation, perhaps is included in the page or zone in the document image.The result is, can be in response to from the user's of PC 150 and so on the request information of surveyed area or the page effectively.
Search Results generation unit 105 comprises tree construction generation unit 111 and generates the html document of the Search Results of representing that testing result that search unit 103 obtains and the similarity information search unit of describing later 104 are obtained.Search Results generation unit 105 also generates the html document of the details in representation page or zone.The html document that generates is sent to the PC 150 that has asked search by communications processor element 102.When the communications processor element 151 of PC 150 received html document, display processing unit 152 showed html document.The processing of tree construction generation unit 111 will be described in the back.
Fig. 7 is the synoptic diagram that is used for the account for screen example, wherein shows this html document on the display of PC 150.Search result screen is for when " zone " being set to ferret out on scouting screen shown in Figure 6 also with the example of " feature " Search Results when being set to text.Display format in this example is " routine ".The item that is shown as Search Results can be any, yet, according to first embodiment, suppose viewing area ID, zone name (title), type, and text.When showing search result screen shown in Figure 7, and when user's click on area title, show the screen of these regional details of expression.This screen will be described later.When pressing the button 701, the display processing unit 152 by PC 150 with thumbnail be presented at carry out under the identical conditions for each regional Search Results.That is, can easily change display format.
Fig. 8 is the synoptic diagram of account for screen example, at the button 701 of the example screens of pressing Fig. 7, perhaps select in the display format of Fig. 6 under the situation of " thumbnail " in this example screens, is shown each zone of the Search Results of document image with the thumbnail indicator gauge.In search result screen, show " search " button and " reference " button for each zone.When the user presses " search " button, carry out the search of similar area.When the user presses " reference " button, show the details that this is regional.When pressing the button 803, the user shows screen shown in Figure 7 once more.Thereby in screen shown in Figure 8, owing to shown thumbnail, the user can easily understand each regional content.
When pressing the button 701 in screen shown in Figure 7, the communications processor element 151 of PC 150 will represent that the sign and the thumbnail of the demonstration of search condition send to document management server 100.When receiving these information, the search unit 103 of document management server 100 is searched under the search condition that receives.The difference of this search and above-mentioned search is that the sign based on the demonstration of representing thumbnail obtains the zone field in " thumbnail path " when the admin table of region of search.Search Results generation unit 105 generates html document based on Search Results.In this case, Search Results generation unit 105 is for the universal resource locator (url) of each region description by the thumbnail existence of thumbnail path generation.The html document that generates is sent to PC 150.The result is that PC 150 can show the thumbnail of Search Results wherein represented to(for) each zone.
Fig. 9 be illustrate wherein shown in pressing example screens shown in Figure 8 with reference to button the time shown synoptic diagram by the example screens of the detailed description of lower area.In describing screen in detail, show the metamessage in zone in the district management table that is kept at document management server 100.The result is that the user can understand this zone.
When pressing " reference " button in screen shown in Figure 8, the communications processor element 151 of PC 150 will be indicated to show corresponding to the area I D in the zone of " reference " button of pressing and the information of details and be sent to document management server 100.After document management server 100 received these information, the search unit 103 of document management server 100 used the area I D that receives as keyword search district management table.Search unit 103 then obtains all the required zone fields of displayed record that meet search condition.Search Results generation unit 105 generates the html document of having described details based on the information of obtaining.PC 150 then receives the html document that generates once more, thereby shows the details that this is regional.
On the detailed display screen in this zone shown in Figure 9, not only can show the metamessage that this is regional, can also the display file image or comprise the metamessage of the page in this zone.Since in the district management table, preserved the zone, the correspondence between the page and the document image, so this can realize.
When the user presses executive button 901 on the screen shown in Figure 9, show the screen of the thumbnail that comprises this page, this thumbnail comprises the zone and the metamessage of this page.Owing in the district management table of document management server, preserved the contact between area I D and the page ID, so this can realize.In other words, after obtaining this regional page ID, search unit 103 uses page ID as keyword search page management table, thereby can obtain the required information of this demonstration.
When the user presses " opening original " button on the screen shown in Figure 9, show to comprise the file data that this is regional.Owing to preserved the contact between area I D and the page ID in the district management table of document management server 100, so this can realize.In other words, after obtaining this regional page ID, search unit 103 uses page ID as the keyword search file management table, thereby can obtain the path on the storage purpose ground of this document.
Further, by pressing search button 903, can search for zone similar in appearance to this zone.At this moment, also can show similar area with time series.Details will be described in the back.
Get back to Fig. 1, similarity information search unit 104 search class are similar to the zone in the zone that shows on the display of PC 150.Similarity information search unit 104 is also searched for the similar page.Can use the searching method of any method as the similar area or the page.Yet,, search for the characteristic quantity that is kept in the district management table by the characteristic quantity that use is kept in the page management table according to first embodiment.The detailed process of similar image search will be described later.
Search Results generation unit 105 generates html document based on the Search Results of similarity information search unit 104.By communications processor element 102 html document that generates is sent to PC 150.The result is to show the similar image Search Results on the display of PC 150.
Figure 10 illustrates when pressing search button 801 in screen shown in Figure 8 the synoptic diagram of the example screens of the similar area Search Results that shows on the display of PC.As shown in figure 10, show zone on the top of WEB browser, and show in the bottom of WEB browser and to be defined as similar zone as search source.On top, can change the weighted sum display format of similar image.As display format, can select " thumbnail " or " tree ".In Figure 10, suppose and selected " thumbnail " as display format.
Figure 11 is explanation when the synoptic diagram of the example screens of selection " tree " during as the display format of similar page Search Results.In the example depicted in fig. 11, suppose and searched for the similar page.The document image that exists in uppermost component shown in Figure 11 comprises the page as search source.Comprise that the document image of the page of high similarity that has with the search source page is presented in the rectangle 1102, wherein more down similarity is low more.
The tree construction that is included in the html document is generated by tree construction generation unit 111.Promptly, after similarity information search unit 104 is obtained the Search Results of the similar page, tree construction generation unit 111 uses the metamessage that is included in the similar page that obtains as keyword, search file admin table and district management table are with the metamessage that obtains the document image that comprises the similar page and be included in the zone in the similar page.Similarity information search unit 104 is the document image by obtaining then, and the similar page and zone are interrelated and generate tree construction.Page displayed and thumbnail that should the zone can show by the thumbnail path that is kept in the metamessage in the tree construction.Thereby the user can understand file data easily by tree construction.
Search Results generation unit 105 generates html document based on the tree construction that generates.Thereby, on PC150, show the Search Results of the similar page with tree construction.The Search Results of the similar page has been described with reference to Figure 11; Yet, also can realize the similar area search by identical processing.In addition, when the user presses button 1103 shown in Figure 11, can show the more multizone that is included in each page.
Figure 12 is the synoptic diagram of the example screens of explanation when pressing button 1103 shown in Figure 11.In screen shown in Figure 12, show three zones.Can use any method to show such screen, for example, search for once more by document management server 100.By pressing the button 1201, show example screens shown in Figure 11 once more.
Search Results generation unit 105 can be based on the Search Results of similarity information search unit 104, generate wherein with generate or update time sequence description the html document of view data.For example, can think, show the file data that comprises the zone that is similar to this zone with time series by pressing the search button 903 in the screen shown in Figure 9.
Figure 13 is the synoptic diagram of the example screens of explanation when showing the Search Results of the similar page as the time series tree construction.The scope 1301 at diagram middle part has been represented the search source page and the zone that is included in this page.The page is presented at left end, and the zone that comprises is presented at the right-hand member of page displayed.Display page and zone, wherein each similar page and zone link separately by line segment.Vertical direction among Figure 13 is the time shaft on expression date of formation or final updating date.
Similarity information search unit 104 in the document management server 100 compares the characteristic quantity of the search source page with the characteristic quantity that is stored in each record in the page management table, to calculate the similarity of the page.When the similarity of calculating was higher than predetermined reference, similarity information search unit 104 determined that record is similar to the search source page, and obtains the recording of information that the characteristic quantity that wherein uses when calculating similarity is stored as the similar page.In addition, can search for similar area by using the district management table to carry out similar processing.As predetermined reference, for example, when similarity is got 0 to 1 value, get 0.3 or littler value the time when similarity, can determine that classes of pages is similar to the search source page.Because according to identical step search similar area, the slightly description of decorrelation.
Tree construction generation unit 111 will be defined as similar page group based on Search Results with time series and the zone group is interrelated.Search Results generation unit 105 then generates html document with what time series series arrangement tree construction generation unit 111 generated be mutually related page group and zone group of time series.
Existence is at the situation of the identical file data of each version (that is each update time) management.In this case, owing to the demonstration that can realize according to the document management server of first embodiment with the seasonal effect in time series file data, the user can confirm in tree construction along with version changes the page or the zone of upgrading.The result is that the user can easily upgrade historical with the identified in units in the page or zone.
Figure 14 is the process flow diagram in the process of carrying out according to the document management server 100 of first embodiment.Communications processor element receives the file data (step S1401) that will manage from PC 150 and so on.Registering unit 110 is stored in data storage cell 122 with the file data that receives, and extracts metamessage from file data, to register the metamessage (step S1402) of extraction with the path of having stored this document data in file management table.
Page feature extraction unit 109 extracts metamessage from the file data of registration, as the characteristic quantity of page-images, and text feature amount (step S1403).Registering unit 110 metamessage, characteristic quantity and text feature amount (step S1404) that then enrollment page feature extraction unit 109 is extracted in the page management table.
Extracted region unit 106 is then based on type that is included in the data in the page and so on, extracts these information (step S1405) in each zone from the page of the file data of registration.
Provincial characteristics extraction unit 108 extracts the characteristic quantity (step S1406) in the zone of each extraction.The characteristic quantity that extracts is according to each regional data type and difference.
Concerning that extraction unit 107 then extracts comprises the file data that this is regional and comprises relation (step S1407) between this regional page.The example of the information of extracting comprises file ID, page ID, and the coordinates regional in the page.
Registering unit 110 is interrelated with the relation that concerns extraction unit 107 extractions with the characteristic quantity that provincial characteristics extraction unit 108 extracts, and registers characteristic quantity and the relation (step S1408) of being mutually related in the district management table.
Registering unit 110 has determined whether to finish the processing (step S1409) of all pages.When judgment processing is not also finished (step S1409 is for denying), registering unit 110 is registration target (step S1410) with next page setup, thereby handles (step S1411) by metamessage and features extraction that page feature extraction unit 109 carries out from the page.
When judging the processing of having finished all pages (step 1409 is for being), registering unit 110 is finished processing.
By carrying out above-mentioned processing, document management server 100 can be managed the page and the zone that comprises in file data in another form and the file data.
Figure 15 is the process flow diagram according to the process of the file management system execution of first embodiment.
The display processing unit 152 of PC 150 shows scouting screen (step 1501) on the WEB browser.Operational processes unit 153 is by the search condition (step 1502) of user by the searched page of input equipment input.In the example shown in Figure 6 ferret out 601 is set to " page ", to select the page as search condition.
Communications processor element 151 sends to document management server 100 (step 1503) with the search condition of input page.Condition when communications processor element 151 also sends demonstration with search condition (for example, display format shows quantity or the like).Thereby document management server is carried out search.
The communications processor element 102 of document management server 100 receives the search condition and the display condition (step 1511) of the page from PC 150.Search unit 103 uses the search condition of the page that receives as keyword search page management table (step S1512).
When Search Results generation unit 105 has been finished in search, determine whether to generate tree construction (step 1513) according to the display condition that receives.(step 1513 is for denying) do not carry out the processing of tree construction generation unit 111 especially when Search Results generation unit 105 is determined not generate tree construction.When determining to select tree construction as display condition, the user is set to " tree " with display format 604 in example shown in Figure 6.
When Search Results generation unit 105 is determined to generate tree construction (step 1513 is for being), tree construction generation unit 111 generates tree construction (step 1514) based on Search Results.The tree that tree construction generation unit 111 generates comprises, for each file data that comprises the page that satisfies search condition, the page of specified file data (for example, first page), satisfy the page of search condition, and be included in the zone in the page that satisfies search condition.
The said structure that tree construction generation unit 111 generates can be by specifying from file ID and page ID that Search Results obtains in step 1512.That is, be 1 to come the searched page admin table by file ID and page quantity are set, can obtain first page.In addition, by coming the searched page admin table as search condition, can obtain to be included in the structure in the page with page ID.
Search Results generation unit 105 generates the html document (step S1515) of the Search Results of expression search unit 103.When generating tree construction by tree construction generation unit 111, Search Results generation unit 105 generates the html document that comprises tree construction.
Communications processor element 102 sends to PC 150 (step S1516) with the html document that generates.
The communications processor element 151 of PC 150 receives the html document (step S1504) of having described Search Results from document management server 100.Display processing unit 152 shows the html document (step S1505) that receives on the WEB browser.
Thereby, can be included in the page in the file data according to the conditional search that the user sets.
Figure 16 is the process flow diagram according to the process of the file management system execution of first embodiment.
Process flow diagram for range searching shown in Figure 16 is similar with the process flow diagram for page search shown in Figure 15 basically.As difference, the search condition of the searched page of step 1502 is changed to the search condition of the region of search of step 1602 among Figure 15, and the searched page admin table of step 1512 is changed to the region of search admin table of step 1612 among Figure 15.Owing to can from Search Results, obtain file ID and page ID in step 1612, can obtain the structure of the tree of step 1614 generation by the step identical with Figure 15.Because other point is identical with Figure 15, the slightly description of decorrelation.
Figure 17 is the process flow diagram in the process of carrying out according to the file management system of first embodiment.
The display processing unit 152 of PC 150 shows at least one page or zone (step S1701) on the WEB browser.Can use for example Fig. 8,9, or the screen shown in 10 is as the screen that shows.
The page or zone that operational processes unit 153 input users use input equipment to select, and the request (step S1702) of searching for the similar page or zone as the search source page.In example shown in Figure 8, but, set as the zone of search source and the request of search similar area by press " search " button in favored area.
Communications processor element 151 send as the page ID of search source and area I D and search for the similar page or the request in zone to document management server 100 (step S1703).The result is that document management server 100 begins to search for the similar page or zone.
Communications processor element 102 in the document management server 100 receives request and the page ID and the area I D (step S1703) in the similar page of search or zone from PC 150.
Because received the request of searching for the similar page or zone, page ID that similarity information search unit 104 is obtained and received or the area I D characteristic quantity that is mutually related is set to search condition (step S1712) with the characteristic quantity that obtains.In the situation of area I D, similarity information search unit 104 is with area I D region of search admin table, thereby obtains the characteristic quantity that is mutually related.Also can obtain and the page ID characteristic quantity that is mutually related from the page management table.Although adopted the example that uses area I D here for convenience of description, also can be with the example of similar processing to page ID to be used.
Can use any method to be set at the method for search condition as the characteristic quantity that will obtain.When being set at search condition, characteristic quantity can change weighting for parameter.Example as changing weighting can change weighting in example screens shown in Figure 10.Except known method, can use any method as changing the method that weighting is searched for.
Similarity information search unit 104 is according to the search condition search similar area or the page (step 1713) set.Characteristic quantity in the characteristic quantity of similarity information search unit 104 from search condition and each record calculates similarity, to obtain the similar area or the page based on similarity.
When finishing search, Search Results generation unit 105 determines whether to generate tree construction (step 1714) according to the display condition that receives.(step 1714 is for denying) do not carry out the processing of tree construction generation unit 111 especially when Search Results generation unit 105 is determined not generate tree construction.As the example that generates tree, can mention by " time series demonstration " in the example screens shown in Figure 9 and carry out search.
When Search Results generation unit 105 is determined to generate tree construction (step 1714 is for being), tree construction generation unit 111 generates tree construction (step 1715) based on Search Results.The structure that comprises in the tree that tree construction generation unit 111 generates, can or for the tree of each file data shown in Figure 11, or according to the time series shown in Figure 13 tree that is mutually related.
Search Results generation unit 105 generates the html document (step S1716) of the Search Results of expression similarity information search unit 104.When having generated tree construction by tree construction generation unit 111, Search Results generation unit 105 generates the html document that comprises tree construction.
Communications processor element 102 sends to PC 150 (step S1717) with the html document that generates.
The communications processor element 151 of PC 150 receives the html document (step S1704) of having described Search Results from document management server 100.Display processing unit 152 shows the html document (step S1705) that receives on the WEB browser.
The result is can search for the similar page or zone according to the document management server of first embodiment.
According to first embodiment, information stores is at each file data, the page, and in each form in the database that is mutually related in zone.Yet information saving method is not limited to this form, for example, can be in XML the description document data metamessage and be stored in the XML database.
According to first embodiment, illustrated by the system that comprises PC 150 of user's operation and the document management server 100 that carries out file management and search.According to this structure, file management and search can realize by normally used client server system.
In addition, the function of PC 150 and document management server 100 can realize by independent structures, rather than realize by the structure that comprises multiple arrangement as first embodiment.
In the document management server according to first embodiment, also can to carry out with the zone or the page be the search of unit even manage a large amount of file datas, and can easily obtain the information wanted.
When search is included in image in the file data and so on, can come search class to be similar to the zone or the page of image and so on by using characteristic quantity corresponding to image and so on.In the time will searching for the similar area or the page, can search for by a plurality of different conditions (such as metamessage) of combination except characteristic quantity.
When the output Search Results, owing to can generate the html document of having described the tree that comprises the page and zone, the user can easily understand the relation between the page and the zone.
According to first embodiment, prepared thumbnail as image for each page.Yet,, when display page, show an image that is not limited to such as thumbnail according to first embodiment.Therefore, as the second embodiment of the present invention, illustrate the combine situation of display page of zone.
Figure 18 is the block diagram according to the structure of the file management system of second embodiment.Be according to the document management server 1900 of second embodiment and difference according to the document management server 100 of first embodiment: Search Results generation unit 105 is changed into the Search Results generation unit 1902 with different disposal, and the file metadata storehouse 1911 of having stored different forms is changed in file metadata storehouse 121.Parts or element like the similar reference number referenced classes, and omit relevant explanation.
Page management table in the file metadata storehouse 1911 of storage unit 101 and district management table are with difference according to those tables of first embodiment: the district management table has different section structures, and the page management table has identical section structure except the section of having deleted the thumbnail path.
Figure 19 is the tableau format of district management table.As shown in figure 19, except according to the section in the district management table of first embodiment, font size is preserved on this district management epiphase mutual correlation ground, font name, and row is write Inbound.By the preservation font size, font name, and go and write Inbound, can reproduce text filed structure with source document basic identically.
As with difference according to the Search Results generation unit 105 of first embodiment, Search Results generation unit 1902 will comprise the Search Results and the zone combination that is included in this page of the detailed demonstration of this page or this page, to generate Search Results.Because other point is identical with Search Results generation unit 105, the slightly description of decorrelation.
Figure 20 is the synoptic diagram that is used for the account for screen example, wherein the html document that display of search results generation unit 1902 generates on the display of PC 150.As shown in figure 20, by with image 2101, image 2102, text filed 2103, text filed 2104, and text filed 2105 mutually combination realize the page 2106.Search Results generation unit 105 generates the html document that the area coordinate these zones wherein preserved according to the district management table is arranged.In text filed situation, Search Results generation unit 105 is according to the font size in the district management table, and font name, and the capable Inbound of writing are arranged text in the zone that limits according to area coordinate.The result is that Search Results generation unit 105 can be realized the original page space of a whole page.Although do not illustrate, can carry out demonstration by centering on each zone, thereby improve the visuality in each zone with wide frame and so on.
Thereby, owing to need can not reduce the data volume that is stored in the storage unit 101 for the view data of each page preservation such as thumbnail.
The present invention is not limited to the foregoing description, and can carry out various changes.For example, according to second embodiment, text is arranged in text filed.Yet, also can arrange there from the view data of the text filed extraction of the page.Thereby, as the modified example of second embodiment, an example will be described, wherein in display page, make up and display image, and no matter whether this zone is text filed.Other structure with handle with identical according to those of second embodiment, thereby the description of summary decorrelation.
Extracted region unit 106 extracts each regional view data from each page of document image.When file data is data beyond the document image, carry out the processing of in the third embodiment of the present invention, describing.The view data of extracting is proofreaied and correct in extracted region unit 106.For example, carry out image rectification and increase contrast and colourity.The result is to have generated the view data with the color that approaches digital document.
Search Results generation unit 1902 in the modified example is with difference according to the Search Results generation unit 1902 of second embodiment: when generation is used to show the html document of Search Results of the details that comprises the page or the page, only will be from the incompatible generation html document of the image sets of each extracted region, and no matter whether each zone in the page is text filed.When arranging text image in html document text filed, the Search Results generation unit 1902 in the modified example embeds from the text message of the text filed extraction attribute as text image.
Thereby when PC 150 shows html documents, and the user can show the text message that is embedded in the file area during by sensing equipment indication file area in pop-up window.
Figure 21 is the synoptic diagram that is used for the account for screen example, wherein the html document that display of search results generation unit 1902 generates on the display of PC.As shown in figure 21, by with image 2101, image 2102, text filed 2111, text filed 2112, and text filed 2113 mutually combination realize the page 2114.When representing the text image (for example, text filed 2112) of file by the sensing equipment indication, PC 150 shows the text message that embeds as attributes of images in pop-up window.In ejecting demonstration 2215, show the text message that embeds by using character font data.The result is, improved visuality with comparing with reference to the situation that comprises the image of character string.Thereby the user can easily understand the content of file.
According to second embodiment, when the user indicates when text filed by sensing equipment, PC 150 is by using character code to show to be included in the file in text filed in pop-up window.Yet text display is not limited to this method, and as long as show by the use character font data during image in text filed and be included in text filed text being presented at, just can use any method.For example, when receiving the selection of the image text filed from the user, PC 150 demand file management servers 1900 send the text message that is included in text filed.After document management server 1900 sent to PC 150 with text message, PC 150 can show the text message that receives by using character font data at another window and so on.
According to first and second embodiment, mainly illustrated and wherein used the example of document image as file data.Therefore, according to the 3rd embodiment, the example of wherein handling document image file data in addition is described.Identical according to the structure of the document management server of the 3rd embodiment with structure according to the document management server of first embodiment, thus the slightly explanation of decorrelation.
As the file data of managing according to the document management server of the 3rd embodiment, can use for example e-file of document generator generation.The e-file that uses according to the 3rd embodiment is not limited to the e-file that document generator produces, and can use any data that comprise text message by character code (for example, JIS code and Unicode).
When the file data that sends from PC 150 was e-file, extracted region unit 106 changed this e-file the view data of each page into, to extract the view data in expression zone from each regional view data.Thereby by e-file is converted to view data, follow-up processing can the contact files view data be carried out.
In addition, the direct extraction text message from e-file text filed in extracted region unit 106.By directly extracting text message from e-file, comparing with the situation of extracting text message by OCR and so on from view data to increase accuracy.
Owing to after with each conversion of page in the e-file being view data, handle, can carry out the processing coordinated with document image and management (comprising the paper file of scanning and the data by facsimile reception) according to the document management server of the 3rd embodiment.
According to first embodiment, only illustrated that search source is the situation in zone in the similarity searching.Therefore, in the fourth embodiment of the present invention, illustrate that search source is the situation of the page or file in the similarity searching.
Figure 22 is the block diagram according to the structure of the file management system of fourth embodiment of the invention.Be according to the document management server 2200 of the 4th embodiment and difference according to the document management server 1900 of second embodiment: similarity information search unit 104 is changed into the similarity information search unit 2201 with different disposal, and Search Results generation unit 1902 is changed into the Search Results generation unit 2202 with different disposal.In the following description, similarly reference number is quoted the like according to second embodiment, and ignores relevant explanation.
Similarity information search unit 2201 is based on the file data searching request from PC 150 and so on, the file management table in the search file metadatabase, page management table and district management table.Similarity information search unit 2201 is to the difference of similarity information search unit 104: similarity information search unit 2201 can be searched for the similar page or similar documents.
Figure 23 is the synoptic diagram of example screens that is used to search for the similar page that explanation shows on the display of PC 150.Show this scouting screen when on being desirably in PC 150, searching for the similar page.According to the 4th embodiment, search for the similar page and mean that search class is similar to the page that is chosen as the page of ferret out by the user, perhaps search class is similar to the zone in each zone that is included in the chosen page.
As shown in figure 23, in " unit of demonstration " 2301, receive the selection in the page or zone.When receiving page selection, the similar page of document management server 2200 search.When receiving the zone selection, document management server 2200 search class are similar to the zone that is included in each zone in this page.
In this scouting screen, when receiving area in " unit of demonstration " 2301 was selected, reception was as the selection of the type in the zone of ferret out in the type area 2302 that will show.In scouting screen according to the 4th embodiment, receive text, chart, form, and any one selection in the photo is as area type.The area type search similar area that document management server 2200 is only selected in the type for the zone 2302 that will show.
In addition, in scouting screen shown in Figure 23, when the input that receives from the user for the document title on search source hurdle 2303, the operational processes unit 153 of PC 150 determines to comprise that the file of this page is as ferret out.
Figure 24 is the synoptic diagram that be used in the search of the similar page receive the example screens of page selection of explanation by display processing unit 152 demonstrations of PC 150.Similar page scouting screen shown in Figure 24 has been determined to show after the file in Figure 23.In similar page scouting screen shown in Figure 24, the page that is included in this document is shown as thumbnail 2401.When the user pressed arrow button in the similar page scouting screen, display processing unit 152 changed page displayed in the thumbnails 2401.Page displayed becomes the target of similarity searching in thumbnail 2401.When operational processes unit 153 receives that the user presses search button 2402, communications processor element 151 will represent to search for the information of the similar page, and selected " unit of demonstration ", selected " type in the zone that will show ", and the information of page displayed in the thumbnail 2401, send to document management server 2200.The result is that document management server 2200 carries out similar page search.To detailed similar page search step be described.Although be different from the 4th embodiment, can receive the selection in the zone that will search for from thumbnail 2401 from the user.
In the similar page of search, similarity information search unit 2201 is calculated each zone that is included in the user-selected page and is stored in similarity between each zone in the district management table in the file metadata storehouse 1911.Similarity information search unit 2201 detects the zone that is defined as being similar to the search source page or comprises the page that this is regional then based on the similarity of calculating.Relevant detailed step will be described in the back.
Similarity information search unit 2201 is gone back the file that search class is similar to the file of user's input.Figure 25 is the synoptic diagram of example screens that is used to search for similar documents that explanation shows on the display of PC.Similar documents search is to receive that the file that will search for is selected and search class is similar to the file of selected file from the user.
In scouting screen shown in Figure 25, when the user of input receive from to(for) the document title on search source hurdle 2501, the files that will search for are determined in the operational processes unit 153 of PC 150.Press search button 2502 when operating unit 153 receives the user, communications processor element 151 sends to document management server 2200 with the information of selected file with the request of carrying out the similar documents search.The result is that document management server 2200 carries out the similar documents search.Detailed similar documents search step will be described in the back.
The html document of the Search Results of Search Results generation unit 2202 generation expression search units 103 and the Search Results of similarity information search unit 2201.In addition, Search Results generation unit 2202 is to difference according to the Search Results generation unit 105 of second embodiment: Search Results generation unit 2202 generates the html document of the Search Results of the Search Results of the similar page of expression and similar documents.The example of html document will be described afterwards.
Figure 26 is the process flow diagram according to the process of document management server 2200 execution of the 4th embodiment.
The request of similar page search and the information (step S2601) of the search source page are carried out in communications processor element 102 receptions.According to the 4th embodiment, communications processor element 102 receives " unit of demonstration " and " type in the zone that will show " that users select on screen shown in Figure 24, and and searches for the request page info together of the similar page.In this process flow diagram, illustrated that selected " unit of demonstration " is the zone, and " type in the zone that will show " be " chart ", the example of " form " and " text ".Promptly, in this process flow diagram, for each " chart " in the page that is included in user's selection, " form " and " text " search similar area, and generate wherein for each " chart ", " form " and " text " arranged the html document of the thumbnail in the zone of searching for.
Extracted region unit 106 is for each zone of every type data extract (step S2602) that is included in the search source page.
Provincial characteristics extraction unit 108 is for the extracted region characteristic quantity (step S2603) of each extraction.The characteristic quantity that extracts is according to each regional data type and difference.
Similarity information search unit 2201 is for each " chart " as the zone of extracting from the search source page, and " form " and " text " calculates the similarity (step S2604) between each zone that is stored in the district management table.Can relatively calculate similarity mutually by characteristic quantity with the zone.Similarity is got 0 to 1 value, and gets 0.3 or littler value the time when similarity, can determine that the zone is similar.Similarity becomes 1 between different types.
Search Results generation unit 2202 generates html document, wherein for each " chart " of comprising in the search source page, " form " and " text ", the thumbnail that is stored in the zone that is defined as having high similarity in the zone in the district management table is arranged (step S2605) with the descending of similarity.
Communications processor element 102 sends to PC 150 (step S2606) with the html document that generates.Thereby PC 150 can show similar area for each zone that is included in the search source page.
Figure 27 is the synoptic diagram of account for screen example, wherein the html document of the processing of the step 2605 that display of search results generation unit 2202 carries out on the display of PC 150 generation.As shown in figure 27, in the page 2701, for each " chart ", " form " and " text " arranged the thumbnail of similar area.
Figure 28 is the process flow diagram according to the process of document management server 2200 execution of the 4th embodiment.
Communications processor element 102 at first receives the request of carrying out the search of the similar page and the information (step S2801) of the search source page.In this process flow diagram, suppose that selected " unit of demonstration " is the page.That is, in this process flow diagram, search class is similar to the page of the user-selected page, wherein is defined as the html document of the thumbnail of the similar page with the descending sort of similarity thereby generate.
Extracted region unit 106 is for each zone of every type data extract (step S2802) that is included in the search source page.
Provincial characteristics extraction unit 108 is for the extracted region characteristic quantity (step S2803) of each extraction.The characteristic quantity that extracts is according to each regional data type and difference.
Provincial characteristics extraction unit 108 is proofreaied and correct the view data in each zone of extracting of expression once more.For example, by colour correction the view data in the zone of extracting from the file data of scanning is proofreaied and correct and increase contrast and improve colourity.The result is to have generated the view data with the color that approaches digital document.The result is owing to improved the reproducibility of view data, can calculate correct similarity.
Be provided as the page of ferret out in the page in the page management table of similarity information search unit 2201 from be stored in file metadata storehouse 1911, be included in zone (step S2804) in this page with appointment.Obtain the information (for example, characteristic quantity) that is included in this page in the page management table of similarity information search unit 2201 from file metadata storehouse 1911.
Similarity information search unit 2201 is calculated as the similarity (step S2805) between each zone that comprises in the zone in the page that obtains of ferret out and the search source page.
The synoptic diagram of the notion when Figure 29 is explanation similarity information search unit 2201 calculating similaritys.As shown in figure 29, similarity information search unit 2201 is calculated each zone that comprises in each page that obtains as ferret out and each regional similarity of extracting from the search source page.On determining the page, exist a plurality of when text filed, similarity information search unit 2201 with text filed combination form one text filed, and then calculate similarity with text zone.
Similarity is got 0 to 1 value, and gets 0.3 or littler value the time when similarity, can determine that the zone is similar.Similarity becomes 1 between different types.The zone that similarity information search unit 2201 determines to have the minimum similarity in the similarity of calculating is similar to the search source zone.In example shown in Figure 29, calculating is as the similarity between each zone in the chart α in search source zone and the page that obtains from file metadata storehouse 1911, and assumed calculation with the similarity " 0.6 " of graph A, similarity " 0.25 " with chart B, with the similarity " 1 " of form A, and with the similarity " 1 " of text A.In this example, the zone that similarity information search unit 2201 determines to be similar to chart α is chart B, and similarity is " 0.25 " between the zone.According to this flow process, similarity information search unit 2201 is carried out determining of similar area and the calculating of similarity between the zone in each search source zone relatively.When the zone with search source zone same type did not exist in as the page of ferret out, 2201 supposition of similarity information search unit did not have similar area and similarity to be set to " 1 ".
According to the 4th embodiment, calculate similarity according to said process; Yet, also can calculate similarity by other processes.
Get back to Figure 28, similarity information search unit 2201 is based on the similarity (step S2806) between each the regional similarity calculating page that calculates at step S2805.According to the 4th embodiment, the mean value of the similarity of similarity information search unit 2201 by calculating each zone of calculating calculates the similarity between the page.According to the 4th embodiment, the similarity between the page is not limited to mean value, also can use other the value such as total value.
Similarity information search unit 2201 determines whether there are other pages (step S2807) that do not calculate similarity in the page management table.
Do not calculate other pages (step S2807 is for being) of similarity when determining existence, similarity information search unit 2201 is that similarity is calculated target pages (step S2808) with this page setup, and similarity information search unit 2201 is then specified the processing (step S2804) that is included in the similarity in the next page once more.
When similarity information search unit 2201 has been calculated the similarity that is stored in all pages in the page management table and determined there is not the page (step S2807 is for denying), Search Results generation unit 2202 generates html document, wherein is stored in the descending sort (step S2809) of the thumbnail of the page in the page management table with similarity.
Communications processor element 102 sends to PC 150 (step S2810) with the html document that generates.Thereby PC 150 can show the page that is similar to the search source page.
Figure 30 is the synoptic diagram of account for screen example, wherein the html document of the generation that display of search results generation unit 2202 is handled at step S2202 on the display of PC 150.As shown in figure 30, in the page 3001, be stored in of the descending sort of the thumbnail of the page in the file metadata storehouse 1911 with similarity.
Figure 31 is the process flow diagram according to the process of document management server 2200 execution of the 4th embodiment.
The request of communications processor element 102 receptions carrying out similar documents search and the information (step S3101) of search source file.
Page feature extraction unit 109 extracts the characteristic quantity (step S3102) that is included in each page in the search source file.
A file will searching for is set in the file in the file management table of similarity information search unit 2201 from be stored in file metadata storehouse 1911, specifies the page (step S3103) that comprises hereof.Can come specified page by using file management table and page management table.Similarity information search unit 2201 is obtained the information that comprises the page hereof from the page management table.
Similarity information search unit 2201 calculate each page of being included in the search source file and the file that obtains as ferret out in the page between similarity (step S3104).
Calculate similarity by the page characteristic quantity between each page that comprises in the optional page in the comparison search source file and the file as ferret out.Similarity is got 0 to 1 value, and gets 0.3 or littler value the time when similarity, can determine that the zone is similar.Similarity information search unit 2201 is calculated similarity for each page, and the page of determining to have minimum is the page that is similar to the search source page.Similarity information search unit 2201 is carried out this processing for all search metapage faces.According to the 4th embodiment, calculate similarity by the characteristic quantity that uses the page, yet, can calculate the similarity that similarity is calculated each page for each zone that is included in the page.
Similarity information search unit 2201 is based on the similarity (step S3105) between the similarity calculation document of each page.According to the 4th embodiment, the mean value of the similarity of similarity information search unit 2201 by calculating each page that calculates comes the similarity between the calculation document.According to the 4th embodiment, the similarity between the file is not limited to mean value, also can use total value and so on.
Similarity information search unit 2201 determines whether there is the alternative document (step S3106) that does not calculate similarity in the page management table.
When determining that existence do not calculate the alternative document (step S3106 is for being) of similarity, similarity information search unit 2201 this document are set to similarity calculating file destination (step S3107).Similarity information search unit 2201 is specified the processing (step S3103) that comprises the page hereof once more.
When similarity information search unit 2201 has been calculated the similarity that is stored in all pages in the page management table and determined there is not alternative document (step S3106 is for denying), Search Results generation unit 2202 generates html document, wherein in the file in being stored in file management table, the thumbnail of first page of file is with the descending sort (step S3108) of similarity.
Communications processor element 102 sends to PC 150 (step S3109) with the html document that generates.Thereby PC 150 can show the file that is similar to the search source file.
In document management server, can be similar to the zone that is included in the zone in the page, the similar page, and the search of similar documents and improved convenience according to the 4th embodiment.Even the file data that document management server management is a large amount of, the user also can easily obtain the information wanted.
The present invention is not limited to the foregoing description, can carry out the various changes such as following institute example.
According to the 4th embodiment, when the similar page of search or zone, search for as keyword by the characteristic quantity that uses the search source page or zone.Yet the present invention is not limited to this analog information search, and search can be carried out as keyword by the page or the regional characteristic quantity that uses similarity searching to detect.
In modified example 1, the following describes the page or the regional characteristic quantity that use similarity searching to detect and search for the similar page or zone, generate the example of the html document of arranging with time series.It may be noted that the present invention is not limited to uses the page that similarity searching detects or the characteristic quantity in zone to carry out step search as keyword, but carrying out that can recurrence searched for several times.With omit for explanation according to the identical part of the 4th embodiment.Can be created on the tree construction of expanding around the search source page or the zone by recursively carrying out search.
In modified example 1, when by using characteristic quantity during, generate before generation/update date that search condition detects this page or zone or the zone or the page of renewal thereby be provided with as the similar page of keyword search or zone than the generation/update time in the first search source page or the zone Zao page or zone.When by using,, search condition detects the zone or the page that generates or upgrade after the generation/update date in this page or zone thereby being set than the generation/update time in the first search source page or zone of the page or the characteristic quantity in zone during the latest as the similar page of keyword search or zone.
Figure 32 A is when the search condition of generation/update date is not set, as another example of modified example 1, and the synoptic diagram of the tree that the search similar area by recurrence when the search similar area generates.(A) expression among Figure 32 A is used the characteristic quantity in search source zone to detect as keyword by the similarity information search unit zone, and the tree of search source zone formation.The tree of (B) expression among Figure 32 A when the similarity information search unit uses the characteristic quantity in the zone of detecting to search for.Thereby, when condition not being set, detect a lot of zones for generation/update date.Thereby, in the example of this modification, when recursively searching for the similar area or the page, generate/update date is set to search condition.Search condition as mentioned above.
Figure 32 B is when with predetermined set during as the search condition of generation/update date, the synoptic diagram of the tree that the search similar area by recurrence generates when the search similar area in modified example 1.Identical among (A) among Figure 32 B and Figure 32 (A), thereby the slightly explanation of decorrelation.
(B) among Figure 32 B is illustrated in the result of the recursive search that shows in the time series table.Such demonstration is effective when management document image historical.In other words, when a plurality of users edit a document image, thereby when generating a plurality of document image, the history of the document image that the user edits becomes shown in (B) among Figure 32 B.Thereby the document management server in this modified example can be managed the history of the document image of being edited by many people, and can show the history of the document image that many people edit, thereby the user can easily understand history.The search of this recurrence not only can be applied to the zone and the page.Also can be applied to file.
In modified example 1, illustrated after recursively having searched for the similar area or the page, generate the situation of the html document that the similar area wherein or the page show according to time series.Yet the present invention is not limited to the demonstration of carrying out with the time series order after the search of carrying out recurrence.
In modified example 2, the situation that shows the zone that the similarity of recurrence detects according to similarity has been described.No matter whether known method, can use any method as computing method based on the similarity of characteristic quantity.
Figure 33 is the synoptic diagram that the tree of the search similar area generation of passing through recurrence in revising example 2 when the search similar area is described.The zone generates with tree construction in the descending mode for the similarity in the search source zone in (A) among Figure 33.
The characteristic quantity in the zone of detecting is interrelated as zone and the search source zone in (B) among Figure 33 that keyword detects with using.Recursively also arrange with the order of similarity in the zone of Jian Ceing.The Search Results generation unit generates the html document shown in (B) among Figure 33.
As particular procedure, when the search similar area or the page, obtain for the search source page or regional similarity based on characteristic quantity according to the similarity information search unit of modified example 2.The zone that the similarity information search unit use to detect or the characteristic quantity of the page be as the similar page of keyword search or zone, thereby obtain the similarity that detects and for the similarity of search source.When recursively searching for similar area, regional interrelated with search source and detection.Thereby even recursively search for the similar page or when zone, the Search Results generation unit also generates wherein search source and the zone of detection or the html document that the page is linked.
According to modified example 2, the user can specify the zone or the page of the information of having described expectation from the document management server of managing a large amount of e-files.Because generated the html document of describing the wherein similar page or the regional tree that interlinks, the user can easily understand the relation between objects such as the zone or the page.
Figure 34 is a hardware configuration of carrying out the PC of the functional programs that realizes document management server.Document management server among this embodiment has the hardware configuration that uses common computer, comprise controller such as CPU (central processing unit) (CPU) 2001, storer such as ROM (read-only memory) (ROM) 2002 and RAM 2003, external memory storage 2004 such as hard disk drive (HDD) or CD (CD) driver, display device 2005, and such as the input equipment 2006 of keyboard and mouse, communication interface 2007, and the bus 2008 that is used to connect these equipment.
The documentor that document management server among this embodiment is carried out, executable format record can be installed and be provided at such as compact disc read-only memory (CD-ROM), floppy disk (FD), can write CD (CD-R), perhaps in the computer readable recording medium storing program for performing of digital versatile disc (DVD).
The documentor that document management server among this embodiment is carried out can be stored on the computing machine that is connected to such as the network of internet, and by providing via the network download program.Further, the documentor of the execution of the document management server among this embodiment can provide via the network such as the internet or issue.
Documentor among this embodiment can be integrated among ROM and so in advance and provide.
The documentor that document management server among this embodiment is carried out has the modular structure (communications processor element that comprises above-mentioned each unit, search unit, the similarity information search unit, the Search Results generation unit, extracted region unit 106 concerns extraction unit, the provincial characteristics extraction unit, page feature extraction unit 109, and registering unit).As actual hardware, CPU reads documentor and execute file supervisory routine from storage medium, thereby each unit is loaded on the primary memory.The result is, generated communications processor element on primary memory, search unit, and the similarity information search unit, the Search Results generation unit, the extracted region unit concerns extraction unit, provincial characteristics extraction unit, page feature extraction unit, and registering unit.
As mentioned above, according to apparatus for management of information of the present invention, approaches to IM, and computer program is suitable for the technology as the page in the search file image or zone.
Those skilled in the art will readily appreciate that other advantages and change.Thereby embodiments of the invention are not limited to above-mentioned specific embodiment.Thereby, under the situation that does not depart from appended claim and their equivalent institute restricted portion, can carry out various changes.
Although invention has been described with reference to specific embodiment for complete sum clearly discloses, appended claim is not restricted for this reason, falls into all modification and the alternative constructions basic teaching, that can take place for those skilled in the art that proposes clearly here and be interpreted as comprising.

Claims (20)

1. the device of a management information, it comprises:
Storage unit, its storage area corresponding informance, wherein the relation information of the relation between the area information that comprises in the zone of each page of configuration file information and expression fileinfo, the page and the area information is interrelated;
The extracted region unit, it extracts area information for each dissimilar zone of arranging on the page from the page of fileinfo;
Concern extraction unit, it extracts relation information from the page of fileinfo, and described relation information is represented area information that the extracted region unit extracts and as the relation between the page of the fileinfo of the extraction source of area information;
Registering unit, it is registered in extracted region unit area information that extracts and the relation information that concerns the extraction unit extraction in the regional corresponding informance interrelatedly.
2. device according to claim 1 further comprises:
Feature extraction unit, it extracts the characteristic information of the feature of expression area information from the area information that the extracted region unit extracts, wherein
Storage unit stores characteristic information and area information and relation information as regional corresponding informance interrelatedly, and
Registering unit concerns the relation information that extraction unit extracts with the area information that the extracted region unit extracts, and the characteristic information that feature extraction unit is extracted is registered in the regional corresponding informance interrelatedly.
3. device according to claim 2 further comprises:
Search unit, region of search information in its regional corresponding informance from be stored in storage unit.
4. device according to claim 2 further comprises:
The similarity information search unit, in its regional corresponding informance of in storage unit, storing, the characteristic information of preserving in characteristic information and the regional corresponding informance that will be mutually related with the area information as search source compares, and when satisfying predetermined condition, detect and the characteristic information of the preserving area information that is mutually related.
5. device according to claim 1 further comprises:
The character information extraction unit, it extracts the character information of the character that comprises in the zone of expression based on this area information demonstration from the area information that the extracted region unit extracts, wherein,
Storage unit is stored regional corresponding informance and character information interrelatedly, and
Registering unit is registered character information and regional corresponding informance that the character information extraction unit extracts interrelatedly.
6. device according to claim 5, wherein:
Storage unit stores the positional information in the page of image information as relation information,
Concern that extraction unit extracts the positional information of the image information that comprises in the zone, this zone constitutes the page as the fileinfo of extraction source, and
Apparatus for management of information further comprises the page info generation unit, this page info generation unit generates page info, wherein be stored in image information in the storage unit according to arranging, and this page info generation unit increases character information in the image information zone of the character information that extracts page info with the image information positional information that is mutually related.
7. device according to claim 5, wherein:
Search unit uses the character string of user's input as keyword, the character information that search is registered by registering unit and regional corresponding informance interrelatedly, and detection and the characters matched information image information that is mutually related in search.
8. device according to claim 1, wherein:
Storage unit page corresponding informance represents that wherein the page info of the fileinfo page and fileinfo are interrelated, and this storage unit comprise as with regional corresponding informance in the be mutually related page info of relation information of area information;
Registering unit is registered the page info of the expression fileinfo page interrelatedly and is stored in fileinfo in the page corresponding informance in the storage unit, and in regional corresponding informance interrelated ground registration area information, relation information, and page info, and
Apparatus for management of information further comprises the output processing unit, this output processing unit output area information, and by with storage unit in area information in the regional corresponding informance stored the be mutually related specified fileinfo of relation information and at least one in the page info.
9. device according to claim 8 further comprises:
The tree construction generation unit, generate by area information and by with storage unit in area information in the regional corresponding informance the stored tree construction that the specified fileinfo of relation information and page info constitute that is mutually related, wherein
Fileinfo in the tree construction that output processing unit output tree structures generation unit generates, page info, and area information, and when a plurality of fileinfo of output, seasonal effect in time series order output file information, page info and area information with generation or updating file information.
10. the method for a management information, it comprises:
Extracted region, it extracts area information for each dissimilar zone of arranging on the page from the page of fileinfo;
Relation is extracted, and it extracts relation information from the page of fileinfo, and described relation information is represented area information that the extracted region unit extracts and as the relation between the page of the fileinfo of the extraction source of area information;
The relation information that extracts when area information that extracts during with extracted region and relation are extracted is registered in the regional corresponding informance of storing in the storage unit interrelatedly.
11. method according to claim 10 further comprises:
Feature extraction is extracted the characteristic information of the feature of expression area information, wherein its area information that extracts during from extracted region
The area information that registration is extracted when comprising extracted region, the relation information that extracts when relation is extracted, and the characteristic information that extracts during feature extraction is registered as regional corresponding informance interrelatedly.
12. method according to claim 11 further comprises:
Region of search information in the regional corresponding informance from be stored in storage unit.
13. method according to claim 11 further comprises:
In the regional corresponding informance of storing in storage unit, the characteristic information of preserving in will be mutually related with the area information as search source characteristic information and the regional corresponding informance compares, and
When satisfying predetermined condition, detect and the characteristic information of the preserving area information that is mutually related.
14. method according to claim 10 further comprises:
Character information extracts, and the area information that extracts when comprising from extracted region, extracts the character information of the character that comprises in the zone of expression based on this area information demonstration, wherein,
Registration comprises that the character information and the regional corresponding informance that extract when character information extracted register interrelatedly.
15. method according to claim 14, wherein:
Concern the positional information of extracting in the page that comprises the image information that comprises in the extraction zone as the information that comprises in the relation information, this zone constitutes the page as the fileinfo of extraction source;
Approaches to IM comprises that further page info generates, and comprises the generation page info, wherein is stored in image information in the storage unit according to arranging with the image information positional information that is mutually related, and
In the image information zone of the character information that extracts page info, increase character information.
16. method according to claim 14, wherein search comprises:
The character string of using user's input is searched for the character information of registering with regional corresponding informance as keyword interrelatedly, and
Detect and the image information that is mutually related of characters matched information in search.
17. method according to claim 10, wherein:
Cell stores page corresponding informance represents that wherein the page info of the fileinfo page and fileinfo are interrelated, and this storage unit comprise as with regional corresponding informance in the be mutually related page info of relation information of area information,
Registration comprises
Register the page info of the expression fileinfo page interrelatedly and be stored in fileinfo in the page corresponding informance in the storage unit, and
Interrelated ground registration area information in regional corresponding informance, relation information, and page info, and
Approaches to IM comprises that further output handles, and comprises output area information, and by with storage unit in area information in the regional corresponding informance stored the be mutually related specified fileinfo of relation information and at least one in the page info.
18. method according to claim 17 further comprises:
Generation is by area information, and by with storage unit in area information in the regional corresponding informance the stored tree construction that the specified fileinfo of relation information and page info constitute that is mutually related, wherein
Output is handled and is comprised
Fileinfo in the tree construction that generates when output generates, page info, and area information, and
When a plurality of fileinfo of output, with seasonal effect in time series order output file information, page info and the area information of generation or updating file information.
19. a computer program, it comprises computer usable medium, and this medium has the computer readable program code that is included in the described medium, makes computing machine carry out when carrying out described program code:
Extracted region, it extracts area information for each dissimilar zone of arranging on the page from the page of fileinfo;
Relation is extracted, and it extracts relation information from the page of fileinfo, and described relation information is represented area information that the extracted region unit extracts and as the relation between the page of the fileinfo of the extraction source of area information;
The relation information that extracts when area information that extracts during with extracted region and relation are extracted is registered in the regional corresponding informance of storing in the storage unit interrelatedly.
20. computer program according to claim 19, wherein:
Extracted region comprises for each dissimilar zone of arranging on the page, extract area information from the page of fileinfo;
Computer program code further makes the information extraction of computing machine execution character, and it comprises from the area information that extracted region is extracted, and extracts the character information of the character that comprises in the zone that expression shows based on this area information;
Registration comprises that the character information and the regional corresponding informance that extract when character information extracted register interrelatedly, and,
Computer program code further makes computing machine carry out: when searching image information, the character string of using user's input is as keyword, the character information of registering in the regional corresponding informance that search is stored in the storage unit is to obtain character information with the search image information that is mutually related.
CNB200710004337XA 2006-01-24 2007-01-23 Method and apparatus for managing information Expired - Fee Related CN100489857C (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP2006015591 2006-01-24
JP2006015591 2006-01-24
JP2006320792 2006-11-28

Publications (2)

Publication Number Publication Date
CN101008955A true CN101008955A (en) 2007-08-01
CN100489857C CN100489857C (en) 2009-05-20

Family

ID=38697388

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB200710004337XA Expired - Fee Related CN100489857C (en) 2006-01-24 2007-01-23 Method and apparatus for managing information

Country Status (1)

Country Link
CN (1) CN100489857C (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674103A (en) * 2018-06-15 2020-01-10 华为技术有限公司 Data management method and device
CN113610603A (en) * 2021-08-09 2021-11-05 京东科技控股股份有限公司 Page information processing method and device, electronic equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674103A (en) * 2018-06-15 2020-01-10 华为技术有限公司 Data management method and device
US11706107B2 (en) 2018-06-15 2023-07-18 Huawei Technologies Co., Ltd. Data management method and apparatus
CN113610603A (en) * 2021-08-09 2021-11-05 京东科技控股股份有限公司 Page information processing method and device, electronic equipment and storage medium
CN113610603B (en) * 2021-08-09 2024-04-16 京东科技控股股份有限公司 Page information processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN100489857C (en) 2009-05-20

Similar Documents

Publication Publication Date Title
JP4977452B2 (en) Information management apparatus, information management method, information management program, recording medium, and information management system
CN100515078C (en) Media asset management system for managing video news segments and associated methods
Leydesdorff et al. Mapping the geography of science: Distribution patterns and networks of relations among cities and institutes
US9069855B2 (en) Modifying a hierarchical data structure according to a pseudo-rendering of a structured document by annotating and merging nodes
CN100514278C (en) Media asset management system for managing video segments from fixed-area security cameras and associated methods
CN100476827C (en) Information processing apparatus and information processing method
US8707167B2 (en) High precision data extraction
CN100444173C (en) Method and apparatus for composing document collection and computer manipulation method
CN101297319B (en) Embedding hot spots in electronic documents
US20090052804A1 (en) Method process and apparatus for automated document scanning and management system
Martins et al. Extracting and exploring the geo-temporal semantics of textual resources
JP2001527246A (en) Convert and display publication files
JP2006120125A (en) Document image information management apparatus and document image information management program
JP2000339350A (en) Multi-mode information access
US20130262968A1 (en) Apparatus and method for efficiently reviewing patent documents
GB2401215A (en) Digital Library System
JP2000222394A (en) Document managing device and method and recording medium for recording its control program
Lee et al. An integrated approach to metadata interoperability
CN100489857C (en) Method and apparatus for managing information
US8447748B2 (en) Processing digitally hosted volumes
KR100616152B1 (en) Control method for automatically sending to other web site news automatically classified on internet
US7418653B1 (en) System and method for data publication through web pages
US20050289185A1 (en) Apparatus and methods for accessing information in database trees
KR20110074423A (en) Egf file searching system service and method therefor
Titinen et al. User needs for electronic document management in public administration: a study of two cases

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090520

Termination date: 20180123