CN101354717B - Document extracting method and document extracting apparatus - Google Patents

Document extracting method and document extracting apparatus Download PDF

Info

Publication number
CN101354717B
CN101354717B CN2008101316932A CN200810131693A CN101354717B CN 101354717 B CN101354717 B CN 101354717B CN 2008101316932 A CN2008101316932 A CN 2008101316932A CN 200810131693 A CN200810131693 A CN 200810131693A CN 101354717 B CN101354717 B CN 101354717B
Authority
CN
China
Prior art keywords
master copy
original copy
copy data
data
master
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2008101316932A
Other languages
Chinese (zh)
Other versions
CN101354717A (en
Inventor
广畑仁志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sharp Corp
Original Assignee
Sharp Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sharp Corp filed Critical Sharp Corp
Publication of CN101354717A publication Critical patent/CN101354717A/en
Application granted granted Critical
Publication of CN101354717B publication Critical patent/CN101354717B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

Document data corresponding to each page included in a document is stored, and furthermore, feature data indicative of a feature of the document data and a document index indicating the document are associated with the document data. A document extracting apparatus obtains input document data, calculates feature data from the input document data, judges similarity between the input document data and the document data based on the feature data, obtains a document index associated with document data similar to the input document data, and extracts a plurality of pieces of document data associated with the document index. Thus, document data concerning the document including a page corresponding to the document data similar to the input document data is extracted for a plurality of pages.

Description

Original copy extracting method and original copy extraction element
Technical field
The present invention relates to from the database of original copy, retrieve the technology of specific original copy, in more detail, relate to based on the master copy datas such as image that read by scanner behind the original copy original copy extracting method and the original copy extraction element of the retrieval master copy data corresponding from database with the original copy that has read.
Background technology
In the past, utilization will use scanner to read original copy that text or photo etc. constitute and data, or the master copy data that uses personal computer electronic types such as (PC) to generate is stored in the database, newly read original copy, and extracts the technology of the master copy data corresponding with the original copy that has read from database.As the method for extracting master copy data, for example propose to use OCR (Optical Character Reader, optical character identification) from the original copy that reads, to extract key word (keyword) and judge the method etc. of the similar degree of original copy based on the feature that key word is judged the method for the similar degree of original copy, original copy is defined in the bill original copy of ruling and is extracted ruling.
Open in the flat 7-282088 communique (Japan) spy, disclose and to have given the descriptor (descriptor) of feature and be associated original copy (text) by the tabulation that descriptor is given the original copy of feature, generate descriptor from the original copy (input text) that has read, and use the descriptor that has generated to carry out the technology of checking of original copy.It is constant that the descriptor of original copy is defined as distortion of producing with respect to following reading of original copy etc.An original copy generated a plurality of descriptors,, select to obtain the original copy of high number of votes obtained or the original copy that number of votes obtained surpasses the threshold value of regulation voting with the original copy that the descriptor that generates from the original copy that has read is associated respectively.
The view data of storing original copy is in advance disclosed in the Te Kaiping 5-37748 communique, between the data bitmap of original copy of the data bitmap (bitmap data) of the original copy that reads and storage in advance, with 1 bit is that unit carries out the figure coupling, thus the technology of carrying out the retrieval of original copy.In addition, in the Te Kaiping 5-37748 communique, put down in writing under the situation of the original copy that constitutes by multipage, also can be the retrieval page or leaf that only reads front cover, by the view data of the page or leaf that reads and first page view data of each original copy of storage are compared, thus the retrieval original copy.
Open in the 2006-31181 communique the spy, disclose and stored text image in advance, the characteristic quantity of the original image that reads and whole pages characteristic quantity of stored text image have been compared, thereby ask similar degree, extract the similar degree text image higher than threshold value, thus the technology of retrieval text image.In this technology, become at a plurality of text images under candidate's the situation, thereby the videotex image is accepted the user and is selected, and under the situation that on average is lower than threshold value of the similar degree of the page or leaf that comprises in text image, reduces the scope thereby text image deleted from the candidate.
Usually, original copy such as text is more by the situation that multipage constitutes.Thereby open the required master copy data of extraction from database of checking that technology in the past headed by the disclosed technology of flat 7-282088 communique can be carried out the original copy that read with scanner with the spy, thereby but need check every page for the original copy that constitutes by multipage and to extract master copy data.Thereby, producing in the original copy of checking the basis because of lose or dirt etc. under the situation of disappearance, existence can not be extracted the problem of the relevant master copy data of the original copy that is made of multipage in whole pages or leaves.Open in the flat 7-282088 communique solution about this problem without any open the spy.
And in the technology that the special data bitmap of opening putting down in writing in the flat 5-37748 communique to the original copy that is made of multipage compares, because every page is compared, so exist the number of pages and the original copy number that comprise in the original copy many more, then comparison process is got over the problem of spended time.In addition, under the situation of the comparison of carrying out data bitmap, the location of two view data that need compare accurately.But, in fact being difficult to position exactly, there is the problem that is difficult to retrieve accurately original copy in its result.
In addition, open in the technology that the 2006-31181 communique put down in writing, as the characteristic quantity in the character zone of text image, owing to use OCR to extract character code, so, produce the problem that the precision of similar judgement reduces according to the character code that extracts the spy.Reduce in order to remedy this precision, consider to extract a plurality of character codes, but in this case, exist the memory span of store character sign indicating number to increase, and owing to use a lot of data to retrieve, so handle time consuming problem.In addition, open in flat 5-37748 communique and the Te Kai 2006-31181 communique the spy, because the situation that the original copy that does not have consideration to comprise secret information is retrieved, so existence comprises the worry that the original copy of secret information is easily exported.
Summary of the invention
The present invention finishes in view of such situation, its purpose is to provide a kind of also can extract the data of the other parts of original copy by feasible based on the part of original copy, thereby can easily extract the original copy extracting method and the original copy extraction element of the master copy data relevant with the original copy that is made of multipage from database.
It is a kind of when extracting master copy data that other purpose of the present invention is to provide, and can avoid extracting mistakenly the original copy extracting method and the original copy extraction element of the situation of the master copy data different with purpose.
Thereby other purpose of the present invention in addition is can protect by the condition that is given for the output original copy original copy extracting method and the original copy extraction element of secret information.
Original copy extraction element of the present invention comprises the original copy memory unit that is used to store master copy data, from the master copy data that the original copy storage unit is stored, extract specific master copy data, it is characterized in that, comprising: the original copy index and the parts of storing explicitly corresponding to the master copy data that is included in each page in the original copy that will represent the original copy that constitutes by multipage; The characteristic memory unit, the characteristic and the master copy data of feature that will calculate based on the unique point of extracting from master copy data, the described master copy data of expression are stored explicitly; Obtain parts, obtain input master copy data as new master copy data; From obtain parts obtained the input master copy data the parts of extract minutiae; Generate parts, generate the characteristic of the feature of expression input master copy data based on the unique point that these parts extracted; Judging part, compare by generating the characteristic that characteristic that parts have generated and characteristic storage unit stored, thereby judge master copy data that is associated with characteristic that the characteristic storage unit stored and the similar degree of importing master copy data; Obtaining with being determined that parts are judged to be is the parts of the original copy index that is associated with the master copy data of the high master copy data of similar degree of input master copy data; And the extraction parts, extract with these parts obtained the represented original copy of original copy index in the corresponding a plurality of master copy datas of multipage that comprise.
In the present invention, the corresponding master copy data of each page that comprises in storage in advance and the original copy, and then the original copy index and the master copy data of the characteristic of feature that will calculate based on the unique point of extracting from master copy data, the expression master copy data and expression original copy are stored explicitly.The original copy extraction element is under the situation that has obtained the input master copy data, from input master copy data generating feature data, based on the similar degree of characteristic judgement with master copy data, obtain the original copy index that is associated with the high master copy data of the similar degree of importing master copy data, extract a plurality of master copy datas that are associated with the original copy index of having obtained.Thus, the original copy that comprises the page or leaf corresponding with being judged as the master copy data that is similar to the input master copy data is determined, and with the original copy that has been determined in whole pages of corresponding master copy datas comprising be extracted.
Among the present invention, the input master copy data based on corresponding to the part of the original copy that is made of multipage can extract and whole pages of original copy corresponding master copy datas.Thereby, even in the original copy that constitutes by multipage because of lose or pollution etc. produced under the situation of disappearance, also can easily extract the master copy data in whole pages or leaves from the database of having stored master copy data in advance.
In the original copy extraction element of the present invention, the characteristic memory unit constitutes and a master copy data a plurality of characteristics of the feature of this master copy data of storage representation explicitly, generate a plurality of characteristics that parts are constituted as the feature that generates expression input master copy data, judging part has: for generating each of a plurality of characteristics that parts have generated, and the parts that the master copy data that the characteristic consistent with this characteristic is associated is voted; And in the data of the original copy that the original copy memory unit is stored, the master copy data of number of votes obtained maximum or number of votes obtained be the parts that master copy data more than the ormal weight was judged as and imported the high master copy data of the similar degree of master copy data.
Among the present invention, the original copy extraction element is in order to judge the similar degree of master copy data, a master copy data is stored a plurality of characteristics in advance, each characteristic to the generation of input master copy data, the master copy data that same characteristic is associated is voted, and will obtain the master copy data conduct of the number of votes obtained more than maximum number of votes obtained or the ormal weight and the high master copy data of similar degree of input master copy data.Owing to the master copy data of a plurality of characteristic unanimities in a plurality of characteristics is judged to be the similar degree height, judges so can carry out more accurate similar degree.When the similar degree that carries out master copy data is judged, judge owing to can carry out more accurate similar degree based on a plurality of characteristics, be judged to be the high master copy data of similar degree mistakenly so can suppress not to be similar to the master copy data of importing master copy data.
In the original copy extraction element of the present invention, obtain parts and have the parts of obtaining a plurality of input master copy datas, judging part has each for a plurality of input master copy datas, judge the parts of original copy memory unit master copy data of being stored and the similar degree of importing master copy data, extract parts and have under the consistent mutually situation of the original copy index that is associated with the high master copy data of each similar degree of a plurality of input master copy datas, the parts of a plurality of master copy datas that the multipage that comprises in extraction and the represented original copy of this original copy index is corresponding.
In the present invention, the original copy extraction element is obtained a plurality of input master copy datas, under the consistent situation of the original copy index that the high master copy data of similar degree with each input master copy data is associated, extracts a plurality of master copy datas that are associated with consistent original copy index.Thus, can extract an original copy based on multipage.Thereby, can further reduce the possibility of the master copy data different of extraction mistakenly with purpose.For example, even exist under the situation of the original copy that is analogous to each other, also can extract the master copy data of purpose.
In the original copy extraction element of the present invention, also have under the situation that has obtained the original copy index that the high master copy data of similar degree a plurality of and the input master copy data is associated, perhaps with original copy index that the high master copy data of each similar degree of a plurality of input master copy datas is associated in, the parts of the more input of request master copy data have been obtained under a plurality of situations to the common original copy index of a plurality of input master copy datas.
In the present invention, the original copy extraction element is further asked the input master copy data corresponding with other page of original copy under the situation of the original copy index that exists the high master copy data of similar degree a plurality of and the input master copy data to be associated.Thus, further obtain the input master copy data corresponding with other page of original copy, and the scope of also utilizing other page of original copy to dwindle the original copy index.Judge by utilizing multipage can carry out more accurate similar degree, and can extract required master copy data accurately.
In the original copy extraction element of the present invention, be constituted as and read original copy by optical profile type and obtain the input master copy data thereby obtain parts.
In the present invention, the original copy extraction element has the scanner that optical profile type reads original copy, thereby thereby carries out the extraction of master copy data by a part that reads original copy with scanner as the parts of obtaining of obtaining the input master copy data.By reading the part of original copy with scanner, thereby for example can extract the master copy data that is stored in the server unit that connects via communication network, and can obtain the data of original copy integral body like a cork from the part of the original copy of formations such as photo or text.
Original copy extraction element of the present invention also comprises: store the parts of the output condition that is used for exporting the required regulation of the pairing master copy data of each page that the represented original copy of this original copy index comprises explicitly with the original copy index; Judge whether the output condition that is associated with the original copy index is satisfied the parts that described original copy index is associated with the master copy data that the master copy data extraction unit is extracted; Be judged to be under the situation that output condition has been satisfied the parts of a plurality of master copy datas that the multipage that comprises in output and the original copy index represented original copy is corresponding; And be judged to be the situation that output condition has not been satisfied, forbid exporting the parts of a plurality of master copy datas corresponding with the multipage that comprises in the original copy index represented original copy.
In the present invention, the original copy extraction element is predetermined output condition to each original copy index, under the situation that output condition has been satisfied, export master copy data, under the situation that data qualification is not satisfied, forbid exporting master copy data, thereby only export original copy corresponding to the original copy index that satisfies output condition.Under the situation that output condition has been satisfied, can export original copy, so by the high original copy of importance degree is determined output condition, thereby can prevent from can protect the secret information that contains in the original copy under the situation that importance degree high original copy easily exported.
Original copy extraction element of the present invention also comprises the parts of formation based on a plurality of images that extract a plurality of master copy datas that parts have extracted.
In the present invention, the original copy extraction element comprises the parts that form image based on master copy data, thereby can form the image based on the master copy data that has extracted.Use comprises the image processing systems such as compounding machine of digital copier or scanner, can form based on being stored in the master copy data in the image processing system or being connected to the image of the master copy data that extracts in the master copy data of storing in the server unit of image processing system via communication network, so form and can obtain the original copy that constitutes by photo or text etc. easily by image.
Computer program of the present invention is the control program that is used for by the above-mentioned original copy extraction element of computer realization.
The recording medium recording of embodied on computer readable of the present invention aforementioned calculation machine program.
Description of drawings
Fig. 1 is the block scheme of functional structure of the inside of expression original copy extraction element.
Fig. 2 is the block scheme that the expression original copy extracts the structure of processing unit.
Fig. 3 is the block scheme of the structure of representation feature point extraction unit.
Fig. 4 is the key diagram of the example of the spatial filter that utilizes of expression filter processing unit.
Fig. 5 is the key diagram of example of the unique point of expression join domain.
Fig. 6 is the key diagram of expression for the extraction result's of the unique point of character string example.
Fig. 7 is a key diagram of paying close attention to unique point and the unique point that extracts.
Fig. 8 A-Fig. 8 D is that expression is extracted 3 peripheral unique points to paying close attention to unique point P1, and the key diagram of the example of calculated characteristics data.
Fig. 9 A-Fig. 9 D is that expression is extracted 3 peripheral unique points to paying close attention to unique point P2, and the key diagram of the example of calculated characteristics data.
Figure 10 is the concept map of the master copy data stored of expression storage unit.
Figure 11 is the concept map of the content example of expression master copy data that storage unit is stored and the corresponding original copy table of original copy.
Figure 12 is the concept map of the content example of expression master copy data that storage unit is stored and the corresponding original copy table of characteristic.
Figure 13 is the process flow diagram of step of the processing of expression registration master copy data.
Figure 14 is the process flow diagram of the step of the expression processing of extracting master copy data.
Figure 15 is the process flow diagram of the step of the expression processing of extracting master copy data.
Figure 16 is the process flow diagram of the step of the expression processing of extracting master copy data.
Figure 17 is the concept map of the content example of expression master copy data that storage unit is stored and the corresponding original copy table of original copy.
Figure 18 is the process flow diagram of the step of expression original copy output processing.
Figure 19 is the block scheme of functional structure of the inside of expression original copy extraction element.
Figure 20 is the block scheme of the inner structure of expression original copy extraction element.
Embodiment
Below, for the present invention, specifically describe based on the accompanying drawing of representing its embodiment.
(embodiment 1)
In embodiment 1, represent the mode of original copy extraction element of the present invention for the image processing system of formation coloured image.Fig. 1 is the block scheme of functional structure of inside of the original copy extraction element 100 of the present invention of expression embodiment 1.Original copy extraction element 100 of the present invention comprises that control constitutes control module 11, the storage unit 12 that is made of semiconductor memory or hard disk etc. and the coloured image input block 13 of optically read coloured image of action of the each several part of original copy extraction element 100.Be connected with color image processing unit 2 on the coloured image input block 13, be used to generate the processing of the view data corresponding with the coloured image that reads.Coloured image input block 13 reads the original copy that is made of photo or text etc. as coloured image, and it is master copy data that storage unit 12 storages are read the view data that is generated by color graphics processing unit 2 behind the original copy by coloured image input block 13.Storage unit 12 works as the original copy storage unit among the present invention, and coloured image input block 13 is obtained the unit as the master copy data among the present invention and worked.In addition, be connected with coloured image on the color graphics processing unit 2 and form unit 14, be used for forming coloured image according to the view data that generates by color graphics processing unit 2.Coloured image input block 13, color graphics processing unit 2 and coloured image form and are connected with the guidance panel 15 that is used to accept from user's operation on the unit 14.
Coloured image input block 13 is by having CCD (Charge Coupled Device, charge coupled cell) scanner constitutes, in the future to be formed in the coloured image on the recording medium such as paper be that the reflected light of original copy looks like to be decomposed into R (red) G (green) B (indigo plant) and read by CCD to idiomorphism, outputs to color graphics processing unit 2 after being transformed to the simulating signal of RGB.Thereby 2 pairs of color graphics processing units carry out the view data that Flame Image Process described later generates numeral from the simulating signals of the RGB of coloured image input block 13 inputs, and then generate and output to coloured image after the view data by C (green grass or young crops) M (magenta) Y (Huang) K (deceive) the signal formation of numeral and form unit 14.Coloured image forms unit 14 based on the view data from 2 inputs of color graphics processing unit, forms coloured image by modes such as hot transfer printing, electronic photo or ink-jets.Guidance panel 15 comprises the display units such as LCD of the required information of the operation that shows original copy extraction element 100, accept to be used to control the touch panel of indication of action of original copy extraction element 100 or numeric keypad etc. by user's operation accepts the unit.
Color graphics processing unit 2 will be transformed to digital signal from the simulating signal of coloured image input block 13 inputs by A/D converter unit 20, according to blackspot (shading) correcting unit 21, input levels correcting unit 22, zone separation processing unit 23, original copy extracts processing unit 24, color correction unit 25, black print generates background color and removes (black generation and under color removal) unit 26, spatial filtering processing unit 27, output levels correcting unit 28, the order of color range reproduction processes unit 29 transmits, and coloured image is formed the view data that unit 14 outputs are made of the CMYK signal of numeral.
A/D converter unit 20 accepts to be input to from coloured image input block 13 simulating signal of the RGB of color graphics processing unit 2, and the simulating signal of RGB is transformed to the rgb signal of numeral, and rgb signal is outputed to shading correction unit 21.The processing of the various distortions that illuminator, imaging system, the camera system that the 21 pairs of rgb signals from 20 inputs of A/D converter unit in shading correction unit are used for eliminating coloured image input block 13 produces.The rgb signal that shading correction unit 21 will have been removed after the distortion outputs to input levels correcting unit 22.
The rgb signal of 22 pairs of 21 inputs from the shading correction unit of input levels correcting unit is adjusted color balance.And then, the rgb signal that is input to input levels correcting unit 22 from shading correction unit 21 is the reflectivity signals of RGB, and input levels correcting unit 22 will be transformed to signals such as color graphics processing unit 2 easy to handle concentration (pixel value) signals by 21 rgb signals of importing from the shading correction unit.The rgb signal that input levels correcting unit 22 will carry out handling outputs to regional separation processing unit 23.
Zone separation processing unit 23 will be any of character zone, dot area or photo (color range continuously) zone from each pixel separation the represented image of the rgb signal of input levels correcting unit 22 inputs, based on separating resulting, will be used to represent that regional identification signal which zone is each pixel belong to outputs to black print and generates background color and remove unit 26, spatial filtering processing unit 27, color range reproduction processes unit 29.Zone separation processing unit 23 will output to original copy again and extract processing unit 24 from the rgb signal of input levels correcting unit 22 inputs.
Original copy extracts processing unit 24 and connects with storage unit 12 and also carry out following processing: and storage unit 12 between the view data that constitutes by rgb signal of input and output be the processing and the relevant processing of original copy extracting method of the present invention described later of master copy data.View data that original copy extraction processing unit 24 will constitute from the rgb signal of regional separation processing unit 23 inputs again or the master copy data of importing from storage unit 12 are that view data outputs to color correction unit 25.In addition, original copy extraction element 100 also can be original copy not to be extracted the back level that processing unit 24 is arranged on regional separation processing unit 23, and with the input levels correcting unit 22 parallel forms that are provided with.
Color correction unit 25 will be transformed to the CMY signal from the rgb signal that original copy extracts processing unit 24 input, in order to realize the fidelity of color reproduction, and carry out look turbid processing of from the CMY signal, removing based on the dichroism that comprises the CMY look material that does not need to absorb component.The CMY signal that color correction unit 25 then will carry out color correction outputs to black print and generates background color and remove unit 26.
Black print generation background color is removed the black print generation processing that the tristimulus signal that carries out the CMY of 25 inputs from the color correction unit in unit 26 generates the K signal, handles four chrominance signals that the K signal that obtains is transformed to the tristimulus signal of CMY CMYK thereby deduct from original C MY signal by the black print generation.Generate an example of handling as black print, have by skeleton black print (skeleton black) and carry out the method that black print generates.In the method, the input-output characteristic of skeleton curve is made as y=f (x), data before the conversion are made as C, M, Y, UCR (Under Color Removal) rate is made as α (0<α<1), then represent data C ', M ', Y ', K ' after the conversion by following formula.
K’=f(min(C,M,Y))
C’=C-αK’
M’=M-αK’
Y’=Y-αK’
Here, UCR leads and after α (0<α<1) the expression part that CMY is overlapping is replaced into K CMY is cut down much degree.Above-mentioned first formula is represented to generate the K signal according to the signal intensity of the minimum in each signal intensity of CMY.Black print generates background color and removes the CMYK signal of unit 26 after with CMY signal transformation and output to spatial filtering processing unit 27.
27 pairs of spatial filtering processing units generate background color from black print and remove the represented image of CMYK signal that unit 26 is imported, according to regional identification signal from regional separation processing unit 23 inputs, carry out handling, thereby improve bluring or granular deterioration of image based on the spatial filtering of digital filter.For example, spatial filtering processing unit 27 in order to improve the repeatability of character, uses the big wave filter of the amount of emphasizing of high fdrequency component to carry out the spatial filtering processing for the zone that is separated into character in regional separation processing unit 23.In addition, 27 pairs of zones that are separated into the site by regional separation processing unit 23 of spatial filtering processing unit are used to remove the low-pass filtering treatment of input site component.CMYK signal after spatial filtering processing unit 27 then will be handled outputs to output levels correcting unit 28.
The characteristic value that 28 pairs of CMYK signals from 27 inputs of spatial filtering processing unit of output levels correcting unit are transformed to coloured image formation unit 14 is the output levels treatment for correcting of dot area percentage, and the CMYK signal after the output levels treatment for correcting is outputed to color range reproduction processes unit 29.
Color range reproduction processes unit 29 is handled the CMYK signal of importing from output levels correcting unit 28 based on the regional identification signal from regional separation processing unit 23 inputs, so that can show the color range corresponding to the zone.For example, color range reproduction processes unit 29 carries out binaryzation or many level vibration (lever dither) processing based on the high-resolution web plate (screen) of the reproduction that is suitable for high fdrequency component for the zone that is separated into character in regional separation processing unit 23.In addition, color range reproduction processes unit 29 is for the zone that is separated into the site by regional separation processing unit 23, and separation of images is a pixel the most at last, thereby carries out the color range reproduction processes, so that can reproduce each color range.View data after color range reproduction processes unit 29 will be handled outputs to coloured image and forms unit 14.
Coloured image forms the view data that unit 14 constitutes based on the CMYK signal from 2 inputs of color graphics processing unit, forms the coloured image of CMYK on recording mediums such as paper.By being that view data forms image based on master copy data,, coloured image exports the original copy that constitutes by photo or text etc. thereby forming unit 14.
Then, illustrate that original copy extracts the structure of processing unit 24 and the processing that original copy extraction processing unit 24 is carried out.Fig. 2 is the block scheme that the expression original copy extracts the structure of processing unit 24.Original copy extracts processing unit 24 and comprises: the feature point extraction unit 241 that extracts pairing unique points such as character on the represented original copy of the master copy data of input or figure, calculate feature (proper vector) the data computation unit 242 of the characteristic of the feature of representing master copy data by unique point, the ballot processing unit 243 that the master copy data of storage unit 12 being stored based on characteristic is voted, judge the similar degree determination processing unit 244 of similar degree of master copy data and the original copy extraction unit 245 that extracts specific master copy data from storage unit 12 based on voting results.
Fig. 3 is the block scheme of the structure of representation feature point extraction unit 241.The center of gravity extraction unit 2414 that feature point extraction unit 241 comprises the filter processing unit 2412 of proofreading and correct for the resolution conversion unit 2411 of the resolution of regulation, to the spatial frequency characteristic of master copy data with the signal transformation processing unit 2410 of colourlessization of master copy data, with the resolution conversion of master copy data, master copy data is carried out the binary conversion treatment unit 2413 of binaryzation and extract the center of gravity of character etc.
Master copy data in input is under the situation of color image data, and signal transformation processing unit 2410 is transformed to luminance signal or lightness signal with colourlessization of coloured image, and the master copy data after the conversion is outputed to resolution conversion unit 2411.For example, the intensity of the color component of each pixel RGB is made as Rj, Gj, Bj respectively, and the luminance signal of each pixel is made as Yj, thereby brightness signal Y can be expressed as Yj=0.30 * Rj+0.59 * Gj+0.11 * Bj.In addition, as other method, also can utilize by rgb signal being transformed to CIE (Commission International de l ' Eclairage) 1976L*a*b* signal, thereby make the method for colourlessization of coloured image.
Resolution conversion unit 2411 becomes the resolution of regulation for the resolution of the master copy data that makes input and master copy data is become doubly, thus the resolution of conversion master copy data, and master copy data outputed to filter processing unit 2412.Thus, even become under the situation of the change resolution that doubly makes master copy data original copy being carried out optics by coloured image input media 13, also can carry out the extraction of unique point and can not be subjected to its influence.In addition, resolution conversion unit 2411 is transformed to the little resolution of resolution of reading in when waiting times than coloured image input media 13.For example, the master copy data that will read in 600dpi (dot per inch) in coloured image input media 13 is transformed to 300dpi.Thus, can alleviate the treatment capacity of back level.
Emphasize processing and the smoothing processing etc. of filter processing unit 2412 by image proofreaied and correct the spatial frequency characteristic of the master copy data of input, and the image after will proofreading and correct outputs to binary conversion treatment unit 2413.For the spatial frequency characteristic that absorbs coloured image input block 13 in each machine difference and carry out processing in the filter processing unit 2412.Produce the storage effect of aperture owing to the light receiving surface of optics parts such as lens or mirror, CCD, transmission efficiency, image retention, physical scan in the picture signal of the CCD output that coloured image input block 13 is included and scan the unequal image blurring cracking that produces.Filter processing unit 2412 by carry out border or edge etc. emphasize handle, thereby repair the cracking that produces in the master copy data.In addition, filter processing unit 2412 is carried out smoothing processing, unwanted high fdrequency component in handling with the extraction that is suppressed at the back grade unique point of handling.
Fig. 4 is the key diagram of the example of the spatial filter that utilized of expression filter processing unit 2412.As shown in the figure, spatial filter for example has 7 * 7 size, is to be used to emphasize to handle the compound filter of handling with smoothing.The pixel of the master copy data of scanning input, and all pixels are carried out calculation process by spatial filter.In addition, the size of spatial filter is not limited to 7 * 7 size, also can 3 * 3,5 * 5 etc. size.In addition, the numerical value of filter factor is an example, should not be defined in this, can suitably set according to the machine or the characteristic of coloured image input block 13.
Thereby binary conversion treatment unit 2413 compares the master copy data binaryzation by the brightness value of each pixel of comprising in will the master copy data of input or the threshold value of brightness value and regulation, and the master copy data after the binaryzation is outputed to center of gravity extraction unit 2414.
Each pixel of 2414 pairs of the center of gravity extraction units master copy datas of 2413 inputs from the binary conversion treatment unit is added the sign (labeling) of the label (label) corresponding with pixel value after the binaryzation.That is, two kinds of labels are arranged in the label, under the situation of pixel value with 0 or 1 expression, the pixel to 0 is added a kind of label, and the pixel to 1 is added another kind of label.The join domain that the then definite pixel of having been added same label of center of gravity extraction unit 2414 connects, and the center of gravity of extracting the join domain of determining outputs to characteristic computing unit 242 as unique point with the unique point of extracting.In addition, unique point can be represented by the coordinate figure on the represented bianry image of master copy data.
Fig. 5 is the key diagram of example of the unique point of expression join domain.In Fig. 5, the join domain of having determined is character ' A ', and is confirmed as having been added the set of the pixel of same label.The position of the center of gravity of this character " A " is the position shown in the stain among Fig. 5, and this center of gravity is a unique point.Fig. 6 is the key diagram of expression for the extraction result's of the unique point of character string example.Under the situation of the character string that constitutes by a plurality of characters, according to the kind of character and respectively at different position extract minutiaes.Unique point not only can be extracted character, equally also can be to figure or photo extracting section.In addition, the extracting method of the unique point shown in is an example here, also can use other method extract minutiae.For example, also can carry out character string is decomposed into word, and extract of the processing of the center of gravity of each word as unique point.
Characteristic computing unit 242 carries out the unique point based on 241 inputs from the feature point extraction unit, the processing of the characteristic of the feature of the master copy data of calculating expression input.Here, representation feature data computing example.Characteristic computing unit 242 will be from the feature point extraction unit each unique points of 241 inputs in order as paying close attention to unique point, and extract near paying close attention to four of unique point other unique points.
Fig. 7 is a key diagram of paying close attention to unique point and the unique point that extracts.Characteristic computing unit 242 as shown in Figure 7, with a unique point as paying close attention to unique point, to extract stated number (being 4 points) successively here as peripheral unique point near this unique point of paying close attention to the unique point periphery from from the near unique point of distance of paying close attention to unique point.In example shown in Figure 7, under with the situation of unique point a as concern unique point P, 4 of unique point b, the c that is surrounded by the closed curve C1 among the figure, d, e are used as peripheral feature point extraction, with unique point b as the situation of paying close attention to unique point P2 under, 4 of unique point a, the c that is surrounded by the closed curve C2 among the figure, e, f are used as peripheral feature point extraction.
In addition, characteristic computing unit 242 extracts 3 combination from 4 of the peripheral unique points that extract.Fig. 8 A-Fig. 8 D is expression to paying close attention to the key diagram that unique point P1 extracts the example of 3 peripheral unique point and calculated characteristics data.Shown in Fig. 8 A-Fig. 8 D, under with the situation of unique point a shown in Figure 7 as concern unique point P1, whole combinations of 3 have been selected in extraction from peripheral unique point b, c, d, e, that is each combination of peripheral unique point b, c, d, peripheral unique point b, c, e, peripheral unique point b, d, e, peripheral unique point c, d, e.
Then, characteristic computing unit 242 calculates invariant (one of characteristic quantity) Hij for the geometry distortion for each combination that extracts.Here, i is a number (i is the integer more than 1) of paying close attention to unique point, and j is the number (j is the integer more than 1) of the combination of 3 of the peripheral unique points of expression.In the present embodiment, two length ratios in the length of the line segment between the peripheral unique point of connection are made as invariant Hij.In addition, the length of above-mentioned line segment is as long as calculate based on each peripheral characteristic point coordinates value.
For example, in the example shown in Fig. 8 A, the line segment length of establishing connection features point b and unique point c is the line segment length of A11, connection features point b and unique point d when being B 11, and invariant H11 is obtained by H11=A11/B11.In addition, in the example shown in Fig. 8 B, the line segment length of establishing connection features point b and unique point c is the line segment length of A12, connection features point b and unique point e when being B12, and invariant H12 is obtained by H12=A12/B12.In addition, in the example shown in Fig. 8 C, the line segment length of establishing connection features point b and unique point d is the line segment length of A13, connection features point b and unique point e when being B13, and invariant H13 is obtained by H13=A13/B13.In addition, in the example shown in Fig. 8 D, the line segment length of establishing connection features point c and unique point d is the line segment length of A14, connection features point c and unique point e when being B14, and invariant H14 is obtained by H14=A14/B14.
Like this, in the example of Fig. 8 A-Fig. 8 D, invariant H11, H12, H13, H14 are calculated.In above example, will unique point be the 1st near, the 2nd near from paying close attention to, the combination of 3 of the 3rd near peripheral unique points is made as j=1, will unique point be the 1st near, the 2nd near from paying close attention to, the combination of 3 of the 4th near peripheral unique points is made as j=2, will unique point be the 1st near, the 3rd near from paying close attention to, the combination of 3 of the 4th near peripheral unique points is made as j=3, will unique point be the 2nd near, the 3rd near from paying close attention to, the combination of 3 of the 4th near peripheral unique points is made as j=4.In addition, will be made as Aij from paying close attention to the nearest peripheral unique point of unique point with the line segment that the 2nd near peripheral unique point is connected in 3 the peripheral unique point, will be made as Bij with the line segment that the 3rd near peripheral unique point is connected from paying close attention to the nearest peripheral unique point of unique point.In addition, for the order of the combination that determines 3 of peripheral unique points or the employed line segment of calculating of invariant Hij, be not limited to the employed method of above example, also can use length to decide as the arbitrary method such as method that benchmark determines to connect the line segment between the peripheral unique point.
Then, the value that characteristic computing unit 242 calculates the remainder of following formula is worth (characteristic) Hi as hash (hash), and is stored in the storage unit 12.In addition, the D of following formula is that what degree is scope according to the value that remainder can be got be set at and predefined constant.
(Hi1×10 3+Hi2×10 2+Hi3×10 1+Hi4×10 0)/D
In addition, characteristic computing unit 242 is after the calculating of the extraction of the peripheral unique point of paying close attention to unique point for one and hashed value Hi finishes, other unique point is paid close attention to unique point as the next one, the next one is paid close attention to unique point carry out the calculating of peripheral unique point and the calculating of hashed value, and calculate with each unique point as the hashed value of paying close attention to unique point.
In example shown in Figure 7, characteristic computing unit 242 carries out with unique point b as the extraction of the peripheral unique point of paying close attention to unique point P2 and the calculating of hashed value H2 after finishing as the calculating of the extraction of the peripheral unique point of paying close attention to unique point P1 and hashed value H1 with unique point a.As shown in Figure 7, with unique point b as the situation of paying close attention to unique point P2 under, unique point a, c, e, f 4 are used as peripheral unique point and extract.
Fig. 9 A-Fig. 9 D is expression to paying close attention to the key diagram that unique point P2 extracts the example of 3 peripheral unique point and calculated characteristics data.Shown in Fig. 9 A-Fig. 9 D, 3 the combination that characteristic computing unit 242 extracts in peripheral unique point a, c, e, the f, promptly, each combination of periphery unique point a, e, f, peripheral unique point a, c, e, peripheral unique point a, f, c, peripheral unique point e, f, c, and to each combination calculation invariant Hij.
Same with the situation of the concern unique point P1 shown in Fig. 8 A-Fig. 8 D, under the situation of paying close attention to unique point P2, shown in Fig. 9 A, calculate invariant H21 by H21=A21/B21, shown in Fig. 9 B, calculate invariant H22 by H22=A22/B22, shown in Fig. 9 C, calculate invariant H23 by H23=A23/B23, shown in Fig. 9 D, calculate invariant H24 by H24=A24/B24.In addition, characteristic computing unit 242 is calculated hashed value H2, and is stored in the storage unit 12 by invariant H21, H22, H23, H24.And then characteristic computing unit 242 repeats same processing with each unique point as paying close attention to unique point, and asks respectively with each unique point as the hashed value Hi under the situation of paying close attention to unique point, and is stored in the storage unit 12.
As above, characteristic computing unit 242 is to the characteristic of each unique point calculating as hashed value Hi, and a plurality of characteristics that will calculate are as the characteristic of master copy data.Characteristic computing unit 242 works as the generation unit among the present invention.
In addition, the computing method of the characteristic shown in are an example here, also can use other method calculated characteristics data.For example, also can use the hash function calculated characteristics data of other regulation.In addition, extracting when paying close attention to the unique point of unique point, also can extract the unique point of the numbers beyond or etc. at 5 at 6 at 4 and come the calculated characteristics data.In addition, also can carry out from 5 unique points that extract, extracting 3 unique points again, come the calculated characteristics data based on the distance between 3 o'clock, come calculated characteristics data etc. corresponding to the number of combinations that from 5 unique points, can extract 3 unique points again, pay close attention to the processing that unique point is calculated a plurality of characteristics one.
The characteristic that characteristic computing unit 242 is calculated is associated with master copy data by storage unit 12 storages.Storage unit 12 is each original copy to being made of multipage respectively, stores the master copy data corresponding with each page, and then storage is with the original copy table of master copy data and original copy correspondence and with master copy data and characteristic characteristic of correspondence table.Storage unit 12 works as the characteristic storage unit among the present invention.
Figure 10 is the concept map of the master copy data stored of expression storage unit 12.Stored the pairing a plurality of master copy datas of each page that comprises in the original copy, to each master copy data added independent expression master copy data ID1, ID2 ... the page or leaf index.Figure 11 is the concept map with the content example of the original copy table of master copy data and original copy correspondence that expression storage unit 12 is stored.Write down independent expression original copy Doc1, Doc2 ... the original copy index, the page or leaf index of the pairing master copy data of each page that expression comprises in the original copy record that is associated with the original copy index.Also write down each original copy number of pages in the table, be associated with the original copy index with the page or leaf index of the identical number of number of pages.Be associated with the original copy index by the page or leaf index, thereby as shown in figure 10, storage unit 12 is stored original copy index and master copy data interrelatedly.
Figure 12 is the concept map with the content example of master copy data and the corresponding mark sheet of characteristic that expression storage unit 12 is stored.In the drawings, the example under the expression situation about will calculate as E=127 as the characteristic of hashed value.Write down each characteristic of 0~126, the page or leaf index record that is associated with the characteristic that this master copy data is calculated of master copy data.Owing in a plurality of master copy datas, calculate same characteristic sometimes, thus to each characteristic association a plurality of pages of index.In addition, because a master copy data is calculated a plurality of characteristics, the page or leaf index of a master copy data is associated with a plurality of characteristics.By the page or leaf index is associated with characteristic, thereby storage unit 12 is stored characteristic and master copy data interrelatedly.
The characteristic that ballot processing unit 243 calculates based on characteristic computing unit 242, the mark sheet that retrieve stored unit 12 is stored is voted to the represented master copy data of page or leaf index that the characteristic consistent with the characteristic that calculates is associated.To a characteristic association under the situation of a plurality of pages of index, the whole master copy datas that are associated with this characteristic are voted.Because the master copy data of 242 pairs of inputs of characteristic computing unit calculates a plurality of characteristics, thus each characteristic is voted, and to repeatedly voting with the similar master copy data of master copy data of input.Ballot processing unit 243 will output to similar degree determination processing unit 244 to the result that a plurality of characteristics that characteristic computing unit 242 calculates are voted.
Similar degree determination processing unit 244 is based on the voting results from the input of ballot processing unit 243, and the master copy data of judging input is with to be stored in which of master copy data in the storage unit 12 similar, and result of determination is outputed to original copy extraction unit 245.Specifically, the number of votes obtained of each master copy data of storage in the storage unit 12 is checked in similar degree determination processing unit 244, the similar master copy data of master copy data that the master copy data of number of votes obtained maximum is judged to be and imports.Perhaps, the number of the characteristic that similar degree determination processing unit 244 also can calculate the number of votes obtained of each master copy data divided by characteristic computing unit 242 is a maximum possible number of votes obtained and with number of votes obtained normalization, the similar master copy data of master copy data that the number of votes obtained after the normalization is judged to be and imports for the master copy data more than the threshold value of regulation.Under the situation of the similar master copy data of master copy data that exists and import, contain the page or leaf index of similar master copy data in the result of determination of similar degree determination processing unit 244 outputs.Ballot processing unit 243 and similar degree determination processing unit 244 work as the identifying unit among the present invention.
Original copy extraction unit 245 is based on the page or leaf index that comprises from the result of determination of similar degree determination processing unit 244 input, the original copy table that retrieve stored unit 12 is stored, and obtain the original copy index that is associated with page index.Thus, determine to contain the original copy of the pairing page or leaf of the similar master copy data of the master copy data that is judged as and imports.Original copy extraction unit 245 then extracts the represented a plurality of master copy datas of a plurality of pages of index that are associated with the original copy index of having obtained, and a plurality of master copy datas that will extract output to color correction unit 25.Thus, with the original copy of having determined in whole pages of corresponding master copy datas comprising be extracted.Original copy extraction unit 245 works as the extraction unit among the present invention.
Then, the processing that the above original copy extraction element of the present invention that structure constituted 100 is performed is described.Thereby thereby the processing of the original copy registration master copy data that is made of multipage, the part extraction of reading original copy and the processing of whole pages of original copy corresponding master copy datas are read in 100 execution of original copy extraction element.Extracting with the processing of whole pages of original copy corresponding master copy datas from the part of original copy is processing of relevant original copy extracting method of the present invention.Figure 13 is the process flow diagram of step of the processing of expression registration master copy data.
The control module 11 of original copy extraction element 100 is waited for the registration indication (S11) of the master copy data of accepting user's operating operation panel 15 and producing at any time.Receiving that not (S11: not), control module 11 continues the acceptance of queued for log-on indication under the registration indication situation.Under the situation of the registration indication of having accepted master copy data (S11: be), the user is set to the original copy that multipage constitutes in the original copy extraction element 100, coloured image input block 13 is a plurality of master copy datas (S12) by optically read each page thereby obtain the view data that is made of rgb signal.Coloured image input block 13 outputs to color image processing unit 2 with master copy data, in color image processing unit 2, order according to A/D converter unit 20, shading correction unit 21, input levels correcting unit 22 and regional separation processing unit 23 is handled master copy data, and control module 11 is stored master copy data (S13) in storage unit 12.
Extract in the processing unit 24 at original copy, feature point extraction unit 241 extracts a plurality of unique points (S14) by aforesaid processing to a master copy data, characteristic computing unit 242 by aforesaid processing to each unique point calculated characteristics data, thereby calculate a plurality of characteristics (S15) of the feature of a master copy data of expression.Control module 11 then generates the page or leaf index of a master copy data of expression, by the master copy data additional page index to storage in the storage unit 12, thereby sets page or leaf index (S16).At this moment, order that control module 11 is transfused to according to master copy data or time on date etc., generate unique page or leaf index.The page or leaf index that control module 11 is followed by characteristic that characteristic computing unit 242 is calculated and master copy data is associated, thereby upgrades mark sheet (S17) as shown in figure 12 like this.
Control module 11 judges then whether the processing to whole master copy data linked character data of having imported finishes (S18).(S18: not), control module 11 turns back to step S14 with processing, and 241 pairs of feature point extraction unit also do not carry out the master copy data extract minutiae of the extraction of unique point under the situation of the master copy data that also has the processing of not carrying out the linked character data.Under situation about handling that whole master copy datas is through with (S18: be), be used to represent original copy index by generation, thereby set original copy index (S19) by the original copy that multipage constituted corresponding with a plurality of master copy datas of having obtained.Here, control module 11 is by generation original copy index such as times on date.In addition, control module 11 also can carry out accepting by guidance panel 15 processing of the desirable original copy index of user.
The page or leaf index that control module 11 is followed by original copy index that will generate and master copy data is associated, thus the original copy table (S20) that updated stored unit 12 is stored, and end process.By above processing, the master copy data of the original copy that is made of multipage is stored in the storage unit 12.
Figure 14 is the process flow diagram of the step of the expression processing that is used to extract master copy data.The control module 11 of original copy extraction element 100 is waited for the extraction indication (S31) of the master copy data of accepting user's operating operation panel 15 and producing at any time.Receiving that not (S31: not), control module 11 continuation waits are received and extracted indication under the situation of extracting indication.Under the situation of the extraction indication of having accepted view data (S31: be), the part page or leaf that comprises in the original copy of user with the multipage formation is set in the original copy extraction element 100, coloured image input block 13 is promptly imported master copy data (S32) by the page or leaf of optically read setting thereby obtain the view data that is made of rgb signal.
Coloured image input block 13 will be imported master copy data and output to color image processing unit 2, in color image processing unit 2, order according to A/D converter unit 20, shading correction unit 21, input levels correcting unit 22 and regional separation processing unit 23 is handled the input master copy data, extract in the processing unit 24 at original copy, feature point extraction unit 241 pairs of inputs master copy data extracts a plurality of unique points (S33).Characteristic computing unit 242 each unique point calculated characteristics data by feature point extraction unit 241 is extracted, thus a plurality of characteristics (S34) that the feature of master copy data is imported in expression calculated.
Each characteristic that ballot processing unit 243 then calculates characteristic computing unit 242, the mark sheet that retrieve stored unit 12 is stored, and carry out (S35) handled in the ballot of the represented master copy data ballot of the page or leaf index that is associated with the characteristic that calculates.Similar degree determination processing unit 244 is judged which similar (S36) of the master copy data of storage in input master copy data and the storage unit 12 based on the voting results in the ballot processing unit 243.At this moment, similar degree determination processing unit 244 will obtain the master copy data of number of votes obtained maximum in the master copy data of number of votes obtained of minimum, or the number of votes obtained after the normalization is judged to be and imports the high master copy data of similar degree of master copy data for the above master copy data of threshold value of regulation.
Control module 11 judges then whether the result of determination in the similar degree determination processing unit 244 represents the master copy data (S37) that exists similar degree high.Represent in result of determination under the situation of the master copy data high that (S37: not), control module 11 output expressions do not make the information (S38) of the similar original copy of original copy that coloured image input block 13 reads with the user less than similar degree.Specifically, control module 11 shows that in the display unit of guidance panel 15 expression does not have the character information of similar original copy, perhaps forms at coloured image and forms the image that does not have similar original copy with character representation in the unit 14.After step S38 finishes, the processing that original copy extraction element 100 finishes to extract master copy data.
In step S37, represent to exist in result of determination under the situation of the high master copy data of similar degree (S37: be), the original copy table that original copy extraction unit 245 retrieve stored unit 12 are stored, and obtain by similar degree determination processing unit 244 and be judged to be the original copy index (S39) that is associated with the page or leaf index of the high master copy data of the similar degree of importing master copy data.Control module 11 then judges whether obtained a plurality of input master copy datas (S40) corresponding with multipage.Be under the situation of the input master copy data corresponding that (S40: not), original copy extraction unit 245 extracts a plurality of pages of index represented a plurality of master copy datas (S43) that are associated with the original copy index of having obtained by the original copy table at the input master copy data of having obtained with one page.Thus, containing the master copy data relevant with the original copy of the pairing page or leaf of the high master copy data of similar degree of input master copy data all is extracted.
Original copy extraction unit 245 outputs to color correction unit 25 with the master copy data that extracts, generate the sequential processes master copy data that background color is removed unit 26, spatial filtering processing unit 27, output levels correcting unit 28, color range reproduction processes unit 29 according to color correction unit 25, black print, 2 pairs of coloured images of color image processing unit form unit 14 output master copy datas.Coloured image forms unit 14 by forming based on the image as a plurality of master copy datas of view data, handles thereby carry out original copy output, promptly exports the original copy (S44) that is made of the multipage corresponding with a plurality of master copy datas.After step S44 finishes, the processing that original copy extraction element 100 finishes to extract master copy data.
Obtained in step S40 under the situation of a plurality of input master copy datas corresponding with multipage (S40: be), control module 11 is judged original copy index that each input master copy data has been obtained whether consistent (S41).(S41: not), control module 11 proceeds to step S38 with processing, and output does not have the situation of similar original copy under the inconsistent situation of original copy index.
Under the situation of original copy index unanimity (S41: be), control module 11 judges whether the processing for whole input master copy datas judgement similar degrees finishes (S42) in step S41.(S42: not), control module 11 turns back to step S33 with processing, and 241 pairs of feature point extraction unit also do not carry out the input master copy data extract minutiae of the extraction of unique point under the situation of the input master copy data that also has the processing of judging similar degree.Under situation about handling that whole input master copy datas are through with (S42: be), original copy extraction element 100 proceeds to step S43 with processing, extract contain with the high master copy data of the similar degree of input master copy data corresponding page the relevant master copy data of original copy and export original copy.
In addition, in above processing, suppose that the high master copy data of similar degree with the input master copy data is one, but original copy extraction element 100 also can exist under a plurality of situations for the master copy data more than the threshold value of regulation at the number of votes obtained after the normalization, judges that a plurality of master copy datas are the processing with the high master copy data of similar degree of input master copy data.In this case, also can carry out the processing of will together export with each relevant original copy of a plurality of master copy datas, perhaps also can carry out in the display unit of guidance panel 15 showing each master copy data high with being judged as similar degree corresponding page image, and allow the user select the processing of proper master copy data.
Be described in detail as above, in the present invention, original copy extraction element 100 is stored in the master copy data corresponding with each page of comprising in the original copy in the storage unit 12, and then, the characteristic of the feature of expression master copy data and the original copy index and the master copy data of expression original copy are stored explicitly.Original copy extraction element 100 is under the situation that has obtained the input master copy data, by input master copy data generating feature data, based on the similar degree of characteristic judgement with master copy data, obtain the original copy index that is associated with the high master copy data of the similar degree of importing master copy data, and extract a plurality of master copy datas that are associated with the original copy index of having obtained.Thus, contain with the original copy that is judged as the page or leaf corresponding with importing master copy data similar master copy data and be determined, in addition, the master copy data corresponding with all pages that comprise in the original copy of having determined is extracted.That is, can extract and whole pages of original copy corresponding master copy datas based on the corresponding input master copy data of the part of the original copy that constitutes by multipage.Thereby, even in the original copy that constitutes by multipage because of lose or pollution etc. produced under the situation of disappearance, also can from the database of having stored master copy data in advance, easily extract all master copy datas in the page or leaf of original copy.
In addition, original copy extraction element 100 of the present invention is in order to judge the similar degree of master copy data, in advance a master copy data is stored a plurality of characteristics, for each characteristic that the input master copy data is generated, the master copy data that is associated with same characteristic is voted, and will obtain the master copy data conduct of the votes more than maximum number of votes obtained or the ormal weight and the high master copy data of similar degree of input master copy data.Owing to the master copy data of most of characteristic unanimity in a plurality of characteristics is judged to be the similar degree height, judges so can carry out more accurate similar degree.Thereby, can avoid as much as possible owing to being judged to be the situation that the high master copy data of similar degree extracts the master copy data different with purpose with the not similar master copy data of input master copy data mistakenly.
In addition, original copy extraction element of the present invention is obtained a plurality of input master copy datas, under the consistent situation of the original copy index that the high master copy data of similar degree with each input master copy data is associated, extracts a plurality of master copy datas that are associated with consistent original copy index.Thus, can extract original copy, and can further reduce the possibility of the master copy data different of extraction mistakenly with purpose based on multipage.For example, even under the situation of the original copy that existence is analogous to each other, also can extract the master copy data of purpose reliably.
In addition, in the present invention, from master copy data, extract with the represented original copy of master copy data on the center of gravity characteristic of correspondence point of character, figure and photo etc., calculate characteristic based on the relative position relation of a plurality of unique points that extract by numeric representation.Thereby owing to carry out the retrieval of master copy data by the characteristic that between master copy data, calculates more like this, so with passing through retrieval that data bitmap relatively carries out or be thereby that the retrieval that characteristic quantity carries out is compared in the past by a plurality of character codes that relatively extract from original copy, cut down the required data volume of retrieval process of carrying out master copy data significantly.Thereby, in the present invention, compare with conventional art, cut down the required time of processing of retrieval master copy data.In addition, in the present invention, because the characteristic by relatively obtaining based on the relative position relation of a plurality of unique points, thereby carry out the retrieval of master copy data, so needn't between master copy data, carry out the location of image.Thereby, in the present invention, compare with conventional art and can retrieve master copy data accurately.
In addition, in the present embodiment, showing and handling color image data is the mode of master copy data, but is not limited thereto, and original copy extraction element 100 of the present invention also can be a mode of handling monochromatic master copy data.
In addition, in the present embodiment, show that to obtain the unit and used scanner as the master copy data among the present invention be the mode of coloured image input block 13, but be not limited thereto, original copy extraction element 100 of the present invention also can be to comprise from the interface of external scan instrument or PC reception master copy data being used as the mode that master copy data is obtained the unit.In addition, master copy data of the present invention is not limited to the view data of optically read original copy, also can be the application datas such as text data that generated by the PC that utilizes application program.In this case, original copy extraction element 100 is accepted master copy data as application data by the interface of obtaining the unit as master copy data, and carries out processing of the present invention.
In addition, in the present embodiment, show and register the master copy data of having obtained, and from the master copy data of having registered, extract the mode of the processing of necessary master copy data, but be not limited thereto, original copy extraction element 100 of the present invention also can be stored the methods such as storage unit 12 of master copy data in advance by installing, thereby does not carry out location registration process and extract the mode of the processing of master copy data.In addition in the present embodiment, show the mode of extracting the processing of necessary master copy data in the master copy data that carries out storage in the storage unit built-in from original copy extraction element 100 12, but be not limited thereto, original copy extraction element 100 of the present invention also can be the mode of processing of extracting the master copy data of necessity from the original copy storage unit of outside such as the memory storage that connected by communication network or server unit in the master copy data that carries out storing.
(embodiment 2)
In embodiment 2, be illustrated under the situation that has the high original copy of a plurality of and similar degree input image data, dwindle (Twisted Write body thereby also obtain the scope that input image data carries out view data) form.The inner structure of the original copy extraction element of present embodiment is same with the situation of the embodiment 1 that uses Fig. 1~Fig. 3 explanation.In addition, the memory contents in the storage unit 12 of present embodiment is identical with the situation of the embodiment 1 that uses Figure 11 and Figure 12 explanation.In addition, the processing of the original copy extraction element of present embodiment registration master copy data is identical with the situation of the embodiment 1 of the flowchart text that uses Figure 13.
The process flow diagram of the step of the processing of Figure 15 and Figure 16 extraction master copy data that to be expression undertaken by the original copy extraction element of embodiment 2.The control module 11 of original copy extraction element 100 is waited for the extraction indication (S501) of the master copy data of accepting user's operating operation panel 15 and producing at any time.Receiving that not (S501: not), control module 11 continuation waits are received to extract and indicated under the extraction indication situation.Under the situation of the extraction indication of having accepted view data (S501: be), the part page or leaf that comprises in the original copy of user with the multipage formation is set in the original copy extraction element 100, coloured image input block 13 is promptly imported master copy data (S502) by one page of optically read setting thereby obtain the view data that is made of rgb signal.
Coloured image input media 13 will be imported master copy data and output to color image processing unit 2, in color image processing unit 2, sequential processes input master copy data according to A/D converter unit 20, shading correction unit 21, input levels correcting unit 22 and regional separation processing unit 23, extract in the processing unit 24 at original copy, feature point extraction unit 241 pairs of inputs master copy data extracts a plurality of unique points (S503).Characteristic computing unit 242 each unique point calculated characteristics data by feature point extraction unit 241 is extracted, thus a plurality of characteristics (S504) that the feature of master copy data is imported in expression calculated.
Each characteristic that ballot processing unit 243 then calculates characteristic computing unit 242, the mark sheet of retrieve stored unit 12 storages carries out (S505) handled in the ballot that the represented master copy data of page or leaf index that is associated with the characteristic that calculates is voted.Similar degree determination processing unit 244 is judged which similar (S506) of the master copy data of storage in input master copy data and the storage unit 12 based on the voting results in the ballot processing unit 243.Among the step S506, similar degree determination processing unit 244 will be judged to be and import the high master copy data of similar degree of master copy data by the number of votes obtained after the normalization for the master copy data more than the threshold value of regulation.
Control module 11 judges then whether the result of determination in the similar degree determination processing unit 244 is represented to exist and the high master copy data (S507) of similar degree of importing master copy data.Represent in result of determination under the situation of the master copy data high that (S507: not), control module 11 output expressions do not make the information (S508) of the similar original copy of original copy that coloured image input block 13 reads with the user less than similar degree.After step S508 finishes, the processing that original copy extraction element 100 finishes to extract master copy data.
In step S507, represent to exist under the situation with the high master copy data of similar degree of input master copy data (S507: be) in result of determination, the original copy table that original copy extraction unit 245 retrieve stored unit 12 are stored, and obtain by similar degree determination processing unit 244 and be judged to be the original copy index (S509) that is associated with the page or leaf index of the high master copy data of the similar degree of importing master copy data.Under the situation that has the high master copy data of similar degree a plurality of and the input master copy data, in step S509, obtain a plurality of original copy index.Control module 11 then judge the input master copy data in pre-treatment whether be read in the original copy that constitutes by multipage the 2nd page of later page or leaf and input master copy data (S510).At the input master copy data in the pre-treatment is to read (S510: not), whether the original copy index that control module 11 judgements obtain in step S509 exists a plurality of (S515) under the situation of the 1st page of input master copy data that get of original copy.The original copy index of obtaining in step S509 is that (S515: not), original copy extraction unit 245 extracts the represented a plurality of master copy datas (S516) of a plurality of pages of index that are associated with the original copy index of having obtained by the original copy table under one the situation.
Original copy extraction unit 245 outputs to color correction unit 25 with the master copy data that extracts, generate the sequential processes master copy data that background color is removed unit 26, spatial filtering processing unit 27, output levels correcting unit 28, color range reproduction processes unit 29 according to color correction unit 25, black print, 2 pairs of coloured images of color image processing unit form unit 14 output master copy datas.Coloured image forms unit 14 by forming based on the image as a plurality of master copy datas of view data, handles thereby carry out original copy output, promptly exports the original copy (S517) that is made of the multipage corresponding with a plurality of master copy datas.After step S517 finishes, the processing that original copy extraction element 100 finishes to extract master copy data.
In step S510, at the input master copy data in the pre-treatment is to read under the situation of input master copy data of the 2nd page of later page or leaf gained of original copy (S510: be), and control module 11 judges in the original copy index of obtaining about the pairing input master copy data of page or leaf that reads from original copy so far whether have the identical original copy index (S511) of full page that reads so far.In that (S511: not), control module 11 does not proceed to step S508 with processing, and output does not have the situation of similar original copy to full page under the situation of identical original copy index.
Under the situation that has the identical original copy index of the full page read so far (S511: be), control module 11 judges whether there is the identical original copy index (S512) of a plurality of full pages.At the original copy index identical (S512: not) under one the situation to full page, control module 11 proceeds to step S516 with processing, original copy extraction unit 245 extracts the represented a plurality of master copy datas (S516) of a plurality of pages of index that are associated with the original copy index of having obtained, coloured image forms the original copy output of carrying out the original copy that multipage constituted corresponding with a plurality of master copy datas exported in unit 14 and handles (S517), original copy extraction element 100 end process.
In step S515, under the situation that has a plurality of original copy index of having obtained (S515: be), perhaps in step S512, exist under the situation of the identical original copy index of a plurality of full pages that up to the present read (S512: be), control module 11 carries out expression is provided the processing (S513) of information output of other page of original copy.Specifically, control module 11 shows the character information of asking to read the new page or leaf that comprises in the original copy in the display unit of guidance panel 15.
Control module 11 judges then whether the user is arranged on (S514) in the original copy extraction element 100 with other page or leaf that comprises in the original copy.Other that comprises in original copy page is set under the situation in the original copy extraction element 100 (S514: be), and control module 11 turns back to step S502 with processing, coloured image input block 13 obtain with original copy in the corresponding input master copy data of other page that comprises.
Other that comprises in original copy page is not set at that (S514: not), control module 11 proceeds to step S516 with processing under the situation in the original copy extraction element 100.In addition, in step S514, after also can finishing in the processing of step S513, even through the stipulated time, under other of original copy page situation about also not being set up, perhaps accepted under the situation of the end indication that original copy reads by user's operating operation panel 15, control module 11 is judged to be the processing of other page that original copy is not set.Control module 11 is by proceeding to processing step S516, thereby original copy extraction unit 245 extracts and the master copy data (S516) shown in each page index that the identical a plurality of original copy index of the full page that reads so far are associated respectively, and coloured image forms device 14 and carries out original copy output processing (S517) with the original copy output corresponding with the master copy data that extracts.Thus, the original copy extraction element 100 outputs a plurality of original copys corresponding with a plurality of original copy index.After step S517 finishes, original copy extraction element 100 end process.
As top described in detail, in the original copy extraction element of present embodiment, with corresponding to original copy in the original copy index that is associated of the high master copy data of the similar degree of input master copy data of the page or leaf that reads exist under a plurality of situations, ask the input master copy data corresponding, and obtain the input image data of other page gained that reads original copy with other page of original copy.And then the original copy extraction element of present embodiment is obtained the whole page or leaf that reads is jointly obtained the original copy index that is associated with the high master copy data of the similar degree of importing master copy data, and extracts a plurality of master copy datas that are associated with the original copy index of having obtained.Thus, exist under a plurality of situations at the original copy index that is judged as and imports the similar master copy data of master copy data, also utilize other page of original copy to carry out the scope compression of original copy index, and repeat the scope compression till the original copy index of definite and the similar master copy data of input master copy data.Thereby, thereby judge by utilizing multipage can carry out more accurate similar degree, and can extract required master copy data accurately.
(embodiment 3)
In embodiment 1 and 2, show the mode that to export any one original copy based on the input master copy data corresponding, but in embodiment 3, illustrate specific original copy is made the strict more mode of output condition with one page.The inner structure of the original copy extraction element of present embodiment is identical with the situation of the embodiment 1 that uses Fig. 1~Fig. 3 explanation.
Figure 17 be the storage unit of expression embodiment 3 12 that stored, with the concept map of the content example of the original copy table of master copy data and original copy correspondence.With Doc1, the Doc2 of independent expression original copy ... the original copy index write down page or leaf index and number of pages explicitly, and be recorded explicitly in order to export required output condition of original copy and original copy index.In example shown in Figure 17, do not have related output condition for the original copy index of Doc1~Doc4, for the original copy index association of Doc21 and Doc51 output condition.For the original copy index of Doc21, as output condition, the master copy data that ID21 and ID25 are corresponding in the page or leaf index of ID21~ID28 that related and original copy index are associated all with the similar situation of input master copy data.And, for the original copy index of Doc51, as output condition, the corresponding master copy data and the similar situation of input master copy data of page or leaf index in the page or leaf index of ID51~ID55 that related and original copy index are associated more than three.In addition, the content with master copy data and the corresponding mark sheet of characteristic of storage unit 12 storages of present embodiment is identical with the situation of the embodiment 1 that uses Figure 12 explanation.
In addition, the processing of the original copy extraction element of present embodiment registration master copy data is same with the situation of the embodiment 1 of the flowchart text that uses Figure 13.In addition, the situation of the embodiment 1 of the flowchart text of the processing of the extraction master copy data that the original copy extraction element of present embodiment carries out and use Figure 14, perhaps use the situation of embodiment 2 of Figure 15 and Figure 16 explanation roughly same, but the content that the output of the original copy of step S44 or step S517 is handled is different with embodiment 1 or 2.
Figure 18 is the process flow diagram of the step handled of original copy output that the original copy extraction element of expression embodiment 3 is carried out.The original copy extraction element 100 of present embodiment is carried out the processing of step S31~S43 shown in Figure 14 or Figure 15 and step S501~S516 shown in Figure 16 in the processing of extracting master copy data.In the original copy output of step S44 or step S517 is handled, original copy index (S61) that master copy data is associated in the master copy data that control module 11 is at first selected to extract in step S43 or step S516 with original copy extraction unit 245.Control module 11 is then the original copy table of storage in the retrieve stored unit 12, judges on the original copy index of having selected whether be associated with output condition (S62).Under the situation that is associated with output condition on the original copy index of having selected (S62: be), control module 11 judges whether the output condition that is associated with the original copy index is satisfied (S63).
For example, the original copy index of Doc21 shown in Figure 17 selected situation under, in step S37 or step S507, when the master copy data corresponding with ID21 and ID25 all is judged as and imports the similar master copy data of master copy data, be judged to be output condition and be satisfied.When not being judged as and importing the similar master copy data of master copy data, be judged to be output condition and be not satisfied with any one corresponding master copy data of ID21 and ID25.In addition, the original copy index of Doc51 selected situation under, in step S37 or step S507, all be judged as with the corresponding master copy data of page or leaf index more than three in the page or leaf index of ID51~ID55 and when importing the similar master copy data of master copy data, be judged to be output condition and be satisfied.When only being judged as and importing the similar master copy data of master copy data, be judged to be output condition and be not satisfied less than a page or leaf index pairing master copy data of three.
In step S62, at (S62: not) under the situation of related output condition not on the original copy index, perhaps in step S63 under the situation that output condition related on the original copy index is satisfied (S63: be), coloured image forms unit 14 by forming the image based on the represented master copy data of each page index that is associated with the original copy index of having selected, thus output and the corresponding original copy of having selected (S64) of original copy index.For example, the original copy corresponding with the original copy index of Doc1~Doc4 shown in Figure 17 is by the decision output condition, so unconditionally exported.In addition, corresponding with the original copy index of Doc21 and Doc51 original copy is output under the situation that has satisfied output condition.After step S64 finished, control module 11 proceeded to next step S65 with processing.In step S63, under the situation that the output condition that is associated with the original copy index is not satisfied (S63: not), do not export and the corresponding original copy of having selected of original copy index, and processing is proceeded to next step S65 by control module 11.Like this, control module 11 forbids exporting the master copy data that does not satisfy output condition.
Control module 11 judges then whether the processing to the whole master copy datas that extract among step S43 or the step S516 finishes (S65).Also having as yet not (S65: not) under the situation of the master copy data of end process, control module 11 is back to step S61 with processing, and is chosen in still non-selected original copy index in the original copy index that is associated with the master copy data that extracts among step S43 or the step S516.Under the situation that the processing of the whole master copy datas that extract in to step S43 or step S516 is through with (S65: be), control module 11 finishes the processing that the extraction master copy data was handled and processing was turned back in original copy output.After original copy output processing finishes, the processing that original copy extraction element 100 finishes to extract master copy data.
Be described in detail as above, the original copy extraction element of present embodiment is predetermined output condition to each original copy index, and when carrying out original copy output processing, only the pairing original copy of original copy index of output condition is satisfied in output.In embodiment 1 and 2, because can be, so, also may easily export whole original copy pages or leaves according to one page of original copy even comprise the high original copy of the such importance degree of secret information based on the input master copy data corresponding output original copy with one page.In the present embodiment, the original copy extraction element is for the original copy that has been determined output condition, export satisfying under the situation of output condition, so by the original copy decision output condition high to importance degree, thereby can prevent that the high original copy of importance degree from easily being exported.
For example, as output condition, it is similar to be judged to be input master copy data and master copy data in multipage, thereby can prevent the whole pages or leaves based on the high original copy of one page output importance degree of original copy.In addition, as output condition, it is similar to be judged to be input master copy data and specific master copy data, thereby the user with specific page of original copy can not extract original copy from the original copy extraction element.As specific master copy data, registration is used to represent do not have the master copy data of the content that is used to check of relevance to get final product with the main contents of the original copy that is made of multipage.As the content that is used to check, be under the situation of Japanese in the main contents of original copy, preferably the content that will be used to check is made as English etc., with the diverse form of the main contents of original copy.
By more than, the original copy extraction element of present embodiment can extract the original copy that has been determined output condition for the specific user with the specific master copy data that is used to check, and other user who does not have the specific master copy data that is used to check can not export the high original copy of importance degree.Thereby, in the present embodiment, by the high original copy of importance degree that contains secret information is predetermined output condition, thereby can protect the secret information that comprises in the original copy.
(embodiment 4)
In embodiment 1~3, the mode that to show original copy extraction element of the present invention be image processing system, but in embodiment 4, the mode that original copy extraction element of the present invention is shown is scanner device.Figure 19 is the block scheme of built-in function structure of the original copy extraction element 300 of the present invention of expression embodiment 4.Original copy extraction element 300 of the present invention comprises that control constitutes control module 31, the storage unit 32 that is made of semiconductor memory or hard disk etc. and the coloured image input block 33 of optically read coloured image of action of the each several part of original copy extraction element 300.Be connected with A/D converter unit 34 on the coloured image input block 33, be connected with shading correction unit 35 on the A/D converter unit 34, be connected with original copy on the shading correction unit 35 and extract processing unit 36.Original copy extracts to be connected with on the processing unit 36 master copy data is sent to outside transmitting element 37.Storage unit 32, coloured image input block 33, A/D converter unit 34, shading correction unit 35, original copy are extracted processing unit 36 and transmitting element 37 is connected to control module 31, and are connected with the operating unit 38 that is used to accept from user's operation on the control module 31.
The included storage unit 12 of the original copy extraction element 100 of explanation in storage unit 32 and the embodiment 1~3 is same, to each the original copy storage master copy data corresponding that constitutes by multipage respectively with each page, and stored original copy table, and with master copy data and characteristic characteristic of correspondence table with master copy data and original copy correspondence.In addition, be connected with outside PC or image processing system etc. on the transmitting element 37.
Coloured image input block 33 is made of the scanner with CCD, will look like to be decomposed into RGB from the reflected light of original copy and is read by CCD, outputs to A/D converter unit 34 after being transformed to the simulating signal of RGB.A/D converter unit 34 is transformed to digital rgb signal with the simulating signal of RGB, and rgb signal is outputed to shading correction unit 35.
The processing of the various distortions that illuminator, imaging system, the camera system that the 35 pairs of rgb signals from 34 inputs of A/D converter unit in shading correction unit are used for eliminating coloured image input block 33 produces.The processing of the color balance of rgb signal is adjusted in shading correction unit 35, goes forward side by side to be about to the reflectivity signals of RGB and to be transformed to the processing of concentration signal.The view data that rgb signal after shading correction unit 35 then will be handled constitutes is that master copy data outputs to original copy extraction processing unit 36.
Original copy extracts processing unit 36 and is constituted as that to extract processing unit 24 identical with the included original copy of the illustrated original copy extraction element of embodiment 1~3 100, and extracts processing unit 24 with original copy and carry out same processing.Promptly, original copy extracts master copy datas that processing unit 36 will 35 inputs from the shading correction unit as the input master copy data, carry out the same processing of processing shown in the process flow diagram with Figure 14 or Figure 15 and Figure 16, from storage unit 32 extract comprise with the high master copy data of input master copy data similar degree corresponding page the relevant a plurality of master copy datas of original copy.
Control module 31 sends to the outside by a plurality of master copy datas that original copy extracted processing unit 36 and extract by transmitting element 37, thus the master copy data that output extracts.Device such as the PC of 37 pairs of outsides of transmitting element or image processing system sends a plurality of master copy datas, and outside device is carried out based on processing such as a plurality of master copy datas formation images.
Be described in detail as top, also same with embodiment 1~3 in the present embodiment, a part of corresponding input master copy data based on the original copy that is constituted with multipage can extract and whole pages of original copy corresponding master copy datas.Thereby, in the present embodiment, though in the original copy that constitutes by multipage because of lose or pollution etc. produces under the situation of disappearance, also can from the database of having stored master copy data in advance, easily extract the master copy data in the whole page or leaf of original copy.
(embodiment 5)
In embodiment 5, the mode of using the general calculation machine to realize original copy extraction element of the present invention is shown.Figure 20 is the block scheme of inner structure of the original copy extraction element 400 of the present invention of expression embodiment 5.The original copy extraction element of the present invention 400 of present embodiment uses multi-purpose computer formations such as PC, comprises storage unit 44 such as the CPU41 that carries out computing, the RAM42 that stores the temporary information of following computing and producing, the driver elements such as CD-ROM drive 43 that read information from recording mediums 5 of the present invention such as CD, hard disk.CPU41 makes driver element 43 read computer program 51 of the present invention from storage medium 5 of the present invention, and the computer program 51 that reads is stored in the storage unit 44.Computer program 51 is loaded into RAM42 from storage unit 44 as required, and CPU41 carries out necessary processing based on the 51 pairs of original copy extraction elements of computer program 400 that load.
In addition, original copy extraction element 400 comprises that input input blocks 45 such as the keyboard of the information such as various processing indications that produce or indicating device, shows the display units such as LCD 46 of various information by user operation.And original copy extraction element 400 comprises the transmitting element 47 that is connected with the output unit 61 of the outside of output original copys such as image processing system, the receiving element 48 that is connected with the input media 62 of the outside of input master copy datas such as scanner device.Transmitting element 47 sends to output unit 61 with master copy data, and output unit 61 is based on master copy data output original copy.Input media 62 optically read original copys also generate master copy data, and the master copy data that generates is sent to original copy extraction element 400, and receiving element 48 receives the master copy data that sends from input media 62.Receiving element 48 is obtained the unit as the master copy data among the present invention and is worked.
The included storage unit 12 of the original copy extraction element 100 of explanation in storage unit 44 and the embodiment 1~3 is same, to each the original copy storage master copy data corresponding that constitutes by multipage respectively with each page, and, stored original copy table with master copy data and original copy correspondence, and with master copy data and characteristic characteristic of correspondence table.
CPU41 is written into computer program 51 of the present invention among the RAM42, and carries out the processing of original copy extracting method of the present invention according to the computer program 51 that has been written into.Promptly, import under the situation of master copy data from input media 62 by receiving element 48, with the input master copy data as the input master copy data, CPU41 carries out the same processing of processing shown in the process flow diagram with Figure 14 or Figure 15 and Figure 16, and from storage unit 44, extract contain with the high master copy data of input master copy data similar degree corresponding page the relevant a plurality of master copy datas of original copy.A plurality of master copy datas that CPU41 will extract send to output unit 61 from transmitting element 47, output unit 61 original copy that output is made of multipage based on master copy data.In addition, CPU41 also can carry out coming the processing and utilizing application program and the processing of the application datas such as text data that generate as master copy data.
Be described in detail as above, also same in the present embodiment with embodiment 1~4, based on a part of corresponding input master copy data of the original copy that constitutes by multipage, can extract and whole pages of original copy corresponding master copy datas.Thereby, in the present embodiment, though in the original copy that constitutes by multipage because of lose or pollution etc. produces under the situation of disappearance, also can from the database of having stored master copy data in advance, easily extract the master copy data in the whole page or leaf of original copy.
In addition, in the present embodiment, show the mode of extracting the processing of necessary master copy data in the master copy data that carries out storage in the storage unit built-in from original copy extraction element 400 44, but be not limited thereto, original copy extraction element 400 of the present invention also can be a mode of extracting the processing of necessary master copy data from the original copy storage unit of not shown outside such as memory storage that is connected by communication network or server unit in the master copy data of storing.
In addition, the recording medium of the present invention 5 that has write down computer program 51 of the present invention can be any mode of CDs such as tape, disk, mobile model hard disk, CD-ROM/MO/MD/DVD or IC-card (comprising storage card)/card type record mediums such as light-card.In addition, recording medium 5 of the present invention also can be mounted in the original copy extraction element 400, the semiconductor memory of the recorded content that CPU41 can read-out recording medium 5, is mask rom, EPROM (Erasable Programmable Read Only Memory), EEPROM (Electrically Erasable Programmable Read Only Memory), flash rom etc.
In addition, computer program 51 of the present invention also can be to download to original copy extraction element 400 and be stored in mode the storage unit 44 from the not shown external server device that is connected to original copy extraction element 400 via communication networks such as internet or LAN.Under the situation of this mode, downloading computer program 51 required programs are stored in the storage unit 44 in advance, perhaps are stored in the storage unit 44 after the recording medium of regulation uses driver element 43 to read, and are loaded among the RAM42 as required and get final product.

Claims (8)

1. an original copy extracting method extracts specific master copy data from the master copy data of storage, it is characterized in that having:
The original copy index and the step of storing explicitly of the original copy that expression is made of multipage corresponding to the master copy data that is included in each page in the described original copy;
The characteristic of feature that will calculate based on the unique point from master copy data, extracted, the described master copy data of expression, the step of storing explicitly with described master copy data;
Obtain step as the input master copy data of new master copy data;
The step of extract minutiae from the input master copy data of having obtained;
Generate the step of the characteristic of the feature of representing the input master copy data based on the unique point that extracts;
Compare by the characteristic that will generate and the characteristic of storage, thereby judge master copy data that is associated with the characteristic of storage and the step of importing the similar degree of master copy data;
Obtain and the step that is judged as the original copy index that is associated with the high master copy data of the similar degree of importing master copy data; And
Extract the step of a plurality of master copy datas corresponding with the multipage that comprises in the represented original copy of the original copy index of having obtained.
2. an original copy extraction element comprises the original copy storage unit that is used to store master copy data, extracts specific master copy data from the master copy data that this original copy storage unit is stored, and it is characterized in that, comprising:
The original copy index storage unit, the original copy index of the original copy that expression is made of multipage with store explicitly corresponding to the master copy data that is included in each page in the described original copy;
The characteristic storage unit, the characteristic of feature that will calculate based on the unique point of extracting from master copy data, the described master copy data of expression is stored explicitly with described master copy data;
Master copy data is obtained the unit, obtains the input master copy data as new master copy data;
The feature point extraction unit, from this master copy data obtain the unit obtained the input master copy data extract minutiae;
Generation unit, the unique point that is extracted based on this feature point extraction unit generates the characteristic that the feature of master copy data is imported in expression;
Identifying unit, the characteristic of being stored by characteristic that this generation unit has been generated and described characteristic storage unit compares, thereby judges master copy data that is associated with characteristic that described characteristic storage unit stored and the similar degree of importing master copy data;
The unit obtained in the original copy index, obtains and be judged to be the original copy index that is associated with the high master copy data of the similar degree of importing master copy data by described identifying unit; And
The master copy data extraction unit, extract with this original copy index obtain the unit obtained the represented original copy of original copy index in the corresponding a plurality of master copy datas of multipage that comprise.
3. original copy extraction element as claimed in claim 2 is characterized in that,
Described characteristic storage unit and master copy data be a plurality of characteristics of the feature of this master copy data of storage representation explicitly,
Described generation unit generates a plurality of characteristics of the feature of expression input master copy data,
Described identifying unit has:
The ballot unit, each of a plurality of characteristics that generated for described generation unit is voted to the master copy data that the characteristic consistent with this characteristic is associated; And
Judging unit is in the master copy data that described original copy storage unit is stored, the master copy data of number of votes obtained maximum or number of votes obtained be the high master copy data of similar degree that master copy data more than the ormal weight was judged as and imported master copy data.
4. as claim 2 or 3 described original copy extraction elements, it is characterized in that,
Described master copy data is obtained the unit and is obtained a plurality of input master copy datas,
Described identifying unit is for each of a plurality of input master copy datas, judges the master copy data that described original copy storage unit is stored and the similar degree of input master copy data,
Under the consistent mutually situation of the original copy index that is associated with the high master copy data of each similar degree of a plurality of input master copy datas, the corresponding a plurality of master copy datas of multipage that comprise in described master copy data extraction unit extraction and the represented original copy of this original copy index.
5. original copy extraction element as claimed in claim 4 is characterized in that,
Also comprise request unit, under the situation that has obtained the original copy index that the high master copy data of similar degree a plurality of and the input master copy data is associated, perhaps with original copy index that the high master copy data of each similar degree of a plurality of input master copy datas is associated in, obtained under a plurality of situations the more input of request master copy data to the common original copy index of described a plurality of input master copy datas.
6. as claim 2 or 3 described original copy extraction elements, it is characterized in that,
Thereby described master copy data is obtained the unit and is obtained the input master copy data by optically read original copy.
7. original copy extraction element as claimed in claim 2 is characterized in that, also comprises:
The output condition storage unit is stored the output condition that is used for exporting the required regulation of the pairing master copy data of each page that the represented original copy of this original copy index comprises explicitly with the original copy index;
The output condition identifying unit judges whether the output condition that is associated with the original copy index is satisfied, and described original copy index is associated with the master copy data that described master copy data extraction unit is extracted;
Output unit is being judged to be under the situation that described output condition has been satisfied, the multipage corresponding a plurality of master copy datas that comprise in output and the represented original copy of original copy index; And
Forbid the unit, be judged to be under the situation that described output condition has not been satisfied, forbid exporting with the represented original copy of original copy index in multipage corresponding a plurality of master copy datas of comprising.
8. original copy extraction element as claimed in claim 2 is characterized in that,
Also comprise image formation unit, form a plurality of images of a plurality of master copy datas that extracted based on described master copy data extraction unit.
CN2008101316932A 2007-07-24 2008-07-23 Document extracting method and document extracting apparatus Expired - Fee Related CN101354717B (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
JP2007192192 2007-07-24
JP192192/07 2007-07-24
JP162324/08 2008-06-20
JP2008162324A JP4340714B2 (en) 2007-07-24 2008-06-20 Document extraction method, document extraction apparatus, computer program, and recording medium

Publications (2)

Publication Number Publication Date
CN101354717A CN101354717A (en) 2009-01-28
CN101354717B true CN101354717B (en) 2010-09-29

Family

ID=40307526

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008101316932A Expired - Fee Related CN101354717B (en) 2007-07-24 2008-07-23 Document extracting method and document extracting apparatus

Country Status (2)

Country Link
JP (1) JP4340714B2 (en)
CN (1) CN101354717B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440622A (en) * 2013-07-31 2013-12-11 北京中科金财科技股份有限公司 Image data optimization method and device
CN109284787B (en) * 2018-08-02 2022-02-25 广东南天司法鉴定所 Method and device for automatically collecting handwritten ink mark color level

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1190218A (en) * 1997-02-07 1998-08-12 松下电器产业株式会社 Filing apparatus
CN1684493A (en) * 2004-04-13 2005-10-19 富士施乐株式会社 Image forming apparatus, program therefor, storage medium, and image forming method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1190218A (en) * 1997-02-07 1998-08-12 松下电器产业株式会社 Filing apparatus
CN1684493A (en) * 2004-04-13 2005-10-19 富士施乐株式会社 Image forming apparatus, program therefor, storage medium, and image forming method

Also Published As

Publication number Publication date
JP4340714B2 (en) 2009-10-07
JP2009048618A (en) 2009-03-05
CN101354717A (en) 2009-01-28

Similar Documents

Publication Publication Date Title
CN101398649B (en) Image data output processing apparatus and image data output processing method
EP3734510A1 (en) Composite code pattern, generating device, reading device, method, and program
CN101571698B (en) Method for matching images, image matching device, image data output apparatus, and recording medium
CN101539996B (en) Image processing method, image processing apparatus, image forming apparatus
CN101184137B (en) Image processing method and device, image reading and forming device
US8351706B2 (en) Document extracting method and document extracting apparatus
CN101404020B (en) Image processing method, image processing apparatus, image forming apparatus, image reading apparatus
CN101431582B (en) Image processing apparatus, image forming apparatus, image processing system, and image processing method
CN101923644B (en) Image processing method, image processing apparatus and image forming apparatus
CN101163188B (en) Image processing apparatus, image reading apparatus, image forming apparatus, image processing method
CN101320426B (en) Image processing device and method, image forming device and image processing system
CN101582117A (en) Image processing apparatus, image forming apparatus, image processing system, and image processing method
CN101299240B (en) Image processing apparatus, image forming apparatus, image processing system, and image processing method
CN101277368B (en) Image processing apparatus, image forming apparatus, image processing system, and image processing method
CN101320425B (en) Image processing apparatus, image forming apparatus, and image processing method
JP2019192193A (en) Complex code pattern, generating device, reading device, method, and program
CN101364268B (en) Image processing apparatus and image processing method
CN101286200A (en) Image processing apparatus and mehthod, image forming apparatus, and image processing system
CN101324928B (en) Image processing method, image processing apparatus, and image forming apparatus
CN101505349A (en) Image processing method, image processing apparatus, image reading apparatus and image forming apparatus
CN101369314A (en) Image processing apparatus, image forming apparatus, image processing system, and image processing method
CN101520846B (en) Image processing method, image processing apparatus and image forming apparatus
CN101354717B (en) Document extracting method and document extracting apparatus
CN101261684B (en) Image processing method, image processing apparatus, and image forming apparatus
JP2008228211A (en) Image output method, image processing apparatus, image forming apparatus, image reading apparatus, computer program, and record medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20100929

CF01 Termination of patent right due to non-payment of annual fee