WO2012050379A2 - 출판물의 핑거프린트 추출 방법, 출판물의 핑거프린트 추출 장치, 핑거프린트를 이용한 출판물 식별 시스템 및 핑거프린트를 이용한 출판물 식별 방법 - Google Patents
출판물의 핑거프린트 추출 방법, 출판물의 핑거프린트 추출 장치, 핑거프린트를 이용한 출판물 식별 시스템 및 핑거프린트를 이용한 출판물 식별 방법 Download PDFInfo
- Publication number
- WO2012050379A2 WO2012050379A2 PCT/KR2011/007633 KR2011007633W WO2012050379A2 WO 2012050379 A2 WO2012050379 A2 WO 2012050379A2 KR 2011007633 W KR2011007633 W KR 2011007633W WO 2012050379 A2 WO2012050379 A2 WO 2012050379A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- fingerprint
- publication
- text
- image
- electronic document
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000000605 extraction Methods 0.000 claims abstract description 75
- 238000012795 verification Methods 0.000 claims abstract description 11
- 238000007781 pre-processing Methods 0.000 claims description 37
- 239000000284 extract Substances 0.000 claims description 29
- 238000012545 processing Methods 0.000 claims description 8
- 238000000926 separation method Methods 0.000 claims description 8
- 238000010276 construction Methods 0.000 claims description 3
- 238000012937 correction Methods 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 2
- 238000012015 optical character recognition Methods 0.000 description 14
- 238000005516 engineering process Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 3
- 238000004806 packaging method and process Methods 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000007639 printing Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008450 motivation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/10—Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
- G06F21/105—Arrangements for software license management or administration, e.g. for managing licenses at corporate level
Definitions
- the present invention relates to content identification, and more particularly, to a fingerprint extraction method of a publication, a fingerprint extraction apparatus of a publication, a publication identification system using a fingerprint, and a publication identification method using a fingerprint.
- DRM Digital Rights Management
- DPP Digital Property Protection
- FIG. 1 schematically shows a general content protection method to which a protection device such as DRM is applied.
- Content Providers encrypt and package content using original content and an encryption key and provide the same, and users access the corresponding DRM server to perform a purchase authentication process to provide the corresponding content. You must obtain a legitimate purchase to obtain a key to unlock the password and a license to use the content so that the content can be played.
- Conventional copyright protection method is to protect the copyright of the content by using an encryption or packaging method.
- the contents may be illegally distributed.
- the DRM applied to a specific e-book reader device is hacked, and the electronic publications for the e-book reader device are illegally distributed.
- the content identification technology can be used to determine whether the publication is infringing or illegal distribution. There is a need for effective protection of copyright.
- An object of the present invention to overcome the above disadvantages is to provide a fingerprint extraction method of a publication that can easily identify the publication to determine whether copyright infringement and effective copyright protection.
- Another object of the present invention is to provide a fingerprint extraction apparatus for performing a fingerprint extraction method of the publication.
- Another object of the present invention is to provide a publication identification system using a fingerprint that can easily identify a publication and effectively protect copyrights.
- Another object of the present invention is to provide a method of operating a publication identification system using the fingerprint.
- Fingerprint extraction method for achieving the above object of the present invention, extracting a text from the electronic document of the input text format and extracting a text fingerprint from the extracted text Include.
- the text after performing preprocessing on the electronic text in the input text format, the text may be extracted from the electronic text in the input text format.
- the preprocessing of the input text format electronic document may include a typo correction or character restoration.
- the fingerprint extraction method for achieving the object of the present invention, the step of inputting an electronic document in the form of an image, and if the input electronic document in the form of a text-based electronic document Converting an electronic document in the form of an input image into an electronic document in the form of text, extracting text from the converted electronic document in the form of text, and extracting a text fingerprint from the extracted text.
- the step of inputting the electronic document in the form of an image may include performing preprocessing on the electronic document in the form of an image after the electronic document in the form of an image is input.
- the performing of the preprocessing on the electronic document in the image form may perform at least one of noise removal, page separation, image rotation, and tilt adjustment of the image included in the electronic document in the image form.
- the method when the input electronic document in the form of an image is an image-based electronic document, performing the preprocessing on the electronic document in the input image form and the image form in which the preprocessing is performed.
- the method may further include extracting an image fingerprint from the electronic document.
- the fingerprint extraction apparatus for achieving another object of the present invention is an image text conversion unit for converting an electronic document of the input image form into an electronic document of the text form, and the electronic form of the text
- the apparatus for extracting a fingerprint of the publication may further include an image preprocessor configured to perform at least one of noise reduction, page separation, image rotation, and tilt adjustment of an image included in the electronic document in the input image form.
- the fingerprint extractor may extract an image fingerprint from a preprocessed image provided from the image preprocessor.
- the fingerprint extracting unit may further include a text preprocessing unit provided to the text extracting unit after preprocessing the electronic document in the text form or the electronic document in the input text form provided from the image text converting unit.
- a publication identification system using a fingerprint for achieving another object of the present invention
- the fingerprint extraction device for extracting the fingerprint of the original publication, and the original provided from the fingerprint extraction device
- a publication information construction device for storing a fingerprint of a publication and additional information of the original publication
- a DBMS DataBase Management System
- the fingerprint extracting apparatus extracts text from the electronic document in text format, and then extracts a text fingerprint from the extracted text.
- the query publication is an electronic document in the form of an image
- the electronic document in the form of an image may be converted into an electronic document in a text form, and then the text may be extracted from the converted text form and the text fingerprint may be extracted from the extracted text. have.
- the fingerprint extracting apparatus extracts an image fingerprint from the electronic document in the form of an image after performing the preprocessing on the electronic document in the form of an image. can do.
- the additional information of the original publication may include at least one of a producer, a publisher, a title, a summary, a publication date, an ISBN, an address, a telephone number, and a fax number of the original publication.
- the fingerprint extraction apparatus for extracting a fingerprint for the collected query publications for identification, and the finger
- a fingerprint query device for querying a fingerprint of an original publication corresponding to a fingerprint for the query publication provided from a print extraction device, a fingerprint extracted from the original publication, and additional information of the original publication
- a DBMS DataBase Management System
- a DBMS DataBase Management System
- the candidate group verification apparatus may compare the search result candidate group and the fingerprint of the query publication, and identify the query publication based on a comparison result.
- the candidate group verification apparatus may obtain and provide additional information corresponding to the query publication from the DBMS.
- the publication identification method using a fingerprint for the collected query publication, and extracted from the collected query publication Retrieving a fingerprint of the original publication corresponding to the fingerprint from the DBMS and identifying whether the collected query publication is infringing based on at least one search result.
- Identifying the collected query publications based on the at least one search result may identify the query publications based on a comparison result of comparing the at least one search result with a fingerprint of the query publication.
- the publication identification method using the fingerprint may further include obtaining additional information corresponding to the query publication from the DBMS when it is determined that the query publication is identical to the original publication as a result of identifying the collected query publication. can do.
- the fingerprint extraction apparatus of the publication the publication identification system using the fingerprint, and the publication identification method using the fingerprint
- the fingerprint of the publication is extracted by using the original publication. It manages in association with information, and extracts fingerprints of query publications to identify unknown publication information.
- the information of the identified publication is used to determine whether the publication is illegally distributed or copyright infringement.
- a publication identification system using a fingerprint may be used for retrieving the information of the original publication by inputting some information of the publication (for example, a publication of several pages).
- FIG. 1 schematically shows a general content protection method to which a protection device such as DRM is applied.
- FIG. 3 is a flowchart illustrating a method of extracting a text fingerprint in the form of an electronic document.
- FIG. 4 is a flowchart illustrating a method of extracting a text fingerprint from a publication in the form of an image.
- FIG. 5 is a flowchart illustrating a method of extracting an image fingerprint from a publication in the form of an image.
- FIG. 6 is a flowchart illustrating a fingerprinting extraction method of a publication according to an embodiment of the present invention.
- FIG. 7 is a block diagram showing a configuration of a fingerprint extraction apparatus of a publication according to an embodiment of the present invention.
- FIG. 8 is a block diagram showing the configuration of a publication identification system according to an embodiment of the present invention.
- FIG. 9 is a block diagram showing the configuration of a publication identification system according to another embodiment of the present invention.
- FIG. 10 is a flowchart illustrating a publication identification method of a publication identification system according to an exemplary embodiment of the present invention.
- the original content itself may be lost if the creator of the publication loses or neglects the storage medium in which the publication was stored, if the publication file provided to the publisher in the form of a digital file is leaked, or if the file is leaked due to the release of DRM. If it leaks.
- the second case is a case in which a user directly digitizes a publication printed in a book form.
- a user since the printed publication is converted into an electronic document form, a pirated publication of good quality can be mass-produced through mass printing.
- the third is a case where the user digitizes a publication printed in the form of a novel, magazine or comic book by scanning with a scanner.
- the user can digitize the publication by using the scanner's automatic input device by dismantling the printed publication, using a device that automatically turns the publication, or by handing the publication by hand and scanning the printed publication and storing it as an image. Can be.
- the printed publication is digitized by the user using a camera.
- the digitized file may be stored in the form of an image, and a difference in quality may occur according to the skill of the capturing user.
- Text is a major means of conveying information in publications such as novels
- images are a major means of conveying information in publications such as magazines and comic books.
- the third and fourth methods of the digital method for illegally distributing the above-mentioned publications are digitalized in the form of images.
- a text fingerprint based publication identification technique is required in the form of an image file
- the publication digitized in the form of an image is an image-based publication such as a magazine or a comic book.
- an image fingerprint based publication identification technique is required in the form of an image file.
- the fingerprint represents characteristic information unique to the content or publication and may also be called a feature point or DNA.
- FIG. 3 is a flowchart illustrating a method of extracting a text fingerprint in the form of an electronic document.
- an electronic document form is a document file (for example, TXT, Korean file, word file, etc.) that is created using various document creation programs in an information processing apparatus including a computer and stored in a text format. , PDF files stored in text format).
- a document file for example, TXT, Korean file, word file, etc.
- PDF files stored in text format
- the fingerprint extraction apparatus performs text preprocessing so that text extraction from the input text documents is desired (step 320).
- the input text document may be electronic documents created using various document creation programs as described above.
- the text preprocessing process may include a process of restoring a character having an abnormal shape due to a typo correction or an error, and may not be necessarily performed and may be selectively performed only when necessary.
- the fingerprint extraction apparatus extracts only text, which is an information transmitting means of the publication, of text documents that have undergone text preprocessing for fingerprint extraction (step 330).
- the fingerprint extraction apparatus extracts a fingerprint for the text extracted through the execution of step 330, thereby extracting a fingerprint for the publication in the form of a text-based electronic document (step 340).
- FIG. 4 is a flowchart illustrating a method of extracting a text fingerprint from a publication in the form of an image.
- the fingerprint extractor performs OCR (Optical Character Recognition) performance on the document in the form of an input image file.
- Image preprocessing is performed to improve (step 420).
- the image file type refers to an image file that can be displayed through a commercial image viewer
- image preprocessing is a process of processing elements that may degrade text recognition performance when OCR is applied to an image type document. This can include processing such as noise reduction, page separation, rotation, and tilt adjustment.
- the fingerprint extracting apparatus performs OCR on the document in the form of the image file which has been preprocessed to convert the document in the form of the image file into the form of the electronic document in the form of text (step 430).
- an electronic document converted to text through OCR may include an abnormally shaped character (or noise) that is mistaken as a limitation of OCR performance.
- the fingerprint extraction apparatus performs a preprocessing process to remove the abnormally shaped characters or noise as described above for the electronic document of the text form converted through the execution of step 430 (step 440).
- the fingerprint extraction apparatus extracts text from the preprocessed text-type electronic document (step 450), and extracts a text fingerprint of the extracted text (step 460).
- the text preprocessing process, the text extraction process, and the text fingerprint extraction process of steps 440 to 460 are preferably performed corresponding to the recognition algorithm and the performance of the OCR performed in step 430.
- steps 320 to 340 shown in FIG. 3 perform the same functions as those of steps 440 to 460 shown in FIG. 4, but the fingerprint extraction process shown in FIG.
- the fingerprint extraction process illustrated in FIG. 4 converts an input image file type document into an electronic document in a text form via OCR, and then extracts a fingerprint. This increases the probability that the converted electronic document will contain noise.
- the fingerprint extraction apparatus for performing the fingerprint extraction method illustrated in FIG. 4 is preferably a fingerprint extraction apparatus that is more robust to noise than the fingerprint extraction apparatus for performing the fingerprint extraction method illustrated in FIG. 3.
- the fingerprint extraction process illustrated in FIG. 3 may be included in FIG. 4.
- FIG. 5 is a flowchart illustrating a method of extracting an image fingerprint from a publication in the form of an image.
- an image is a main means of transmitting information. Therefore, for the publication in which the image is used as a means of transmitting information as described above, the image fingerprint is extracted for copyright protection.
- the fingerprint extracting apparatus may perform preprocessing to effectively extract a fingerprint from a document in the form of an input image.
- the preprocessing may include removing elements that may interfere with image fingerprint extraction.
- the preprocessing may include processing such as noise removal, page separation, rotation, and tilt adjustment.
- the fingerprint extraction apparatus extracts an image fingerprint from the preprocessed image (step 530).
- FIG. 6 is a flowchart illustrating a method for extracting a fingerprint of a publication according to an embodiment of the present invention. The contents of FIGS. 2 to 5 are synthesized.
- the fingerprint extraction apparatus determines whether the inputted digitized publication is an image file or a text file (step 610). In the case of an image file, preprocessing of the image is performed (step 620).
- image preprocessing is a process of removing noise elements, page separation, rotation, and the like that remove the elements that may impair the text recognition performance or the image fingerprint extraction when OCR is applied to the image type document. It may include processing such as tilt adjustment.
- the fingerprint extraction apparatus determines whether the preprocessed image is text in the form of an image (step 630), and if it is determined that the text is in the form of an image, performs an OCR to convert the text in the form of an image into an electronic document in a text form. (Step 640).
- an electronic document converted into text through OCR may include an abnormally shaped character (or noise) that is incorrectly recognized as a limitation of recognition performance in the OCR process.
- the fingerprint extracting apparatus performs a text preprocessing operation to remove the abnormally shaped characters or noise of the converted electronic document in the text form (step 650).
- the fingerprint extraction apparatus extracts text from the preprocessed text-type electronic document (step 660), and extracts a text fingerprint of the extracted text (step 670).
- step 610 of FIG. 6 determines whether the inputted digitized publication is a text document. If it is determined in step 610 of FIG. 6 that the inputted digitized publication is a text document, the fingerprint extracting apparatus proceeds to step 650 instead of performing steps 620 to 640 and sequentially performs steps 650 to 670. do.
- the fingerprint extraction apparatus does not perform steps 640 to 670, and proceeds to step 680 to obtain a preprocessed image.
- the image fingerprint is extracted (step 680).
- FIG. 7 is a block diagram showing a configuration of a fingerprint extraction apparatus of a publication according to an embodiment of the present invention.
- the fingerprint extracting apparatus 700 may include a controller 710, an image preprocessor 720, an image-text converter 730, a text preprocessor 740, It may include a text extractor 750 and a fingerprint extractor 760.
- the controller 710 determines the type of the inputted digitized publication, and provides the inputted digitized publication to the image preprocessor 720 or the text preprocessor 740 according to the determination result.
- the controller 710 may provide an image preprocessing unit when the input publication is an electronic document in the form of an image scanned by a scanner or captured by a camera, and preprocess the text when the input publication is an electronic document in a text form. Provided to section 740.
- controller 710 may control operations of other components constituting the fingerprint extraction apparatus.
- the image preprocessor 720 performs preprocessing such as noise removal, page separation, rotation, and tilt adjustment so as to improve the recognition performance of the OCR for the electronic document in the image form provided from the controller 710, and then the preprocessing is performed. If the preprocessed image is an electronic document in the form of an image composed of text, the preprocessed image is provided to the image-to-text converter 730, and the preprocessed image is composed of an image such as a magazine or a cartoon. In case it is provided to the fingerprint extractor 760.
- preprocessing such as noise removal, page separation, rotation, and tilt adjustment
- the image-text converter 730 may be configured as an OCR, converts the preprocessed image provided from the image preprocessor 730 into an electronic document in a text form, and then converts the converted text document into a text extractor ( 750).
- the text preprocessor 740 performs a preprocessing process to remove an abnormal character or noise of the text type electronic document provided from the text preprocessor 740 or the controller 710, and then the electronics of the preprocessed text type.
- the document is provided to the text extraction unit (750).
- the text extractor 750 receives the electronic document in the form of preprocessed text from the text preprocessor 740, extracts the text, which is an information transmission means of the publication, from the provided electronic document, and provides the extracted text to the fingerprint extractor 760. do.
- the fingerprint extractor 760 extracts an image fingerprint from a preprocessed image provided from the image preprocessor 720 or extracts a text fingerprint from text provided from the text extractor 750.
- the fingerprint extractor 720 may extract a fingerprint from an image or text using a known fingerprint extraction technique.
- the fingerprint extractor 760 may include an image fingerprint extractor module 761 and a text fingerprint extractor module 763, and the image fingerprint extractor module 761 may be configured from the image preprocessor 720.
- the image fingerprint is extracted from the provided preprocessed image, and the text fingerprint extraction module 763 extracts the fingerprint from the text provided from the text extraction unit 750.
- the method and apparatus for extracting a fingerprint of a publication according to an embodiment of the present invention shown in FIGS. 6 and 7 may be used to extract a fingerprint of an original publication, and may be used to extract illegally distributed publications retrieved or collected through the Internet. It may be used to extract a fingerprint, or may be used to extract a fingerprint of any publication for which information is desired.
- the fingerprint extraction method and apparatus of a publication according to an embodiment of the present invention may be used for extracting a fingerprint of a query publication.
- FIG. 8 is a block diagram illustrating a configuration of a publication identification system according to an exemplary embodiment of the present invention, in which a database is constructed using a fingerprint of a publication when an original of the publication is provided for copyright protection from a publication copyright holder or a publication provider.
- the system is illustrated by way of example.
- a publication identification system may include a fingerprint extraction apparatus 700, a publication information construction apparatus 810, and a database management system (DBMS) 830.
- DBMS database management system
- the fingerprint extracting apparatus 700 has the same configuration as shown in FIG. 7 and extracts the fingerprint of the original publication by executing the fingerprint extraction method shown in FIG. 6, and then constructs the publication information from the fingerprint of the extracted original publication. To the device 810.
- the publication information building device 810 is provided with the fingerprint of the original publication from the fingerprint extraction apparatus 700, and after receiving the information of the original publication from the publication copyright holder or the publication provider, the fingerprint of the original publication and the information of the original publication. Link to provide to the DBMS (830), and manages this.
- the information of the original publication may include various information related to the original publication, such as the author, publisher, title, summary, publication date, ISBN (International Standard Book Number), address, telephone number, and fax number of the original publication.
- the publication information building device 810 may store the original publication in the DBMS 830 for management of the publication, or may encrypt and store all or part of the publication in the DBMS 830 when security is required.
- the DBMS 830 stores the fingerprint of the original publication provided from the publication information building apparatus 810 and the publication information associated with it. In addition, the DBMS 830 may store the original publication as provided to the publication information building apparatus 810.
- FIG. 9 is a block diagram showing the configuration of a publication identification system according to another embodiment of the present invention.
- Files of digital publications or digitized publication files can be easily distributed through the Internet or the like.
- publication files can be distributed through various internet channels such as peer-to-peer, torrent, webhard, cafe, blog, and the like.
- digital publications or digitized publications can be easily distributed and transferred through portable storage devices or portable terminals.
- the publication identification system according to another embodiment of the present invention shown in FIG. 9 is used to identify a publication to be identified or to know illegally distributed or infringing publication through various channels as described above.
- a publication identification system may include a fingerprint extracting apparatus 700, a fingerprint querying apparatus 820, a DBMS 830, and a candidate group verifying apparatus 840. .
- the fingerprint extraction apparatus 700 has the same configuration as shown in FIG. 7 and executes the fingerprint extraction method shown in FIG.
- the fingerprint extracting apparatus 700 extracts fingerprints of the query publications searched and collected through various paths to identify whether the publication is illegally distributed or copyright infringement, and then extracts the extracted fingerprints to the fingerprint query apparatus 820. to provide.
- the fingerprint query apparatus 820 queries the DBMS 830 for fingerprints of query publications provided from the fingerprint extraction apparatus 700.
- the fingerprint query apparatus 820 provides the candidate group verification apparatus 840 with a fingerprint of the query publication provided from the fingerprint extraction apparatus 700.
- the DBMS 830 receives the fingerprint of the query publication from the fingerprint query device 820, searches for a corresponding fingerprint in the database, and provides the candidate group verification device 840 with at least one search result candidate group searched for.
- the search result candidate group may include a fingerprint of at least one original publication similar to the fingerprint of the query publication and information of the publication.
- the candidate group verification apparatus 840 verifies the search result candidate group provided by the DBMS 830 to determine whether the query publication is illegally distributed or infringed on copyright.
- the candidate group verification apparatus 840 may compare the fingerprint of the query publication provided from the fingerprint query apparatus 820 with the search result candidate group provided from the DBMS 830 to determine whether the query publication is illegal distribution or copyright infringement. Can be.
- the candidate group verifying apparatus 840 may obtain information of illegally distributed or infringed copyrighted publications from the DBMS 830 and provide the information to a relevant institution or administrator.
- the fingerprint extraction apparatus requires a large processing time to extract the fingerprint of the publication, so that the fingerprint extraction apparatus using the cloud computing concept is used to reduce the load on the system. It can be configured by dispersing. In addition, in order to improve the performance of the publication identification system and reduce the overall load, a technique of processing a file once retrieved separately using a hash or the like may be prevented.
- FIG. 10 is a flowchart illustrating a publication identification method of a publication identification system according to an exemplary embodiment of the present invention.
- a publication identification system first searches for and collects a publication that is suspected of being illegally distributed or infringing a copyright as a query publication (step 1010), and extracts a fingerprint of the collected query publication (step 1020).
- the publication identification system queries the DBMS for the publication corresponding to the extracted fingerprint (step 1030) and obtains a corresponding search result candidate group from the DBMS (step 1040).
- the search result candidate group obtained from the DBMS may include fingerprints of at least one publication corresponding to the fingerprint of the query publication.
- the publication identification system performs verification of the acquired search result candidate groups to identify corresponding publications that are determined to be illegally distributed (or distributed) or copyright infringed (step 1050).
- the publication identification system may identify the publication based on a result of comparing the fingerprint extracted by performing step 1020 with the fingerprint provided from the DBMS.
- the publication identification system provides the information obtained by acquiring the information of the illegally distributed or copyrighted publication from the DBMS (step 1060).
- the publication identification system extracts a fingerprint by using an original publication and manages it in association with metadata information of a publication for a publication for which copyright protection has been requested in advance.
- We can protect copyrights by establishing a system for and identifying illegally distributed or infringing publications using fingerprints of the publications.
- the present invention uses fingerprints to block illegal distribution in a situation where encryption and packaging methods are released, and to take appropriate protection measures in the event that unauthorized publications are distributed online.
- the publication identification system using a fingerprint may be used for the purpose of searching for information of the original publication by inputting some information (eg, a few pages of publication) of the publication. .
- some information eg, a few pages of publication
- the publication identification system according to an embodiment of the present invention using a fingerprint using feature points representing content specific information.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Technology Law (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Collating Specific Patterns (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
- Storage Device Security (AREA)
Abstract
Description
Claims (20)
- 입력된 텍스트 형식의 전자문서로부터 텍스트를 추출하는 단계; 및상기 추출된 텍스트로부터 텍스트 핑거프린트를 추출하는 단계를 포함하는 출판물의 핑거프린트 추출 방법.
- 제1항에 있어서, 상기 입력된 텍스트 형식의 전자문서로부터 텍스트를 추출하는 단계는,상기 입력된 텍스트 형식의 전자문서에 대해 전처리를 수행한 후, 상기 입력된 텍스트 형식의 전자문서로부터 텍스트를 추출하는 것을 특징으로 하는 출판물의 핑거프린트 추출 방법.
- 제2항에 있어서, 상기 입력된 텍스트 형식의 전자문서에 대한 전처리는 오타 정정 또는 문자 복원을 포함하는 것을 특징으로 하는 출판물의 핑거프린트 추출 방법.
- 이미지 형태의 전자문서가 입력되는 단계;입력된 상기 이미지 형태의 전자문서가 텍스트 기반 전자문서인 경우 상기 입력된 이미지 형태의 전자문서를 텍스트 형태의 전자문서로 변환하는 단계;상기 변환된 텍스트 형태의 전자문서로부터 텍스트를 추출하는 단계; 및상기 추출된 텍스트로부터 텍스트 핑거프린트를 추출하는 단계를 포함하는 출판물의 핑거프린트 추출 방법.
- 제4항에 있어서, 상기 이미지 형태의 전자문서가 입력되는 단계는,상기 이미지 형태의 전자문서가 입력된 후 상기 이미지 형태의 전자문서에 대한 전처리를 수행하는 단계를 포함하는 것을 특징으로 하는 출판물의 핑거프린트 추출 방법.
- 제5항에 있어서, 상기 상기 이미지 형태의 전자문서에 대한 전처리를 수행하는 단계는,상기 이미지 형태의 전자문서에 포함된 잡음 제거, 페이지 분리, 이미지 회전 및 이미지의 기울기 조정 중 적어도 하나의 처리를 수행하는 것을 특징으로 하는 출판물의 핑거프린트 추출 방법.
- 제4항에 있어서, 상기 출판물의 핑거프린트 추출 방법은,상기 입력된 상기 이미지 형태의 전자문서가 이미지 기반 전자문서인 경우에는,상기 입력된 이미지 형태의 전자문서에 대한 전처리를 수행하는 단계; 및상기 전처리가 수행된 이미지 형태의 전자문서로부터 이미지 핑거프린트를 추출하는 단계를 더 포함하는 것을 특징으로 하는 출판물의 핑거프린트 추출 방법.
- 제4항에 있어서, 상기 변환된 텍스트 형태의 전자문서로부터 텍스트를 추출하는 단계는,상기 변환된 텍스트 형식의 전자문서에 대해 전처리를 수행한 후, 상기 변환된 텍스트 형식의 전자문서로부터 텍스트를 추출하는 것을 특징으로 하는 출판물의 핑거프린트 추출 방법.
- 입력된 이미지 형태의 전자문서를 텍스트 형태의 전자문서로 변환하는 이미지 텍스트 변환부;상기 텍스트 형태의 전자문서로부터 텍스트를 추출하는 텍스트 추출부; 및상기 추출된 텍스트로부터 텍스트 핑거프린트를 추출하는 핑거프린트 추출부를 포함하는 출판물의 핑거프린트 추출 장치.
- 제9항에 있어서, 상기 출판물의 핑거프린트 추출 장치는상기 입력된 이미지 형태의 전자문서에 포함된 잡음 제거, 페이지 분리, 이미지 회전 및 이미지의 기울기 조정 중 적어도 하나의 처리를 수행하는 이미지 전처리부를 더 포함하는 것을 특징으로 하는 출판물의 핑거프린트 추출 장치.
- 제10항에 있어서, 상기 핑거프린트 추출부는상기 이미지 전처리부로부터 제공된 전처리된 이미지로부터 이미지 핑거프린트를 추출하는 것을 특징으로 하는 출판물의 핑거프린트 추출 장치.
- 제9항에 있어서, 상기 핑거프린트 추출부는상기 이미지 텍스트 변환부로부터 제공된 상기 텍스트 형태의 전자문서 또는 입력된 텍스트 형태의 전자문서에 대해 전처리를 수행한 후 상기 텍스트 추출부에 제공하는 텍스트 전처리부를 더 포함하는 것을 특징으로 하는 출판물의 핑거프린트 추출 장치.
- 원본 출판물의 핑거프린트를 추출하는 핑거프린트 추출 장치;상기 핑거프린트 추출 장치로부터 제공된 원본 출판물의 핑거프린트와 상기 원본 출판물의 부가 정보를 연계하여 저장하는 출판물 정보 구축 장치; 및상기 원본 출판물로부터 추출된 핑거프린트 및 상기 원본 출판물의 부가 정보가 저장되는 DBMS(DataBase Management System)을 포함하는 핑거프린트를 이용한 출판물 식별 시스템.
- 제13항에 있어서, 상기 핑거프린트 추출 장치는상기 원본 출판물 또는 상기 질의 출판물이 텍스트 형식의 전자 문서인 경우에는, 상기 텍스트 형식의 전자 문서로부터 텍스트를 추출한 후 추출된 텍스트로부터 텍스트 핑거프린트를 추출하고, 상기 원본 출판물 또는 상기 질의 출판물이 이미지 형태의 전자 문서인 경우에는 이미지 형태의 전자 문서를 텍스트 형태의 전자 문서로 변환한 후 변환된 텍스트 형태의 전자 문서로부터 텍스트를 추출하고, 추출된 텍스트로부터 텍스트 핑거프린트를 추출하는 것을 특징으로 하는 핑거프린트를 이용한 출판물 식별 시스템.
- 제14항에 있어서, 상기 핑거프린트 추출 장치는상기 원본 출판물 또는 상기 질의 출판물이 이미지 형태의 전자 문서인 경우에는 상기 이미지 형태의 전자 문서에 대한 전처리를 수행한 후 전처리가 수행된 이미지 형태의 전자 문서로부터 이미지 핑거프린트를 추출하는 것을 특징으로 하는 핑거프린트를 이용한 출판물 식별 시스템.
- 제13항에 있어서, 상기 원본 출판물의 부가 정보는원본 출판물의 제작자, 출판사, 제목, 요약, 발행일, ISBN, 주소, 전화번호 및 팩스번호 중 적어도 하나의 정보를 포함하는 것을 특징으로 하는 핑거프린트를 이용한 출판물 식별 시스템.
- 저작권 침해을 식별하기 위해 수집된 질의 출판물에 대한 핑거프린트를 추출하는 핑거프린트 추출 장치;상기 핑거프린트 추출 장치로부터 제공된 상기 질의 출판물에 대한 핑거프린트에 해당하는 원본 출판물의 핑거프린트를 질의하는 핑거프린트 질의 장치;원본 출판물로부터 추출된 핑거프린트 및 상기 원본 출판물의 부가 정보가 저장되고, 상기 핑거프린트 질의 장치의 질의에 상응하여 적어도 하나의 원본 출판물의 핑거프린트로 구성된 검색결과 후보군을 제공하는 DBMS(DataBase Management System); 및상기 DBMS로부터 제공된 검색결과 후보군을 검증하여 상기 질의 출판물의 저작권 침해 여부를 판단하는 후보군 검증 장치를 포함하는 핑거프린트를 이용한 출판물 식별 시스템.
- 제17항에 있어서, 상기 후보군 검증 장치는상기 검색 결과 후보군과 상기 질의 출판물의 핑거프린트를 비교하고, 비교결과에 기초하여 상기 질의 출판물을 식별하며, 상기 질의 출판물이 상기 DBMS 내부에 존재하는 출판물로 판단되는 경우 상기 질의 출판물에 해당하는 부가 정보를 상기 DBMS로부터 획득하여 제공하는 것을 특징으로 하는 핑거프린트를 이용한 출판물 식별 시스템.
- 수집된 질의 출판물에 대한 핑거프린트를 추출하는 단계;상기 수집된 질의 출판물로부터 추출된 핑거프린트와 대응되는 원본 출판물의 핑거프린트를 DBMS로부터 검색하는 단계; 및적어도 하나의 검색 결과에 기초하여 상기 수집된 질의 출판물을 식별하는 단계를 포함하는 핑거프린트를 이용한 출판물 식별 방법.
- 제19항에 있어서, 상기 적어도 하나의 검색 결과에 기초하여 상기 수집된 질의 출판물을 질의 출판물을 식별하는 단계는상기 적어도 하나의 검색 결과를 상기 질의 출판물의 핑거프린트와 비교한 비교결과에 기초하여 상기 질의 출판물을 식별하고, 상기 수집된 질의 출판물을 식별한 결과 상기 질의 출판물이 원본 출판물과 동일하다고 판별되는 경우 상기 DBMS로부터 상기 질의 출판물에 대응되는 부가 정보를 획득하는 단계를 더 포함하는 것을 특징으로 하는 핑거프린트를 이용한 출판물 식별 방법.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011800494631A CN103154957A (zh) | 2010-10-14 | 2011-10-13 | 出版物的指纹提取方法、出版物的指纹提取装置、利用指纹的出版物识别系统及利用指纹的出版物识别方法 |
US13/879,398 US20130290330A1 (en) | 2010-10-14 | 2011-10-13 | Method for extracting fingerprint of publication, apparatus for extracting fingerprint of publication, system for identifying publication using fingerprint, and method for identifying publication using fingerprint |
JP2013533773A JP2013543178A (ja) | 2010-10-14 | 2011-10-13 | 出版物のフィンガープリント抽出方法、出版物のフィンガープリント抽出装置、フィンガープリントを利用した出版物識別システム及びフィンガープリントを利用した出版物識別方法 |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2010-0100508 | 2010-10-14 | ||
KR20100100508 | 2010-10-14 | ||
KR20110023069A KR101491446B1 (ko) | 2010-10-14 | 2011-03-15 | 출판물의 핑거프린트 추출 방법, 출판물의 핑거프린트 추출 장치, 핑거프린트를 이용한 출판물 식별 시스템 및 핑거프린트를 이용한 출판물 식별 방법 |
KR10-2011-0023069 | 2011-03-15 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2012050379A2 true WO2012050379A2 (ko) | 2012-04-19 |
WO2012050379A3 WO2012050379A3 (ko) | 2012-06-14 |
Family
ID=45938813
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2011/007633 WO2012050379A2 (ko) | 2010-10-14 | 2011-10-13 | 출판물의 핑거프린트 추출 방법, 출판물의 핑거프린트 추출 장치, 핑거프린트를 이용한 출판물 식별 시스템 및 핑거프린트를 이용한 출판물 식별 방법 |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2012050379A2 (ko) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103164698A (zh) * | 2013-03-29 | 2013-06-19 | 华为技术有限公司 | 指纹库生成方法及装置、待测文本指纹匹配方法及装置 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20060032886A (ko) * | 2004-10-13 | 2006-04-18 | 한국전자통신연구원 | 핑거프린트 기반 불법복제 콘텐츠 추적 시스템 및 그 방법 |
KR20070032504A (ko) * | 2005-09-16 | 2007-03-22 | 삼성전자주식회사 | 텍스트 추출 기능을 갖는 호스트 장치 및 그의 텍스트 추출방법 |
KR20070106475A (ko) * | 2007-08-27 | 2007-11-01 | (주)코인미디어 랩 | 텍스트 복제 탐지 방법 |
KR20100080458A (ko) * | 2008-12-30 | 2010-07-08 | 이르데토 액세스 비.브이. | 데이터 객체 핑거프린팅 |
-
2011
- 2011-10-13 WO PCT/KR2011/007633 patent/WO2012050379A2/ko active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20060032886A (ko) * | 2004-10-13 | 2006-04-18 | 한국전자통신연구원 | 핑거프린트 기반 불법복제 콘텐츠 추적 시스템 및 그 방법 |
KR20070032504A (ko) * | 2005-09-16 | 2007-03-22 | 삼성전자주식회사 | 텍스트 추출 기능을 갖는 호스트 장치 및 그의 텍스트 추출방법 |
KR20070106475A (ko) * | 2007-08-27 | 2007-11-01 | (주)코인미디어 랩 | 텍스트 복제 탐지 방법 |
KR20100080458A (ko) * | 2008-12-30 | 2010-07-08 | 이르데토 액세스 비.브이. | 데이터 객체 핑거프린팅 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103164698A (zh) * | 2013-03-29 | 2013-06-19 | 华为技术有限公司 | 指纹库生成方法及装置、待测文本指纹匹配方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
WO2012050379A3 (ko) | 2012-06-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101491446B1 (ko) | 출판물의 핑거프린트 추출 방법, 출판물의 핑거프린트 추출 장치, 핑거프린트를 이용한 출판물 식별 시스템 및 핑거프린트를 이용한 출판물 식별 방법 | |
JP3542678B2 (ja) | 電子文書の単語間の空白部分の長さを利用した符号化および復号化方法、電子文書への署名情報の埋め込み方法、機密文書の暗号化方法 | |
WO2015034175A1 (ko) | 기업 내부 정보 보안을 강화하기 위한 방법, 시스템 및 장치 | |
US8873863B2 (en) | System and method for fingerprinting for comics | |
KR20010043172A (ko) | 아날로그 문서의 디지털 인증 | |
US8695061B2 (en) | Document process system, image formation device, document process method and recording medium storing program | |
CN104517045B (zh) | 数字文档保护方法及系统 | |
KR101803066B1 (ko) | 불법 복제된 서적의 통합 식별 시스템 및 방법 | |
JP2008083910A (ja) | ソフトウエア管理システムおよびソフトウエア管理プログラム | |
WO2021172668A1 (ko) | 블록체인을 이용한 최초 저작권자 인증 시스템 및 그 방법 | |
WO2020222475A1 (ko) | 조회 이력 정보와 문서 인증 정보에 의하여 인증 기능이 강화된 문서 인증 방법 및 문서 인증 시스템 | |
WO2020222476A1 (ko) | 조회 이력 통지에 의하여 인증 기능이 강화된 문서 인증 방법 및 문서 인증 시스템 | |
WO2015122620A1 (ko) | 디지털콘텐츠의 무결성 보장을 위한 디지털 콘텐츠 모니터링 시스템 | |
US8570547B2 (en) | Image registration device, image registration system, image registration method and computer readable medium that register the associated image acquired by the associated image acquisition unit with the associated image being assigned to the predetermined process | |
WO2012050379A2 (ko) | 출판물의 핑거프린트 추출 방법, 출판물의 핑거프린트 추출 장치, 핑거프린트를 이용한 출판물 식별 시스템 및 핑거프린트를 이용한 출판물 식별 방법 | |
JP2012182737A (ja) | 秘密資料流出防止システム、判定装置、秘密資料流出防止方法およびプログラム | |
JP4733310B2 (ja) | 分散型著作権保護方法、およびその方法を利用可能なコンテンツ公開装置、監視サーバならびにシステム | |
JP2004185312A (ja) | 文書管理装置 | |
WO2014027870A1 (ko) | 저작물 관리 방법 | |
JP3840580B1 (ja) | ソフトウエア管理システムおよびソフトウエア管理プログラム | |
CN116226885B (zh) | 一种复印机保密检查取证系统及方法 | |
WO2017115885A1 (ko) | 다양한 콘텐츠에 암호화정보를 포함시켜 검색 및 추적이 용이하도록 한 콘텐츠 검색 및 이력추적 감시시스템 | |
WO2017115884A1 (ko) | Epub 파일 암호화를 위한 단위 파일에 대한 압축과 복원 방법 및 장치 | |
JP4993588B2 (ja) | 画像処理装置、画像処理方法、画像処理プログラム及びコンピュータ読み取り可能な記録媒体 | |
JP2011028349A (ja) | 文書処理装置、文書処理システム及びプログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201180049463.1 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11832766 Country of ref document: EP Kind code of ref document: A2 |
|
ENP | Entry into the national phase in: |
Ref document number: 2013533773 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13879398 Country of ref document: US |
|
NENP | Non-entry into the national phase in: |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 11832766 Country of ref document: EP Kind code of ref document: A2 |