WO2012169841A2

WO2012169841A2 - Electronic book system, electronic book data formation, searching device, and method for same

Info

Publication number: WO2012169841A2
Application number: PCT/KR2012/004567
Authority: WO
Inventors: 이해성
Original assignee: 주식회사 내일이비즈
Priority date: 2011-06-08
Filing date: 2012-06-08
Publication date: 2012-12-13
Also published as: WO2012169841A9; WO2012169841A3

Abstract

One embodiment of the present invention generates electronic book data, said data comprising groups formed by clustering objects included in electronic documents on the basis of the objects and the coordinate values of the objects. The electronic book data formed is loaded into a search engine and a database management system (DBMS). By connecting with the search engine and the database management system a user can implement a search for bibliographic information in a business transaction breakdown or similar, or a search of the body of the electronic book data.

Description

E-book system and e-book data generation, retrieval device and method thereof

The present invention relates to an apparatus and method for generating, storing and retrieving e-book data.

E-books and electronic documents can generally be entered directly into a search engine. However, the layout of an e-book or electronic document that can be directly input into a search engine is only a solid document. In other words, e-books or electronic documents with very complicated layouts, such as multi-stage layouts or newspapers, could not be directly loaded into search engines. This is because it is not yet possible to accurately analyze the order of each paragraph or paragraph that can accurately read an e-book or an electronic document having a complicated layout by computer algorithm alone.

In addition, in a search engine linked with the conventional e-book system, contents such as tables, figures, formulas, and chemical formulas cannot be searched. In general, in order to express such objects in an e-book, it is convenient to express them all in pictures. However, objects represented by pictures could not be searched because there is no text.

In addition, in the conventional e-book system, it was possible to search the text for all e-books existing in e-book stores that are sold directly to consumers, but only e-books belonging to a specific e-book library, It is not possible to search the text only for e-books already purchased and owned.

An object of the present invention is to solve the above problems of the conventional e-book system.

That is, the present invention provides an e-book system in which an e-book or electronic document having a complicated layout can be searched by a search engine.

In addition, the present invention provides an e-book system in which formulas, formulas or pictures in an e-book can be searched.

In addition, the present invention provides an e-book system that can search the purchase / loan history and preferences / preferences of individuals in addition to the general text search.

The present invention generates an e-book or electronic document having a complicated layout as data in a form that can be mounted in a search engine.

The type of data that can be loaded into a search engine forms one page of a book into one or more groups.

The type of data that may be loaded into the search engine may include a text object that is text describing a picture with respect to the picture object.

Data in a form that can be loaded into the search engine may include hidden groups that are displayed on the screen but ignored when searching in the search engine.

The e-book system of the present invention may include a database management system (DBMS) that stores a purchase / loan history and preferences / individuals of an individual together with a search engine capable of searching a text.

Search engines and DBMSs can interoperate through e-book unique numbers (for example, using GUIDs such as DC32CC1FD0604859A96CCB103E2F7C1C, 1F6CB702AFFE4bdfB130937D90C51F59).

In one embodiment, the smallest unit that makes up the e-book data is an object. Objects are divided into three categories according to their properties. "Text", "Picture" and "Table Text" respectively. Another property of these objects is the coordinate values that indicate their position and size on the monitor screen.

In one embodiment, the objects gather to form a line. The lines come together to form a group. Groups form a page. Pages come together to form a book. For example, if the letter "A" appears on the screen, the information related to this letter is organized as follows: [Page: 10]-> [Group: 2]-> [Line: 3]-> [Object Order: 4, Object Property 1 (Type): Text, Object Property 2 (Code): 'A' = AC00, Object Property 3 (Bold): No, Object Property 4 (Underline): No, Object Property 5 (Italic) ): Yes, object property 6 (font size): 12, object property 7 (Width): 32, object property 8 (Height): 36, object property 9 (upper left coordinate value): (59, 60)]. The "picture" is likewise represented.

In one embodiment, "table text" is not displayed on the screen but is associated with a picture displayed on the screen. Thus, when searching for a picture, if a text describing the picture is entered as a keyword, the "table text" associated with the picture can be searched.

In one embodiment, the e-book data may further include information about an important bibliography of the e-book or electronic document. In addition, the e-book data may further include a unique number for distinguishing the e-book.

In one embodiment, the e-book system stores bibliographic information of the e-book and buyers (people, libraries) and loan information (lenders and loan terms) of the e-book or electronic document along with a search engine that retrieves the text of the e-book. It may further include a DBMS. In one embodiment, the search engine and the DBMS may interoperate through a unique number included in the e-book data.

According to an embodiment of the present invention, an electronic document may generate electronic book data having an organized structure of objects, lines, groups, and pages.

According to an embodiment of the present invention, by generating e-book data, an e-book including a complicated layout that is not possible in an electronic document such as a PDF, for example, an e-book having a very complicated layout such as a newspaper can be input into a search engine. have.

According to an embodiment of the present invention, the text can be searched through a search engine as well as a sentence including various symbols such as formulas and chemical formulas, which were not possible in the conventional e-book system (for example, PDF or ePub). Do.

According to an embodiment of the present invention, a text search for all e-books existing in an e-book store sold directly to consumers, as well as a search limited to e-books belonging to a specific e-book library, are possible. Do.

According to an embodiment of the present invention, it is also possible to search the text only for e-books that are already purchased and owned by each person or e-books that are used by each person.

According to an embodiment of the present invention, the text search results limited to the entire library or personalization area may be filtered again by combining with various sales statistics, usage statistics, or bibliographic information.

1 is a block diagram of an e-book providing and retrieval system according to an embodiment of the present invention.

2 is a block diagram of an e-book searching apparatus according to an embodiment of the present invention.

3 is a detailed block diagram of an e-book data configuration unit according to an embodiment of the present invention.

4 is an example of a group constituting a page of e-book data according to an embodiment of the present invention.

5 is another example of a group constituting a page of e-book data according to an embodiment of the present invention.

6 is a flowchart of a method for forming an internal text area of e-book data according to an embodiment of the present invention.

7 is a flowchart of a method for retrieving e-book data in accordance with one embodiment of the present invention.

1 is a block diagram of an e-book providing and searching system according to an embodiment of the present invention. In one embodiment, the e-book providing and retrieval system 100 includes an e-book retrieval device 110 that performs retrieval of an e-book in association with the search engine 130 and the DBMS 140 via the network 190. can do. In one embodiment, the network 190 may include both a wired communication medium such as a LAN or WAN and a wireless communication medium such as a Wi-Fi or mobile communication system.

In one embodiment, the search engine 130 may be configured to receive a text search word and perform a text search of the e-book. In an embodiment, the DBMS 140 may store a loan history, a purchase history, or a field of interest of an individual, and search for bibliographic information, for example, as a keyword for a non-text search word in such a personalization area.

As shown in FIG. 1, a user may access an e-book data retrieval device 110 via a mobile communication device 152 such as a mobile phone, smartphone or tablet PC, notebook 154, PC 156 or other network 190. E-book data can be retrieved using any computing device capable of accessing the < RTI ID = 0.0 >

In one embodiment, a user may define a search scope only for his loan, purchase, or field of interest (personalization area). In this case, the e-book data retrieval apparatus 110 may first perform a search in the DBMS 140, and then operate the e-book data search again to the search engine 130 for the e-book obtained as a result.

In one embodiment, the user may perform a search throughout the e-book library (non-personalized area) without limiting the search scope. In this case, the e-book data retrieval apparatus 110 may perform a search in the search engine 130, and then check the e-book obtained as a result in the DBMS 140 again to extract a highly relevant result. .

According to an embodiment of the present disclosure, the user may input a keyword through the e-book providing and search system 100 to perform a text search and obtain a result. The search engine 130 may search the body of the e-book data stored therein based on the user's keyword input and output the result.

However, the search engine 130 has a problem in that bibliographic information of the e-book and information other than the text (for example, information related to the commerce of the e-book) are not stored. Unlike this, the DBMS 140 may not store all the contents of the e-book, but may store information such as commerce related to the e-book. For example, the DBMS 140 may determine whether a particular e-book can be sold to individual customers, whether it is delivered to which library, what is the sales volume, what individual customer has made a purchase, which library member has rented, etc. Information, that is, information not stored in the search engine 130 may be included.

According to an embodiment of the present invention, the text search and the bibliographic search of the conventional e-book are not linked to each other, and the inconvenience of being separately performed can be eliminated, and the bibliographic search and the text search can be integrated and provided as one.

For example, according to an embodiment of the present invention, through the search engine 130 supports the text search in the online store that sells e-books to each individual, as well as by category by bibliographic information in conjunction with the DBMS 140 (E.g., literature, practical, language, etc.) to narrow the scope of the search, and in addition to re-search in order of sales based on the sales history so as to filter the text search results derived from the search engine 130 You can also pay.

For example, according to an embodiment of the present invention, after limiting to an e-book delivered to each library through the DBMS 140, a text search may be performed through the search engine 130. In addition to the text search results of the search engine 130, in conjunction with the DBMS 140, the search scope by category (for example, literature, practical, language, etc.) to narrow the search, in addition to the loan history based on You can also filter the text search results by count.

For example, according to one embodiment of the present invention, after limiting to e-books purchased by each individual through the DBMS 140, the body may be searched through the search engine 130. In addition to the DBMS 140 in the text search results of the search engine 130, the scope of the search is narrowed by category (for example, literature, practical, language, etc.), and in addition, the number of readings and reading time You can also filter the text search results.

For example, according to an embodiment of the present invention, after limiting to e-books that are rented for each individual through the DBMS 140, the text search may be performed through the search engine 130. The search results of the body of the search engine 130 are further linked with the DBMS 140 to narrow down the search by classification (for example, literature, practical use, language, etc.), and in addition, in order of the number of rentals and reading time. You can also filter the text search results.

2 is a block diagram of an e-book searching apparatus according to an embodiment of the present invention. In an embodiment, the e-book retrieval apparatus 110 may include an e-book data construction unit 210 configured to receive an electronic document and configure e-book data.

In one embodiment, the e-book data configuration unit 210 may receive an electronic document including, for example, a word file (doc), a PowerPoint document (ppt) or a pdf file. In one embodiment, the e-book data configuration unit 210 includes an internal text area viewed by the user from the input electronic document, bibliographic information indicating the basics of the e-book data, a unique number for identifying the e-book data, and the like. E-book data can be generated.

In an embodiment, the e-book retrieval apparatus 110 receives the e-book data from the e-book data configuration unit 210 and records the corresponding data in the search engine 130 and the DBMS 140. It may further include.

In one embodiment, the server-side component 220 may include a data analysis unit 221 for analyzing the structure of the e-book data to distinguish the internal text area, the bibliographic information unique number, and store it in a separate file.

According to an embodiment, the data analyzer 221 may store an internal text area, bibliographic information, and a unique number based on a file format used by the search engine 130 or the DBMS 140. For example, when the search engine 130 or the DBMS 140 may receive an XML file, the data analyzer 221 may store an internal text area, bibliographic information, and a unique number as an XML file. In one embodiment, the data analyzer 221 may record the unique number together in the internal text area and the bibliographic information, without separating the unique number into a separate file.

In one embodiment, the server-side component 220 registers the internal text area and the unique number of the e-book data stored by the data analyzer 221 to the search engine 130 and the bibliographic information and the unique number to the DBMS 140. The data mounting unit 222 may be further included. In this embodiment, the search engine 130 and the DBMS 140 share a unique number so that mutual information exchange is possible.

In one embodiment, the data mounter 222 may further record bibliographic information in the search engine 130. In one embodiment, the data mounting unit 222 may store the corresponding contents of the bibliographic information in each item of the predetermined schema table of the DBMS 140.

As illustrated in FIG. 2, the e-book search apparatus 110 may further include an e-book search unit 240 that receives a keyword from a user and performs a search in association with the search engine 130 and the DBMS 140. Can be.

In an embodiment, the e-book search unit 240 may receive a search range and a keyword indicating whether the personalized area or the non-personalized area is input from the user. In this embodiment, the personalization area may include a history of each book purchased by each individual and books borrowed from the library. In this embodiment, the non-personalized area may include a bookstore where each individual can purchase an e-book and books held by each e-book library. In one embodiment, the keyword may include a word for text search or a word for text search.

In one embodiment, when the search range is a personalized area, the e-book search unit 240 performs a search in the DBMS 140 using a search word other than the text, and the unique number and text of the e-books extracted as a result. The search word 130 may be input to the search engine 130 to extract a unique number of the corresponding e-book.

In this embodiment, the e-book search unit 240 selects a unique number of the selected e-book and a word for text search based on selecting an input indicating additional instruction of the user, for example, one of the extracted e-books. The search results for each page of the e-book may be extracted by inputting the search engine 130.

In one embodiment, when the search range is a non-personalized area, the e-book search unit 240 performs a search in the search engine 130 using the text search word, and the unique number of the e-books extracted as a result. By inputting a search word other than the text into the DBMS 140, a unique number of an e-book satisfying the text and bibliographic information may be extracted.

In this embodiment, the e-book search unit 240 may divide the e-books extracted by the search engine 130 into a set having a predetermined size and perform the DBMS 140 search for each set in order. For example, if the predetermined size is 100 and the number of e-books extracted as a result of the search engine 130 is 952, the search result may be divided into 10 sets. The tenth set contains the unique numbers of 52 e-books. In this embodiment, the DBMS 140 search uses words outside the text search.

As such, after dividing the search results of the search engine 130 into sets, the DBMS 140 may search for bibliographic information to provide a quick search to the user. In an embodiment, the search engine 130 may be configured to output the results in the order most relevant to the input text search word. In this case, when the bibliographic information is searched by the DBMS 140 in order by dividing the search results by the search engine 130 into a set, the results may be substantially output in descending order of relevance.

In the search of the non-personalized area, as in the search of the personalized area, the e-book search unit 240 selects an input indicating an additional instruction of the user, for example, based on selecting one of the extracted e-books. The search result for each page of the e-book may be extracted by inputting the unique number of the e-book and the word for text search into the search engine 130.

3 is a detailed block diagram of an e-book data configuration unit according to an embodiment of the present invention. In one embodiment, the e-book data configuration unit 210 may include an internal text area configuration unit 301, a bibliographic information input unit 302, and a unique number determination unit 303.

In one embodiment, the bibliographic information input unit 302 may receive basic information related to the corresponding e-book, such as the author, publisher, year of publication, and subject of the e-book. In one embodiment, the unique number determination unit 303 may generate a unique number for identifying the corresponding e-book data and record it in the e-book data. By using the unique number, the search engine 130 for searching the text of the e-book and the DBMS 140 for searching bibliographic information can be linked.

In an embodiment, the internal text area configuring unit 301 may configure data included in a picture area representing the entire page. That is, an internal text area for storing the code value and the position on the screen of the characters existing in each page is configured. In this case, the picture area may be a picture or may include a picture. Alternatively, it can contain a set of instructions that allow a computer to generate a picture.

In one embodiment, the internal text area is logically formed in the structure of a book> page> group> line> object. In one embodiment, an ebook is a collection of pages. In one embodiment, a page is a collection of groups within. In one embodiment, a group is a collection of lines inside. In one embodiment, a line is a collection of objects inside.

In one embodiment, the object may include "text", "picture" or "table text". "Text" includes common letters. The term "picture" includes not only a general picture but also a table and a formula. "Table text" includes hidden characters that are not displayed on the screen, describing "picture".

In one embodiment, each object includes a "text code value" and a "coordinate value on the screen". Thus, if you use "table text", you can search for text that describes "picture" for that "picture". In addition, since "table text" also has "coordinate values on the screen" as an object, it is possible to search where the picture is located on the page.

As illustrated in FIG. 3, the internal text area configuring unit 301 may receive an electronic document input unit 310 that receives an electronic document including, for example, a word file (doc), a PowerPoint document (ppt), or a pdf file. It may include. The input electronic document generally includes page information, object code and coordinate values of the object.

In an embodiment, the internal text area configuring unit 301 may further include a data extracting unit 320 extracting page information, an object code, and coordinate values of the object from the input electronic document.

In an embodiment, the internal text area configuring unit 301 may further include a group setting unit 330 that sets a group based on coordinate values of all objects existing in the same page. For example, the group setting unit 330 analyzes the coordinate values of the objects in the page, obtains and divides a cluster distribution, and clusters the objects for each cluster. In this embodiment, the group setting unit 330 may set each clustered cluster as a group and determine the order.

Example of clustering

In one embodiment of the present invention, clustering may be performed as follows. First, the basic constants are

NumberOfClusters: Number of clusters initially set

MinimumNumberOfElement: The minimum number of elements needed to maintain a cluster.

MinimumDistance: The minimum distance of each cluster center point that can be maintained without each cluster being merged into one.

SplittingSize: A parameter used to divide a cluster.

MaxNumIter1: the maximum possible number of iterations performed in the first half of this clustering method.

MaxNumIter2: the maximum number of possible iterations performed later in this clustering method

MaxNumMerge: maximum number of clusters that can be merged in one iteration

Clustering basically divides each of the points (elements) located in a space. In one embodiment of the present invention, the space corresponding to this space may be referred to as a page surface, and each element occupying a position is a geometric figure such as a letter or a picture.

In general, we store the coordinates of the rectangles that surround them so that they can be handled by digital devices such as computers. In practice, however, a letter or picture does not use all of the rectangular space in which it is drawn. Points of a certain color are stamped on specific areas of the rectangle, leaving the rest empty. Therefore, in order to cluster them in a two-dimensional page space consisting of letters or pictures, it is necessary to keep track of all the positions of the composition points (pixels) of all the letters and pictures existing on the page, which requires considerable computation time.

Therefore, in an embodiment of the present invention, instead of the above-described method, the system displays the position of the corresponding letter or picture and sets the system to assume that the entire inside of the rectangle surrounding the letter or picture is filled with black dots. In this case, the computation time can be saved considerably, and in fact the clustered results are excellent.

In one embodiment of the invention clustering is performed as follows.

[1] Generate coordinates of arbitrary points by NumberOfClusters, and then set the coordinates of these points as the center points of the clusters.

[2] For all points in the space, calculate the distance from each cluster center point set above, and set the cluster including the center point closest to it as the cluster to which it belongs.

[3] For each cluster, the average of the coordinate values of all the points included in the clusters is obtained to newly calculate and set the cluster center point.

[4] If at any point the cluster to which it belongs changes or the number of iterations from [1] to [4] is less than MaxNumIter1, go back to step [2].

[5] If the number of points in a cluster is less than MinimumNumberOfElement, subsequent operations ignore this cluster and ignore each point in the cluster. In other words, this cluster is confirmed.

[6] If the number of clusters is greater than (constant 1) * NumberOfClusters or the number of iterations is even, go to step [7], otherwise go to step [8]. Where (constant 1) is a value that can be adjusted appropriately to improve performance.

[7] For all clusters, find the distance between the center points of each cluster, and if two clusters are found whose distance is less than the MinimumDistance, combine them into one and calculate and assign a new center point. Then repeat step [7] again. If the number of iterations reaches MaxNumIter2, go to step [8].

[8] If the number of clusters is smaller than NumberOfClusters / (constant 2) or the number of iterations is odd, go to step [9]. Otherwise, go to step [10].

[9] The standard deviation of the coordinates of all points in the space is called STD-All. For all clusters, find the standard deviation for only those points that belong to them, and then find clusters with this value larger than SplittingSize * STD-All. If no cluster is found, go to step [10]. If a corresponding cluster is found, the cluster is divided into two based on the center points of the points belonging to the cluster. Next, find the center points of each of these two divided clusters, find the distance, and if the value is greater than (constant 3) * MinimumDistance, replace the original unified cluster with these two divided clusters. If not, the original integrated cluster is adopted as is.

[10] If the number of iterations in this step exceeds MaxNumIter2, or if there have been no changes in clusters since the last time this step was completed, the operation is terminated. If not, proceed to step [2] by selecting the center points of the current clusters as initial values.

In one embodiment, the internal text area configuring unit 301 may further include a line discriminating unit 340 for distinguishing objects belonging to the same line by analyzing coordinate values of each object existing in the group. The line distinguishing unit 340 may distinguish objects belonging to the same line according to the direction of the e-book, for example, “horizontal writing” or “vertical writing”. In this embodiment, the line distinguishing unit 340 may automatically distinguish the moving direction based on the coordinate values of the objects in the group. In another embodiment, the line discriminator 340 may determine the heading direction based on a user input.

In one embodiment, the internal text area configuration unit 301 may further include an object order determiner 350 that determines the order of objects belonging to the same line. The object order determiner 350 may determine the order of the objects in the line according to the direction of the e-book, for example, “horizontal writing” or “vertical writing”. In addition, the object order determiner 350 may start at "upper" in "vertical writing" depending on the e-book's progress direction and start position, for example, starting from "left" or "right" in "horizontal writing." You can determine the order of objects differently depending on whether you start with or "below".

In this embodiment, the object order determiner 350 may automatically distinguish the moving direction and the starting position based on the coordinate values of the objects in the group. In another embodiment, the object order determiner 350 may determine the moving direction and the start position based on the user's input.

In one embodiment, the internal text area configuring unit 301 may further include an editing unit 360 to correct an error in group setting, line setting, or order of an object. In one embodiment, the editor 360 may receive an input for specifying a boundary of the group from the user. Therefore, when there is an error in the group automatically set by the group setting unit 330 it can be corrected. Similarly, the editor 360 may receive an input for specifying a line boundary and an object order from a user to correct an error that may occur in the line discriminator 340 or the object order determiner 350.

In one embodiment, the editing unit 360 may receive a table text for describing the picture from the user. For example, the editing unit 360 may receive text for reading the equation as table text with respect to the equation that is a picture object. for example,

If the formula is set as a picture object, the user may input, via the editing unit 360, "A squared plus ratio is the square of the seed" as table text of the corresponding picture object.

When the text is constructed in this way, it is possible to search this formula when performing a text search through the search engine 130. Therefore, even complex formulas, chemical formulas, and even pictures and photographs can be searched.

For example, if the picture "Millet of Miele" is set as a picture object, the corresponding table text is "Millet's evening bell" through the editor 360. Jean Francois and Mille are the chief of the Barbi sect. In addition to inheriting realism, he draws a picture of nature and is called a naturalist. In particular, this picture of the evening bell on the page can be called the representative work of Miele. ” In this case, the search engine 130 can search the text with the keyword "evening bell of Millet", and can accurately extract the position of the corresponding picture.

In one embodiment, the editor 360 may designate a "hidden group" that is displayed on the screen but ignored by the search engine when searching. For example, a page number of a book, a part indicating a title in every page, etc. is a part that appears on each page but does not need to be searched. The editor 360 may designate these parts as hidden groups to be ignored when searching in a search engine.

4 illustrates an example of a group constituting a page of e-book data according to an embodiment of the present invention. As shown in FIG. 4, page 400 may include five groups 410-450.

Groups 410-450 are sets of lines in which content is naturally connected when reading the corresponding content. Page 400 is composed of a multi-stage, there is a line that exists on the same line, but the content is not connected if read as it is. For example, the group 430 and the group 440 are on the same line but are not connected to the contents, for example, the group 430 may be known as “the oppositely distorted state. In group 440, "with respect to family and life" are on the same line but the content is not connected. Thus, the two

groups

430 and 440 are separate groups.

Groups are similar to paragraphs in terms of whether the linking of content proceeds naturally. Unlike FIG. 4, one group may constitute one page. In this case, the same page and group correspond to "single document". As in page 400, when configured in multiple tiers, the page includes a plurality of groups.

According to a criterion of whether contents are connected when reading the internal lines sequentially, pages may be formed in one or a plurality of groups. Likewise, groups may be formed in one or a plurality of lines. Similarly, lines may be formed of one or a plurality of objects. For example, the group 410 is an example in which one picture object forms one line and one group.

In one embodiment, the groups 410-450 of the page 400 may have an order of 410> 420> 430> 440> 450. The order of the groups may be based on the order of the content. In this embodiment, if the user reads in the order of the groups, the entire contents of the page can be read sequentially. In other words, reading the contents of a page corresponds to reading in the order of the group of the page and in the order of the lines in the group.

In one embodiment, the group 450 may be designated as a "hidden group" by the editing unit 360 as a part of the title of the book which is repeated equally on every page. In this case, the text of the group 450 appears on the screen but is ignored when the text is searched by the search engine 130.

5 shows another example of a group constituting a page of e-book data according to an embodiment of the present invention. As shown in FIG. 5, page 500 may include three groups 510-530.

In one embodiment, the groups 510-530 of the page 500 may have an order of 510> 520> 530. In one embodiment, the group 530 may be designated as a "hidden group" by the editing unit 360 as a part corresponding to the page number. In this case, the text of the group 530 appears on the screen but is ignored when the body search is performed in the search engine 130.

6 is a flowchart of a method of forming an internal text area of electronic book data according to an embodiment of the present invention.

The method 600 includes a step 610 of receiving an electronic document including a word file (doc), a PowerPoint document (ppt) or a pdf file. In an embodiment, step 610 may be performed by the electronic document input unit 310. The input electronic document generally includes page information, object code and coordinate values of the object.

Next, the method 600 includes a step 620 of extracting data, such as page information, object code, and coordinate values of the object, from the input electronic document. In an embodiment, step 620 may be performed by the data extractor 320.

Next, the method 600 includes setting 630 a group based on coordinate values of all objects present within the same page. In an embodiment, step 630 may be performed by the group setting unit 330. The group setting step 630 may include clustering objects by clusters after obtaining and dividing a cluster distribution by analyzing coordinate values of the objects in the page. In this embodiment, the group setting step 630 may include setting and ordering each clustered cluster as a group.

Next, the method 600 includes a step 640 of distinguishing objects belonging to the same line by analyzing coordinate values of each object present in the group. In one embodiment, step 640 may be performed by the line discriminator 340. The line distinguishing step 640 may be performed to distinguish objects belonging to the same line according to the direction of the e-book, for example, “horizontal writing” or “vertical writing”. The line distinguishing step 640 may proceed automatically based on the coordinate values of the objects in the group, or may proceed by determining the direction of travel based on a user input.

Next, the method 600 includes a step 650 of determining the order of objects belonging to the same line. In an embodiment, step 650 may be performed by the object order determiner 350. The object ordering step 650 may be performed to determine the order of the objects in the line according to the direction of the e-book, such as "horizontal writing" or "vertical writing." Also, the object ordering step 650 may be performed from "upper" to "upper", depending on the e-book's progress direction and starting position, for example, starting from "left" or "right" in "horizontal writing." It may be practiced to determine the order of the objects differently depending on whether they start at "below" or "below".

In this embodiment, the object order determination step 650 may be implemented to automatically distinguish the heading direction and the starting position based on the coordinate values of the objects in the group. In another embodiment, object ordering step 650 may be implemented to determine a heading direction and a starting position based on a user input.

In one embodiment, the method 600 may further include an editor step 660 that corrects for errors that may occur in the group setup step 630, the line discrimination step 640, or the object ordering step 650. have. In one embodiment, the editing step 660 may include receiving input from a user specifying a boundary of the group. Therefore, when there is an error in the group set in the group setting step 630, it can be corrected. Similarly, editing step 660 may include receiving input from a user specifying a line boundary or an order of objects. In this case, an error that may occur in the line discrimination step 640 or the object order determination step 650 may be corrected.

In an embodiment, the editing step 660 may include receiving a table text from a user describing the picture with respect to the picture. For example, for a formula, which is a picture object, text for reading the formula may be input as table text. for example,

If the formula is set to a picture object, a user may input "A squared plus ratio of square is seed squared" as table text of the corresponding picture object.

For example, if the picture "Millet of Miele" is set as a picture object, the corresponding table text in the editing step 660 is "Millet's evening bell. Jean-François and Miele are the chief of the Barbi sect. In addition to inheriting realism, he draws a picture of nature and is called a naturalist. In particular, this picture, the evening bell on the page, is the representative work of Miele. ” In this case, the search engine 130 can search the text with the keyword "evening bell of Millet", and can accurately extract the position of the corresponding picture.

In one embodiment, the editing step 660 may include specifying a "hidden group" that is displayed on the screen but ignored by the search engine when searching. For example, a page number of a book, a part indicating a title in every page, etc. is a part that appears on each page but does not need to be searched. By designating these parts as hidden groups in the editing step 660, they can be ignored when searching in a search engine.

It is noted that each step shown in FIG. 6 is exemplary, and depending on the embodiment, the steps may be integrated or subdivided into detailed steps. It is also possible for some steps to be omitted or repeated.

7 is a flowchart of a method for retrieving e-book data according to an embodiment of the present invention. In one embodiment, the method 700 may be performed in conjunction with the search engine 130 and the DBMS 140 by the e-book search unit 240 of the e-book search device shown in FIG. 2.

The method 700 includes a step 710 of receiving a search range from a user. In one embodiment, the search range indicates whether it is a personalized area or a non-personalized area. For example, the personalization area may include details of own books purchased by each individual and books borrowed from the library. In addition, the non-personalization area may include a bookstore where each individual can purchase an e-book and books held by each e-book library.

Next, the method 700 includes a step 720 of receiving a keyword from a user. In one embodiment, the keyword may include a word for text search or a word for text search.

The method 700 includes a step 730 of determining whether the search range entered in step 710 represents a personalized area or a non-personalized area. If it is determined in step 730 that the search range is a personalized area, the DBMS 140 performs a search (step 742). In an embodiment, step 742 may include searching for bibliographic information in the DBMS 140 using words other than the text search word among the keywords input in step 720.

After performing step 742, the method 700 includes an additional search 744 of the search engine 130 for the e-books extracted as a result. In one embodiment, step 744 is a step of extracting the unique number of the corresponding e-book by inputting the word for the text search in the search engine 130 of the unique number of the e-books extracted in step 742 and keywords received in step 720 It may include.

In one embodiment, the method 700, after step 744, is based on selecting an input indicating additional instructions from the user, such as one of the resulting e-books, to identify the unique number and word for text search of the selected e-book. The method may further include extracting a search result for each page of the corresponding e-book by inputting the to the search engine 130.

If it is determined in step 730 that the search range is a non-personalized area, the search engine 130 performs a search (step 752). According to an embodiment, operation 752 may include performing a text search by the search engine 130 using a text search word among the keywords input in operation 720.

After performing step 752, the method 700 includes a step 756 of further searching in the DBMS 140 for the e-books extracted as a result. In operation 756, a unique number of the e-book searched in operation 752 and a word for searching other than the text may be input to the DBMS 140 to extract a unique number of the e-book that satisfies the text and bibliographic information.

In one embodiment, the method 700 may further include dividing 754 the e-books extracted by the search engine 130 into a set having a predetermined size, before step 756 after performing step 752. In this embodiment, step 756 may include performing a DBMS 140 search in order on each set divided in step 754.

For example, if the predetermined size is 100 and the number of e-books extracted as a result of the search engine 130 is 952, the search results may be divided into 10 sets in step 754. The tenth set contains the unique numbers of 52 e-books.

In operation 754, the search results of the search engine 130 may be divided into sets, and when the DBMS 140 search is performed in operation 756, a quick search may be provided to the user. In an embodiment, the search engine 130 may be configured to output the results in the order most relevant to the input text search word. In operation 754, the search results of the search engine 130 may be divided into sets, and when the bibliographic information is searched by the DBMS 140 in order in operation 756, the search results may be substantially output in descending order of relevance.

After operation 756, the search engine 130 searches for the unique number of the selected e-book and the word for the text search based on the selection of an e-book, which is not shown, indicating an additional instruction of the user, for example, one of the extracted e-books. The method may further include extracting a search result for each page of the corresponding e-book by inputting in the.

Note that the steps shown in FIG. 7 are exemplary and may be added, repeated, omitted or separated according to embodiments. For example, in step 730,

steps

742 and 744 are performed in the personalized area according to the search range, and steps 752-756 are performed in the non-personalized area. DBMS 140 can be searched by limiting the classification. Similarly, the search engine 130 may be searched by inputting a keyword for text search on the search result derived in step 756.

In the device embodiments disclosed herein, the arrangement of the components shown may vary depending on the environment or requirements on which the invention is implemented. For example, some components may be omitted or several components may be integrated and implemented as one. In addition, the arrangement order and connection of some components may be changed.

In the method embodiments disclosed herein, the arrangement of the steps shown may vary depending on the environment or requirements on which the invention is implemented. For example, some steps may be omitted or some steps may be combined and implemented as one. In addition, the arrangement order of some steps may be changed.

While the invention and its various functional components have been described in particular embodiments, the invention may be implemented in hardware, software, firmware, middleware, or a combination thereof, and the system, subsystem, components or sub-configurations thereof. It should be understood that they can be used as elements. If implemented in software, the elements of the invention are instructions / code segments for performing the necessary tasks. The program or code segments may be stored in a machine readable medium, such as a processor readable medium, a computer program product, or via a transmission medium or communication link by a computer data signal embodied in a carrier wave or a signal modulated by a carrier. Can be sent. Machine readable media or processor readable media may include any medium that can store or transmit information in a form readable and executable by a machine (eg, processor, computer, etc.).

Although the present invention has been described and illustrated with reference to the embodiments, those skilled in the art, various modifications and equivalent other embodiments without departing from the spirit and scope of the present invention defined by the appended claims It will be appreciated that examples are possible.

Claims

Receiving an electronic document,

Extracting an object included in the electronic document and a coordinate value of the object, and

Clustering objects based on the coordinate values of the object to form a group

E-book data generation method comprising a.
The method of claim 1,

Identifying an object corresponding to the same line by analyzing coordinate values of the objects in the group

E-book data generation method further comprising.
The method of claim 2,

Determining an order of objects corresponding to the same line based on a moving direction of the line

E-book data generation method further comprising.
The method of claim 3,

Correcting an error of at least one of a range of the group, an object corresponding to the same line, and an order of the objects

E-book data generation method further comprising.
The method of claim 1, wherein the object includes text and a picture,

Generating table text as a text object describing the figure

E-book data generation method further comprising.
Search for e-books in an e-book retrieval device that interoperates over a network with a database management system (DBMS) that stores bibliographic information and a unique number of e-book data including groups formed by clustering objects based on coordinates of objects As a way to

Receiving a search range indicating whether a personalized area or a non-personalized area is input from a user;

Receiving a word for text search or a word for text search from a user as a keyword,

Requesting a search for the keyword to a database management system (DBMS) if the search range is a personalization area; and

Receiving a search result from the DBMS

E-book search method comprising a.
The method of claim 6,

The e-book retrieval device is linked with a search engine for storing the body and the unique number of the e-book data through the network,

The requesting of the search for the keyword to the DBMS includes: requesting to perform a search in the DBMS with a search word outside the main text.

The search result includes a unique number of the e-book,

Requesting a search by inputting a unique number of the e-book and the text search word into a search engine, and

Receiving a search result from the search engine

E-book search method comprising more.
The method of claim 6,

The e-book retrieval device is linked with a search engine for storing the body and the unique number of the e-book data through the network,

If the search range is an unpersonalized area, requesting a search engine for the keyword, and

Receiving a search result from the search engine

E-book search method comprising more.
The method of claim 8,

Requesting a search for the keyword from the search engine includes requesting a search from the search engine with a word for text search,

The search result received from the search engine includes a unique number of at least one e-book,

Requesting a search by inputting a unique number of the at least one e-book and a word for searching other than the text into a DBMS, and

Receiving a search result from the DBMS

E-book search method comprising more.
The method of claim 9,

Requesting a search by inputting a unique number of the at least one e-book and a word for searching other than the text into a DBMS,

Dividing the unique number of the at least one e-book into a set having a predetermined number, and

Requesting a search by inputting unique numbers of e-books belonging to each set to the DBMS in order

E-book search method comprising a.
An electronic document input unit for receiving an electronic document,

A data extraction unit for extracting an object included in the electronic document and a coordinate value of the object, and

A group setting unit for forming a group by clustering the objects based on the coordinate value of the object

E-book data generating device comprising a.
The method of claim 11,

Line distinguishing unit for identifying the object corresponding to the same line by analyzing the coordinate value of the object in the group

E-book data generating device further comprising.
The method of claim 12,

An object order determiner that determines an order of objects corresponding to the same line based on a moving direction of a line

E-book data generating device further comprising.
The method of claim 13,

An editing unit to correct at least one error among the range of the group, the object corresponding to the same line, and the order of the object

E-book data generating device further comprising.
The method of claim 11, wherein the object includes text and a picture,

An editing unit for generating table text as a text object describing the above picture

E-book data generating device further comprising.
An e-book system that provides an e-book search service through a network,

An e-book retrieval device that receives a search range indicating whether the personalized area or the non-personalized area is input by a user and a word for text search or a word for text search as a keyword,

A database management system (DBMS) that stores bibliographic information and a unique number of e-book data including a group formed by clustering objects based on coordinates of the object,

If the search range is a personalization area, the e-book search device requests a search for the keyword from the DBMS,

And the DBMS, in response to receiving a search request from the e-book retrieval device, retrieves bibliographic information by a word for retrieval out of the text, and extracts a unique number of the corresponding e-book.
The method of claim 16,

Further comprising a search engine for storing the unique number and body of the e-book data,

The e-book searching device receives a search result including the unique number of the e-book from the DBMS, requests the search by inputting the unique number of the received e-book and the text search word into a search engine,

And the search engine searches for a text using a text search word in response to receiving a search request from the e-book search apparatus, and extracts a unique number of the corresponding e-book.
The method of claim 16,

Further comprising a search engine for storing the unique number and body of the e-book data,

If the search range is an unpersonalized area, the e-book search device requests a search for the keyword from the search engine,

And the search engine searches for a text using a text search word in response to receiving a search request from the e-book search apparatus, and extracts a unique number of the corresponding e-book.
The method of claim 18,

The e-book search apparatus receives a search result including the unique number of at least one e-book from the search engine, inputs the received unique number of the at least one e-book and the word for out-of-body text into a DBMS. Request a search,

And the DBMS retrieves bibliographic information by a word for searching out of the text in response to receiving a search request from the e-book searching device, and extracts a unique number of the corresponding e-book.
The method of claim 19,

In response to receiving the unique number of the at least one e-book from the search engine, the e-book searching apparatus divides the unique number of the at least one e-book into a set having a predetermined number and belongs to each set. And request a search by inputting the unique number of the e-book into the DBMS in order.