WO2012169841A2 - Système de livre numérique, formation de données de livre numérique, dispositif de recherche et son procédé - Google Patents

Système de livre numérique, formation de données de livre numérique, dispositif de recherche et son procédé Download PDF

Info

Publication number
WO2012169841A2
WO2012169841A2 PCT/KR2012/004567 KR2012004567W WO2012169841A2 WO 2012169841 A2 WO2012169841 A2 WO 2012169841A2 KR 2012004567 W KR2012004567 W KR 2012004567W WO 2012169841 A2 WO2012169841 A2 WO 2012169841A2
Authority
WO
WIPO (PCT)
Prior art keywords
search
book
text
unique number
dbms
Prior art date
Application number
PCT/KR2012/004567
Other languages
English (en)
Korean (ko)
Other versions
WO2012169841A3 (fr
WO2012169841A9 (fr
Inventor
이해성
Original Assignee
주식회사 내일이비즈
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 주식회사 내일이비즈 filed Critical 주식회사 내일이비즈
Priority claimed from KR1020120061536A external-priority patent/KR101364178B1/ko
Publication of WO2012169841A2 publication Critical patent/WO2012169841A2/fr
Publication of WO2012169841A3 publication Critical patent/WO2012169841A3/fr
Publication of WO2012169841A9 publication Critical patent/WO2012169841A9/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing

Definitions

  • the present invention relates to an apparatus and method for generating, storing and retrieving e-book data.
  • E-books and electronic documents can generally be entered directly into a search engine.
  • the layout of an e-book or electronic document that can be directly input into a search engine is only a solid document.
  • e-books or electronic documents with very complicated layouts, such as multi-stage layouts or newspapers, could not be directly loaded into search engines. This is because it is not yet possible to accurately analyze the order of each paragraph or paragraph that can accurately read an e-book or an electronic document having a complicated layout by computer algorithm alone.
  • An object of the present invention is to solve the above problems of the conventional e-book system.
  • the present invention provides an e-book system in which an e-book or electronic document having a complicated layout can be searched by a search engine.
  • the present invention provides an e-book system in which formulas, formulas or pictures in an e-book can be searched.
  • the present invention provides an e-book system that can search the purchase / loan history and preferences / preferences of individuals in addition to the general text search.
  • the present invention generates an e-book or electronic document having a complicated layout as data in a form that can be mounted in a search engine.
  • the type of data that can be loaded into a search engine forms one page of a book into one or more groups.
  • the type of data that may be loaded into the search engine may include a text object that is text describing a picture with respect to the picture object.
  • Data in a form that can be loaded into the search engine may include hidden groups that are displayed on the screen but ignored when searching in the search engine.
  • the e-book system of the present invention may include a database management system (DBMS) that stores a purchase / loan history and preferences / individuals of an individual together with a search engine capable of searching a text.
  • DBMS database management system
  • Search engines and DBMSs can interoperate through e-book unique numbers (for example, using GUIDs such as DC32CC1FD0604859A96CCB103E2F7C1C, 1F6CB702AFFE4bdfB130937D90C51F59).
  • GUIDs such as DC32CC1FD0604859A96CCB103E2F7C1C, 1F6CB702AFFE4bdfB130937D90C51F59.
  • the smallest unit that makes up the e-book data is an object.
  • Objects are divided into three categories according to their properties. "Text”, “Picture” and “Table Text” respectively. Another property of these objects is the coordinate values that indicate their position and size on the monitor screen.
  • the objects gather to form a line.
  • the lines come together to form a group.
  • Groups form a page. Pages come together to form a book.
  • Object Order 4
  • Object Property 1 Type: Text
  • Object Property 2 Code
  • 'A' AC00
  • Object Property 3 Bold
  • object property 6 font size
  • object property 7 Width
  • object property 8 Height
  • object property 9 upper left coordinate value
  • "table text" is not displayed on the screen but is associated with a picture displayed on the screen.
  • searching for a picture if a text describing the picture is entered as a keyword, the "table text" associated with the picture can be searched.
  • the e-book data may further include information about an important bibliography of the e-book or electronic document.
  • the e-book data may further include a unique number for distinguishing the e-book.
  • the e-book system stores bibliographic information of the e-book and buyers (people, libraries) and loan information (lenders and loan terms) of the e-book or electronic document along with a search engine that retrieves the text of the e-book. It may further include a DBMS. In one embodiment, the search engine and the DBMS may interoperate through a unique number included in the e-book data.
  • an electronic document may generate electronic book data having an organized structure of objects, lines, groups, and pages.
  • the text can be searched through a search engine as well as a sentence including various symbols such as formulas and chemical formulas, which were not possible in the conventional e-book system (for example, PDF or ePub). Do.
  • a text search for all e-books existing in an e-book store sold directly to consumers, as well as a search limited to e-books belonging to a specific e-book library, are possible. Do.
  • the text search results limited to the entire library or personalization area may be filtered again by combining with various sales statistics, usage statistics, or bibliographic information.
  • FIG. 1 is a block diagram of an e-book providing and retrieval system according to an embodiment of the present invention.
  • FIG. 2 is a block diagram of an e-book searching apparatus according to an embodiment of the present invention.
  • FIG. 3 is a detailed block diagram of an e-book data configuration unit according to an embodiment of the present invention.
  • FIG. 4 is an example of a group constituting a page of e-book data according to an embodiment of the present invention.
  • 5 is another example of a group constituting a page of e-book data according to an embodiment of the present invention.
  • FIG. 6 is a flowchart of a method for forming an internal text area of e-book data according to an embodiment of the present invention.
  • FIG. 7 is a flowchart of a method for retrieving e-book data in accordance with one embodiment of the present invention.
  • the e-book providing and retrieval system 100 includes an e-book retrieval device 110 that performs retrieval of an e-book in association with the search engine 130 and the DBMS 140 via the network 190. can do.
  • the network 190 may include both a wired communication medium such as a LAN or WAN and a wireless communication medium such as a Wi-Fi or mobile communication system.
  • the search engine 130 may be configured to receive a text search word and perform a text search of the e-book.
  • the DBMS 140 may store a loan history, a purchase history, or a field of interest of an individual, and search for bibliographic information, for example, as a keyword for a non-text search word in such a personalization area.
  • a user may access an e-book data retrieval device 110 via a mobile communication device 152 such as a mobile phone, smartphone or tablet PC, notebook 154, PC 156 or other network 190.
  • a user may define a search scope only for his loan, purchase, or field of interest (personalization area).
  • the e-book data retrieval apparatus 110 may first perform a search in the DBMS 140, and then operate the e-book data search again to the search engine 130 for the e-book obtained as a result.
  • the user may perform a search throughout the e-book library (non-personalized area) without limiting the search scope.
  • the e-book data retrieval apparatus 110 may perform a search in the search engine 130, and then check the e-book obtained as a result in the DBMS 140 again to extract a highly relevant result. .
  • the user may input a keyword through the e-book providing and search system 100 to perform a text search and obtain a result.
  • the search engine 130 may search the body of the e-book data stored therein based on the user's keyword input and output the result.
  • the search engine 130 has a problem in that bibliographic information of the e-book and information other than the text (for example, information related to the commerce of the e-book) are not stored.
  • the DBMS 140 may not store all the contents of the e-book, but may store information such as commerce related to the e-book.
  • the DBMS 140 may determine whether a particular e-book can be sold to individual customers, whether it is delivered to which library, what is the sales volume, what individual customer has made a purchase, which library member has rented, etc.
  • Information, that is, information not stored in the search engine 130 may be included.
  • the text search and the bibliographic search of the conventional e-book are not linked to each other, and the inconvenience of being separately performed can be eliminated, and the bibliographic search and the text search can be integrated and provided as one.
  • the search engine 130 supports the text search in the online store that sells e-books to each individual, as well as by category by bibliographic information in conjunction with the DBMS 140 (E.g., literature, practical, language, etc.) to narrow the scope of the search, and in addition to re-search in order of sales based on the sales history so as to filter the text search results derived from the search engine 130 You can also pay.
  • the DBMS 140 E.g., literature, practical, language, etc.
  • a text search may be performed through the search engine 130.
  • the search scope by category for example, literature, practical, language, etc.
  • You can also filter the text search results by count.
  • the body may be searched through the search engine 130.
  • the scope of the search is narrowed by category (for example, literature, practical, language, etc.), and in addition, the number of readings and reading time You can also filter the text search results.
  • the text search may be performed through the search engine 130.
  • the search results of the body of the search engine 130 are further linked with the DBMS 140 to narrow down the search by classification (for example, literature, practical use, language, etc.), and in addition, in order of the number of rentals and reading time. You can also filter the text search results.
  • the e-book retrieval apparatus 110 may include an e-book data construction unit 210 configured to receive an electronic document and configure e-book data.
  • the e-book data configuration unit 210 may receive an electronic document including, for example, a word file (doc), a PowerPoint document (ppt) or a pdf file.
  • the e-book data configuration unit 210 includes an internal text area viewed by the user from the input electronic document, bibliographic information indicating the basics of the e-book data, a unique number for identifying the e-book data, and the like. E-book data can be generated.
  • the e-book retrieval apparatus 110 receives the e-book data from the e-book data configuration unit 210 and records the corresponding data in the search engine 130 and the DBMS 140. It may further include.
  • the server-side component 220 may include a data analysis unit 221 for analyzing the structure of the e-book data to distinguish the internal text area, the bibliographic information unique number, and store it in a separate file.
  • the data analyzer 221 may store an internal text area, bibliographic information, and a unique number based on a file format used by the search engine 130 or the DBMS 140. For example, when the search engine 130 or the DBMS 140 may receive an XML file, the data analyzer 221 may store an internal text area, bibliographic information, and a unique number as an XML file. In one embodiment, the data analyzer 221 may record the unique number together in the internal text area and the bibliographic information, without separating the unique number into a separate file.
  • the server-side component 220 registers the internal text area and the unique number of the e-book data stored by the data analyzer 221 to the search engine 130 and the bibliographic information and the unique number to the DBMS 140.
  • the data mounting unit 222 may be further included.
  • the search engine 130 and the DBMS 140 share a unique number so that mutual information exchange is possible.
  • the data mounter 222 may further record bibliographic information in the search engine 130. In one embodiment, the data mounting unit 222 may store the corresponding contents of the bibliographic information in each item of the predetermined schema table of the DBMS 140.
  • the e-book search apparatus 110 may further include an e-book search unit 240 that receives a keyword from a user and performs a search in association with the search engine 130 and the DBMS 140. Can be.
  • the e-book search unit 240 may receive a search range and a keyword indicating whether the personalized area or the non-personalized area is input from the user.
  • the personalization area may include a history of each book purchased by each individual and books borrowed from the library.
  • the non-personalized area may include a bookstore where each individual can purchase an e-book and books held by each e-book library.
  • the keyword may include a word for text search or a word for text search.
  • the e-book search unit 240 when the search range is a personalized area, performs a search in the DBMS 140 using a search word other than the text, and the unique number and text of the e-books extracted as a result.
  • the search word 130 may be input to the search engine 130 to extract a unique number of the corresponding e-book.
  • the e-book search unit 240 selects a unique number of the selected e-book and a word for text search based on selecting an input indicating additional instruction of the user, for example, one of the extracted e-books.
  • the search results for each page of the e-book may be extracted by inputting the search engine 130.
  • the e-book search unit 240 when the search range is a non-personalized area, performs a search in the search engine 130 using the text search word, and the unique number of the e-books extracted as a result. By inputting a search word other than the text into the DBMS 140, a unique number of an e-book satisfying the text and bibliographic information may be extracted.
  • the e-book search unit 240 may divide the e-books extracted by the search engine 130 into a set having a predetermined size and perform the DBMS 140 search for each set in order. For example, if the predetermined size is 100 and the number of e-books extracted as a result of the search engine 130 is 952, the search result may be divided into 10 sets. The tenth set contains the unique numbers of 52 e-books. In this embodiment, the DBMS 140 search uses words outside the text search.
  • the DBMS 140 may search for bibliographic information to provide a quick search to the user.
  • the search engine 130 may be configured to output the results in the order most relevant to the input text search word.
  • the results may be substantially output in descending order of relevance.
  • the e-book search unit 240 selects an input indicating an additional instruction of the user, for example, based on selecting one of the extracted e-books.
  • the search result for each page of the e-book may be extracted by inputting the unique number of the e-book and the word for text search into the search engine 130.
  • the e-book data configuration unit 210 may include an internal text area configuration unit 301, a bibliographic information input unit 302, and a unique number determination unit 303.
  • the bibliographic information input unit 302 may receive basic information related to the corresponding e-book, such as the author, publisher, year of publication, and subject of the e-book.
  • the unique number determination unit 303 may generate a unique number for identifying the corresponding e-book data and record it in the e-book data. By using the unique number, the search engine 130 for searching the text of the e-book and the DBMS 140 for searching bibliographic information can be linked.
  • the internal text area configuring unit 301 may configure data included in a picture area representing the entire page. That is, an internal text area for storing the code value and the position on the screen of the characters existing in each page is configured.
  • the picture area may be a picture or may include a picture. Alternatively, it can contain a set of instructions that allow a computer to generate a picture.
  • the internal text area is logically formed in the structure of a book> page> group> line> object.
  • an ebook is a collection of pages.
  • a page is a collection of groups within.
  • a group is a collection of lines inside.
  • a line is a collection of objects inside.
  • the object may include "text”, "picture” or "table text”.
  • “Text” includes common letters.
  • the term “picture” includes not only a general picture but also a table and a formula.
  • “Table text” includes hidden characters that are not displayed on the screen, describing "picture”.
  • each object includes a "text code value” and a "coordinate value on the screen”.
  • the internal text area configuring unit 301 may receive an electronic document input unit 310 that receives an electronic document including, for example, a word file (doc), a PowerPoint document (ppt), or a pdf file. It may include.
  • the input electronic document generally includes page information, object code and coordinate values of the object.
  • the internal text area configuring unit 301 may further include a data extracting unit 320 extracting page information, an object code, and coordinate values of the object from the input electronic document.
  • the internal text area configuring unit 301 may further include a group setting unit 330 that sets a group based on coordinate values of all objects existing in the same page. For example, the group setting unit 330 analyzes the coordinate values of the objects in the page, obtains and divides a cluster distribution, and clusters the objects for each cluster. In this embodiment, the group setting unit 330 may set each clustered cluster as a group and determine the order.
  • clustering may be performed as follows. First, the basic constants are
  • NumberOfClusters Number of clusters initially set
  • MinimumNumberOfElement The minimum number of elements needed to maintain a cluster.
  • MinimumDistance The minimum distance of each cluster center point that can be maintained without each cluster being merged into one.
  • MaxNumIter1 the maximum possible number of iterations performed in the first half of this clustering method.
  • MaxNumIter2 the maximum number of possible iterations performed later in this clustering method
  • MaxNumMerge maximum number of clusters that can be merged in one iteration
  • Clustering basically divides each of the points (elements) located in a space.
  • the space corresponding to this space may be referred to as a page surface, and each element occupying a position is a geometric figure such as a letter or a picture.
  • the system displays the position of the corresponding letter or picture and sets the system to assume that the entire inside of the rectangle surrounding the letter or picture is filled with black dots. In this case, the computation time can be saved considerably, and in fact the clustered results are excellent.
  • clustering is performed as follows.
  • step [6] If the number of clusters is greater than (constant 1) * NumberOfClusters or the number of iterations is even, go to step [7], otherwise go to step [8].
  • (constant 1) is a value that can be adjusted appropriately to improve performance.
  • step [7] For all clusters, find the distance between the center points of each cluster, and if two clusters are found whose distance is less than the MinimumDistance, combine them into one and calculate and assign a new center point. Then repeat step [7] again. If the number of iterations reaches MaxNumIter2, go to step [8].
  • step [8] If the number of clusters is smaller than NumberOfClusters / (constant 2) or the number of iterations is odd, go to step [9]. Otherwise, go to step [10].
  • the standard deviation of the coordinates of all points in the space is called STD-All.
  • STD-All For all clusters, find the standard deviation for only those points that belong to them, and then find clusters with this value larger than SplittingSize * STD-All. If no cluster is found, go to step [10]. If a corresponding cluster is found, the cluster is divided into two based on the center points of the points belonging to the cluster. Next, find the center points of each of these two divided clusters, find the distance, and if the value is greater than (constant 3) * MinimumDistance, replace the original unified cluster with these two divided clusters. If not, the original integrated cluster is adopted as is.
  • step [10] If the number of iterations in this step exceeds MaxNumIter2, or if there have been no changes in clusters since the last time this step was completed, the operation is terminated. If not, proceed to step [2] by selecting the center points of the current clusters as initial values.
  • the internal text area configuring unit 301 may further include a line discriminating unit 340 for distinguishing objects belonging to the same line by analyzing coordinate values of each object existing in the group.
  • the line distinguishing unit 340 may distinguish objects belonging to the same line according to the direction of the e-book, for example, “horizontal writing” or “vertical writing”. In this embodiment, the line distinguishing unit 340 may automatically distinguish the moving direction based on the coordinate values of the objects in the group. In another embodiment, the line discriminator 340 may determine the heading direction based on a user input.
  • the internal text area configuration unit 301 may further include an object order determiner 350 that determines the order of objects belonging to the same line.
  • the object order determiner 350 may determine the order of the objects in the line according to the direction of the e-book, for example, “horizontal writing” or “vertical writing”.
  • the object order determiner 350 may start at "upper” in “vertical writing” depending on the e-book's progress direction and start position, for example, starting from “left” or "right” in “horizontal writing.” You can determine the order of objects differently depending on whether you start with or "below”.
  • the object order determiner 350 may automatically distinguish the moving direction and the starting position based on the coordinate values of the objects in the group. In another embodiment, the object order determiner 350 may determine the moving direction and the start position based on the user's input.
  • the internal text area configuring unit 301 may further include an editing unit 360 to correct an error in group setting, line setting, or order of an object.
  • the editor 360 may receive an input for specifying a boundary of the group from the user. Therefore, when there is an error in the group automatically set by the group setting unit 330 it can be corrected.
  • the editor 360 may receive an input for specifying a line boundary and an object order from a user to correct an error that may occur in the line discriminator 340 or the object order determiner 350.
  • the editing unit 360 may receive a table text for describing the picture from the user.
  • the editing unit 360 may receive text for reading the equation as table text with respect to the equation that is a picture object. for example, If the formula is set as a picture object, the user may input, via the editing unit 360, "A squared plus ratio is the square of the seed" as table text of the corresponding picture object.
  • the search engine 130 can search the text with the keyword "evening bell of Millet", and can accurately extract the position of the corresponding picture.
  • the editor 360 may designate a "hidden group" that is displayed on the screen but ignored by the search engine when searching. For example, a page number of a book, a part indicating a title in every page, etc. is a part that appears on each page but does not need to be searched. The editor 360 may designate these parts as hidden groups to be ignored when searching in a search engine.
  • page 400 illustrates an example of a group constituting a page of e-book data according to an embodiment of the present invention. As shown in FIG. 4, page 400 may include five groups 410-450.
  • Groups 410-450 are sets of lines in which content is naturally connected when reading the corresponding content.
  • Page 400 is composed of a multi-stage, there is a line that exists on the same line, but the content is not connected if read as it is.
  • the group 430 and the group 440 are on the same line but are not connected to the contents, for example, the group 430 may be known as “the oppositely distorted state.
  • group 440 "with respect to family and life" are on the same line but the content is not connected.
  • the two groups 430 and 440 are separate groups.
  • Groups are similar to paragraphs in terms of whether the linking of content proceeds naturally. Unlike FIG. 4, one group may constitute one page. In this case, the same page and group correspond to "single document". As in page 400, when configured in multiple tiers, the page includes a plurality of groups.
  • pages may be formed in one or a plurality of groups.
  • groups may be formed in one or a plurality of lines.
  • lines may be formed of one or a plurality of objects.
  • the group 410 is an example in which one picture object forms one line and one group.
  • the groups 410-450 of the page 400 may have an order of 410> 420> 430> 440> 450.
  • the order of the groups may be based on the order of the content. In this embodiment, if the user reads in the order of the groups, the entire contents of the page can be read sequentially. In other words, reading the contents of a page corresponds to reading in the order of the group of the page and in the order of the lines in the group.
  • the group 450 may be designated as a "hidden group" by the editing unit 360 as a part of the title of the book which is repeated equally on every page. In this case, the text of the group 450 appears on the screen but is ignored when the text is searched by the search engine 130.
  • page 500 shows another example of a group constituting a page of e-book data according to an embodiment of the present invention.
  • page 500 may include three groups 510-530.
  • the groups 510-530 of the page 500 may have an order of 510> 520> 530.
  • the group 530 may be designated as a "hidden group" by the editing unit 360 as a part corresponding to the page number. In this case, the text of the group 530 appears on the screen but is ignored when the body search is performed in the search engine 130.
  • FIG. 6 is a flowchart of a method of forming an internal text area of electronic book data according to an embodiment of the present invention.
  • the method 600 includes a step 610 of receiving an electronic document including a word file (doc), a PowerPoint document (ppt) or a pdf file.
  • step 610 may be performed by the electronic document input unit 310.
  • the input electronic document generally includes page information, object code and coordinate values of the object.
  • the method 600 includes a step 620 of extracting data, such as page information, object code, and coordinate values of the object, from the input electronic document.
  • step 620 may be performed by the data extractor 320.
  • the method 600 includes setting 630 a group based on coordinate values of all objects present within the same page.
  • step 630 may be performed by the group setting unit 330.
  • the group setting step 630 may include clustering objects by clusters after obtaining and dividing a cluster distribution by analyzing coordinate values of the objects in the page.
  • the group setting step 630 may include setting and ordering each clustered cluster as a group.
  • the method 600 includes a step 640 of distinguishing objects belonging to the same line by analyzing coordinate values of each object present in the group.
  • step 640 may be performed by the line discriminator 340.
  • the line distinguishing step 640 may be performed to distinguish objects belonging to the same line according to the direction of the e-book, for example, “horizontal writing” or “vertical writing”.
  • the line distinguishing step 640 may proceed automatically based on the coordinate values of the objects in the group, or may proceed by determining the direction of travel based on a user input.
  • the method 600 includes a step 650 of determining the order of objects belonging to the same line.
  • step 650 may be performed by the object order determiner 350.
  • the object ordering step 650 may be performed to determine the order of the objects in the line according to the direction of the e-book, such as "horizontal writing” or “vertical writing.”
  • the object ordering step 650 may be performed from “upper” to “upper”, depending on the e-book's progress direction and starting position, for example, starting from “left” or “right” in “horizontal writing.” It may be practiced to determine the order of the objects differently depending on whether they start at "below” or "below”.
  • the object order determination step 650 may be implemented to automatically distinguish the heading direction and the starting position based on the coordinate values of the objects in the group. In another embodiment, object ordering step 650 may be implemented to determine a heading direction and a starting position based on a user input.
  • the method 600 may further include an editor step 660 that corrects for errors that may occur in the group setup step 630, the line discrimination step 640, or the object ordering step 650. have.
  • the editing step 660 may include receiving input from a user specifying a boundary of the group. Therefore, when there is an error in the group set in the group setting step 630, it can be corrected.
  • editing step 660 may include receiving input from a user specifying a line boundary or an order of objects. In this case, an error that may occur in the line discrimination step 640 or the object order determination step 650 may be corrected.
  • the editing step 660 may include receiving a table text from a user describing the picture with respect to the picture. For example, for a formula, which is a picture object, text for reading the formula may be input as table text. for example, If the formula is set to a picture object, a user may input "A squared plus ratio of square is seed squared" as table text of the corresponding picture object.
  • the corresponding table text in the editing step 660 is "Millet's evening bell. Jean-Institut and Miele are the chief of the Barbi sect. In addition to inheriting realism, he draws a picture of nature and is called a naturalist. In particular, this picture, the evening bell on the page, is the representative work of Miele. ”
  • the search engine 130 can search the text with the keyword “evening bell of Millet", and can accurately extract the position of the corresponding picture.
  • the editing step 660 may include specifying a "hidden group" that is displayed on the screen but ignored by the search engine when searching. For example, a page number of a book, a part indicating a title in every page, etc. is a part that appears on each page but does not need to be searched. By designating these parts as hidden groups in the editing step 660, they can be ignored when searching in a search engine.
  • each step shown in FIG. 6 is exemplary, and depending on the embodiment, the steps may be integrated or subdivided into detailed steps. It is also possible for some steps to be omitted or repeated.
  • the method 700 is a flowchart of a method for retrieving e-book data according to an embodiment of the present invention.
  • the method 700 may be performed in conjunction with the search engine 130 and the DBMS 140 by the e-book search unit 240 of the e-book search device shown in FIG. 2.
  • the method 700 includes a step 710 of receiving a search range from a user.
  • the search range indicates whether it is a personalized area or a non-personalized area.
  • the personalization area may include details of own books purchased by each individual and books borrowed from the library.
  • the non-personalization area may include a bookstore where each individual can purchase an e-book and books held by each e-book library.
  • the method 700 includes a step 720 of receiving a keyword from a user.
  • the keyword may include a word for text search or a word for text search.
  • the method 700 includes a step 730 of determining whether the search range entered in step 710 represents a personalized area or a non-personalized area. If it is determined in step 730 that the search range is a personalized area, the DBMS 140 performs a search (step 742). In an embodiment, step 742 may include searching for bibliographic information in the DBMS 140 using words other than the text search word among the keywords input in step 720.
  • step 744 is a step of extracting the unique number of the corresponding e-book by inputting the word for the text search in the search engine 130 of the unique number of the e-books extracted in step 742 and keywords received in step 720 It may include.
  • the method 700 is based on selecting an input indicating additional instructions from the user, such as one of the resulting e-books, to identify the unique number and word for text search of the selected e-book.
  • the method may further include extracting a search result for each page of the corresponding e-book by inputting the to the search engine 130.
  • step 730 the search engine 130 performs a search (step 752).
  • operation 752 may include performing a text search by the search engine 130 using a text search word among the keywords input in operation 720.
  • the method 700 includes a step 756 of further searching in the DBMS 140 for the e-books extracted as a result.
  • a unique number of the e-book searched in operation 752 and a word for searching other than the text may be input to the DBMS 140 to extract a unique number of the e-book that satisfies the text and bibliographic information.
  • the method 700 may further include dividing 754 the e-books extracted by the search engine 130 into a set having a predetermined size, before step 756 after performing step 752.
  • step 756 may include performing a DBMS 140 search in order on each set divided in step 754.
  • the search results may be divided into 10 sets in step 754.
  • the tenth set contains the unique numbers of 52 e-books.
  • the search results of the search engine 130 may be divided into sets, and when the DBMS 140 search is performed in operation 756, a quick search may be provided to the user.
  • the search engine 130 may be configured to output the results in the order most relevant to the input text search word.
  • the search results of the search engine 130 may be divided into sets, and when the bibliographic information is searched by the DBMS 140 in order in operation 756, the search results may be substantially output in descending order of relevance.
  • the search engine 130 searches for the unique number of the selected e-book and the word for the text search based on the selection of an e-book, which is not shown, indicating an additional instruction of the user, for example, one of the extracted e-books.
  • the method may further include extracting a search result for each page of the corresponding e-book by inputting in the.
  • steps 742 and 744 are performed in the personalized area according to the search range, and steps 752-756 are performed in the non-personalized area.
  • DBMS 140 can be searched by limiting the classification.
  • the search engine 130 may be searched by inputting a keyword for text search on the search result derived in step 756.
  • the arrangement of the components shown may vary depending on the environment or requirements on which the invention is implemented. For example, some components may be omitted or several components may be integrated and implemented as one. In addition, the arrangement order and connection of some components may be changed.
  • the arrangement of the steps shown may vary depending on the environment or requirements on which the invention is implemented. For example, some steps may be omitted or some steps may be combined and implemented as one. In addition, the arrangement order of some steps may be changed.
  • the invention may be implemented in hardware, software, firmware, middleware, or a combination thereof, and the system, subsystem, components or sub-configurations thereof. It should be understood that they can be used as elements. If implemented in software, the elements of the invention are instructions / code segments for performing the necessary tasks.
  • the program or code segments may be stored in a machine readable medium, such as a processor readable medium, a computer program product, or via a transmission medium or communication link by a computer data signal embodied in a carrier wave or a signal modulated by a carrier. Can be sent.
  • Machine readable media or processor readable media may include any medium that can store or transmit information in a form readable and executable by a machine (eg, processor, computer, etc.).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Un mode de réalisation de la présente invention génère des données de livre numérique, lesdites données comprenant des groupes constitués d'objets de groupement inclus dans des documents numériques en fonction des objets et des valeurs de coordonnées des objets. Les données de livre numérique formées sont chargées dans un moteur de recherche et un système de gestion de base de données (SGBD). En se connectant au moteur de recherche et au système de gestion de base de données, un utilisateur peut mettre en œuvre une recherche d'informations bibliographiques dans une analyse détaillée d'opération commerciale ou similaire, ou une recharge du corps des données de livre numérique.
PCT/KR2012/004567 2011-06-08 2012-06-08 Système de livre numérique, formation de données de livre numérique, dispositif de recherche et son procédé WO2012169841A2 (fr)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
KR10-2011-0055248 2011-06-08
KR20110055248 2011-06-08
KR20120045505 2012-04-30
KR10-2012-0045505 2012-04-30
KR10-2012-0061536 2012-06-08
KR1020120061536A KR101364178B1 (ko) 2011-06-08 2012-06-08 전자책 시스템과 전자책 데이터 생성, 검색 장치 및 그 방법

Publications (3)

Publication Number Publication Date
WO2012169841A2 true WO2012169841A2 (fr) 2012-12-13
WO2012169841A3 WO2012169841A3 (fr) 2013-03-07
WO2012169841A9 WO2012169841A9 (fr) 2013-05-02

Family

ID=47296633

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2012/004567 WO2012169841A2 (fr) 2011-06-08 2012-06-08 Système de livre numérique, formation de données de livre numérique, dispositif de recherche et son procédé

Country Status (1)

Country Link
WO (1) WO2012169841A2 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016108407A1 (fr) * 2015-01-02 2016-07-07 삼성전자 주식회사 Procédé et dispositif de fourniture d'annotation
CN113656553A (zh) * 2021-08-19 2021-11-16 掌阅科技股份有限公司 纸电同步检索方法、电子设备及计算机存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20040079490A (ko) * 2003-03-07 2004-09-16 김효한 멀티미디어전자책용 파일제작방법
US20050187937A1 (en) * 2004-02-25 2005-08-25 Fuji Xerox Co., Ltd. Computer program product, device system, and method for providing document view
KR20060084032A (ko) * 2005-01-17 2006-07-21 오에스에스 주식회사 전자 문서 관리 시스템 및 그 운영 방법
US20070124295A1 (en) * 2005-11-29 2007-05-31 Forman Ira R Systems, methods, and media for searching documents based on text characteristics
KR20080048027A (ko) * 2005-08-09 2008-05-30 잘락 코포레이션 전자문서로부터 콘텐츠를 집합하고, 추출하고, 전개하는방법 및 장치
US20110043869A1 (en) * 2007-12-21 2011-02-24 Nec Corporation Information processing system, its method and program

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20040079490A (ko) * 2003-03-07 2004-09-16 김효한 멀티미디어전자책용 파일제작방법
US20050187937A1 (en) * 2004-02-25 2005-08-25 Fuji Xerox Co., Ltd. Computer program product, device system, and method for providing document view
KR20060084032A (ko) * 2005-01-17 2006-07-21 오에스에스 주식회사 전자 문서 관리 시스템 및 그 운영 방법
KR20080048027A (ko) * 2005-08-09 2008-05-30 잘락 코포레이션 전자문서로부터 콘텐츠를 집합하고, 추출하고, 전개하는방법 및 장치
US20070124295A1 (en) * 2005-11-29 2007-05-31 Forman Ira R Systems, methods, and media for searching documents based on text characteristics
US20110043869A1 (en) * 2007-12-21 2011-02-24 Nec Corporation Information processing system, its method and program

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016108407A1 (fr) * 2015-01-02 2016-07-07 삼성전자 주식회사 Procédé et dispositif de fourniture d'annotation
CN113656553A (zh) * 2021-08-19 2021-11-16 掌阅科技股份有限公司 纸电同步检索方法、电子设备及计算机存储介质

Also Published As

Publication number Publication date
WO2012169841A3 (fr) 2013-03-07
WO2012169841A9 (fr) 2013-05-02

Similar Documents

Publication Publication Date Title
US9514216B2 (en) Automatic classification of segmented portions of web pages
US8700494B2 (en) Identifying product variants
WO2018113241A1 (fr) Procédé et dispositif de présentation de page, serveur et support de stockage
US9400808B2 (en) Color description analysis device, color description analysis method, and color description analysis program
CN112597182B (zh) 数据查询语句的优化方法、装置、终端及存储介质
WO2011053046A2 (fr) Système et procédé pour préparer des rapports d'analyse sur la base du programme excel
WO2012108623A1 (fr) Procédé, système et support d'enregistrement lisible par ordinateur pour ajouter une nouvelle image et des informations sur la nouvelle image à une base de données d'images
US20150287047A1 (en) Extracting Information from Chain-Store Websites
US20080250007A1 (en) Document Characteristic Analysis Device for Document To Be Surveyed
US20100217769A1 (en) Related content display device and system
WO2022252822A1 (fr) Procédé et appareil de présentation d'informations, et dispositif et support
WO2019039673A1 (fr) Appareil et procédé permettant d'extraire automatiquement des informations de mot-clé de produit sur la base d'une analyse de page web basée sur une intelligence artificielle
CN107590288A (zh) 用于抽取网页图文块的方法和装置
WO2022097891A1 (fr) Procédé d'extraction de données à structure identique et appareil l'utilisant
WO2012169841A2 (fr) Système de livre numérique, formation de données de livre numérique, dispositif de recherche et son procédé
CN107908749B (zh) 一种基于搜索引擎的人物检索系统及方法
CN113407678A (zh) 知识图谱构建方法、装置和设备
WO2011062378A2 (fr) Procédé et appareil permettant d'afficher et d'agencer en trois dimensions des données
WO2019112223A1 (fr) Procédé de récupération de document électronique et serveur associé
Richter et al. HeidelPlace: An extensible framework for geoparsing
US20110016380A1 (en) Form editing apparatus, form editing method, and storage medium
KR101407555B1 (ko) 전자책 시스템과 전자책 데이터 생성, 검색 장치 및 그 방법
WO2018147625A1 (fr) Appareil et procédé de recherche de produit de maquillage
WO2024048908A1 (fr) Procédé, dispositif informatique et programme informatique pour fournir des informations d'article affinées par l'intermédiaire d'une plateforme de registre d'articles
WO2024079833A1 (fr) Dispositif de traitement d'informations, procédé de sortie et programme de sortie

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12796209

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12796209

Country of ref document: EP

Kind code of ref document: A2