WO2010038923A1 - System and method of auto-complete with query type under guarantee of search results and storage media having program source thereof - Google Patents

System and method of auto-complete with query type under guarantee of search results and storage media having program source thereof Download PDF

Info

Publication number
WO2010038923A1
WO2010038923A1 PCT/KR2008/006551 KR2008006551W WO2010038923A1 WO 2010038923 A1 WO2010038923 A1 WO 2010038923A1 KR 2008006551 W KR2008006551 W KR 2008006551W WO 2010038923 A1 WO2010038923 A1 WO 2010038923A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
document
index
query
autocomplete
Prior art date
Application number
PCT/KR2008/006551
Other languages
French (fr)
Inventor
Han Min Jung
Mi Kyoung Lee
Pyung Kim
Seung Woo Lee
Dong In Park
Won Kyung Sung
Original Assignee
Korea Institute Of Science & Technology Information
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020080105464A external-priority patent/KR101051422B1/en
Application filed by Korea Institute Of Science & Technology Information filed Critical Korea Institute Of Science & Technology Information
Publication of WO2010038923A1 publication Critical patent/WO2010038923A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Definitions

  • the present invention relates, in general, to a query search system, and, more particularly, to a query type-based automatic completion system and method capable of guaranteeing the presence of search results, which can present lists and the types of queries for which the results of a search for an entered query are present and can cope with the addition or deletion of document information in real time, and to a storage medium in which a program source therefor is recorded.
  • This information may include text, sound, pictures, movies, multimedia, etc.
  • the content of information is recorded in the form of text .
  • DB Database
  • server DB-only computer
  • the above-described DB includes all information about text, sound, pictures, movies and multimedia, but recording is mainly performed on the basis of text information to record and manage a maximum amount of information in a limited storage space.
  • Text information may be classified into vocabularies recognized by human beings and vocabularies recognized by computers including programs.
  • a vocabulary or a term input from a DB server for an information search is called a query, an entity or a recommendation, and is referred to hereinafter as 'query' whenever possible.
  • a method of determining the state, in which part of a query is entered, to be a completed state, and displaying retrieved queries is an automatic completion method.
  • the automatic completion method is a scheme for displaying queries, which have been previously entered and searched, in a list, selecting one from the displayed query list, and promptly displaying the selected one, in the case of queries such as names, addresses, and titles which are repeatedly input in a web browser or other types of data search software.
  • AJAX Asynchronous JavaScript and Extensile Markup Language
  • the site of an individual business or specific application field presents an autocomplete list depending on the frequency of entry of queries even if the success of a search is not guaranteed due to a relatively small amount of content, thus decreasing the reliability of a search function.
  • FIG. 1 is a diagram showing the functional construction of a system for searching a typical DB system for information.
  • ⁇ i8> In order to search text information recorded in the DB server for required information, a query is entered to a computer terminal.
  • the computer terminal is provided with a search program, and is configured to analyze an entered query (entity) using the search program and to search the DB server for index information corresponding to the query.
  • the retrieved information is used as reference data required to obtain higher or various types of knowledge, or required by operators or managers to make decisions or determinations.
  • FIG. 2 is a diagram showing the state in which a query is entered to search for data and retrieved queries are displayed in an autocomplete form according to an embodiment.
  • ⁇ 24> Referring to FIG. 2, in detail, in the state in which "tfl" is entered as a query, a list of queries retrieved in an autocomplete form is shown. The retrieved list is divided into a portion based on right-hand truncation and a portion based on left-hand truncation, which are separately displayed.
  • the query is entered as the letter of the Korean alphabet, but characters of other languages, such as the English alphabet, may also be entered.
  • the queries etc. which have been retrieved using the entered query and are displayed in an autocomplete form, are classified as an institution type, is classified as a nation type, is classified as a group type.
  • FIG. 3 is a diagram showing the state in which a query is entered and a corresponding search fails according to an example of the prior art.
  • FIG. .3 is described in detail below.
  • a website 'bestbuyer' www.bb.co.kr
  • " ⁇ j-2ju)-" is entered as a query.
  • "oj-jrju ⁇ -" and "oj- ⁇ uf ⁇ ]Ij-” are retrieved, and are displayed in an autocomplete form.
  • a search for index information matching the query is displayed as a failure.
  • a search for the index information matching the retrieved query 'oj-nm- 1 an( j a search for the index information matching the retrieved query ' 0 I" 51 ! 1 -]- ⁇ 1 ]"' are displayed as failure.
  • the reason for this failure is that information of a product corresponding to the query may be deleted due to, for example, low sales volume of the corresponding product, the exhaustion of stocks of the product, the expiration of the validity period of the product, etc.
  • UMLSKS SUGGEST An Auto-complete Feature for the UMLSKS Interface Using AJAX
  • the above-described prior art is a scheme for setting a flag when a search for a query is successful. By the scheme, it is determined whether to present an autocomplete list.
  • OntoFrame means a semantic web service framework constructed to provide technical research information analysis service on the basis of Extensible Markup Language (XML), Resource Description Framework (RDF), Web Ontology Language (OWL), SPARQL Protocol and RDF Query Language (SPARQL), which are semantic web standard technologies.
  • XML Extensible Markup Language
  • RDF Resource Description Framework
  • OWL Web Ontology Language
  • SPARQL Protocol SPARQL Protocol and RDF Query Language
  • a Uniform Resource Identifier (URI) server functions to collect and convert information, and performs a document classification function by extracting queries from the original text and allocating the queries to the original text.
  • URI Uniform Resource Identifier
  • the OntoFrame service of the semantic web service framework is intended to provide a query (ent ity)-centric integrated search function, which is similar to the vertical search function of Naver, which is a portal site.
  • this is intended to detect the type of a specific query and generate search results complying with the detected type. For example, when a user enters a person's name "Christian Becker” as a query, information about similar researchers and researchers in citation are presented. When a topic word “Semantic Web” is entered as the query, information about topic trends, See Also, researchers by topic, papers by topic, researcher network, etc. is presented.
  • an object of the present invention is to provide a query type-based automatic completion system and method capable of guaranteeing the presence of search results, which search for an entered query and present retrieved queries in an autocomplete form only when the presence of search results is guaranteed, and a storage medium in which a program source therefor is recorded.
  • Another object of the present invention is to provide a query type- based automatic completion system and method capable of guaranteeing the presence of search results, which present retrieved queries in an autocomplete form only when the presence of the results of a search for an entered query is guaranteed, thus enabling a fast and accurate search, and a storage medium in which a program source therefor is recorded.
  • a further object of the present invention is to provide a query type- based automatic completion system and method capable of guaranteeing the presence of search results, in which the document frequency of each query based on the addition or deletion of document information is considered in real time, and type-based information is grouped, thus providing the results of a search as improved results, and a storage medium in which a program source therefor is recorded.
  • the present invention provides a query type-based automatic completion system capable of guaranteeing presence of search results, comprising a document indexing server for receiving registration of document information, extracting index term information and document frequency information from the registered document information, recording the index term information and the document frequency information, and generating autocomplete list information from the extracted index term information, an autocomplete Database (DB) for recording the autocomplete list information, generated by the document indexing server, in association with the document frequency information, an autocomplete server for searching the autocomplete DB, extracting autocomplete list information including the index term information from the autocomplete DB, converting the autocomplete list information into queries, providing the queries through a user interface, converting a query, which is selected and entered, into an index term, searching for document information including the index term, and providing the document information through the user interface, a document collection unit for registering collected document information in the document indexing server, and an index DB for recording the index term information provided by the document indexing server and providing the index term information to the autocomplete server.
  • DB autocomplete Database
  • the document collection unit collects one or more selected from among various types of content document information, including webpage document information, format document information, image document information, video document information, text document information, and multimedia document information.
  • the document indexing server comprises a document registration unit for registering the document information collected by the document collection unit, a document indexing unit for extracting index terms from the document information registered by the document registration unit and storing the index terms in the index DB, and a DB generation unit for searching the index terms stored in the index DB for index term information provided in an autocomplete list, recording retrieved index term information in the autocomplete DB, and updating and managing the document frequency information.
  • the document indexing unit is configured to extract additional information, including the index term information, through one or more selected from a scheme for extracting the index terms from the document information registered in the document registration unit and a scheme for extracting index term information designated by text processing.
  • the document indexing unit extracts the index terms, using any one method selected from indexing basxl on morpheme analysis and N-gram indexing, from the document information registered in the document registration unit, and storing the index terms in the index DB.
  • the document indexing unit records and stores additional information, including the extracted index terms, in the index DB in association with corresponding document information.
  • the document indexing server further comprises a document editing unit for revising or deleting the document information registered in the document registration unit.
  • the document indexing unit is configured to eliminate unnecessary index terms, included in a stopword dictionary, from the index terms extracted from the document information registered in the document registration unit .
  • the DB generation unit is configured to accumulatively calculate document frequencies of respective pieces of document information for the autocoinplete list of the autocomplete DB, and record information about the document frequenci es .
  • the DB generation unit is configured to exclude an index term having a document frequency of 0 from the autocomplete list for which the document frequencies are accumulatively calculated.
  • the autocomplete server comprises a query entry unit for receiving the query to be searched for through the user interface and converting the query into the index term, a DB searching unit for searching the autocomplete DB for the index term provided by the query entry unit, an index term determination unit for checking document frequency information of the index terms stored in the autocomplete DB, determining appropriate index terms to be autocomplete list information, and providing the document frequency information, a presentation unit for converting the autocomplete list information provided by the index term determination unit into queries and providing the queries, a selection unit for providing both the entered query and the queries of the autocomplete list, provided by the presentation unit, through the user interface, receiving selected query information together with an event signal, and converting the selected query information into the index term, and a service association unit for searching for the document information in response to the index term information received from the selection unit and a search event signal, and providing retrieved document information.
  • the query entry unit is configured such that the query is input as a unit letter designated by any one selected from a phoneme, a syll
  • the query entry unit calls the autocomplete DB using an Asynchronous JavaScript and XML (AJAX) method and searches the autocomplete DB for an index term whenever a query is entered.
  • AJAX Asynchronous JavaScript and XML
  • the query entry unit is configured to receive the query information through the User Interface (UI).
  • UI User Interface
  • the DB searching unit is configured to individually search for the index term using a right-hand truncation method and a left-hand truncation method, and generate results of the search in the autocomplete list.
  • the index term determination unit is configured to determine to include index terms, having a document frequency of 1 or more, in the autocomplete list and to provide the index terms.
  • the presentation unit is configured to adjust ranking of queries in the autocomplete list using one or more selected from among input statistical information of the queries, document frequency information of the queries, and alphabetic sequence information of the queries.
  • the service association unit is configured to search for document information matching the index term information through calling of an Application Programming Interface (API).
  • API Application Programming Interface
  • the present invention provides a query type-based automatic completion method capable of guaranteeing presence of search results, comprising a process of collecting and registering document information, extracting index term information and document frequency information from the registered document information, storing the extracted information in an index Database (DB), generating autocomplete list information for the index term information, storing the autocomplete list information in an autocomplete DB, and revising registered document information, a process of converting a query, entered from the autocomplete DB through a user interface, into an index term, and including index terms, which are searched for at a document frequency of 1 or more, in the autocomplete list information, and a process of converting the index terms of the autocomplete list into queries and presenting the queries through the user interface, converting a query, which is selected and entered from an outside through the user interface, into an index term, and output ting document information retrieved by searching for the index term.
  • DB index Database
  • the revision process comprises a step of collecting the document information using a document collection unit, a registering the collected document information using a document registration unit, a step extracting index terms from the registered document information using a document indexing unit, and storing the index terms in an index DB, a step of extracting index terms to be provided in the autocomplete list from the index term information stored in the index DB using a DB generation unit, and storing the index terms in the autocomplete DB, and a step of revising or deleting the registered document information using a document editing unit.
  • indexing based on morpheme analysis and N-gram indexing.
  • the index terms stored in the index DB include additional information, and unnecessary index terms are eliminated using a stopword dictionary, and such that updated document frequency information of respective pieces of document information is stored in the autocomplete DB.
  • the entered query is input as a unit letter designated by any one selected from a phoneme, a syllable, a word phrase and a word.
  • the entered query is converted into an index term, and is searched for by calling the autocomplete DB using an Asynchronous JavaScript and XML (AJAX) method.
  • AJAX Asynchronous JavaScript and XML
  • the present invention provides a storage medium for storing a program source for a query type-based automatic completion method capable of guaranteeing presence of search results, comprising a process of collecting and registering document information, extracting index term information and document frequency information from the registered document information, storing the extracted information in an index Database (DB), generating autocomplete list information for the index term information, storing the autocomplete list information in an autocomplete DB, and revising registered document information, a process of converting a query, entered from the autocomplete DB through a user interface, into an index term, and including index terms, which are searched for at a document frequency of 1 or more, in the autocomplete list information, and a process of converting the index terms of the autocomplete list into queries and presenting the queries through the user interface, converting a query, which is selected and entered from an outside through the user interface, into an index term, and output ting retrieved document information retrieved by searching for the index term.
  • DB index Database
  • the query autocomplete list process is a process of adjusting ranking of queries in the autocomplete list using one or more selected from among input statistical information of the queries, document frequency information of the queries, and alphabetic sequence information of the queries.
  • the entered query process is a process of calling the autocomplete DB using an AJAX method and searching the autocomplete DB whenever the query is input as a unit letter designated by any one selected from among a phoneme, a syllable, a word phrase and a word.
  • the present invention provides a query type-based automatic completion system capable of guaranteeing presence of search results, comprising a server system for receiving registration of document information, extracting index terms and document frequency information from the registered document information to construct an autocomplete DB, converting a query, which is entered from an outside through a user interface, into an index term, providing an autocomplete list of index terms, including the input index term, from the autocomplete DB through the user interface, converting a query, which is selected and entered through the user interface, into an index term, and providing document information including the index term through the user interface, a public communication network connected to the server system and configured to transmit or receive the query and retrieved document information through a communication path selected from a wired communication path and a wireless communication path, and a terminal unit implemented as a computer connected to the public communication network, and configured to receive the query to be searched for through the user interface, transmit the query to the server system, display the autocomplete list of the query provided by the server system on the user interface, receive a
  • the server system comprises a document indexing server for receiving registration of the document information, extracting the index terms and the document frequency information, and constructing an autocomplete Database (DB), an autocomplete server for receiving the query from an outside through the user interface, converting the query into an index term, extracting information about an index term list, including the index term, converting the index term list information into queries, providing the queries through the user interface, converting a query, which is selected and entered, into an index term, and providing retrieved document information, the autocomplete DB for storing the autocomplete list information generated by the document indexing server in association with the document frequency information, a document collection unit for accessing the document indexing server and registering the collected document information, and an index DB for recording the index term information provided by the document indexing server, and providing the index term information through a search performed by the autocomplete server.
  • DB autocomplete Database
  • the public communication network comprises a wireless communication network for enabling the server system and the terminal unit to be connected to each other through a wireless communication path and transmitting data signals, and a wired communication network for enabling the server system and the terminal unit to be connected to each other through a wired communication path and transmitting data signals.
  • the present invention is advantageous for convenient use in that an entered query is searched for and retrieved queries are presented in an autocomplete form only when the presence of search results is guaranteed, thus improving the reliability of the results of the search.
  • the present invention is advantageous for industrial applications in that retrieved queries are presented in an autocomplete form only when the presence of search results is guaranteed, thus preventing the failure of the search and enabling a fast search.
  • the present invention is advantageous for convenient use in that the document frequency of each query based on the addition or deletion of document information is considered in real time, and type-based information is grouped, thus providing the results of a search as improved results.
  • FIG. 1 is a diagram showing the functional construction of a system for searching a typical DB system for information
  • FIG. 2 is a diagram showing the state in which a query is entered to search for data and retrieved queries are displayed in an autocomplete form according to an embodiment
  • FIG. 3 is a diagram showing the state in which a query is entered and a corresponding search fails according to an example of the prior art
  • FIG. 4 is a diagram showing the functional construction of a query type-based automatic completion system capable of guaranteeing the presence of search results according to an embodiment of the present invention
  • FIG. 5 is a diagram showing the detailed functional construction of the server system of the query type-based automatic completion system capable of guaranteeing the presence of search results according to an embodiment of the present invention
  • FIG. 6 is a flowchart showing a query type-based automatic completion method capable of guaranteeing the presence of search results according to an embodiment of the present invention.
  • FIG. 7 is a diagram showing the state in which document frequencies are updated due to the addition or deletion of document information according to an embodiment of the present invention.
  • FIG. 4 is a diagram showing the functional construction of a query type-based automatic completion system capable of guaranteeing the presence of search results according to an embodiment of the present invention
  • FIG. 5 is a diagram showing the detailed functional construction of the server system of the query type-based automatic completion system capable of guaranteeing the presence of search results according to an embodiment of the present invention
  • FIG. 6 is a flowchart showing a query type-based automatic completion method capable of guaranteeing the presence of search results according to an embodiment of the present invention
  • FIG. 7 is a diagram showing the state in which a document frequencies are updated due to the addition or deletion of document information according to an embodiment of the present invention.
  • the query type-based automatic completion system capable of guaranteeing the presence of search results according to the present invention includes a server system 100, a public communication network 200 and a terminal unit 300.
  • the server system 100 receives registration of document information, extracts an index term, constructs an autocomplete Database (DB), receives a query, converts the query into an index term, searches the autocomplete DB for index terms which include the above index term, converts the index terms into queries, and provides the queries.
  • DB autocomplete Database
  • the server system 100 receives registration of the document information, extracts an index term and frequency information thereof from the registered document information, constructs an autocomplete DB using the extracted information, converts a query, which has been entered from the outside through a user interface, into an index term, provides an autocomplete list of index terms, including the index term, from the autocomplete DB through the user interface, converts a query, selected and entered through the user interface, into an index term, and provides document information, including the index term, through the user interface.
  • the public communication network 200 is connected both to the server system 100 and to the terminal unit 300 through a communication path selected from a wired communication path and a wireless communication path, and is configured to transmit or receive data signals.
  • the terminal unit 300 is implemented as a computer terminal for receiving a search target query through a User Interface (UI), providing the query to the server system 100 through the public communication network 200, displaying information provided by the server system 100 through the UI, selecting one query from the displayed query list, receiving the selected query through the UI together with a search event signal, and checking document information, which has been retrieved and provided.
  • UI User Interface
  • the server system 100 includes a document collection unit 110, a document indexing server 120, an autocomplete DB 140, an autocomplete server 130, and an index DB 150.
  • the document collection unit 110 registers collected document information in the document indexing server 120.
  • the collected document information is content document information including webpage document information, format document information, image document information, video document information, text document information, and multimedia document information.
  • the document indexing server 120 which is configured to receive registration of document information, extract an index term, and construct a database for autocomplete list information, includes a document registration unit 121, a document indexing unit 12.3, a DB generation unit 124, and a document editing unit 122.
  • the document registration unit 121 records and registers a new information document provided by the document collection unit 110 in a separate document information storage DB (not shown).
  • Methods in which the document indexing unit 123 extracts additional information from registered document information include a method of extracting index terms from input document information and a method of extracting information designated by text processing. An operation of extracting additional information is performed using one or more selected from among the above methods. Further, the extracted additional information is recorded and stored in the index DB in association with document information.
  • the index term extraction method corresponding to the former is performed in such a way that additional information including an index term is extracted from the registered document information using one method selected from indexing based on morpheme analysis and N-gram indexing, and is stored in the index DB.
  • index term extraction methods include a method using indexing based on morpheme analysis and a method using N-gram indexing. An index term is extracted from registered document information using a selected method.
  • 'morpheme' is the smallest word or part thereof that has a meaning, and means that it cannot be broken down further into smaller units.
  • the morpheme analysis indexing is also called longest match strategy, in which an analysis method, including the longest number of characters, is employed as a method of dividing words or phrases when there is a plurality of possibi lities.
  • the N-gram indexing is a method using N adjacent syllables.
  • N adjacent syllables For example, in the case of 'SH " ⁇ 1 . ' ⁇ - 1 , 1 ⁇ T, 1 ⁇ ', 1 ⁇ M 1 and ' 71#' are syllables, which are respectively used as queries. Of these syllables, meaningless N-gram queries may be used to search for inappropriate document information. In order to prevent this case, weights are assigned to respective syllables.
  • the document indexing unit 123 eliminates unnecessary index terms, included in a stopword dictionary, from the index terms extracted from the registered document information.
  • the term 'stopword means a word that is not used as an index term at the time of Internet search, for example, words having no meaning as an index term, such as an article, a preposition, a postposition, and a conjunction.
  • the DB generation unit 124 searches the index terms stored in the index DB 150 for respective type-based index terms to be provided in an autocomplete form, records retrieved index terms in the autocomplete DB 140, newly calculates and manages information about the document frequency of each index term. Information about the document frequency occurring in each piece of document information is recorded, and a relevant index term having a document frequency of 0 is excluded from the autocomplete list.
  • a 'document frequency' The frequency of occurrence (hereinafter referred to as a 'document frequency') is to indicate whether a designated index term is included in one piece of document information. If a query occurs in one piece of document information, a value of '1' is given, whereas if a query does not occur, a value of '0' is given.
  • the autocomplete DB 140 records information about the autocomplete list for the index term generated by the document indexing server 120 together with the document frequency information thereof.
  • the autocomplete server 130 is configured to extract index term information, including the index term, by searching the autocomplete DB 140, to convert the index term information into a query, and to provide the query.
  • the autocomplete server 130 includes a query entry unit 131; a DB searching unit 132; an index term determination unit 133; a presentation unit 134; a selection unit 135; and a service association unit 136.
  • the query entry unit 131 is configured to receive a search query through the user interface and convert the query into an index term, and is operated such that it searches the autocomplete DB by calling the autocomplete DB using an Asynchronous JavaScript and XML (AJAX) application method whenever the query is entered as a unit letter designated by any one selected from a phoneme, a syllable, a word phrase, and a word through a UI.
  • AJAX Asynchronous JavaScript and XML
  • the AJAX application method is a method of requesting and receiving only required data from a web server and processing the data in a client.
  • the web server creates and provides a webpage in response to searched or requested content, and creates and provides a new webpage when new content is requested.
  • part of data that was processed by the web server is processed by a client or a terminal connected to the web server, so that the amount of data exchanged between the web server and the client is decreased, a bandwidth is decreased, and the amount of entire data to be processed by the web server is decreased, thus improving responsiveness and enabling interactive data exchange.
  • the AJAX method is disadvantageous in that an inapplicable browser is present, the functionality of a Hypertext Transfer Protocol (HTTP) client is limited, a security problem is present, and debugging is not facilitated due to the creation of a script.
  • HTTP Hypertext Transfer Protocol
  • the AJAX method is widely used because of the advantages thereof in that fast screen switching is possible in the state in which a webpage is almost fixed, and in that, since part of data processing is assigned to the client or terminal, the load of the server is decreased, data processing time is reduced, asynchronous data communication is possible, and both the bandwidth and communication time are reduced owing to a small amount of data.
  • the DB searching unit 132 which is configured to search the autocomplete DB for an index term that has been entered and converted by the query entry unit, individually searches the autocomplete DB using a right- hand truncation method and a left-hand truncation method, and creates the results of the search in a list.
  • the index term determination unit 133 checks the document frequency information of the index terms stored in the autocomplete DB 140 and determines to provide index terms, the document frequency information of which has a value of 1 or more, in a list, and thus provides the index terms as autocomplete list information.
  • the presentation unit 134 converts the autocomplete list information provided by the index term determination unit 133 into query information, and provides the query information through the user interface.
  • the presentation unit 134 adjusts the ranking or sequence of queries displayed in the list using one or more selected from among the input statistical information of entered queries, the document frequency information of the queries, and the alphabetic sequence information of the queries.
  • the selection unit 135 provides both the entered query and the autocomplete list, converted and provided by the presentation unit 134 into queries, through the user interface, receives selected query information, and converts the query information into an index term.
  • the service association unit 136 searches for the index term, which is received from and provided by the selection unit 135, in response to a search event signal, and provides document information retrieved in the index DB 150, wherein the search is performed by Application Programming Interface (API) calling.
  • API Application Programming Interface
  • the API is a specific method preset by a computer operating system or by some other application program by which processing can be requested from the operating system or the application program.
  • the API is the interface of the operating system or a program and differs from a graphic user interface or an imperative interface which directly interfaces with the user.
  • the API is the format of a language or message used when an application program communicates with a system program such as an operating system or a DB management system.
  • the API is implemented by calling a function which provides a connection to a specific subroutine so as to execute the subroutine in the program.
  • ⁇ i34> That is, a single API is composed of several program modules or routines which already exist or must be connected to execute a requested task by calling a function.
  • the server includes components for utilizing a computer, communicating with a network, executing computer operation processing, and performing various functions.
  • the respective components are operated by the processor, memory, input/output means, etc. of the server.
  • the server system 100 of the present invention having the above construction includes the document indexing server 120 for receiving registration of document information, indexing the document information, and constructing the autocomplete DB 140, the autocomplete provision server 130 for converting a query entered by the user into an index term, searching the autocomplete DB 140 for index terms, including the index term, and providing the index terms in a list converted into queries, the document collection unit 110 for collecting document information, the autocomplete DB 140 for recording together index terms and type information thereof, and the index DB 150 for recording information about the index terms.
  • the document indexing server 120 includes the document registration unit 121 for receiving registration of document information, the document indexing unit 123 for extracting index terms from the document information and indexing the index terms, the DB generation unit 124, and a document editing unit 122.
  • the document registration unit 121 receives registration of document information through the document collection unit 110 including a document register, a knowledge management system, a document collector, etc.
  • the registered document information includes all types of content, such as webpage document information, text document information, format document information, image document information, and video document information.
  • the document indexing unit 123 extracts index terms (queries) that are detected through an indexing method selected from indexing based on morpheme analysis and N-gram indexing, and stores the extracted index terms in the index DB 150.
  • a method in which the document indexing unit 123 extracts index terms is configured to perform an additional information operation such as by extracting a specific index term from the registered document information, or by extracting specific information through text processing.
  • the additional information, extracted from the document registration unit 121, is added to the index DB 150, and unnecessary index terms are eliminated in advance using a stopword dictionary or the like.
  • the DB generation unit 124 extracts type-based index term information, provided in an autocomplete form, from the index term information stored in the index DB 150, records the extracted type-based index term information in the autocomplete DB 140 together with the index term information, and newly calculates information about the document frequency of each index term.
  • the document frequency information is obtained by recording information about the number of times that each relevant index term appears in a document, and is configured such that, when the value of a document frequency is 0, information about a corresponding index term is excluded from targets to be presented in an autocomplete list.
  • the document editing unit 122 revises or deletes previously registered document information. As the document information is revised or deleted, the index terms in relevant document information and the document frequencies thereof must be changed, thus influencing the DB generation unit 124.
  • the present invention is configured to search the document information, which has been input to and registered in the system, for index terms, to extract the index terms, and to manage the extracted index terms by separately indicating the extracted index terms in various index terms, which have been previously recorded in an index term dictionary managed by the system.
  • a query and an index term are used as terms having the same meaning. That is, the term "query” is used through an interface with the user, and is displayed to allow the user to input or select the query. The query is entered to the system and is converted into an index term, which is converted into a query and is then presented or output .
  • the present invention having the above construction is advantageous in that, when new document information is added, the extraction of index terms is performed, and the document frequencies of respective index terms are automatically accumulatively calculated and changed in the index term dictionary, thus enabling the autocomplete list to be updated in real time.
  • the index DB in a search engine provided in the system is provided with an index term dictionary, a biographical dictionary, etc.
  • the biographical dictionary is composed of the names of persons directly received from the URI server through a web service. That is, the index DB adds information about authors (persons) of service target document information, such as thesis information, in real time, without holding a list of index terms acquired from corpus or the like.
  • FIG. 7 is a diagram showing the state in which document frequencies are updated due to the addition or deletion of document information.
  • index terms are type- based information formed in an autocomplete form, they are pieces of information extracted from one piece of document information, and thus the document frequencies of the index terms are set to '1' respectively.
  • the technical spirit of the present invention is to solve this problem and to provide an autocomplete list capable of guaranteeing the presence of search results in real time without causing a temporal difference by immediately adjusting the document frequency information of autocomplete target queries at the time of registration or editing of document information.
  • the conventional OntoFrame provides an entity-centric integrated search, and such an entity is a subset of queries.
  • 62> The checking of the types of respective queries is performed by calling a search engine, the entered query is converted into an index term, and both an index term dictionary and a biographical dictionary provided in the search engine are referred to together for an index term.
  • type-based information indicating whether an entered query is of an area type, a person type, or another type is also searched for while an index term and a person name having a document frequency of 1 or more are searched for.
  • the autocomplete interface of the present invention may recognize the types of index terms on the basis of the results of the search and may display the types of index terms using icons, colors, tree classification, etc.
  • the autocomplete server 130 includes the query entry unit 131 for receiving a query from the user and converting the query into an index term, the DB searching unit 132 for searching the autocomplete DB 140, including index terms, for corresponding index terms, the index term determination unit 133 for checking document frequency information recorded and stored in the autocomplete DB 140 and determining whether to provide an autocomplete list, the presentation unit 134 for converting the autocomplete list into queries, and providing the queries to a search interface through the User Interface(UI), the selection unit 135 for providing the UI to allow a specific query to be selected from the autocomplete list which includes the presented queries, and converting a selected query into an index term, and the service association unit 136 for providing document information retrieved by searching for the selected index term in response to an event signal attributable to the manipulation of a search button or a keyboard through a search service.
  • the query entry unit 131 for receiving a query from the user and converting the query into an index term
  • the DB searching unit 132 for searching the autocomplete DB 140, including index
  • the query entry unit 131 receives a query through a search box provided by the user interface and converts the query into an index term.
  • the index term calls the DB searching unit 132 using an AJAX method whenever one character based on a phoneme, a syllable, a word phrase and a word is entered.
  • the DB searching unit 132 searches the autocomplete DB 140 for the input index term, and thus determines whether index terms, including the index term, are present.
  • the search is performed using a right-hand truncation method of matching the front part of index terms in such a way that, for example, for an index term j s retrieved, and using a left-hand truncation method of matching the rear part of index terms in such a way that, for the index term is retrieved.
  • the index term determination unit 133 determines to present index terms, having a document frequency of 1 or more among index terms which are determined to include the entered query through the DB searching unit 132 or which are retrieved through matching, in an autocomplete list.
  • the fact that the document frequency is 1 or more means that document information including a relevant index term (query) is present in the search system.
  • the presentation unit 134 converts the index terms obtained by the index term determination unit 133 into queries, and presents the queries in the autocomplete list.
  • the ranking or sequence of queries arranged and displayed in the autocomplete list is adjusted using one or more selected from among the input statistical information of queries entered by users, the document frequency information of the queries, and the alphabetic sequence information of the queries, according to a typical automatic completion method.
  • the ranking of queries in the autocomplete list is adjusted using the document frequency information or alphabetic sequence information of relevant queries included in the autocomplete DB 140.
  • the selection unit 135 receives the selected query and converts the query into an index term.
  • the selection from the presented query list is performed by designating a specific query using up/down buttons of a keyboard provided in the terminal unit 300, or a mouse, and by selecting one query from the autocomplete list.
  • the selected query (index term) information is transmitted to the service association unit 136 together with relevant event signal information.
  • the service association unit 136 processes a service of receiving the selected query, searching the index DB 150 for index information by calling an API in response to an event signal attributable to the manipulation of a keyboard such as a search button or an enter key, and then providing document information matching the query.
  • FIG. 6 is a flowchart showing a query type-based automatic completion method capable of guaranteeing the presence of search results according to an embodiment of the present invention.
  • the query type-based automatic completion method capable of guaranteeing the presence of search results according to an embodiment of the present invention includes a revision process! a determination process; and an output process.
  • the revision process is a process of collecting and registering document information, extracting index terms, storing the index terms in the index DB, generating index terms to be provided in an autocomplete form, storing the index terms in the autocomplete DB, and revising or deleting the registered document information.
  • the revision process includes a step SlOO of collecting document information using the document collection unit, a step of recording and storing the collected document information in a separate document information DB (not shown) using the document registration unit, a step SIlO of extracting additional information including the index terms from the registered document information using the document indexing unit and storing the additional information in the index DB, a step S120 of generating index terms to be provided in an autocomplete form from the information stored in the index DB using the DB generation unit, and storing the index terms in the autocomplete DB, and a step S130 of revising or deleting the registered document information using the document editing unit.
  • the methods of extracting index terms include indexing based on morpheme analysis and N-gram indexing. One selected from the indexing based on morpheme analysis and the N-gram indexing is used.
  • a text processing method may be used.
  • the determination process is performed to search the autocomplete DB 140 for an index term input through the user interface, and determine index terms having a document frequency of 1 or more to be search index terms.
  • the input index term calls the DB searching unit 132 using the AJAX method whenever the input index term is input as a unit letter designated by any one selected from among a phoneme, a syllable, a word phrase and a word at steps S140 to S160.
  • the output process is performed to convert the determined index terms into queries, present the queries in an autocomplete list, search for selected index term information, and output document information matching the index term information at steps S170 to S190.
  • the ranking (sequence) of the autocomplete list of the queries is adjusted by one or more selected from the input statistical information of queries entered by users, and the document frequency information and the alphabetic sequence information of the autocomplete DB.
  • the method of the present invention can be implemented in the form of computer-readable code in a computer-readable storage medium.
  • the computer- readable storage medium is a recording device in which data readable by a computer system is stored.
  • the storage medium may be, for example, Read-Only Memory (ROM), Random Access Memory (RAM), cache memory, a hard disc, an optical disc, a floppy disc, magnetic tape, etc.
  • the storage medium may be provided in carrier wave form, and may include, for example, the case provided through the Internet.
  • the computer-readable storage medium may be distributed to computer systems connected through a network and computer-readable code may be stored and executed in the computer systems in a distributed manner.
  • the present invention relates to a query search system, and is advantageous for industrial applications in that it presents retrieved queries in an autocomplete form only when the presence of search results is guaranteed, thus not only increasing the reliability of results of the search, but also preventing the failure of a search and enabling a fast search, and is advantageous for convenient use in that the document frequency of each query based on the addition or deletion of document information is considered in real time, and type-based information is grouped, thus providing the results of a search as improved results.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclosed herein is a query type-based automatic completion system and method capable of guaranteeing the presence of search results and a storage medium for storing a program source therefor. The query type-based automatic completion system includes a document indexing server for receiving registration of document information, extracting index term information and document frequency information, recording the index term information and the document frequency information, and generating autocomplete list information from the extracted index term information, an autocomplete DB for recording the autocomplete list information in association with the document frequency information, and an autocomplete server for searching the autocomplete DB, extracting autocomplete list information from the autocomplete DB, converting the autocomplete list information into queries, providing the queries, converting a query, which is selected and entered, into an index term, searching for document information including the index term, and providing the document information through the user interface.

Description

[DESCRIPTION] [Invention Title!
SYSTEM AND METHOD OF AUTO-COMPLETE WITH QUERY TYPE UNDER GUARANTEE OF SEARCH RESULTS AND STORAGE MEDIA HAVING PROGRAM SOURCE THEREOF [Technical Field]
<i> The present invention relates, in general, to a query search system, and, more particularly, to a query type-based automatic completion system and method capable of guaranteeing the presence of search results, which can present lists and the types of queries for which the results of a search for an entered query are present and can cope with the addition or deletion of document information in real time, and to a storage medium in which a program source therefor is recorded. [Background Art]
<2> Modern society exists now in the so-called 'Information Age' in which a large amount of information ranging from long-existing information to the latest information can be easily accessed either for charge or for free, with the development of data telecommunications using computers, the Internet, communication networks or the like.
<3> This information may include text, sound, pictures, movies, multimedia, etc. Typically, the content of information is recorded in the form of text .
<4> Further, as the amount of information increases, a device for recording and managing information and technology for rapidly and accurately searching for required information as needed have been required. This technology is typically called a Database (DB), and a DB-only computer is called a server. Everyone can easily access the server from a remote place over a data communication network such as the Internet and can easily search for and utilize required information.
<5> The above-described DB includes all information about text, sound, pictures, movies and multimedia, but recording is mainly performed on the basis of text information to record and manage a maximum amount of information in a limited storage space.
<(» Text information may be classified into vocabularies recognized by human beings and vocabularies recognized by computers including programs.
<?> In order to rapidly search a large amount of information stored in the DB for desired content, terms implemented using standardized vocabularies by which information can be exchanged between human beings and computers and which can be recognized both by human beings and by computers are required. Such standardized terms are called a set of concepts or ontology, and technology for selectively searching a web such as the Internet, in which a large amount of information is shared, for desired information using ontology, is called semantic web technology.
<s> Generally, there is a need for an information search to acquire expert knowledge or to make proper decisions or determinations in management, and a technique for rapidly mining desired or required technical information from accumulated large-capacity technical information is another independent technical field.
<9> A vocabulary or a term input from a DB server for an information search is called a query, an entity or a recommendation, and is referred to hereinafter as 'query' whenever possible.
<ιo> A method of determining the state, in which part of a query is entered, to be a completed state, and displaying retrieved queries is an automatic completion method.
<ii> The automatic completion method is a scheme for displaying queries, which have been previously entered and searched, in a list, selecting one from the displayed query list, and promptly displaying the selected one, in the case of queries such as names, addresses, and titles which are repeatedly input in a web browser or other types of data search software.
<12> An automatic completion method implemented using Asynchronous JavaScript and Extensile Markup Language (XML) (AJAX) technology, one of web 2.0 technologies, has been widely applied to various fields, such as various websites including Internet portal sites, digital libraries, enterprise Web 2.0, and specialized application field programs.
<i3> Since such an automatic completion method has been proven already to effectively satisfy users preferences, from a standpoint of improving user experience, it is expected that the automatic completion method will be more widely used in the future for search interfaces.
<i4> However, since an automatic completion method that has been provided to date is a scheme for arranging queries using log information stored in response to a query entered by the user, or using its own dictionary, there is a problem in that, when a query to be searched for does not appear in the upper portion of a displayed list, the entire portion of the presented autocomplete list must be sequentially observed.
<15> Further, unlike portal sites including a large amount of content, the site of an individual business or specific application field presents an autocomplete list depending on the frequency of entry of queries even if the success of a search is not guaranteed due to a relatively small amount of content, thus decreasing the reliability of a search function.
<I6> FIG. 1 is a diagram showing the functional construction of a system for searching a typical DB system for information.
<i7> With reference to FIG. 1, the concept of a search in a DB server for required information using text information is described below. Various types of information composed of text are recorded and managed in the DB server in large quantities.
<i8> In order to search text information recorded in the DB server for required information, a query is entered to a computer terminal.
<i9> The computer terminal is provided with a search program, and is configured to analyze an entered query (entity) using the search program and to search the DB server for index information corresponding to the query.
<20> A plurality of pieces of index information, including the query, are retrieved and provided in a list. When one index is selected from the list, information corresponding to the selected index is ultimately retrieved and is output to the computer terminal. <2i> The retrieved information is used as reference data required to obtain higher or various types of knowledge, or required by operators or managers to make decisions or determinations.
<22> The amount of information recorded and managed in the DB server rapidly increases with the advance of knowledge and science, and thus there is a problem in that a lot of time is required to search for desired information by analyzing the desired information using an entered query.
<23> FIG. 2 is a diagram showing the state in which a query is entered to search for data and retrieved queries are displayed in an autocomplete form according to an embodiment.
<24> Referring to FIG. 2, in detail, in the state in which "tfl" is entered as a query, a list of queries retrieved in an autocomplete form is shown. The retrieved list is divided into a portion based on right-hand truncation and a portion based on left-hand truncation, which are separately displayed. In FIG. 2, the query is entered as the letter of the Korean alphabet, but characters of other languages, such as the English alphabet, may also be entered.
<25> As queries that are retrieved by searching for the entered query in an autocomplete form and are displayed using a right-hand truncation method, there are
Figure imgf000006_0001
"t$ψ and
Figure imgf000006_0002
Further, as the queries that are displayed using a left-hand truncation method, there are "°1
Figure imgf000006_0003
<26> The queries
Figure imgf000006_0004
etc., which have been retrieved using the entered query and are displayed in an autocomplete form, are classified as an institution type,
Figure imgf000006_0005
is classified as a nation type,
Figure imgf000006_0006
is classified as a group type.
<27> Currently, when a query is entered to perform a search in a plurality of portal sites of Korea, for example, Naver (www.naver.com) or the like, automatically completed queries, including the entered query, are retrieved and provided in a list. <28> Since a query is completed while being continuously entered, and a desired query is selected from the list of automatically completed queries and is entered even in the state in which part of the query is entered, the time required for the entry of a query for a search is reduced, and the convenience of use is improved.
<29> A specific business or a specific application field which does not include a large amount of content, unlike portable sites, cannot guarantee the presence of search results (successful presentation of search results) for an entered query due to insufficiency in the amount of content, and therefore presents an autocomplete list based only on the frequency of entry of queries, thus deteriorating the reliability of a relevant search function.
<3o> FIG. 3 is a diagram showing the state in which a query is entered and a corresponding search fails according to an example of the prior art.
<3i> FIG. .3 is described in detail below. For example, currently, a website 'bestbuyer' (www.bb.co.kr) which is a site for comparing the prices of products with each other, is accessed, and "ώj-2ju)-" is entered as a query. For the entered query, "oj-jrju}-" and "oj-^uf ^]Ij-" are retrieved, and are displayed in an autocomplete form.
<32> A search for index information matching the query is displayed as a failure. For example, a search for the index information matching the retrieved query 'oj-nm-1 an(j a search for the index information matching the retrieved query '0I"51!1-]- ^1]"' are displayed as failure. The reason for this failure is that information of a product corresponding to the query may be deleted due to, for example, low sales volume of the corresponding product, the exhaustion of stocks of the product, the expiration of the validity period of the product, etc.
<33> In spite of the above reason, the reliability of a search system to the user is deteriorated due to the failure of a search.
<34> As an example of the prior art, a scheme for presenting only queries for which successful search results are obtained, as an auto-complete feature to be applied to a UMLS Knowledge Source Server (UMLSKS) interface, is disclosed in "UMLSKS SUGGEST: An Auto-complete Feature for the UMLSKS Interface Using AJAX" by A. Bangalore, A. Browne, and G. Divita, in Proceedings of AMIA, 1106.
<35> The above-described prior art is a scheme for setting a flag when a search for a query is successful. By the scheme, it is determined whether to present an autocomplete list.
<36> However, when content, including a query for which a search has failed, is added subsequently, it is not presented as an autocomplete list until the content is entered as a query and a search is attempted.
<37> Further, the prior art, disclosed in "Auto Complete Method for Web Application Form Based on Term Hierarchy" by M. Takasi et al., in Proceedings of the Annual Conference on JSAI (in Japanese), 1106, is related to an automatic completion function. This prior art relates to technology for supporting format conversion so that a list of queries can be compatibly used in different application programs. Since the prior art does not provide an autocomplete list for guaranteeing the presence of search results, the ranges of application greatly differ, and thus have no relationships therebetween.
<38> Further, the prior art, disclosed in "Location-Based Search Term Recommendation System for Mobile Terminals" by Kwangjo Lee, Jinwoo Song, Jeongseok Han, and Sungbong Yang, in Autumn Conference of the Korean Institute of Information Scientists and Engineers, 1107., is a technology for overcoming limitations in the query storage space of a terminal and considering user location information using a remote recommendation server in a mobile terminal. However, since this technology does not provide an autocomplete list for guaranteeing the presence of search results, the ranges of application greatly differ, and thus have no relationships therebetween.
<39> Further, as the prior art, there is technology disclosed in "A Semantic Portal for Researchers Using OntoFrame" by W. Sung, H. Jung, P. Kim, th
I. Kang, S. Lee, M. Lee, D. Park, and S. Hahn, in Proceedings of the 6
International Semantic Web Conference, 1107. <40> In the prior art, the term "OntoFrame" means a semantic web service framework constructed to provide technical research information analysis service on the basis of Extensible Markup Language (XML), Resource Description Framework (RDF), Web Ontology Language (OWL), SPARQL Protocol and RDF Query Language (SPARQL), which are semantic web standard technologies.
<4i> In the prior art, existing DBs are collected with reference to modeled ontology and are converted into RDF triple format, and thus an OntoReasoner, which is an inference engine, utilizes the RDF triple format as knowledge.
<42> Further, as the prior art, there is a technology disclosed in "Allocation of Themes and Fields to Technical and Scientific Documents using Thesaurus and Field Classification System" by HanMin Jung, InSoo Kang, and WonKyeong Sung, in Summer Conference of the Korean Society for Language and information, 1106.
<43> The prior art is configured such that a Uniform Resource Identifier (URI) server functions to collect and convert information, and performs a document classification function by extracting queries from the original text and allocating the queries to the original text.
<44> The OntoFrame service of the semantic web service framework is intended to provide a query (ent ity)-centric integrated search function, which is similar to the vertical search function of Naver, which is a portal site.
<45> That is, this is intended to detect the type of a specific query and generate search results complying with the detected type. For example, when a user enters a person's name "Christian Becker" as a query, information about similar researchers and researchers in citation are presented. When a topic word "Semantic Web" is entered as the query, information about topic trends, See Also, researchers by topic, papers by topic, researcher network, etc. is presented.
<46> In particular, when the results of a search for a query are provided, the results of the extraction of the topic word performed by the URI server are used. Since the results of the extraction of the topic word are propagated to persons, institutions, etc., by an inference engine so as to enable the construction of topic transition, topic-based experts and topic- based theses, there is a problem in that it is not possible to implement an accurate semantic web service unless corresponding document information is added .
<47> Even in the conventional OntoFrame service, an automatic completion function has been provided in such a way that, of queries in a query dictionary used to extract queries, queries matching a user's query are presented in an autocomplete list.
<48> However, there may occur a problem in that queries not included in the extracted queries are presented in an autocomplete list, thus resulting in a phenomenon in which queried search results are not present, or unreliable search results are presented.
<49> Further, there is a problem in that, since the number of queries matching a character string entered as a query is large, an autocomplete load greatly increases.
<50> Therefore, there is a need to develop technology for solving the above problems by utilizing a method of controlling an autocomplete list using extracted queries.
<5i> Further, there is a need to develop technology for enabling a query type-based automatic completion function by recognizing the type of a query at the stage previous to the presentation of an autocomplete list.
<52> In addition, there is a need to develop technology for providing only the successful results of a search for an entered query in an autocomplete form.
[Disclosure] [Technical Problem]
<53> Accordingly, the present invention has been made keeping in mind the above problems occurring in the prior art, and an object of the present invention is to provide a query type-based automatic completion system and method capable of guaranteeing the presence of search results, which search for an entered query and present retrieved queries in an autocomplete form only when the presence of search results is guaranteed, and a storage medium in which a program source therefor is recorded.
<54> Another object of the present invention is to provide a query type- based automatic completion system and method capable of guaranteeing the presence of search results, which present retrieved queries in an autocomplete form only when the presence of the results of a search for an entered query is guaranteed, thus enabling a fast and accurate search, and a storage medium in which a program source therefor is recorded.
<55> A further object of the present invention is to provide a query type- based automatic completion system and method capable of guaranteeing the presence of search results, in which the document frequency of each query based on the addition or deletion of document information is considered in real time, and type-based information is grouped, thus providing the results of a search as improved results, and a storage medium in which a program source therefor is recorded. [Technical Solution]
<56> In order to accomplish the above objects, the present invention provides a query type-based automatic completion system capable of guaranteeing presence of search results, comprising a document indexing server for receiving registration of document information, extracting index term information and document frequency information from the registered document information, recording the index term information and the document frequency information, and generating autocomplete list information from the extracted index term information, an autocomplete Database (DB) for recording the autocomplete list information, generated by the document indexing server, in association with the document frequency information, an autocomplete server for searching the autocomplete DB, extracting autocomplete list information including the index term information from the autocomplete DB, converting the autocomplete list information into queries, providing the queries through a user interface, converting a query, which is selected and entered, into an index term, searching for document information including the index term, and providing the document information through the user interface, a document collection unit for registering collected document information in the document indexing server, and an index DB for recording the index term information provided by the document indexing server and providing the index term information to the autocomplete server.
<57> The document collection unit collects one or more selected from among various types of content document information, including webpage document information, format document information, image document information, video document information, text document information, and multimedia document information.
<58> The document indexing server comprises a document registration unit for registering the document information collected by the document collection unit, a document indexing unit for extracting index terms from the document information registered by the document registration unit and storing the index terms in the index DB, and a DB generation unit for searching the index terms stored in the index DB for index term information provided in an autocomplete list, recording retrieved index term information in the autocomplete DB, and updating and managing the document frequency information.
<59> The document indexing unit is configured to extract additional information, including the index term information, through one or more selected from a scheme for extracting the index terms from the document information registered in the document registration unit and a scheme for extracting index term information designated by text processing.
<60> Further, the document indexing unit extracts the index terms, using any one method selected from indexing basxl on morpheme analysis and N-gram indexing, from the document information registered in the document registration unit, and storing the index terms in the index DB.
<6i> Further, the document indexing unit records and stores additional information, including the extracted index terms, in the index DB in association with corresponding document information. <62> Further, the document indexing server further comprises a document editing unit for revising or deleting the document information registered in the document registration unit.
<63> Further, the document indexing unit is configured to eliminate unnecessary index terms, included in a stopword dictionary, from the index terms extracted from the document information registered in the document registration unit .
<64> The DB generation unit is configured to accumulatively calculate document frequencies of respective pieces of document information for the autocoinplete list of the autocomplete DB, and record information about the document frequenci es .
<f>5> Further, the DB generation unit is configured to exclude an index term having a document frequency of 0 from the autocomplete list for which the document frequencies are accumulatively calculated.
<66> The autocomplete server comprises a query entry unit for receiving the query to be searched for through the user interface and converting the query into the index term, a DB searching unit for searching the autocomplete DB for the index term provided by the query entry unit, an index term determination unit for checking document frequency information of the index terms stored in the autocomplete DB, determining appropriate index terms to be autocomplete list information, and providing the document frequency information, a presentation unit for converting the autocomplete list information provided by the index term determination unit into queries and providing the queries, a selection unit for providing both the entered query and the queries of the autocomplete list, provided by the presentation unit, through the user interface, receiving selected query information together with an event signal, and converting the selected query information into the index term, and a service association unit for searching for the document information in response to the index term information received from the selection unit and a search event signal, and providing retrieved document information. <67> The query entry unit is configured such that the query is input as a unit letter designated by any one selected from a phoneme, a syllable, a word phrase and a word.
<68> Further, the query entry unit calls the autocomplete DB using an Asynchronous JavaScript and XML (AJAX) method and searches the autocomplete DB for an index term whenever a query is entered.
<69> Further, the query entry unit is configured to receive the query information through the User Interface (UI).
<7o> The DB searching unit is configured to individually search for the index term using a right-hand truncation method and a left-hand truncation method, and generate results of the search in the autocomplete list.
<7i> The index term determination unit is configured to determine to include index terms, having a document frequency of 1 or more, in the autocomplete list and to provide the index terms.
<72> The presentation unit is configured to adjust ranking of queries in the autocomplete list using one or more selected from among input statistical information of the queries, document frequency information of the queries, and alphabetic sequence information of the queries.
<73> The service association unit is configured to search for document information matching the index term information through calling of an Application Programming Interface (API).
<74> In order to accomplish the above objects, the present invention provides a query type-based automatic completion method capable of guaranteeing presence of search results, comprising a process of collecting and registering document information, extracting index term information and document frequency information from the registered document information, storing the extracted information in an index Database (DB), generating autocomplete list information for the index term information, storing the autocomplete list information in an autocomplete DB, and revising registered document information, a process of converting a query, entered from the autocomplete DB through a user interface, into an index term, and including index terms, which are searched for at a document frequency of 1 or more, in the autocomplete list information, and a process of converting the index terms of the autocomplete list into queries and presenting the queries through the user interface, converting a query, which is selected and entered from an outside through the user interface, into an index term, and output ting document information retrieved by searching for the index term.
<75> The revision process comprises a step of collecting the document information using a document collection unit, a registering the collected document information using a document registration unit, a step extracting index terms from the registered document information using a document indexing unit, and storing the index terms in an index DB, a step of extracting index terms to be provided in the autocomplete list from the index term information stored in the index DB using a DB generation unit, and storing the index terms in the autocomplete DB, and a step of revising or deleting the registered document information using a document editing unit.
<76> The extraction of the index terms is performed using any one method selected from indexing based on morpheme analysis and N-gram indexing.
<77> The index terms stored in the index DB include additional information, and unnecessary index terms are eliminated using a stopword dictionary, and such that updated document frequency information of respective pieces of document information is stored in the autocomplete DB.
<78> The entered query is input as a unit letter designated by any one selected from a phoneme, a syllable, a word phrase and a word.
<79> The entered query is converted into an index term, and is searched for by calling the autocomplete DB using an Asynchronous JavaScript and XML (AJAX) method.
<80> In order to accomplish the above objects, the present invention provides a storage medium for storing a program source for a query type-based automatic completion method capable of guaranteeing presence of search results, comprising a process of collecting and registering document information, extracting index term information and document frequency information from the registered document information, storing the extracted information in an index Database (DB), generating autocomplete list information for the index term information, storing the autocomplete list information in an autocomplete DB, and revising registered document information, a process of converting a query, entered from the autocomplete DB through a user interface, into an index term, and including index terms, which are searched for at a document frequency of 1 or more, in the autocomplete list information, and a process of converting the index terms of the autocomplete list into queries and presenting the queries through the user interface, converting a query, which is selected and entered from an outside through the user interface, into an index term, and output ting retrieved document information retrieved by searching for the index term.
<8i> The query autocomplete list process is a process of adjusting ranking of queries in the autocomplete list using one or more selected from among input statistical information of the queries, document frequency information of the queries, and alphabetic sequence information of the queries.
<82> The entered query process is a process of calling the autocomplete DB using an AJAX method and searching the autocomplete DB whenever the query is input as a unit letter designated by any one selected from among a phoneme, a syllable, a word phrase and a word.
<83> In order to accomplish the above objects, the present invention provides a query type-based automatic completion system capable of guaranteeing presence of search results, comprising a server system for receiving registration of document information, extracting index terms and document frequency information from the registered document information to construct an autocomplete DB, converting a query, which is entered from an outside through a user interface, into an index term, providing an autocomplete list of index terms, including the input index term, from the autocomplete DB through the user interface, converting a query, which is selected and entered through the user interface, into an index term, and providing document information including the index term through the user interface, a public communication network connected to the server system and configured to transmit or receive the query and retrieved document information through a communication path selected from a wired communication path and a wireless communication path, and a terminal unit implemented as a computer connected to the public communication network, and configured to receive the query to be searched for through the user interface, transmit the query to the server system, display the autocomplete list of the query provided by the server system on the user interface, receive a single selected query together with an event signal, provide the query and the event signal to the server system, and display the document information retrieved and provided by the server system.
<84> The server system comprises a document indexing server for receiving registration of the document information, extracting the index terms and the document frequency information, and constructing an autocomplete Database (DB), an autocomplete server for receiving the query from an outside through the user interface, converting the query into an index term, extracting information about an index term list, including the index term, converting the index term list information into queries, providing the queries through the user interface, converting a query, which is selected and entered, into an index term, and providing retrieved document information, the autocomplete DB for storing the autocomplete list information generated by the document indexing server in association with the document frequency information, a document collection unit for accessing the document indexing server and registering the collected document information, and an index DB for recording the index term information provided by the document indexing server, and providing the index term information through a search performed by the autocomplete server.
<85> Further, the public communication network comprises a wireless communication network for enabling the server system and the terminal unit to be connected to each other through a wireless communication path and transmitting data signals, and a wired communication network for enabling the server system and the terminal unit to be connected to each other through a wired communication path and transmitting data signals.
[Advantageous Effects]
<87> Accordingly, the present invention is advantageous for convenient use in that an entered query is searched for and retrieved queries are presented in an autocomplete form only when the presence of search results is guaranteed, thus improving the reliability of the results of the search.
<88> Further, the present invention is advantageous for industrial applications in that retrieved queries are presented in an autocomplete form only when the presence of search results is guaranteed, thus preventing the failure of the search and enabling a fast search.
<89> In addition, the present invention is advantageous for convenient use in that the document frequency of each query based on the addition or deletion of document information is considered in real time, and type-based information is grouped, thus providing the results of a search as improved results. [Description of Drawings]
<90> FIG. 1 is a diagram showing the functional construction of a system for searching a typical DB system for information;
<9i> FIG. 2 is a diagram showing the state in which a query is entered to search for data and retrieved queries are displayed in an autocomplete form according to an embodiment;
<92> FIG. 3 is a diagram showing the state in which a query is entered and a corresponding search fails according to an example of the prior art;
<93> FIG. 4 is a diagram showing the functional construction of a query type-based automatic completion system capable of guaranteeing the presence of search results according to an embodiment of the present invention;
<94> FIG. 5 is a diagram showing the detailed functional construction of the server system of the query type-based automatic completion system capable of guaranteeing the presence of search results according to an embodiment of the present invention;
<95> FIG. 6 is a flowchart showing a query type-based automatic completion method capable of guaranteeing the presence of search results according to an embodiment of the present invention; and
<%> FIG. 7 is a diagram showing the state in which document frequencies are updated due to the addition or deletion of document information according to an embodiment of the present invention. [Best Mode]
<97> The terms and words used in the present specification and claims should not be interpreted as being limited to their typical meaning based on the dictionary definitions thereof, but should be interpreted to have the meaning and concept relevant to the technical spirit of the present invention, on the basis of the principle by which the inventor can suitably define the implications of terms in the way which best describes the invention.
<98> Hereinafter, embodiments of the present invention will be described in detail with reference to the attached drawings.
<99> Embodiments
<ιoo> The drawings are attached to describe the present invention, wherein FIG. 4 is a diagram showing the functional construction of a query type-based automatic completion system capable of guaranteeing the presence of search results according to an embodiment of the present invention, FIG. 5 is a diagram showing the detailed functional construction of the server system of the query type-based automatic completion system capable of guaranteeing the presence of search results according to an embodiment of the present invention, FIG. 6 is a flowchart showing a query type-based automatic completion method capable of guaranteeing the presence of search results according to an embodiment of the present invention, and FIG. 7 is a diagram showing the state in which a document frequencies are updated due to the addition or deletion of document information according to an embodiment of the present invention.
<!0i> Referring to FIG. 4, the query type-based automatic completion system capable of guaranteeing the presence of search results according to the present invention includes a server system 100, a public communication network 200 and a terminal unit 300. <iO2> The server system 100 receives registration of document information, extracts an index term, constructs an autocomplete Database (DB), receives a query, converts the query into an index term, searches the autocomplete DB for index terms which include the above index term, converts the index terms into queries, and provides the queries.
<io3> In detail, the server system 100 receives registration of the document information, extracts an index term and frequency information thereof from the registered document information, constructs an autocomplete DB using the extracted information, converts a query, which has been entered from the outside through a user interface, into an index term, provides an autocomplete list of index terms, including the index term, from the autocomplete DB through the user interface, converts a query, selected and entered through the user interface, into an index term, and provides document information, including the index term, through the user interface.
<IO4> The public communication network 200 is connected both to the server system 100 and to the terminal unit 300 through a communication path selected from a wired communication path and a wireless communication path, and is configured to transmit or receive data signals.
<i05> The terminal unit 300 is implemented as a computer terminal for receiving a search target query through a User Interface (UI), providing the query to the server system 100 through the public communication network 200, displaying information provided by the server system 100 through the UI, selecting one query from the displayed query list, receiving the selected query through the UI together with a search event signal, and checking document information, which has been retrieved and provided.
<iO6> Referring to FIG. 5, the server system 100 includes a document collection unit 110, a document indexing server 120, an autocomplete DB 140, an autocomplete server 130, and an index DB 150.
<i07> The document collection unit 110 registers collected document information in the document indexing server 120. The collected document information is content document information including webpage document information, format document information, image document information, video document information, text document information, and multimedia document information.
<i08> The document indexing server 120, which is configured to receive registration of document information, extract an index term, and construct a database for autocomplete list information, includes a document registration unit 121, a document indexing unit 12.3, a DB generation unit 124, and a document editing unit 122. The document registration unit 121 records and registers a new information document provided by the document collection unit 110 in a separate document information storage DB (not shown).
<1O9> Methods in which the document indexing unit 123 extracts additional information from registered document information include a method of extracting index terms from input document information and a method of extracting information designated by text processing. An operation of extracting additional information is performed using one or more selected from among the above methods. Further, the extracted additional information is recorded and stored in the index DB in association with document information.
<iιo> Of the methods in which the document indexing unit 123 extracts additional information, including index term information, from the document information registered by the document registration unit 121, the index term extraction method corresponding to the former is performed in such a way that additional information including an index term is extracted from the registered document information using one method selected from indexing based on morpheme analysis and N-gram indexing, and is stored in the index DB.
<iιi> That is, index term extraction methods include a method using indexing based on morpheme analysis and a method using N-gram indexing. An index term is extracted from registered document information using a selected method.
<ιi2> The term 'morpheme' is the smallest word or part thereof that has a meaning, and means that it cannot be broken down further into smaller units.
<ιi3> The morpheme analysis indexing is also called longest match strategy, in which an analysis method, including the longest number of characters, is employed as a method of dividing words or phrases when there is a plurality of possibi lities.
<ιi4> The N-gram indexing is a method using N adjacent syllables. For example, in the case of 'SH"^^1. 'Φ^-1, 1^T, 1^', 1^M1 and ' 71#' are syllables, which are respectively used as queries. Of these syllables, meaningless N-gram queries may be used to search for inappropriate document information. In order to prevent this case, weights are assigned to respective syllables.
<ii5> Further, the document indexing unit 123 eliminates unnecessary index terms, included in a stopword dictionary, from the index terms extracted from the registered document information.
<ii6> The term 'stopword means a word that is not used as an index term at the time of Internet search, for example, words having no meaning as an index term, such as an article, a preposition, a postposition, and a conjunction.
<ii7> The DB generation unit 124 searches the index terms stored in the index DB 150 for respective type-based index terms to be provided in an autocomplete form, records retrieved index terms in the autocomplete DB 140, newly calculates and manages information about the document frequency of each index term. Information about the document frequency occurring in each piece of document information is recorded, and a relevant index term having a document frequency of 0 is excluded from the autocomplete list.
<ii8> The frequency of occurrence (hereinafter referred to as a 'document frequency') is to indicate whether a designated index term is included in one piece of document information. If a query occurs in one piece of document information, a value of '1' is given, whereas if a query does not occur, a value of '0' is given.
<ιi9> The autocomplete DB 140 records information about the autocomplete list for the index term generated by the document indexing server 120 together with the document frequency information thereof.
<i2o> The autocomplete server 130 is configured to extract index term information, including the index term, by searching the autocomplete DB 140, to convert the index term information into a query, and to provide the query. The autocomplete server 130 includes a query entry unit 131; a DB searching unit 132; an index term determination unit 133; a presentation unit 134; a selection unit 135; and a service association unit 136. The query entry unit 131 is configured to receive a search query through the user interface and convert the query into an index term, and is operated such that it searches the autocomplete DB by calling the autocomplete DB using an Asynchronous JavaScript and XML (AJAX) application method whenever the query is entered as a unit letter designated by any one selected from a phoneme, a syllable, a word phrase, and a word through a UI.
<i2i> The AJAX application method is a method of requesting and receiving only required data from a web server and processing the data in a client.
<122> Generally, the web server creates and provides a webpage in response to searched or requested content, and creates and provides a new webpage when new content is requested.
<123> In this case, there are many cases where content of an initial webpage is similar to that of the new webpage. That is, in the state in which a Hyper Text Markup Language (HTML) code is duplicated, the content of the same HTML code is transmitted again, so that a lot of bandwidth is wasted, thus resulting in the loss of time and expenses and making it difficult to realize a real-time interactive service with a user.
<124> In the case of such an AJAX method, part of data that was processed by the web server is processed by a client or a terminal connected to the web server, so that the amount of data exchanged between the web server and the client is decreased, a bandwidth is decreased, and the amount of entire data to be processed by the web server is decreased, thus improving responsiveness and enabling interactive data exchange.
<I25> The AJAX method is disadvantageous in that an inapplicable browser is present, the functionality of a Hypertext Transfer Protocol (HTTP) client is limited, a security problem is present, and debugging is not facilitated due to the creation of a script. However, the AJAX method is widely used because of the advantages thereof in that fast screen switching is possible in the state in which a webpage is almost fixed, and in that, since part of data processing is assigned to the client or terminal, the load of the server is decreased, data processing time is reduced, asynchronous data communication is possible, and both the bandwidth and communication time are reduced owing to a small amount of data.
<I26> The DB searching unit 132, which is configured to search the autocomplete DB for an index term that has been entered and converted by the query entry unit, individually searches the autocomplete DB using a right- hand truncation method and a left-hand truncation method, and creates the results of the search in a list.
<127>
<128> *The index term determination unit 133 checks the document frequency information of the index terms stored in the autocomplete DB 140 and determines to provide index terms, the document frequency information of which has a value of 1 or more, in a list, and thus provides the index terms as autocomplete list information.
<i29> The presentation unit 134 converts the autocomplete list information provided by the index term determination unit 133 into query information, and provides the query information through the user interface. The presentation unit 134 adjusts the ranking or sequence of queries displayed in the list using one or more selected from among the input statistical information of entered queries, the document frequency information of the queries, and the alphabetic sequence information of the queries.
<i30> The selection unit 135 provides both the entered query and the autocomplete list, converted and provided by the presentation unit 134 into queries, through the user interface, receives selected query information, and converts the query information into an index term. The service association unit 136 searches for the index term, which is received from and provided by the selection unit 135, in response to a search event signal, and provides document information retrieved in the index DB 150, wherein the search is performed by Application Programming Interface (API) calling.
<i3i> The API is a specific method preset by a computer operating system or by some other application program by which processing can be requested from the operating system or the application program.
<I32> The API is the interface of the operating system or a program and differs from a graphic user interface or an imperative interface which directly interfaces with the user.
<133> The API is the format of a language or message used when an application program communicates with a system program such as an operating system or a DB management system. The API is implemented by calling a function which provides a connection to a specific subroutine so as to execute the subroutine in the program.
<i34> That is, a single API is composed of several program modules or routines which already exist or must be connected to execute a requested task by calling a function.
<135> Generally, the server includes components for utilizing a computer, communicating with a network, executing computer operation processing, and performing various functions. The respective components are operated by the processor, memory, input/output means, etc. of the server.
<136> The server system 100 of the present invention having the above construction includes the document indexing server 120 for receiving registration of document information, indexing the document information, and constructing the autocomplete DB 140, the autocomplete provision server 130 for converting a query entered by the user into an index term, searching the autocomplete DB 140 for index terms, including the index term, and providing the index terms in a list converted into queries, the document collection unit 110 for collecting document information, the autocomplete DB 140 for recording together index terms and type information thereof, and the index DB 150 for recording information about the index terms.
<137> The document indexing server 120 includes the document registration unit 121 for receiving registration of document information, the document indexing unit 123 for extracting index terms from the document information and indexing the index terms, the DB generation unit 124, and a document editing unit 122.
<!38> The document registration unit 121 receives registration of document information through the document collection unit 110 including a document register, a knowledge management system, a document collector, etc. The registered document information includes all types of content, such as webpage document information, text document information, format document information, image document information, and video document information.
<139> The document indexing unit 123 extracts index terms (queries) that are detected through an indexing method selected from indexing based on morpheme analysis and N-gram indexing, and stores the extracted index terms in the index DB 150.
<i4o> A method in which the document indexing unit 123 extracts index terms is configured to perform an additional information operation such as by extracting a specific index term from the registered document information, or by extracting specific information through text processing.
<i4i> In this case, the additional information, extracted from the document registration unit 121, is added to the index DB 150, and unnecessary index terms are eliminated in advance using a stopword dictionary or the like.
<I42> The DB generation unit 124 extracts type-based index term information, provided in an autocomplete form, from the index term information stored in the index DB 150, records the extracted type-based index term information in the autocomplete DB 140 together with the index term information, and newly calculates information about the document frequency of each index term.
<i43> The document frequency information is obtained by recording information about the number of times that each relevant index term appears in a document, and is configured such that, when the value of a document frequency is 0, information about a corresponding index term is excluded from targets to be presented in an autocomplete list. <I44> The document editing unit 122 revises or deletes previously registered document information. As the document information is revised or deleted, the index terms in relevant document information and the document frequencies thereof must be changed, thus influencing the DB generation unit 124.
<145> The present invention is configured to search the document information, which has been input to and registered in the system, for index terms, to extract the index terms, and to manage the extracted index terms by separately indicating the extracted index terms in various index terms, which have been previously recorded in an index term dictionary managed by the system.
<|46> Although there is a slight difference, a query and an index term are used as terms having the same meaning. That is, the term "query" is used through an interface with the user, and is displayed to allow the user to input or select the query. The query is entered to the system and is converted into an index term, which is converted into a query and is then presented or output .
<147> Further, when, for example, specific thesis information is input to the system, the five highest index terms are detected among the thesis information, and are extracted as representative index terms.
<i48> For the five highest index terms which have been retrieved, extracted and allocated, document frequency fields are provided in the index term dictionary of the system, and the document frequency of a selected index term is increased.
<I49> The present invention having the above construction is advantageous in that, when new document information is added, the extraction of index terms is performed, and the document frequencies of respective index terms are automatically accumulatively calculated and changed in the index term dictionary, thus enabling the autocomplete list to be updated in real time.
<15O> For example, when specific document information is deleted, the document frequencies of relevant index terms in corresponding document frequency fields are decreased by 1. Through this method, it is possible to rapidly cope in real time with the case where specific document information is added or deleted, and this operation will be described in detail later with reference to FIG. 7.
<151> The index DB in a search engine provided in the system is provided with an index term dictionary, a biographical dictionary, etc. The biographical dictionary is composed of the names of persons directly received from the URI server through a web service. That is, the index DB adds information about authors (persons) of service target document information, such as thesis information, in real time, without holding a list of index terms acquired from corpus or the like.
<I52> However, similar to the index term dictionary, since document frequency information obtained in real time is maintained, a method of providing a list in an autocomplete form and coping with the addition or deletion of document information is the same as that of the index term dictionary.
<I53> FIG. 7 is a diagram showing the state in which document frequencies are updated due to the addition or deletion of document information.
<I54> Referring to FIG. 7, when first document information is registered, five index terms are extracted and are additionally recorded both in the index DB 150 and in the autocomplete DB 140. When the index terms are type- based information formed in an autocomplete form, they are pieces of information extracted from one piece of document information, and thus the document frequencies of the index terms are set to '1' respectively.
<155> In FIG. 7, when second document information is registered, five index terms are extracted. Since "OWL" and "Semantic Annotation" are index terms previously extracted from the first document information, the document frequencies thereof are 2. Since the remaining three index terms are initially extracted, the document frequencies thereof is 1.
<156> When the first document information is deleted in FIG. 7, document frequencies of five index terms from the first document information are decreased by 1. <157> In the prior art, a method of presenting only index terms, for which the results of a search are generated, in an autocomplete form, is introduced. Here, a simple method of attaching a flag, required to indicate the success or failure of a search for a given index term, to the index term when the user's query has succeeded, is used.
<158> In the prior art, there is a problem in that, even if document information including an index term causing the failure of a search is added subsequently, the index term is not presented in an autocomplete list until the user personally enters a relevant query and attempts a search.
<I59> The technical spirit of the present invention is to solve this problem and to provide an autocomplete list capable of guaranteeing the presence of search results in real time without causing a temporal difference by immediately adjusting the document frequency information of autocomplete target queries at the time of registration or editing of document information.
<16O> The conventional OntoFrame provides an entity-centric integrated search, and such an entity is a subset of queries.
<i6i> Since the entered query is matched to include the specific type of query, such as a person and a topic, a query page is constructed when the entered query is searched for, otherwise a typical search result page is constructed.
<|62> The checking of the types of respective queries is performed by calling a search engine, the entered query is converted into an index term, and both an index term dictionary and a biographical dictionary provided in the search engine are referred to together for an index term. At this time, type-based information indicating whether an entered query is of an area type, a person type, or another type is also searched for while an index term and a person name having a document frequency of 1 or more are searched for.
<I63> The following paragraph is a search engine program according to an embodiment of the present invention.
<i64> API: SearchResultList getAutoCompleteCString SearchTerm) <165> Example of calling: getAutoCompleteC "sem" )
<I66> Examples of results :
<I67> [Sem Borst , Person]
<I68> [Semantic Annotation, Topic]
<I69> [Semantic Web, Topic]
<i7o> [Semih Ergintav, Person]
<i7i> [Semyon M. Meerkov, Person]
<172> The autocomplete interface of the present invention may recognize the types of index terms on the basis of the results of the search and may display the types of index terms using icons, colors, tree classification, etc.
<173> The autocomplete server 130 includes the query entry unit 131 for receiving a query from the user and converting the query into an index term, the DB searching unit 132 for searching the autocomplete DB 140, including index terms, for corresponding index terms, the index term determination unit 133 for checking document frequency information recorded and stored in the autocomplete DB 140 and determining whether to provide an autocomplete list, the presentation unit 134 for converting the autocomplete list into queries, and providing the queries to a search interface through the User Interface(UI), the selection unit 135 for providing the UI to allow a specific query to be selected from the autocomplete list which includes the presented queries, and converting a selected query into an index term, and the service association unit 136 for providing document information retrieved by searching for the selected index term in response to an event signal attributable to the manipulation of a search button or a keyboard through a search service.
<I74> The query entry unit 131 receives a query through a search box provided by the user interface and converts the query into an index term. The index term calls the DB searching unit 132 using an AJAX method whenever one character based on a phoneme, a syllable, a word phrase and a word is entered. <I75> The DB searching unit 132 searches the autocomplete DB 140 for the input index term, and thus determines whether index terms, including the index term, are present.
<176> The search is performed using a right-hand truncation method of matching the front part of index terms in such a way that, for example, for an index term
Figure imgf000032_0001
js retrieved, and using a left-hand truncation method of matching the rear part of index terms in such a way that, for the index term
Figure imgf000032_0002
is retrieved.
<I77> The index term determination unit 133 determines to present index terms, having a document frequency of 1 or more among index terms which are determined to include the entered query through the DB searching unit 132 or which are retrieved through matching, in an autocomplete list.
<I78> The fact that the document frequency is 1 or more means that document information including a relevant index term (query) is present in the search system.
<179> The presentation unit 134 converts the index terms obtained by the index term determination unit 133 into queries, and presents the queries in the autocomplete list. The ranking or sequence of queries arranged and displayed in the autocomplete list is adjusted using one or more selected from among the input statistical information of queries entered by users, the document frequency information of the queries, and the alphabetic sequence information of the queries, according to a typical automatic completion method. Alternatively, the ranking of queries in the autocomplete list is adjusted using the document frequency information or alphabetic sequence information of relevant queries included in the autocomplete DB 140.
<I8O> The method of adjusting or determining the ranking (sequence) can be applied to the case where a new method is developed.
<i8i> When a specific query is designated and selected from the autocomplete list, presented as queries, on the terminal unit 300 through the UI, the selection unit 135 receives the selected query and converts the query into an index term. The selection from the presented query list is performed by designating a specific query using up/down buttons of a keyboard provided in the terminal unit 300, or a mouse, and by selecting one query from the autocomplete list.
<i82> The selected query (index term) information is transmitted to the service association unit 136 together with relevant event signal information.
<I83> The service association unit 136 processes a service of receiving the selected query, searching the index DB 150 for index information by calling an API in response to an event signal attributable to the manipulation of a keyboard such as a search button or an enter key, and then providing document information matching the query.
<184> FIG. 6 is a flowchart showing a query type-based automatic completion method capable of guaranteeing the presence of search results according to an embodiment of the present invention.
<I85> Referring to FIG. 6, the query type-based automatic completion method capable of guaranteeing the presence of search results according to an embodiment of the present invention includes a revision process! a determination process; and an output process.
<I86> The revision process is a process of collecting and registering document information, extracting index terms, storing the index terms in the index DB, generating index terms to be provided in an autocomplete form, storing the index terms in the autocomplete DB, and revising or deleting the registered document information. The revision process includes a step SlOO of collecting document information using the document collection unit, a step of recording and storing the collected document information in a separate document information DB (not shown) using the document registration unit, a step SIlO of extracting additional information including the index terms from the registered document information using the document indexing unit and storing the additional information in the index DB, a step S120 of generating index terms to be provided in an autocomplete form from the information stored in the index DB using the DB generation unit, and storing the index terms in the autocomplete DB, and a step S130 of revising or deleting the registered document information using the document editing unit.
<I87> The methods of extracting index terms include indexing based on morpheme analysis and N-gram indexing. One selected from the indexing based on morpheme analysis and the N-gram indexing is used.
<188> When the index term extraction method is not used, a text processing method may be used.
<189> The determination process is performed to search the autocomplete DB 140 for an index term input through the user interface, and determine index terms having a document frequency of 1 or more to be search index terms. The input index term calls the DB searching unit 132 using the AJAX method whenever the input index term is input as a unit letter designated by any one selected from among a phoneme, a syllable, a word phrase and a word at steps S140 to S160.
<19O> The output process is performed to convert the determined index terms into queries, present the queries in an autocomplete list, search for selected index term information, and output document information matching the index term information at steps S170 to S190. The ranking (sequence) of the autocomplete list of the queries is adjusted by one or more selected from the input statistical information of queries entered by users, and the document frequency information and the alphabetic sequence information of the autocomplete DB.
<i9i> The present invention having the above construction is completely performed according to the above-description of the system.
<192> The method of the present invention can be implemented in the form of computer-readable code in a computer-readable storage medium. The computer- readable storage medium is a recording device in which data readable by a computer system is stored. The storage medium may be, for example, Read-Only Memory (ROM), Random Access Memory (RAM), cache memory, a hard disc, an optical disc, a floppy disc, magnetic tape, etc. Further, the storage medium may be provided in carrier wave form, and may include, for example, the case provided through the Internet. Further, the computer-readable storage medium may be distributed to computer systems connected through a network and computer-readable code may be stored and executed in the computer systems in a distributed manner.
<193> Although the preferred embodiments of the present invention have been disclosed in detail, those skilled in the art will appreciate that various modifications and changes are possible, without departing from the scope and spirit of the invention, and belong to the scope of the accompanying claims.
<I94>
[Industrial Applicability]
<195> The present invention relates to a query search system, and is advantageous for industrial applications in that it presents retrieved queries in an autocomplete form only when the presence of search results is guaranteed, thus not only increasing the reliability of results of the search, but also preventing the failure of a search and enabling a fast search, and is advantageous for convenient use in that the document frequency of each query based on the addition or deletion of document information is considered in real time, and type-based information is grouped, thus providing the results of a search as improved results.

Claims

[CLAIMS] [Claim 1]
<I97> A query type-based automatic completion system capable of guaranteeing presence of search results, comprising:
<I98> a document indexing server for receiving registration of document information, extracting index term information and document frequency information from the registered document information, recording the index term information and the document frequency information, and generating autocomplete list information from the extracted index term information;
<199> an autocomplete Database (DB) for recording the autocomplete list information, generated by the document indexing server, in association with the document frequency information; and
<200> an autocomplete server for searching the autocomplete DB, extracting autocomplete list information including the index term information from the autocomplete DB, converting the autocomplete list information into queries, providing the queries through a user interface, converting a query, which is selected and entered, into an index term, searching for document information including the index term, and providing the document information through the user interface. [Claim 2]
<20i> The query type-based automatic completion system according to claim 1, further comprising:
<202> a document collection unit for registering collected document information in the document indexing server; and
<203> an index DB for recording the index term information provided by the document indexing server and providing the index term information to the autocomplete server. [Claim 3]
<204> The query type-based automatic completion system according to claim 2, wherein the document collection unit collects one or more selected from among various types of content document information, including webpage document information, format document information, image document information, video document information, text document information, and multimedia document informat ion. [Claim 4]
<2O5> The query type-based automatic completion system according to claim 2, wherein the document indexing server comprises '•
<206> a document registration unit for registering the document information collected by the document collection unit;
<207> a document indexing unit for extracting index terms from the document information registered by the document registration unit and storing the index terms in the index DB; and
<208> a DB generation unit for searching the index terms stored in the index DB for index term information provided in an autocomplete list, recording retrieved index term information in the autocomplete DB, and updating and managing the document frequency information. [Claim 5]
<209> The query type-based automatic completion system according to claim 4, wherein the document indexing unit is configured to extract additional information, including the index term information, through one or more selected from a scheme for extracting the index terms from the document information registered in the document registration unit and a scheme for extracting index term information designated by text processing. [Claim 61
<2io> The query type-based automatic completion system according to claim 4, wherein the document indexing unit extracts the index terms, using any one method selected from indexing based on morpheme analysis and N-gram indexing, from the document information registered in the document registration unit, and storing the index terms in the index DB. [Claim 7]
<2ii> The query type-based automatic completion system according to claim 6, wherein the document indexing unit records and stores additional information, including the extracted index terms, in the index DB in association with corresponding document information.
[Claim 8] <2i2> The query type-based automatic completion system according to claim 4, wherein the document indexing server further comprises a document editing unit for revising or deleting the document information registered in the document registration unit .
[Claim 9] <2i3> The query type-based automatic completion system according to claim 4, wherein the document indexing unit is configured to eliminate unnecessary index terms, included in a stopword dictionary, from the index terms extracted from the document information registered in the document registration unit .
[Claim 10] <2i4> The query type-based automatic completion system according to claim 4, wherein the DB generation unit is configured to accumulatively calculate document frequencies of respective pieces of document information for the autocomplete list of the autocomplete DB, and record information about the document frequencies.
[Claim 11] <2i5> The query type-based automatic completion system according to claim 10, wherein the DB generation unit is configured to exclude an index term having a document frequency of 0 from the autocomplete list for which the document frequencies are accumulatively calculated.
[Claim 12] <216> The query type-based automatic completion system according to claim 1, wherein the autocomplete server comprises: <2i7> a query entry unit for receiving the query to be searched for through the user interface and converting the query into the index term! <2i8> a DB searching unit for searching the autocomplete DB for the index term provided by the query entry unit; <2I9> an index term determination unit for checking document frequency information of the index terms stored in the autocomplete DB, determining appropriate index terms to be autocomplete list information, and providing the document frequency information;
<220> a presentation unit for converting the autocomplete list information provided by the index term determination unit into queries and providing the queries;
<22i> a selection unit for providing both the entered query and the queries of the autocomplete list, provided by the presentation unit, through the user interface, receiving selected query information together with an event signal, and converting the selected query information into the index term; and
<222> a service association unit for searching for the document information in response to the index term information received from the selection unit and a search event signal, and providing retrieved document information. [Claim 13]
<223> The query type-based automatic completion system according to claim 12, wherein the query entry unit is configured such that the query is input as a unit letter designated by any one selected from a phoneme, a syllable, a word phrase and a word. [Claim 14]
<224> The query type-based automatic completion system according to claim 13, wherein the query entry unit calls the autocomplete DB using an Asynchronous JavaScript and XML (AJAX) method and searches the autocomplete DB for an index term whenever a query is entered. [Claim 15]
<225> The query type-based automatic completion system according to claim 12, wherein the query entry unit is configured to receive the query information through the User Interface (UI). [Claim 16]
<226> The query type-based automatic completion system according to claim 12, wherein the DB searching unit is configured to individually search for the index term using a right-hand truncation method and a left-hand truncation method, and generate results of the search in the autocomplete list. [Claim 17]
<227> The query type-based automatic completion system according to claim 12, wherein the index term determination unit is configured to determine to include index terms, having a document frequency of 1 or more, in the autocomplete list and to provide the index terms. [Claim 18]
<228> The query type-based automatic completion system according to claim 12, wherein the presentation unit is configured to adjust ranking of queries in the autocomplete list using one or more selected from among input statistical information of the queries, document frequency information of the queries, and alphabetic sequence information of the queries. [Claim 19]
<229> The query type-based automatic completion system according to claim 12, wherein the service association unit is configured to search for document information matching the index term information through calling of an Application Programming Interface (API). [Claim 20]
<230> A query type-based automatic completion method capable of guaranteeing presence of search results, comprising the steps of:
<23i> (a) collecting and registering document information, extracting index term information and document frequency information from the registered document information, storing the extracted information in an index Database (DB), generating autocomplete list information for the index term information, storing the autocomplete list information in an autocomplete DB, and revising registered document information;
<232> (b) converting a query, entered from the autocomplete DB through a user interface, into an index term, and including index terms, which are searched for at a document frequency of 1 or more, in the autocomplete list information; and
<233> (c) converting the index terms of the autocomplete list into queries and presenting the queries through the user interface, converting a query, which is selected and entered from an outside through the user interface, into an index term, and outputting document information retrieved by searching for the index term. [Claim 21]
<234> The query type-based automatic completion method according to claim 20, wherein step (a) comprises the steps of:
<235> collecting the document information using a document collection unit;
<236> registering the collected document information using a document registration unit ;
<237> extracting index terms from the registered document information using a document indexing unit, and storing the index terms in an index DB;
<238> extracting index terms to be provided in the autocomplete list from the index term information stored in the index DB using a DB generation unit, and storing the index terms in the autocomplete DB; and
<239> revising or deleting the registered document information using a document editing unit. [Claim 22]
<24o> The query type-based automatic completion method according to claim 21, wherein the extraction of the index terms is performed using any one method selected from indexing based on morpheme analysis and N-gram indexing. [Claim 23]
<24i> The query type-based automatic completion method according to claim 21, wherein step (a) is performed such that the index terms stored in the index DB include additional information, and unnecessary index terms are eliminated using a stopword dictionary, and such that updated document frequency information of respective pieces of document information is stored in the autocomplete DB. [Claim 24] <242> The query type-based automatic completion method according to claim 20, wherein the query entered at step (b) is input as a unit letter designated by any one selected from a phoneme, a syllable, a word phrase and a word. [Claim 25]
<243> The query type-based automatic completion method according to claim 24, wherein the entered query is converted into an index term, and is searched for by calling the autocomplete DB using an Asynchronous JavaScript and XML (AJAX) method. [Claim 26]
<244> A storage medium for storing a program source for a query type-based automatic completion method capable of guaranteeing presence of search results, comprising:
<245> a process of collecting and registering document information, extracting index term information and document frequency information from the registered document information, storing the extracted information in an index Database (DB), generating autocomplete list information for the index term information, storing the autocomplete list information in an autocomplete DB, and revising registered document information;
<246> a process of converting a query, entered from the autocomplete DB through a user interface, into an index term, and including index terms, which are searched for at a document frequency of 1 or more, in the autocomplete list information; and
<247> a process of converting the index terms of the autocomplete list into queries and presenting the queries through the user interface, converting a query, which is selected and entered from an outside through the user interface, into an index term, and output ting retrieved document information retrieved by searching for the index term. [Claim 27]
<248> The storage medium according to claim 26, wherein the query autocomplete list process is a process of adjusting ranking of queries in the autocomplete list using one or more selected from among input statistical information of the queries, document frequency information of the queries, and alphabetic sequence information of the queries. [Claim 281
<249> The storage medium according to claim 26, wherein the entered query process is a process of calling the autocomplete DB using an AJAX method and searching the autocomplete DB whenever the query is input as a unit letter designated by any one selected from among a phoneme, a syllable, a word phrase and a word. [Claim 29]
<250> A query type-based automatic completion system capable of guaranteeing presence of search results, comprising:
<25I> a server system for receiving registration of document information, extracting index terms and document frequency information from the registered document information to construct an autocomplete DB, converting a query, which is entered from an outside through a user interface, into an index term, providing an autocomplete list of index terms, including the input index term, from the autocomplete DB through the user interface, converting a query, which is selected and entered through the user interface, into an index term, and providing document information including the index term through the user interface;
<252> a public communication network connected to the server system and configured to transmit or receive the query and retrieved document information through a communication path selected from a wired communication path and a wireless communication path; and
<253> a terminal unit implemented as a computer connected to the public communication network, and configured to receive the query to be searched for through the user interface, transmit the query to the server system, display the autocomplete list of the query provided by the server system on the user interface, receive a single selected query together with an event signal, provide the query and the event signal to the server system, and display the document information retrieved and provided by the server system. [Claim 30] <254> The query type-based automatic completion system according to claim
29, wherein the server system comprises: <255> a document indexing server for receiving registration of the document information, extracting the index terms and the document frequency information, and constructing an autocomplete Database (DB); <256> an autocomplete server for receiving the query from an outside through the user interface, converting the query into an index term, extracting information about an index term list, including the index term, converting the index term list information into queries, providing the queries through the user interface, converting a query, which is selected and entered, into an index term, and providing retrieved document information; <257> the autocomplete DB for storing the autocomplete list information generated by the document indexing server in association with the document frequency information; <258> a document collection unit for accessing the document indexing server and registering the collected document information; and <259> an index DB for recording the index term information provided by the document indexing server, and providing the index term information through a search performed by the autocomplete server.
[Claim 31] <260> The query type-based automatic completion system according to claim
29, wherein the public communication network comprises: <26i> a wireless communication network for enabling the server system and the terminal unit to be connected to each other through a wireless communication path and transmitting data signals; and <262> a wired communication network for enabling the server system and the terminal unit to be connected to each other through a wired communication path and transmitting data signals.
PCT/KR2008/006551 2008-10-01 2008-11-07 System and method of auto-complete with query type under guarantee of search results and storage media having program source thereof WO2010038923A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR10-2008-0096831 2008-10-01
KR20080096831 2008-10-01
KR10-2008-0105464 2008-10-27
KR1020080105464A KR101051422B1 (en) 2008-10-01 2008-10-27 Record media recording the automatic completion system, method and program for each query type with guaranteed search results

Publications (1)

Publication Number Publication Date
WO2010038923A1 true WO2010038923A1 (en) 2010-04-08

Family

ID=42073670

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2008/006551 WO2010038923A1 (en) 2008-10-01 2008-11-07 System and method of auto-complete with query type under guarantee of search results and storage media having program source thereof

Country Status (1)

Country Link
WO (1) WO2010038923A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104468113A (en) * 2013-09-16 2015-03-25 安讯士有限公司 Distribution of user credentials
WO2018156351A1 (en) * 2017-02-24 2018-08-30 Microsoft Technology Licensing, Llc Corpus specific generative query completion assistant
US10127314B2 (en) 2012-03-21 2018-11-13 Apple Inc. Systems and methods for optimizing search engine performance

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20070039771A (en) * 2005-10-10 2007-04-13 엔에이치엔(주) Method and system for recommending query based search index
KR20070098252A (en) * 2006-03-31 2007-10-05 엔에이치엔(주) System and method for providing automatically completed recommended word by correcting and displaying the word
KR20070101974A (en) * 2006-04-13 2007-10-18 엘지전자 주식회사 Portable terminal and method for processing message in portable terminal
KR20070111275A (en) * 2006-05-17 2007-11-21 엔에이치엔(주) System and method for providing search result according to automatically completed an initial sound and the automatically completed an initial sound

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20070039771A (en) * 2005-10-10 2007-04-13 엔에이치엔(주) Method and system for recommending query based search index
KR20070098252A (en) * 2006-03-31 2007-10-05 엔에이치엔(주) System and method for providing automatically completed recommended word by correcting and displaying the word
KR20070101974A (en) * 2006-04-13 2007-10-18 엘지전자 주식회사 Portable terminal and method for processing message in portable terminal
KR20070111275A (en) * 2006-05-17 2007-11-21 엔에이치엔(주) System and method for providing search result according to automatically completed an initial sound and the automatically completed an initial sound

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10127314B2 (en) 2012-03-21 2018-11-13 Apple Inc. Systems and methods for optimizing search engine performance
CN104468113A (en) * 2013-09-16 2015-03-25 安讯士有限公司 Distribution of user credentials
CN104468113B (en) * 2013-09-16 2019-09-27 安讯士有限公司 Device and method for distributed users voucher
WO2018156351A1 (en) * 2017-02-24 2018-08-30 Microsoft Technology Licensing, Llc Corpus specific generative query completion assistant
US11573989B2 (en) 2017-02-24 2023-02-07 Microsoft Technology Licensing, Llc Corpus specific generative query completion assistant

Similar Documents

Publication Publication Date Title
US10657966B2 (en) Better resolution when referencing to concepts
CN107992585B (en) Universal label mining method, device, server and medium
US9009025B1 (en) Context-based utterance recognition
CN101520786B (en) Method for realizing input method dictionary and input method system
US20180181560A1 (en) Information input method and device
US8429099B1 (en) Dynamic gazetteers for entity recognition and fact association
KR101751113B1 (en) Method for dialog management based on multi-user using memory capacity and apparatus for performing the method
US8639687B2 (en) User-customized content providing device, method and recorded medium
JP5161658B2 (en) Keyword input support device, keyword input support method, and program
US11736587B2 (en) System and method for integrating message content into a target data processing device
CN110069698B (en) Information pushing method and device
JP2008529179A (en) Method and apparatus for accessing mobile information in natural language
US9940355B2 (en) Providing answers to questions having both rankable and probabilistic components
JP2015525929A (en) Weight-based stemming to improve search quality
US20190026361A1 (en) Method and apparatus for providing information by using degree of association between reserved word and attribute language
TW202334839A (en) Contextual clarification and disambiguation for question answering processes
US9292537B1 (en) Autocompletion of filename based on text in a file to be saved
US20200043074A1 (en) Apparatus and method of recommending items based on areas
JP2019145102A (en) Dialog management server, dialog management method, and program
WO2010038923A1 (en) System and method of auto-complete with query type under guarantee of search results and storage media having program source thereof
KR20020022977A (en) Internet resource retrieval and browsing method based on expanded web site map and expanded natural domain names assigned to all web resources
KR20040048548A (en) Method and System for Searching User-oriented Data by using Intelligent Database and Search Editing Program
KR101051422B1 (en) Record media recording the automatic completion system, method and program for each query type with guaranteed search results
JP4496797B2 (en) Document management apparatus and method
JP4842921B2 (en) Search system and method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08877182

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08877182

Country of ref document: EP

Kind code of ref document: A1