WO2018186599A1

WO2018186599A1 - Automatic extraction and structurization, by subject, of sub-topic of query

Info

Publication number: WO2018186599A1
Application number: PCT/KR2018/002834
Authority: WO
Inventors: 민혜진; 김진홍; 박찬훈; 김광현
Original assignee: 네이버 주식회사
Priority date: 2017-04-06
Filing date: 2018-03-09
Publication date: 2018-10-11

Abstract

A technique for automatically extracting and structuring a sub-topic suitable for a query is disclosed. A topic structuring method can comprise the steps of: extracting, by subject, a sub-topic related to a subject; generating a topic tree for the sub-topic by using hierarchical information of the subject; and hierarchically providing the sub-topic as a search word related to a query according to a topic tree of a subject to which the query belongs, when the query for a search is provided.

Description

Subtopic Automatic Extraction and Structure of Query by Subject

The description below relates to a technique for automatically extracting and structuring subtopics suitable for queries.

When providing a search result of a query input by a user, the search system provides various functions to help the user further search in addition to documents matching the search condition. Representatives that help users navigate further include related search terms, related tags, and search term autocompletion. They are based on identifying queries and frequently appearing search terms or tags based on the co-occurrence of word pairs.

For example, Korean Patent Application Publication No. 10-2012-0096806 (published August 31, 2012) discloses a search term recommendation system and a search term recommendation method that select a search term based on location information of a user terminal and provide the search term to a user. Is disclosed.

On the other hand, the shopping search system may provide a function to help the product search by using hierarchical information such as the brand, color, and price of the product in case of a shopping intention.

It is necessary to provide a topic with a clear topic and structure the topic for efficient information retrieval.

When a specific topic is given, only subtopics suitable for the topic are extracted, and the hierarchical information automatically constructed for the specific topic is used to provide a structured method according to the degree (segmentation) desired by the user.

A computer-implemented topic structuring method, comprising: extracting a subtopic associated with the topic for each topic; Generating a topic tree for the subtopic using hierarchical information of the subject; And providing a sub-topic hierarchically as an associated search word for the query according to a topic tree of a topic to which the query belongs, when a query for searching is given.

According to an aspect, the extracting may include extracting the subtopic by analyzing words related to the core object that determines the subject.

According to another aspect, the method may further include filtering the subtopic according to at least one of a document appearance frequency and a retrieval frequency.

According to another aspect, the method may further include clustering the subtopics according to a synonym or substring (substring) relationship to select a representative of each cluster.

According to another aspect, the generating may include generating the topic tree by labeling the subtopic with each class name of the hierarchical information.

According to another aspect, the generating may include: extracting a similar word from word embedding data for the subtopic; Clustering the similar words according to a synonym or substring (substring) relationship; And labeling the clustered words by mapping them to respective classes in linguistic taxonomy.

According to another aspect, the method may further include rebalancing the topic tree by reducing at least one of breadth and depth of the topic tree.

According to another aspect, the providing may include at least one condition of a subject score indicating a correlation between the query and the subtopic, the number of documents corresponding to the subtopic, and whether or not the topic is correct for the query. And filtering the subtopics accordingly.

A computer-implemented search result providing method comprising: providing a search result corresponding to a query given a query for searching; Providing a subtopic associated with the topic in a hierarchical form with a plurality of depths as an associated search word for the query according to the hierarchical information of the subject to which the query belongs; And providing a search result corresponding to the query including the selected search word when at least one search word is selected from the subtopics.

A computer program recorded on a computer readable recording medium in combination with a computer system to execute a topic structuring method, the topic structuring method comprising: extracting subtopics associated with the topic on a topic-by-topic basis; Generating a topic tree for the subtopic using hierarchical information of the subject; And if the query for search is given, providing the subtopics hierarchically as an associated search word for the query according to the topic tree of the topic to which the query belongs.

A topic structured system implemented in a computer, comprising: at least one processor configured to execute a computer readable instruction, the at least one processor comprising: an extracting unit configured to extract subtopics related to the topic for each topic; A generator configured to generate a topic tree for the subtopic using hierarchical information of the subject; And a providing unit providing the subtopics hierarchically as a related search word for the query according to a topic tree of a topic to which the query belongs, when a query for searching is given.

According to embodiments of the present invention, when a specific topic is given, only the subtopics suitable for the topic are extracted, and hierarchical information is automatically constructed based on the specific topic, and then the subject is appropriately structured and presented according to the degree (segmentation) desired by the user. And attributes specific to the query that are relevant to the query and contribute to helping the user to efficiently identify and actually perform further navigation.

1 is a diagram illustrating an example of a network environment according to an embodiment of the present invention.

2 is a block diagram illustrating an internal configuration of an electronic device and a server according to an embodiment of the present invention.

3 illustrates an example of a process of layering a patterned query according to an embodiment of the present invention.

4 is a diagram illustrating an example of components that may be included in a processor of a server according to an embodiment of the present invention.

5 is a flowchart illustrating an example of a method that a server may perform according to an embodiment of the present invention.

FIG. 6 shows an example of a process of filtering and grouping subtopic candidates for queries 'Guam' and 'potato' according to an embodiment of the present invention.

7 is a flowchart illustrating an example of a process of constructing hierarchical information according to an embodiment of the present invention.

8 shows an example of hierarchical information constructed by using clustering and language taxonomy for a travel subject.

9 illustrates an example of a process of converting a topic network constructed according to an embodiment of the present invention into a tree having a depth of 2 (2-depth tree).

10 is a flowchart illustrating an example of a tree rebalancing process according to an embodiment of the present invention.

11 to 12 are diagrams illustrating examples of a tree rebalancing process according to an embodiment of the present invention.

13 to 14 illustrate examples of a search result screen in which a 2-depth topic structure is reflected according to an embodiment of the present invention.

Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

Embodiments of the present invention relate to techniques for automatically extracting and structuring subtopics suitable for queries.

Embodiments, including those specifically disclosed herein, provide topical query topics and allow for the organization of topics for efficient information retrieval, thereby providing significant improvements in terms of accuracy, efficiency, scalability, cost savings, and the like. Achieve the advantages.

1 is a diagram illustrating an example of a network environment according to an embodiment of the present invention. The network environment of FIG. 1 illustrates an example including a plurality of

electronic devices

110, 120, 130, and 140, a plurality of

servers

150 and 160, and a network 170. 1 is an example for describing the present invention, and the number of electronic devices or the number of servers is not limited as shown in FIG. 1.

The plurality of

electronic devices

110, 120, 130, and 140 may be fixed terminals or mobile terminals implemented as computer devices. Examples of the plurality of

electronic devices

110, 120, 130, and 140 include smart phones, mobile phones, tablet PCs, navigation systems, computers, notebook computers, digital broadcasting terminals, personal digital assistants (PDAs), and PMPs ( Portable Multimedia Player). For example, the first electronic device 110 may communicate with other

electronic devices

120, 130, 140 and / or the

server

150, 160 through the network 170 using a wireless or wired communication scheme.

The communication method is not limited, and may include not only a communication method using a communication network (for example, a mobile communication network, a wired internet, a wireless internet, a broadcasting network) that the network 170 may include, but also a short range wireless communication between devices. For example, the network 170 may include a personal area network (PAN), a local area network (LAN), a campus area network (CAN), a metropolitan area network (MAN), a wide area network (WAN), and a broadband network (BBN). And one or more of networks such as the Internet. The network 170 may also include any one or more of network topologies, including bus networks, star networks, ring networks, mesh networks, star-bus networks, trees, or hierarchical networks, but It is not limited.

Each of the

servers

150 and 160 communicates with the plurality of

electronic devices

110, 120, 130, and 140 through the network 170 to provide a command, code, file, content, service, or the like. It may be implemented in devices.

For example, the server 160 may provide a file for installing an application to the first electronic device 110 connected through the network 170. In this case, the first electronic device 110 may install an application using a file provided from the server 160. In addition, the server 150 is provided by accessing the server 150 under the control of an operating system (OS) included in the first electronic device 110 or at least one program (for example, a browser or the installed application). Can be provided with services or content. For example, when the first electronic device 110 transmits a service request message to the server 150 through the network 170 under the control of the application, the server 150 sends a code corresponding to the service request message to the first. The electronic device 110 may transmit the content to the electronic device 110, and the first electronic device 110 may provide content to the user by configuring and displaying a screen according to a code according to the control of the application.

2 is a block diagram illustrating an internal configuration of an electronic device and a server according to an embodiment of the present invention. 2 illustrates an internal configuration of the first electronic device 110 as an example of one electronic device and the server 150 as an example of one server. Other

electronic devices

120, 130, 140 or server 160 may also have the same or similar internal configuration.

The first electronic device 110 and the server 150 may include

memories

211 and 221,

processors

212 and 222,

communication modules

213 and 223, and input /

output interfaces

214 and 224. The

memories

211 and 221 are computer-readable recording media, and may include non-volatile permanent storage devices such as random access memory (RAM), read only memory (ROM), and disk drives. In addition, the

memory

211 and 221 may store an operating system or at least one program code (for example, a code for an application installed in the first electronic device 110 and driven). These software components may be loaded from a computer readable recording medium separate from the

memories

211 and 221. Such a separate computer-readable recording medium may include a computer-readable recording medium such as a floppy drive, a disk, a tape, a DVD / CD-ROM drive, a memory card, and the like. In other embodiments, software components may be loaded into the

memory

211, 221 through the

communication module

213, 223 rather than a computer readable recording medium. For example, the at least one program is a program installed by files provided by the file distribution system (for example, the server 160 described above) through the network 170 for distributing installation files of developers or applications (for example, It can be loaded into the memory (211, 221) based on the above-described application).

Processors

212 and 222 may be configured to process instructions of a computer program by performing basic arithmetic, logic, and input / output operations. Instructions may be provided to the

processors

212, 222 by the

memory

211, 221 or the

communication modules

213, 223. For example, the

processors

212 and 222 may be configured to execute a command received according to a program code stored in a recording device such as the

memory

211 and 221.

The

communication modules

213 and 223 may provide a function for the first electronic device 110 and the server 150 to communicate with each other through the network 170. The other electronic device (eg, the second electronic device 120) may be provided. ) Or other server (eg, server 160). For example, a request (eg, a search request) generated by the processor 212 of the first electronic device 110 according to a program code stored in a recording device such as the memory 211 may be controlled according to the control of the communication module 213. It may be delivered to the server 150 through 170. Conversely, control signals, commands, contents, files, and the like provided according to the control of the processor 222 of the server 150 are transmitted to the communication module of the first electronic device 110 via the communication module 223 and the network 170. It may be received by the first electronic device 110 through 213. For example, a control signal or command of the server 150 received through the communication module 213 may be transmitted to the processor 212 or the memory 211, and the content or file may be transmitted to the first electronic device 110. May be stored as a storage medium that may further include.

The input / output interface 214 may be a means for interfacing with the input / output device 215. For example, the input device may include a device such as a keyboard or mouse, and the output device may include a device such as a display for displaying a communication session of an application. As another example, the input / output interface 214 may be a means for interfacing with a device in which functions for input and output are integrated into one, such as a touch screen. More specifically, the processor 212 of the first electronic device 110 uses data provided by the server 150 or the second electronic device 120 in processing a command of a computer program loaded in the memory 211. The service screen or contents configured to be displayed on the display may be displayed through the input / output interface 214. Similarly, the input / output interface 224 may output information configured using data provided by the server 150 when the processor 222 of the server 150 processes a command of a computer program loaded in the memory 221. have.

In addition, in other embodiments, the first electronic device 110 and the server 150 may include more components than those of FIG. 2. However, it is not necessary to clearly show most of the prior art components. For example, the first electronic device 110 may be implemented to include at least a part of the above-described input / output device 215 or may be other such as a transceiver, a Global Positioning System (GPS) module, a camera, various sensors, a database, or the like. It may further include components. More specifically, when the first electronic device 110 is a smartphone, an acceleration sensor, a gyro sensor, a camera, various physical buttons, a button using a touch panel, an input / output port, and vibration for a smartphone generally include It can be appreciated that various components such as a vibrator may be implemented to be further included in the first electronic device 110.

Hereinafter, a specific embodiment of a topic structuring method and a topic structuring system for automatically extracting and structuring subtopics of a topic-specific query will be described.

Topic structuring (grouping and hierarchical) is required to provide as many search results as possible in one query and to enable efficient information retrieval.

When a search system provides a search result of a query input by a user, the search system provides various functions to help the user further search in addition to the documents matching the search condition. Typical examples include related search terms, related tags, and automatic completion of search terms. These features are useful in the following ways.

First, it recommends a query if the user does not know the name of the query related to the information he wants to know.

Second, the user can indirectly grasp the detailed attributes / subtopics related to the query.

However, the relationship "association" has the following limitations.

First, the relationship "association" is ambiguous, so the specific relationship with the query (eg, parent / child concept, synonym or sibling concept) cannot be known. Therefore, when the number of search terms or tags provided increases, it is difficult to arrange the structure structurally, and thus, the number provided to the user can be reduced from the viewpoint of user use.

Second, if the query has significance, the associated search terms or tags are provided unorganized according to each meaning, so that it is not very helpful for further searching.

Meanwhile, in the case of a query intended to be shopping, hierarchical information provided to help a user search for a desired product exists. Although the product's brand, color, price, etc. are very systematic, it helps to search quickly and efficiently. However, this information is manually entered by each seller, which limits its scalability and also applies only to shopping queries. There are limitations.

The present invention has a function to enable the user to efficiently grasp the detailed attributes / subtopics related to the query while helping the user further search, and propose an automatic subtopic extraction and structured technique that can solve the above-mentioned limitations. do.

The key contents of the topic structuring system according to the present invention are as follows.

(1) Topic structuring system patterns main queries by topic into "main object + subtopic". In this case, the main object refers to a core object that determines a subject, and the subtopic includes at least one of a sub object and an attribute. The sub object refers to an object that embodies the subject, and the attribute refers to a word representing an attribute of the subject such as suffix or prefix.

(2) The topic structuring system hierarchies the patterned queries according to the semantic relationship of sub-objects and attributes. 3 illustrates an example of a process of layering a patterned query according to an embodiment of the present invention. As shown in FIG. 3, the main object MainObj and the subtopics SubObj and Suffix may be layered based on a query of a specific subject, that is, the main object MainObj.

(3) The topic structuring system can provide hierarchical queries and subtopics with the search results (documents) to the user. At this time, the topic structuring system can contribute to help the user to efficiently identify detailed subtopics that are suitable for the topic and related to the query and to assist the actual additional search by structuring and presenting the subtopic according to the degree (segmentation) desired by the user. have.

4 is a diagram illustrating an example of components that may be included in a processor of a server according to an embodiment of the present invention, and FIG. 5 is an example of a method that may be performed by a server according to an embodiment of the present invention. It is a flowchart shown.

As shown in FIG. 4, the processor 222 of the server 150 may include the extractor 410, the refiner 420, the generator 430, the adjuster 440, and the provider 450 as components. It may include. The processor 222 and the components of the processor 222 may control the server 150 to perform the steps S510 to S550 included in the method of FIG. 5. In this case, the processor 222 and the components of the processor 222 may be implemented to execute instructions according to code of an operating system included in the memory 221 and code of at least one program. In addition, the components of the processor 222 may be representations of different functions performed by the processor 222 according to a control command provided by an operating system or at least one program. For example, the extractor 410 may be used as a functional expression for the processor 222 to extract the main object and the subtopic according to the above-described control command.

The components of the processor 222 will be described first as follows.

The MainObj + Suffix extraction module of the extractor 410 extracts main objects and attributes by subject. Natural language understanding technology can be used to extract main objects and attributes.

The SubObj extraction module of the extraction unit 410 extracts a sub object when there is a main object + (property) for each subject. In order to extract the sub object, various statistical information (eg, clicks, likes, comments, authors, etc.) or dictionary information may be used.

The ranker (Suffix Ranker, SubObj Ranker) module of the refiner 420 determines the ranking of sub-objects and / or attributes in order of importance with respect to the main object. In this case, various information such as the number of clicks, the number of likes, the number of comments, and the number of authors may be utilized to determine the importance.

The Post-Processor (Post-Processor Ranker, Post-Processor Ranker) module of the refiner 420 receives the ranked sub-objects and / or attributes and clusters them according to synonyms or substring (substring) relationships and Select a representative value.

The TopicGraphToTree module of the generator 430 collects clustered sub-objects and / or attributes, finds the relationship strength of how often they appear in a query or document, creates a network (graph) structure, and then creates a tree (search / cluster-based tree). Will be converted. A detailed search / cluster-based tree will be described below.

The TreeConstructor module of the generator 430 integrates a dictionary-based tree and a search / cluster-based tree structure to form a final topic tree (eg, a 2-depth tree structure).

The Topic Reranker module of the adjuster 440 further filters the topic tree according to filtering conditions (eg, the number of documents, subject suitability, correctness, etc.).

The New Object Assigner module of the adjusting unit 440 extracts and assigns a new item related to the main object to the original tree structure. If there are a lot of new items that do not fit in the tree structure, the process is restarted from the beginning to reconstruct the tree.

The Document Finding API module of the providing unit 450 constructs a query based on the final topic tree to extract a suitable document. In this case, a filtering function may also be included.

The Auto-Tagger module of the providing unit 450 constructs a topic tag based on the final topic tree and tags it in a suitable document.

Steps S510 to S550 included in the method of FIG. 5 may be performed through the processor 222 including the above components.

In FIG. 5, in operation S510, the extractor 410 may extract a main object, which is a core object for determining a corresponding subject, and a subtopic that embodies the corresponding subject for each subject. In this case, the extractor 410 may extract sub-objects and / or attribute candidates by analyzing words frequently appearing with the main object on the document or by analyzing words frequently used in the search system with the main object. .

In operation S520, the refiner 420 may filter the subtopics according to the appearance frequency or the search frequency in the document and then perform grouping based on the relationship between words. In the candidate filtering process, at least some of the sub object and / or attribute candidates may be filtered. As a specific example, the refiner 420 may filter the sub-object and / or the attribute according to at least one of the frequency of appearance in the document and the search frequency of the user. In addition, the frequency of appearance can be filtered by limiting data of a specific period. In this case, the filtering method may vary according to the characteristics of the subject. For example, if the subject has high timeliness, the data may be filtered recently for a certain period (for example, one week before the present). In addition, the refiner 420 may group the selected sub-objects and / or attribute candidates through the candidate filtering process in consideration of a substring relation and the like, and may select a representative for each group after grouping. The method of selecting a representative may be various. In one embodiment, the representative may be selected to have the highest search frequency. In other words, the refiner 420 ranks subtopics in order of importance (eg, frequency of appearance in documents, search frequency, etc.), clusters the ranked subtopics according to a synonym or substring relationship, and selects a representative of each cluster. can do. Extraction and purification of the subtopics consists of candidate selection and grouping / representation. FIG. 6 illustrates an example of a process of filtering, grouping, and selecting a representative topic for candidates for queries 'Guam' and 'potato' according to an embodiment of the present invention. In FIG. 6, WTRIP and FOOD are classification codes (category classification codes) indicating the subject of the query, and the number next to the words indicates the frequencies retrieved associated with the query.

Referring back to FIG. 5, in operation S530, the generation unit 430 may generate a topic tree for the grouped subtopics using hierarchical information on the corresponding subject. The generation unit 430 may generate a topic tree by labeling each grouped subtopic to match each class name of the hierarchy by using hierarchy information. Depending on the subject, there is a case where hierarchical information probably exists and there is no hierarchical information. For example, dictionary information constructed from a database containing various kinds of contents is one of information that can be usefully used for hierarchical information. In this case, the generation unit 430 may generate a topic tree based on the existing hierarchical information such as dictionary information. For example, cooking or recipe topics have a rich hierarchy of information based on a cooking encyclopedia. On the other hand, travel or shopping topics do not have hierarchical information, and various subtopics may be created or changed depending on time. If the layer information does not exist, the generation unit 430 may be used to generate a topic tree by constructing the layer information based on a word embedding-based clustering technique and a taxonomy. The present invention has an advantage in that a topic can be automatically layered even when there is no layer information.

7 is a flowchart illustrating an example of a process of constructing hierarchical information using a word embedding-based clustering technique and language taxonomy according to an embodiment of the present invention. Referring to FIG. 7, the generation unit 430 extracts a similar word from word embedding data for a subtopic (S701), clusters the extracted word according to a synonym or substring relation (S702), and then clusters the word. Can be labeled based on linguistic taxonomy (S703). 8 shows an example of hierarchical information constructed by using clustering and language taxonomy for a travel subject. In the word embedding-based clustering process (S702), the word embedding data is learned by subject-specific documents (eg, blog posts, etc.), the word vector values of subtopics requiring clustering are learned from the learning data, and clustering is performed based on the word vector values. do. In this case, clustering may use various methods such as hierarchical clustering, K-means algorithm, density clustering, and the like. In the linguistic taxonomy application process (S703), the clustered result may be labeled by mapping the clustered result to each class on the linguistic taxonomy. At this time, language taxonomy is general, and there are many unnecessary classes when compared with hierarchical information specialized for a subject. Therefore, it is necessary to delete unnecessary classes, which will be described later in the rebalancing process of the adjusting unit 440.

In the topic layering phase, clustered subtopics are gathered to find out how often they appear together in a query or document, create a network (graph) structure, convert them into topic trees (cluster-based trees), and cluster them with dictionary-based topic trees. The topic tree built on the basis can be integrated to form the final tree structure.

Referring back to FIG. 5, in operation S540, the adjustment unit 440 may perform rebalancing on the topic tree constructed in the topic layering step of the generation unit 430 according to the purpose of the user or the system. In addition, the adjuster 440 may perform pruning on the topic tree in consideration of subject fitness, search intention, search result amount, and the like.

Table 1 shows the definition of the topic network according to an embodiment of the present invention.

The generation unit 430 generates a topic tree using information constructed by using a search frequency and clustering. First, a topic network G is constructed by representing each word as a node and the relationship between the words as edges. The node V and the trunk line E in the topic network G according to an embodiment of the present invention may be defined as shown in Table 1. At this time, the generation unit 430 changes the topic network to the topic tree in consideration of the search frequency. 9 illustrates an example of a process of converting a topic network constructed according to an embodiment of the present invention into a tree having a depth of 2 (2-depth tree). Various algorithms may be used to convert the network into a tree, and for example, a minimum spanning tree construction algorithm in a weight graph may be applied. The controller 440 may then be based on linguistic taxonomy. You can combine one tree with a tree based on search frequency / clustering to perform rebalancing based on user or system purpose. 10 is a flowchart illustrating an example of a tree rebalancing process according to an embodiment of the present invention. Referring to FIG. 10, the coordinator 440 may insert a cluster corresponding to a leaf node in a clustering-based tree into a corresponding class of a dictionary-based tree (S1001). The breadth and depth of the topic tree are different for each query, and the depth and width of the topic tree are generally large, so it is necessary to reduce them (S1002 ~ S1003). Reducing the width and depth is accomplished by combining a tree based on linguistic taxonomy and a tree based on search frequency / clustering. The width value and the depth value of the topic tree may be set differently according to the requirements of the system, and in the embodiment of the present invention, it is assumed that the depth is 2 (2 depth treeization). In addition, the adjustment unit 440 may perform pruning of the topic tree in consideration of topic suitability, search intent, and amount of search results (S1004). FIG. 11 illustrates some methods for reducing the width of a tree rebalancing process. The width of the topic tree may be reduced by bottom-up node movement and / or top-down node movement. In addition, FIG. 12 illustrates some methods for reducing the depth during the tree rebalancing process. The depth of the topic tree may be reduced by replacing some nodes with child nodes.

Referring back to FIG. 5, in operation S550, the provider 450 may provide a subtopic along with a search result corresponding to the query by using a topic tree of a topic to which the query belongs. In this case, the provider 450 may filter the subtopic according to various conditions as the related search word for the query and provide the search result with the search result. In one example, the provider 450 may filter the subtopic according to the subject fitness of the query. Given a query for searching, you can check the subject to which the query belongs, and if the query belongs to several topics, you can filter the topics that do not fit the given subject. To this end, the subject score of 'Query + Subtopic', which is a score indicating the correlation between the query and the subtopic, may be used. Text categorization (Text categorization) algorithm (eg, support vector machine (SVM), k-Nearest Neighbor (kNN), Convolutional Neural Networks (CNN), etc.) may be used as a subject score grading method. As another example, the provider 450 may filter the subtopic using the number of documents corresponding to the subtopic. If the number of documents included in the search result is less than a certain number, the usefulness may be reduced, so the corresponding subtopic may be excluded. As another example, the provider 450 may filter the subtopic based on whether the topic is correct. For subtopics where it is more appropriate to provide correctness information than providing multiple documents as a result of a search (for example, when correctness information is required, such as Guam weather), you can include it as a related search term for the query. .

The provider 450 may hierarchically expose detailed subtopics (sub objects and / or attributes) related to the query as a related search word for the query input by the user. The topic tree for each topic may be updated in units of a certain period, and the update cycle of the topic tree may be determined in consideration of the characteristics of the corresponding topic according to the topic.

13 to 14 illustrate examples of a search result screen reflecting a topic structure of two depths according to an exemplary embodiment of the present invention. For example, in the process of providing a search result corresponding to the input query with respect to the query input by the user, queries of depth1 and queries of depth2 are provided as related search terms of the input query according to hierarchical information of a subject corresponding to the input query. can do. Referring to FIG. 13, when a user inputs the query “Guam” into the search box 1301, a related search term of the input query “Guam” according to hierarchical information of the corresponding subject “Guam” along with a search result corresponding to the input query. As a query, the queries 1310 of depth1 and the queries 1320 of depth2 may be provided. In this case, each of the queries provided as the related search word is configured in a form selectable by the user, and the query selected by the user is automatically added to the search box 1301. As shown in FIG. 13, when the user selects 'delicious' from the queries 1310 of depth1 provided as the related query of the initial query 'Guam', 'delicious' is additionally input to the search box 1301. A search result 1302 of depth1 may be exposed using the query “Guam restaurant”. Next, when the user selects the query 'handmade burger' among the queries 1320 of depth2, as shown in FIG. 14, 'handmade burger' is additionally input into the search box 1301, and the 'guam restaurant homemade burger' is selected. The query may expose a search result 1402 of depth2.

As described above, according to embodiments of the present invention, a search result (document) may be provided along with hierarchical topics to help efficient additional search and provide a variety of search results with a single query. Furthermore, it is also possible to utilize hierarchical topic structures for search ranking. In other words, documents containing sub-objects and attributes are likely to be relatively high quality documents, which can be used to boost these documents in search ranking.

The apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. For example, the devices and components described in the embodiments may include a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), a programmable PLU (programmable). It can be implemented using one or more general purpose or special purpose computers, such as logic units, microprocessors, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to the execution of the software. For convenience of explanation, one processing device may be described as being used, but one of ordinary skill in the art will appreciate that the processing device includes a plurality of processing elements and / or a plurality of types of processing elements. It can be seen that it may include. For example, the processing device may include a plurality of processors or one processor and one controller. In addition, other processing configurations are possible, such as parallel processors.

The software may include a computer program, code, instructions, or a combination of one or more of the above, and configure the processing device to operate as desired, or process it independently or collectively. You can command the device. The software and / or data may be embodied in any type of machine, component, physical device, computer storage medium or device in order to be interpreted by or provided to the processing device or to provide instructions or data. have. The software may be distributed over networked computer systems so that they may be stored or executed in a distributed manner. Software and data may be stored on one or more computer readable recording media.

The method according to the embodiment may be embodied in the form of program instructions that can be executed by various computer means and recorded in a computer readable medium. In this case, the medium may be to continuously store a program executable by the computer, or to temporarily store for execution or download. In addition, the medium may be a variety of recording means or storage means in the form of a single or several hardware combined, not limited to a medium directly connected to any computer system, it may be distributed on the network. Examples of the medium include magnetic media such as hard disks, floppy disks and magnetic tape, optical recording media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, And ROM, RAM, flash memory, and the like, configured to store program instructions. In addition, examples of another medium may include a recording medium or a storage medium managed by an app store that distributes an application, a site that supplies or distributes various software, a server, or the like.

Although the embodiments have been described by the limited embodiments and the drawings as described above, various modifications and variations are possible to those skilled in the art from the above description. For example, the described techniques may be performed in a different order than the described method, and / or components of the described systems, structures, devices, circuits, etc. may be combined or combined in a different form than the described method, or other components. Or even if replaced or substituted by equivalents, an appropriate result can be achieved.

Therefore, other implementations, other embodiments, and equivalents to the claims are within the scope of the claims that follow.

Claims

In a computer-implemented topic structuring method,

Extracting subtopics related to the topic for each topic;

Generating a topic tree for the subtopic using hierarchical information of the subject; And

Given a query for search, providing the subtopics hierarchically as a related search word for the query according to the topic tree of the topic to which the query belongs.

Topic structuring method comprising a.
The method of claim 1,

The extracting step,

Extracting the subtopic by analyzing words associated with the key object that determines the subject

Topic structuring method, characterized in that.
The method of claim 1,

Filtering the subtopics according to at least one of a document appearance frequency and a retrieval frequency.

Topic structuring method further comprising.
The method of claim 1,

Selecting a representative of each cluster by clustering the subtopics according to a synonym or a substring (substring) relationship

Topic structuring method further comprising.
The method of claim 1,

The generating step,

Labeling the subtopic with each class name of the hierarchical information to generate the topic tree

Topic structuring method, characterized in that.
The method of claim 1,

The generating step,

Extracting similar words from word embedding data for the subtopic;

Clustering the similar words according to a synonym or substring (substring) relationship; And

Labeling the clustered words by mapping each class in linguistic taxonomy

Topic structuring method comprising a.
The method of claim 1,

Rebalancing the topic tree by reducing at least one of breadth and depth of the topic tree

Topic structuring method further comprising.
The method of claim 1,

The providing step,

Filtering the subtopic according to at least one of a subject score indicating a correlation between the query and the subtopic, a number of documents corresponding to the subtopic, and whether or not a correctness topic is provided for the query.

Topic structuring method comprising a.
In the computer-implemented search results providing method,

Providing a search result corresponding to the query given a query for searching;

Providing a subtopic associated with the topic in a hierarchical form with a plurality of depths as an associated search word for the query according to the hierarchical information of the subject to which the query belongs; And

Providing a search result corresponding to the query including the selected search word when at least one search word of the subtopics is selected.

Search result providing method comprising a.
A computer program recorded on a computer readable recording medium in combination with a computer system to execute a topic structuring method,

The topic structuring method,

Extracting subtopics related to the topic for each topic;

Generating a topic tree for the subtopic using hierarchical information of the subject; And

Given a query for search, providing the subtopics hierarchically as a related search word for the query according to the topic tree of the topic to which the query belongs.

Including, a computer program.
In a topic structured system implemented in a computer,

At least one processor implemented to execute computer-readable instructions

Including,

The at least one processor,

An extraction unit for extracting subtopics related to the topic for each topic;

A generator configured to generate a topic tree for the subtopic using hierarchical information of the subject; And

Given a query for a search, a provider that provides the subtopics as a related search word for the query in a hierarchical form according to the topic tree of the subject to which the query belongs.

Topic structuring system comprising a.