WO2019087593A1 - Document retrieval device and method - Google Patents
Document retrieval device and method Download PDFInfo
- Publication number
- WO2019087593A1 WO2019087593A1 PCT/JP2018/034358 JP2018034358W WO2019087593A1 WO 2019087593 A1 WO2019087593 A1 WO 2019087593A1 JP 2018034358 W JP2018034358 W JP 2018034358W WO 2019087593 A1 WO2019087593 A1 WO 2019087593A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- document
- classification
- documents
- search
- unit
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
Definitions
- the present invention relates to a document search apparatus and method, and more particularly to a document search technology using a full text search method.
- Patent Document 1 sets plural types of sentence types for identifying the contents of sentences, such as "Opinion”, "Recommendation”, etc., from the original text database storing original text document data.
- sentence types for identifying the contents of sentences, such as "Opinion", "Recommendation", etc.
- the excerpt sentence data is formed, for example, in a form in which a conjunction is removed, and a technique for extracting excerpt sentence data corresponding to a specified sentence type and displaying a list is disclosed.
- Patent Document 1 discloses a technique of setting in advance a priority between patterns of specific sentences belonging to sentence types, and adjusting excerpt sentence data to be displayed based on the priorities.
- the present invention has been made to solve the above-described problem, and a document search apparatus capable of preferentially displaying an existing document including information of a phenomenon that has occurred in the past that is similar to a phenomenon that has occurred. Intended to provide.
- a document search device includes a document database in which a plurality of documents are stored, first information identifying each of the plurality of documents, and a plurality of documents.
- a classification result database in which second information identifying a sentence included in each and third information indicating a classification class representing an attribute of the sentence are stored in association with each other, and a document regarding a certain phenomenon is searched.
- a display order determination unit that determines an order in which the plurality of extracted documents are output and displayed when there are a plurality of documents extracted by the extraction unit; Includes at least a first classification class representing the phenomenon, and the display order determination unit is associated with the first classification class among the plurality of extracted documents with reference to the classification result database. It is characterized in that a decision is made to preferentially output and display a document including a sentence.
- a search condition input step in which a search condition for searching a document concerning a certain phenomenon is input, and a plurality of documents stored in the document database based on the search condition Determining an order of outputting and displaying the plurality of extracted documents when there are a plurality of documents extracted in the extraction step, and performing a full text search to extract documents matching the search condition
- a display order determination step comprising: first information identifying each of the plurality of documents; and second information identifying a sentence included in each of the plurality of documents.
- the classification class is characterized in that it comprises at least the first classification class.
- the document including the sentence associated with the classification class representing the phenomenon is preferentially output and displayed, so it occurred in the past similar to the occurring phenomenon. It is possible to preferentially display documents containing information on phenomena.
- FIG. 1 is a functional block diagram of a document search apparatus according to the first embodiment of the present invention.
- FIG. 2 is a block diagram showing a configuration example of hardware for realizing the document search device according to the first embodiment of the present invention.
- FIG. 3 is a flowchart illustrating search processing according to the first embodiment of the present invention.
- FIG. 4 is a view showing an example of a display unit according to the first embodiment of the present invention.
- FIG. 5 is a functional block diagram of a document search device according to a second embodiment of the present invention.
- FIG. 6 is a flowchart illustrating classification processing according to the second embodiment of the present invention.
- FIG. 7 is a functional block diagram of a document search device according to a third embodiment of the present invention.
- FIG. 8 is a flowchart for explaining classification model construction processing according to the third embodiment of the present invention.
- the document search device 1 searches for an existing document related to a “phenomenon” that has occurred, such as a defect that has occurred at a manufacturing site, for example.
- Documents containing information on phenomena that occurred in the past that are similar to phenomena currently occurring are preferentially displayed as search results. Then, the search result output and displayed with priority is referred to by the user and utilized for emergency response to the failure.
- the document search device 1 has a document DB 41 in which a plurality of original text documents are stored, information (first information) for identifying the original text document to be searched, and information for identifying sentences included in the original text documents (first information) There is a classification result DB 42 in which the second information) and the information (third information) indicating the classification class representing the attribute of the sentence are associated with each other.
- the document search device 1 performs full-text search on a plurality of original text documents stored in the document DB 41 based on the search condition input by the user, and extracts a plurality of documents matching the search condition.
- the classification class includes at least a classification class (first classification class) representing a phenomenon that has occurred.
- the document search device 1 stores a document including a sentence associated with a classification class representing a phenomenon, which is stored in the classification result DB 44, Make a decision to output and display preferentially as a search result.
- the document search device 1 includes an input / output unit 2, a search unit 3, and a storage unit 4.
- the input / output unit 2 includes a search condition input unit 21 and a display unit 22.
- the input / output unit 2 receives an input from a user who uses the document search device 1, and outputs and displays a search result.
- the input / output unit 2 uses, for example, a form of a web browser, but a dedicated application may be used. Also, the input / output unit 2 may be separated from other functional units included in the document search device 1 in a network manner, or may be on the same computer.
- the search condition input unit 21 receives, from the user, an input of a search condition for searching an existing document related to a phenomenon that has occurred, such as a defect at a manufacturing site.
- the search condition input unit 21 receives, for example, a word string or a query sentence representing a phenomenon occurring at present.
- the display unit 22 displays the search result by the search unit 3 described later. Specifically, the display unit 22 highlights sentences belonging to a classification class such as a sentence representing a phenomenon that has occurred, and displays a plurality of documents extracted by the extraction unit 31 described later as a search result.
- a classification class such as a sentence representing a phenomenon that has occurred
- the display unit 22 displays the sentence to be displayed in an emphasized manner so that the classification classes to which the sentence belongs can be distinguished from each other. For example, when a plurality of classification classes are adopted, it is assumed that the document of the search result includes a plurality of sentences belonging to different classification classes. In such a case, the display unit 22 highlights sentences belonging to different classification classes included in the same document, for example, in different colors in the original document. The details of the classification class will be described later.
- the search unit 3 includes an extraction unit 31 and a display order determination unit 32. Search conditions such as a query from the user input to the search condition input unit 21 are input to the search unit 3.
- the search unit 3 searches a document matching the search condition with respect to the original text document registered in the document DB 41 described later, and determines the order of outputting and displaying the search result document.
- the extraction unit 31 executes a full text search on a plurality of original text documents registered in the document DB 41 based on the search condition input through the search condition input unit 21 to extract a document matching the search condition. . More specifically, the extraction unit 31 performs full-text search with reference to the index DB 411 in which indexes of a plurality of original text documents included in the document DB 41 are registered, and extracts a document that matches the search condition.
- the display order determination unit 32 determines the order in which the plurality of documents are output and displayed. More specifically, the display order determination unit 32 refers to a classification result DB 42 described later, and a plurality of documents extracted by the extraction unit 31 are sentences associated with the classification class “phenomena” representing the phenomenon that has occurred. Make a decision to preferentially output and display documents containing.
- the storage unit 4 includes a document DB 41 and a classification result DB 42.
- the document DB 41 includes an original text DB 410 and an index DB 411.
- the document DB 41 stores information on a plurality of original text documents (a plurality of documents) to be searched.
- the original text DB 410 a plurality of original text documents prepared in advance or link information to the original text documents are registered.
- the plurality of original text documents registered in the original text DB 410 are used when the display unit 22 displays the search results. More specifically, the display unit 22 reads out the original text document corresponding to the search result document from the original text DB 410 based on the information of the document of the search result by the search unit 3 and performs display content processing and highlighting. .
- index DB 411 indexes corresponding to a plurality of original text documents registered in the original text DB 410 are registered.
- the index DB 411 is provided to speed up the search process when the extraction unit 31 executes the full text search.
- the transposition index has, for example, a data structure in which a matrix of a table is registered in which information of character strings, position information of character strings in a document, identification information of a document, and the like are associated with one another and registered.
- the generation of the index is performed prior to the search processing by the search unit 3 and is also performed when the original text document is registered in the original text DB 410. Further, as a method of extracting a character string at the time of index generation, for example, morphological analysis is used.
- the index may be generated by a device installed outside the document search device 1 or may be generated by, for example, the control unit 102 in the document search device 1.
- the original text DB 410 includes an original text document in a language such as Japanese, which is not separated
- the text of the original text is divided by morphological analysis.
- standardization of character strings called mixed normal-width and half-width, and mixed-case of upper-case and lower-case characters, and deletion of special symbols are performed. Is desirable.
- the classification result DB 42 includes information identifying each of a plurality of original text documents registered in the original text DB 410, information identifying a sentence included in each of a plurality of original text documents, and a classification class representing the attribute of the sentence And the information which shows are mutually linked
- a classification class is a set of sentences defined by the attributes of sentences, such as the meaning and content of sentences.
- the classification class "cause” (second classification class) indicating the cause of the occurred phenomenon and the classification class "action” indicating the countermeasure for the occurred phenomenon
- Three classification classes (the third classification class) are adopted.
- a sentence belonging to the classification class “phenomenon” “... Occurrence of error” may be mentioned.
- a sentence belonging to the classification class "cause” for example, "... considered as a factor”
- a sentence belonging to the classification class "action” for example, "... Be
- the document search device 1 includes a computer including a control unit 102 connected via a bus 101, a communication control device 103, a storage device 104, an input device 105, and a display device 106, and their hardware. It can be realized by a program that controls wear resources.
- the control unit 102 includes a CPU 102 a and a main storage unit 102 b. Programs for the CPU 102a to perform various controls and operations are stored in advance in the main storage unit 102b.
- the control unit 102 implements the functions of the document search apparatus 1 such as the extraction unit 31 and the display order determination unit 32 illustrated in FIG. 1.
- the communication control device 103 is an input / output interface for connecting the document search device 1 and various devices.
- the communication control apparatus 103 may have a function as a control apparatus for connecting the document search apparatus 1 and various external electronic devices via a network.
- the classification result of the document to be searched which is executed by an apparatus installed outside may be received via the communication control apparatus 103 and stored in the classification result DB 42.
- the storage device 104 includes a readable and writable storage medium, and a drive device for reading and writing various information such as programs and data from and to the storage medium.
- a semiconductor memory such as a flash memory or a hard disk can be used as a storage medium.
- the storage device 104 is a document DB 41, a classification result DB 42, a program storage unit 104a, and other storage devices (not shown), for example, a storage device for backing up programs and data stored in the storage device 104. It can have.
- the program storage unit 104a stores various programs for executing processing necessary for document search such as search processing in the present embodiment.
- the input device 105 is realized by a keyboard, a mouse, a touch panel, and the like, and receives input and operation from the user.
- the input device 105 receives the input of the search condition from the user.
- the input device 105 functions as the search condition input unit 21 described with reference to FIG.
- the display device 106 a liquid crystal display or the like is used. On the display device 106, an input result by the input device 105 is displayed, and information on a document of the search result is displayed.
- the display device 106 functions as the display unit 22 described in FIG.
- the search condition input unit 21 receives an input of a search condition by the user (step S1).
- the user's input accepted by the search condition input unit 21 is displayed in the area 220 of the display unit 22 as shown in the display example of FIG. 4.
- character strings of “ ⁇ device”, “error”, and “occurrence” are accepted as search conditions.
- step S2 the extraction unit 31 executes a full text search to extract a document matching the search condition from the document DB 41 (step S2).
- the extraction unit 31 executes a full-text search with reference to the index DB 411.
- the extraction unit 31 extracts a plurality of documents including the search condition “ ⁇ device”, “error”, and “occurrence” in the transposed index registered in the index DB 411.
- the extraction unit 31 also calculates the degree of similarity of each of the plurality of extracted documents with the search condition. In the calculation of the degree of similarity, the extraction unit 31 may use a known method generally used in full-text search.
- the document extracted by the extraction unit 31 is temporarily stored in association with the degree of similarity.
- the document extracted by the extraction unit 31 may include a document having a content different from the content intended by the user even if the document matches the search condition.
- the display order determination unit 32 determines the order in which the plurality of documents extracted by the extraction unit 31 are output and displayed (step S3). More specifically, the display order determination unit 32 generates a phenomenon that occurs among a plurality of extracted documents based on an index value that indicates the degree of the relationship between each of the plurality of extracted documents and the classification class "phenomenon". The order in which the document including the sentence belonging to the classification class (the first classification class) to be displayed is displayed is determined.
- the display order determination unit 32 matches the search condition among the plurality of documents extracted by the extraction unit 31, and is included in the extracted documents. A decision is made to preferentially output and display a document including a sentence classified into the classification class "phenomenon".
- the display order determination unit 32 calculates a display order index value obtained by multiplying the similarity calculated for each document extracted by the extraction unit 31 by a predetermined coefficient.
- the predetermined coefficient is set such that a display order index value higher than the value of the display order index value of the search result classified into another classification class is calculated as the search result classified into the classification class "phenomenon" Do.
- the display unit 22 processes the display content in the search result document in which the display order determination unit 32 determines the order of output and display (step S4). For example, the display unit 22 highlights sentences belonging to the classification classes “phenomenon”, “cause”, and “action” included in each of a plurality of documents displayed as a search result, and displays the sentences as a search result.
- the display unit 22 adds, to a part of the original text document corresponding to the search result document, processing such as HTML tags that can be distinguished on the display. Specifically, in the area 221 where the original text document corresponding to the document of the search result is displayed, the display unit 22 displays the sentences classified into the classification classes “phenomenon”, “cause” and “action”. The regions 222a, 222b and 222c are processed.
- the display unit 22 may surround the regions 222a, 222b, and 222c with tags (for example, div tags) that group the regions 222 as HTML block elements, or may apply a style sheet such as Cascading Style Sheets (CSS).
- tags for example, div tags
- CSS Cascading Style Sheets
- the display unit 22 displays the document of the search result in which the display content is processed (step S5). Specifically, the display unit 22 lists and displays the corresponding original text documents from the top of the display screen in accordance with the output display order of the search result documents determined in step S3. As shown in the display example of FIG. 4, the document “No. 1” displayed at the top of the display screen is a document for which the highest display order index value has been calculated.
- the display unit 22 can distinguish the sentences belonging to the classification class "phenomenon", “cause”, and “action” in each document.
- the character colors of the areas 222a, 222b, and 222c and the highlight display colors may be changed from each other.
- the document search device 1 preferentially displays, among a plurality of documents extracted by full-text search, a document including a sentence belonging to a classification class representing a phenomenon. Do. Therefore, the document search device 1 can preferentially display an existing document including information of a phenomenon that has occurred in the past, which is similar to a phenomenon that is currently occurring. As a result, the user can perform quicker emergency response to a problem or the like that has occurred at a manufacturing site or the like.
- the document search device 1 when displaying the document of the search result, the document search device 1 highlights and displays the sentence included in the document and belonging to the classification class. Therefore, when confirming the search result on the display screen, the user can more easily confirm whether the document of the search result is an existing document including information similar to the phenomenon that is actually occurring at present. It can be carried out.
- the document search device 1 uses three classification classes of “classification phenomenon”, “cause” and “action”, and therefore, not only existing documents concerning the phenomena currently occurring but also the present It is possible to output and display a document containing information that is more useful to the user, such as investigation of the cause of the phenomenon and recovery.
- the document search device 1 since the document search device 1 has the classification result DB 42 in which information on classification classes in sentence units is stored in advance for the document to be searched, the calculation load in the document search device 1 can be further reduced. It is possible to make the document search device 1 a simpler configuration.
- the document search device 1a further classifies the sentences included in each of the plurality of documents into any of a plurality of classification classes representing the attributes of the sentences, and the classification result DB 42. And a classification execution unit 5 for storing the information.
- the document search device 1 a further includes a classification model storage unit 43.
- the document search device 1a classifies each of a plurality of original texts to be searched in sentence units and stores the classification result in the classification result DB 42.
- the document search device 1a performs a search based on the search condition input by the user thereafter.
- the classification execution unit 5 classifies a plurality of original text documents registered in the original text DB 410 into classification classes in sentence units. More specifically, the classification execution unit 5 inputs the original text document registered in the original text DB 410 to be classified into the classification model stored in advance in the classification model storage unit 43. Then, the classification execution unit 5 classifies the sentences contained in each document into classification classes “phenomenon”, “cause”, and “action”, which are set in advance, and outputs classification results.
- the classification execution unit 5 when performing classification, can set a threshold value to classify into one of the classification classes “phenomenon”, “cause”, and “action”, class by class. You may decide whether or not. In this case, the classification execution unit 5 may output a sentence not classified into any classification class as a classification result.
- the classification result output by the classification execution unit 5 is stored in the classification result DB 42.
- the classification result information stored in the classification result DB 42 is data in which information for identifying a document in the original text, information for identifying a sentence, and information indicating a classification class in which the sentence is classified is associated.
- the classification result stored in the classification result DB 42 may be a classification class and one sentence (a sentence included in the original document), and the classification class and the position in the original document (for example, the start position and the number of characters) ) May be included.
- the classification model storage unit 43 stores, for example, a classification model which is learned and constructed in advance by a device installed outside.
- the classification model is a model constructed by learning a classifier selected from known algorithms used in natural language processing, and the details will be described later.
- the classification execution unit 5 reads out the original text document registered in the original text DB 410 of the document DB 41 and inputs it to the classification model stored in the classification model storage unit 43. (Step S20).
- the classification execution unit 5 classifies sentences for each of a plurality of original text documents (step S21). More specifically, the classification execution unit 5 classifies each of the sentences included in the original text document into any one of predetermined classification classes “phenomenon”, “cause”, and “action”.
- the classification execution unit 5 After performing class classification for each of a plurality of original text documents, the classification execution unit 5 associates information for identifying the document, information for identifying the sentence, and information on the classification class into which the sentence is classified. It stores in the classification result DB 42 (step S22).
- the information identifying each sentence may be the original sentence or the position of each sentence in the original document.
- the extraction unit 31 refers to the full text search with reference to the index DB 411 based on the search condition input to the search condition input unit 21 as in the first embodiment. Execute and extract multiple documents that match the search condition.
- the display order determination unit 32 determines the order in which the plurality of extracted documents are output and displayed.
- the display order determination unit 32 determines the order in which the document is output and displayed using the classification result DB 42 in which the classification result by the classification execution unit 5 is stored.
- the display unit 22 processes the display content. For example, the display unit 22 highlights and displays, in the corresponding original text document, the sentences classified into the classification classes “phenomenon”, “cause” and “action” included in the document of the search result. Furthermore, the display unit 22 emphasizes and displays the sentences of each classification class so as to be distinguishable from each other.
- the classification execution unit 5 uses the classification model stored in advance in the classification model storage unit 43 to use the class of the original document Perform classification.
- the document search device 1a can perform class classification for the original text document. Therefore, it becomes possible to cope with the update of the original document of the search target in the document search device 1a.
- the classification execution unit 5 is registered in the document DB 41 (original text DB 410) using the classification model stored in advance in the classification model storage unit 43. The case where original text documents are classified into sentence classes in each classification class is described.
- the document search device 1 b further includes a learning unit 6. The learning unit 6 learns a predetermined classifier, and constructs a classification model used when the classification execution unit 5 executes the classification process.
- the learning unit 6 includes a teacher data setting unit 61 and a classification model learning unit 62.
- the classifier used by the learning unit 6 is, for example, a network combining a support vector machine (Support Vector Machine, hereinafter referred to as "SVM") or "word2vec", which is a two-layer neural network, and a convolutional neural network. Etc., from known algorithms used in document classification in natural language processing.
- SVM Support Vector Machine
- word2vec a support vector machine
- Etc. from known algorithms used in document classification in natural language processing.
- a classifier using supervised learning is adopted, but in constructing a classification model, a classifier using unsupervised learning may be adopted.
- the teacher data setting unit 61 sets teacher data including a sentence and a classification class to which the sentence should belong. More specifically, the teacher data setting unit 61 labels the teacher data labeled like a sentence representing the classification class "phenomenon", a sentence representing the classification class "cause", and a sentence representing the classification class "action”. prepare.
- the classification model learning unit 62 inputs the training data set by the training data setting unit 61 into the classifier, and learns a classifier such as an SVM, for example, to construct a classification model. More specifically, the classification model learning unit 62 first converts sentences of text data into vector representations. Specifically, the classification model learning unit 62 may use a sentence vector weighted for each appearing word using an algorithm such as the tf-idf method.
- the classification model learning unit 62 classifies this sentence vector with a classifier such as SVM to construct a classification model.
- the classification model constructed by the classification model learning unit 62 is stored in the classification model storage unit 43.
- classification model construction processing executed by the learning unit 6 will be described using the flowchart of FIG.
- the classification model construction process is performed prior to the classification process performed by the classification execution unit 5.
- the teacher data set by the teacher data setting unit 61 is input to a classifier such as an SVM (step S30).
- the classification model learning unit 62 learns the classifier based on the input teacher data, and constructs a classification model (step S31).
- the classification model constructed by the classification model learning unit 62 is stored in the classification model storage unit 43.
- classification processing by the classification execution unit 5 is executed as in the second embodiment, and the document is classified using the classification model constructed by the learning unit 6. Furthermore, search processing by the search unit 3 is executed, and the order of output display of the plurality of extracted documents is determined. Then, the display unit 22 processes the display content in the original text document corresponding to the search result document, emphasizing and displaying the sentences belonging to each classification class so that they can be distinguished from each other.
- the learning unit 6 learns a predetermined classifier to construct a classification model.
- the document search device 1b can locally update the classification model, reset the classification class, and the like as needed.
- classification class is not limited to these three cases, and the classification class “phenomenon” may be used alone, and further different classification classes may be added and used in combination.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Debugging And Monitoring (AREA)
Abstract
This document retrieval device (1) includes: a document DB (41) in which a plurality of original text documents are stored; and a classification result DB (42) in which information which identifies an original text document to be retrieved, information which identifies text included in the original text document, and information which indicates a classification class that represents the attribute of the text are associated with each other. The document retrieval device (1) performs a full text retrieval on the plurality of original text documents stored in the document DB (41) on the basis of a retrieval condition input by a user, and extracts a plurality of documents which match the retrieval condition. The document retrieval device (1) performs a decision for preferentially outputting and displaying, as a retrieval result, a document which is stored in the classification result DB (42) and includes text associated with a classification class that represents a phenomenon, among the plurality of documents extracted through the full text retrieval.
Description
本発明は、文書検索装置および方法に関し、特に全文検索手法を用いた文書検索技術に関する。
The present invention relates to a document search apparatus and method, and more particularly to a document search technology using a full text search method.
従来より、コールセンターへの問い合わせの記録や、製造現場での不具合への対応の記録など、様々な場面で発生した出来事や現象、およびその原因や対処に関するテキストデータの蓄積が行われている。また、このようなテキストデータを活用するために、全文検索手法が用いられ、現在発生している現象と類似の過去の事例を検索して参照することが行われている。特に、コールセンターや製造現場においては、問題や不具合に緊急に対応することが必要な場合がある。
Heretofore, there have been accumulated text data on events and phenomena occurring in various scenes such as a record of inquiries to a call center and a response to a defect at a manufacturing site, and the cause and countermeasure. Also, in order to utilize such text data, a full-text search method is used, and past cases similar to the phenomenon currently occurring are searched and referred to. In particular, in a call center or a manufacturing site, it may be necessary to respond urgently to problems or problems.
例えば、特許文献1は、原文の文書データを格納した原文データベースから、「意見」、「提言」などのように、文の内容を識別するための複数種類の文の型を設定し、これらの文の型に分類された文単位の抜粋文データを作成している。そして、抜粋文データを、例えば、接続詞を除去したような形式に形成し、指定された文の型に対応する抜粋文データを抽出して一覧表示する技術を開示している。また、特許文献1は、文の型に属する特定の文のパターン間で優先順位を予め設定し、この優先順位に基づいて、表示する抜粋文データを調整する技術を開示している。
For example, Patent Document 1 sets plural types of sentence types for identifying the contents of sentences, such as "Opinion", "Recommendation", etc., from the original text database storing original text document data. We create excerpt sentence data of sentence unit classified into sentence type. Then, the excerpt sentence data is formed, for example, in a form in which a conjunction is removed, and a technique for extracting excerpt sentence data corresponding to a specified sentence type and displaying a list is disclosed. Further, Patent Document 1 discloses a technique of setting in advance a priority between patterns of specific sentences belonging to sentence types, and adjusting excerpt sentence data to be displayed based on the priorities.
しかし、特許文献1に記載された技術では、文の型に属する特定の文のパターン間において予め個別に設定された優先順位に基づいて検索結果の抜粋文データが表示される。
However, in the technique described in Patent Document 1, excerpt sentence data of a search result is displayed on the basis of a priority set individually in advance between patterns of specific sentences belonging to sentence types.
そのため、製造現場などで不具合が発生し、発生している現象に類似する過去に発生した現象の情報を含む既存の文書を検索する場合に、ユーザが必要とする情報が優先的に表示されないことがあった。このような場合においては、ユーザによる検索結果の確認に時間がかかり、不具合への緊急対応が困難なことがあった。
Therefore, when a problem occurs at a manufacturing site or the like, and when searching for an existing document including information of a phenomenon that has occurred in the past similar to the phenomenon that has occurred, information required by the user is not preferentially displayed. was there. In such a case, it takes time for the user to confirm the search results, and it may be difficult to urgently deal with problems.
本発明は、上述した課題を解決するためになされたものであり、発生している現象と類似する過去に発生した現象の情報を含む既存の文書を優先的に表示することができる文書検索装置を提供することを目的とする。
The present invention has been made to solve the above-described problem, and a document search apparatus capable of preferentially displaying an existing document including information of a phenomenon that has occurred in the past that is similar to a phenomenon that has occurred. Intended to provide.
上述した課題を解決するために、本発明に係る文書検索装置は、複数の文書が記憶されている文書データベースと、前記複数の文書のそれぞれを識別する第1の情報と、前記複数の文書のそれぞれに含まれる文を識別する第2の情報と、前記文の属性を表す分類クラスを示す第3の情報とが互いに関連付けて記憶されている分類結果データベースと、ある現象に関する文書を検索するための検索条件が入力される検索条件入力部と、前記検索条件に基づいて前記文書データベースに記憶された前記複数の文書に対して全文検索を実行して前記検索条件に一致する文書を抽出する抽出部と、前記抽出部により抽出された文書が複数あるときに、前記複数の抽出された文書を出力表示する順序を決定する表示順決定部と、を備え、前記分類クラスは、少なくとも前記現象を表す第1の分類クラスを含み、前記表示順決定部は、前記分類結果データベースを参照して、前記複数の抽出された文書のうち、前記第1の分類クラスに関連付けられた文を含む文書を優先的に出力表示する決定を行うことを特徴とする。
In order to solve the problems described above, a document search device according to the present invention includes a document database in which a plurality of documents are stored, first information identifying each of the plurality of documents, and a plurality of documents. A classification result database in which second information identifying a sentence included in each and third information indicating a classification class representing an attribute of the sentence are stored in association with each other, and a document regarding a certain phenomenon is searched. A search condition input unit into which a search condition is input, and a full-text search is performed on the plurality of documents stored in the document database based on the search condition to extract a document matching the search condition And a display order determination unit that determines an order in which the plurality of extracted documents are output and displayed when there are a plurality of documents extracted by the extraction unit; Includes at least a first classification class representing the phenomenon, and the display order determination unit is associated with the first classification class among the plurality of extracted documents with reference to the classification result database. It is characterized in that a decision is made to preferentially output and display a document including a sentence.
また、本発明に係る文書検索方法は、ある現象に関する文書を検索するための検索条件が入力される検索条件入力ステップと、前記検索条件に基づいて文書データベースに記憶されている複数の文書に対して全文検索を実行して前記検索条件に一致する文書を抽出する抽出ステップと、前記抽出ステップで抽出された文書が複数あるときに、前記複数の抽出された文書を出力表示する順序を決定する表示順決定ステップと、を備え、前記表示順決定ステップは、前記複数の文書のそれぞれを識別する第1の情報と、前記複数の文書のそれぞれに含まれる文を識別する第2の情報と、前記文の属性を表す分類クラスを示す第3の情報とが互いに関連付けて記憶されている分類結果データベースを参照して、前記複数の抽出された文書のうち、前記現象を表す第1の分類クラスに関連付けられた文を含む文書を優先的に出力表示する決定を行い、前記分類クラスは、少なくとも前記第1の分類クラスを含むことを特徴とする。
Further, in the document search method according to the present invention, a search condition input step in which a search condition for searching a document concerning a certain phenomenon is input, and a plurality of documents stored in the document database based on the search condition Determining an order of outputting and displaying the plurality of extracted documents when there are a plurality of documents extracted in the extraction step, and performing a full text search to extract documents matching the search condition A display order determination step, the display order determination step comprising: first information identifying each of the plurality of documents; and second information identifying a sentence included in each of the plurality of documents. Among the plurality of extracted documents, with reference to a classification result database in which third information indicating a classification class representing the attribute of the sentence is stored in association with each other. Performed preferentially decision to output display documents containing the first statement associated with classification class representing the serial phenomenon, the classification class is characterized in that it comprises at least the first classification class.
本発明によれば、全文検索によって抽出された文書のうち、現象を表す分類クラスに関連付けられた文を含む文書を優先的に出力表示するので、発生している現象と類似する過去に発生した現象の情報を含む文書を優先的に表示することができる。
According to the present invention, among the documents extracted by the full text search, the document including the sentence associated with the classification class representing the phenomenon is preferentially output and displayed, so it occurred in the past similar to the occurring phenomenon. It is possible to preferentially display documents containing information on phenomena.
以下、本発明の好適な実施の形態について、図1から図8を参照して詳細に説明する。各図について共通する構成要素には、同一の符号が付されている。なお、以下において「文書」および「文」は、テキストデータを意味する。また、「文」とは、句点あるいはピリオドによって区切られた文字列のテキストデータをいい、「文書」は、複数の「文」で構成される文章を含むテキストデータのファイルをいう。
Hereinafter, preferred embodiments of the present invention will be described in detail with reference to FIGS. 1 to 8. The components common to the respective drawings are denoted by the same reference numerals. In the following, "document" and "statement" mean text data. Also, "sentence" refers to text data of a character string separated by phrase points or periods, and "document" refers to a file of text data including sentences composed of a plurality of "sentences".
[第1の実施の形態]
図1に示すように、本実施の形態に係る文書検索装置1は、例えば、製造現場で発生した不具合など、発生した「現象」に関する既存の文書を検索する。現在発生している現象に類似する過去に発生した現象の情報を含む文書は、検索結果として優先的に出力表示される。そして、優先的に出力表示された検索結果は、ユーザに参照されて不具合への緊急対応に活用される。 First Embodiment
As shown in FIG. 1, thedocument search device 1 according to the present embodiment searches for an existing document related to a “phenomenon” that has occurred, such as a defect that has occurred at a manufacturing site, for example. Documents containing information on phenomena that occurred in the past that are similar to phenomena currently occurring are preferentially displayed as search results. Then, the search result output and displayed with priority is referred to by the user and utilized for emergency response to the failure.
図1に示すように、本実施の形態に係る文書検索装置1は、例えば、製造現場で発生した不具合など、発生した「現象」に関する既存の文書を検索する。現在発生している現象に類似する過去に発生した現象の情報を含む文書は、検索結果として優先的に出力表示される。そして、優先的に出力表示された検索結果は、ユーザに参照されて不具合への緊急対応に活用される。 First Embodiment
As shown in FIG. 1, the
文書検索装置1は、複数の原文の文書が記憶されている文書DB41、および検索対象の原文の文書を識別する情報(第1の情報)と、原文の文書に含まれる文を識別する情報(第2の情報)と、その文の属性を表す分類クラスを示す情報(第3の情報)とが互いに関連付けられている分類結果DB42を有する。文書検索装置1は、ユーザによって入力された検索条件に基づいて、文書DB41に記憶されている複数の原文の文書に対する全文検索を行い、検索条件に一致する複数の文書を抽出する。分類クラスは、少なくとも発生した現象を表す分類クラス(第1の分類クラス)を含む。文書検索装置1は、全文検索により抽出された複数の文書(複数の抽出された文書)のうち、分類結果DB44に記憶されている、現象を表す分類クラスに関連付けられた文を含む文書を、検索結果として優先的に出力表示する決定を行う。
The document search device 1 has a document DB 41 in which a plurality of original text documents are stored, information (first information) for identifying the original text document to be searched, and information for identifying sentences included in the original text documents (first information) There is a classification result DB 42 in which the second information) and the information (third information) indicating the classification class representing the attribute of the sentence are associated with each other. The document search device 1 performs full-text search on a plurality of original text documents stored in the document DB 41 based on the search condition input by the user, and extracts a plurality of documents matching the search condition. The classification class includes at least a classification class (first classification class) representing a phenomenon that has occurred. Among the plurality of documents (a plurality of extracted documents) extracted by the full text search, the document search device 1 stores a document including a sentence associated with a classification class representing a phenomenon, which is stored in the classification result DB 44, Make a decision to output and display preferentially as a search result.
[文書検索装置の機能ブロック]
図1に示すように、第1の実施の形態に係る文書検索装置1は、入出力部2と、検索部3と、記憶部4とを備える。 [Function block of document search device]
As shown in FIG. 1, thedocument search device 1 according to the first embodiment includes an input / output unit 2, a search unit 3, and a storage unit 4.
図1に示すように、第1の実施の形態に係る文書検索装置1は、入出力部2と、検索部3と、記憶部4とを備える。 [Function block of document search device]
As shown in FIG. 1, the
入出力部2は、検索条件入力部21と、表示部22とを備える。入出力部2は、文書検索装置1を利用するユーザからの入力を受け付け、検索結果を出力して表示する。入出力部2は、例えば、Webブラウザの形態が用いられるが、専用のアプリケーションを用いてもよい。また、入出力部2が、文書検索装置1に含まれる他の機能部とネットワーク的に離れていても、同一コンピュータ上にあってもよい。
The input / output unit 2 includes a search condition input unit 21 and a display unit 22. The input / output unit 2 receives an input from a user who uses the document search device 1, and outputs and displays a search result. The input / output unit 2 uses, for example, a form of a web browser, but a dedicated application may be used. Also, the input / output unit 2 may be separated from other functional units included in the document search device 1 in a network manner, or may be on the same computer.
検索条件入力部21は、製造現場での不具合など、発生した現象に関する既存の文書を検索するための検索条件の入力をユーザから受け付ける。検索条件入力部21には、例えば、現在発生している現象を表す単語列や問い合わせ文が入力される。
The search condition input unit 21 receives, from the user, an input of a search condition for searching an existing document related to a phenomenon that has occurred, such as a defect at a manufacturing site. The search condition input unit 21 receives, for example, a word string or a query sentence representing a phenomenon occurring at present.
表示部22は、後述する検索部3による検索結果を表示する。具体的には、表示部22は、発生した現象を表す文などの分類クラスに属する文を強調して、後述する抽出部31によって抽出された複数の文書を検索結果として表示する。
The display unit 22 displays the search result by the search unit 3 described later. Specifically, the display unit 22 highlights sentences belonging to a classification class such as a sentence representing a phenomenon that has occurred, and displays a plurality of documents extracted by the extraction unit 31 described later as a search result.
また、表示部22は、強調して表示する文について、その文が属する分類クラスが互いに識別可能となるように表示する。例えば、複数の分類クラスが採用されている場合において、検索結果の文書に、異なる分類クラスに属する複数の文が含まれているとする。このような場合に、表示部22は、同一の文書に含まれる異なる分類クラスに属する文を、例えば、原文の文書において互いに異なる色でハイライト表示する。なお、分類クラスについての詳細は後述する。
Further, the display unit 22 displays the sentence to be displayed in an emphasized manner so that the classification classes to which the sentence belongs can be distinguished from each other. For example, when a plurality of classification classes are adopted, it is assumed that the document of the search result includes a plurality of sentences belonging to different classification classes. In such a case, the display unit 22 highlights sentences belonging to different classification classes included in the same document, for example, in different colors in the original document. The details of the classification class will be described later.
検索部3は、抽出部31と、表示順決定部32とを備える。検索条件入力部21に入力されたユーザからの問い合わせなどの検索条件は検索部3に入力される。検索部3は後述する文書DB41に登録されている原文の文書に対して、検索条件に一致する文書の検索を行い、検索結果の文書を出力表示する順序を決定する。
The search unit 3 includes an extraction unit 31 and a display order determination unit 32. Search conditions such as a query from the user input to the search condition input unit 21 are input to the search unit 3. The search unit 3 searches a document matching the search condition with respect to the original text document registered in the document DB 41 described later, and determines the order of outputting and displaying the search result document.
抽出部31は、検索条件入力部21を介して入力された検索条件に基づいて、文書DB41に登録された複数の原文の文書に対する全文検索を実行して、検索条件に一致する文書を抽出する。より詳細には、抽出部31は、文書DB41に含まれる、複数の原文の文書のインデックスが登録されているインデックスDB411を参照して全文検索を行い、検索条件に一致する文書を抽出する。
The extraction unit 31 executes a full text search on a plurality of original text documents registered in the document DB 41 based on the search condition input through the search condition input unit 21 to extract a document matching the search condition. . More specifically, the extraction unit 31 performs full-text search with reference to the index DB 411 in which indexes of a plurality of original text documents included in the document DB 41 are registered, and extracts a document that matches the search condition.
表示順決定部32は、抽出部31により抽出された文書が複数あるときに、その複数の文書を出力表示する順序を決定する。より詳細には、表示順決定部32は、後述する分類結果DB42を参照して、抽出部31によって抽出された複数の文書のうち、発生した現象を表す分類クラス「現象」に関連付けられた文を含む文書を優先的に出力表示する決定を行う。
When there are a plurality of documents extracted by the extraction unit 31, the display order determination unit 32 determines the order in which the plurality of documents are output and displayed. More specifically, the display order determination unit 32 refers to a classification result DB 42 described later, and a plurality of documents extracted by the extraction unit 31 are sentences associated with the classification class “phenomena” representing the phenomenon that has occurred. Make a decision to preferentially output and display documents containing.
記憶部4は、文書DB41と、分類結果DB42とを備える。文書DB41は、原文DB410と、インデックスDB411とを有する。
文書DB41には、検索対象である複数の原文の文書(複数の文書)に関する情報が記憶されている。 Thestorage unit 4 includes a document DB 41 and a classification result DB 42. The document DB 41 includes an original text DB 410 and an index DB 411.
Thedocument DB 41 stores information on a plurality of original text documents (a plurality of documents) to be searched.
文書DB41には、検索対象である複数の原文の文書(複数の文書)に関する情報が記憶されている。 The
The
原文DB410には、予め用意されている複数の原文の文書、または原文の文書へのリンク情報が登録されている。原文DB410に登録されている複数の原文の文書は、表示部22が検索結果を表示する際に用いられる。より詳細には、表示部22は、検索部3による検索結果の文書の情報に基づいて、原文DB410から検索結果の文書に対応する原文の文書を読み出して、表示内容の加工および強調表示を行う。
In the original text DB 410, a plurality of original text documents prepared in advance or link information to the original text documents are registered. The plurality of original text documents registered in the original text DB 410 are used when the display unit 22 displays the search results. More specifically, the display unit 22 reads out the original text document corresponding to the search result document from the original text DB 410 based on the information of the document of the search result by the search unit 3 and performs display content processing and highlighting. .
インデックスDB411は、原文DB410に登録されている複数の原文の文書に対応するインデックスが登録されている。インデックスDB411は、抽出部31が全文検索を実行する際の検索処理の高速化を図るために設けられる。
In the index DB 411, indexes corresponding to a plurality of original text documents registered in the original text DB 410 are registered. The index DB 411 is provided to speed up the search process when the extraction unit 31 executes the full text search.
より詳細には、インデックスDB411には、原文の文書から生成された転置インデックスなどのインデックスが登録されている。転置インデックスは、例えば、文字列の情報と、文書における文字列の位置情報と、文書の識別情報などが互いに関連付けて登録されたテーブルの行列を転置させたデータ構造を有する。
More specifically, in the index DB 411, an index such as a transposed index generated from a text document is registered. The transposition index has, for example, a data structure in which a matrix of a table is registered in which information of character strings, position information of character strings in a document, identification information of a document, and the like are associated with one another and registered.
なお、インデックスの生成は、検索部3による検索処理に先立って行われ、原文の文書が原文DB410に登録される際に併せて行われる。また、インデックス生成時の文字列の抽出方法としては、例えば形態素解析が用いられる。なお、インデックスは、文書検索装置1の外部に設置された装置によって生成されてもよく、文書検索装置1内部の、例えば、制御部102が生成してもよい。
The generation of the index is performed prior to the search processing by the search unit 3 and is also performed when the original text document is registered in the original text DB 410. Further, as a method of extracting a character string at the time of index generation, for example, morphological analysis is used. The index may be generated by a device installed outside the document search device 1 or may be generated by, for example, the control unit 102 in the document search device 1.
例えば、日本語のように分かち書きされていない言語による原文の文書が原文DB410に含まれる場合には、形態素解析によって、原文の文書の文を分割する。また、形態素解析を行う前または後に、正規化と呼ばれる、全角と半角とが混在する文字列、および大文字と小文字とが混在する文字列の表記の統一、ならびに特殊な記号の削除などが行われることが望ましい。
For example, if the original text DB 410 includes an original text document in a language such as Japanese, which is not separated, the text of the original text is divided by morphological analysis. In addition, before or after performing morphological analysis, standardization of character strings called mixed normal-width and half-width, and mixed-case of upper-case and lower-case characters, and deletion of special symbols are performed. Is desirable.
分類結果DB42は、原文DB410に登録されている複数の原文の文書のそれぞれを識別する情報と、複数の原文の文書のそれぞれに含まれる文を識別する情報と、その文の属性を表す分類クラスを示す情報とが互いに関連付けて記憶されているデータベースである。より詳細には、分類クラスと1文(原文の文書に含まれる文)とが関連付けられた情報や、分類クラスと原文の文書の中での位置(例えば、開始位置と文字数)とが関連付けられた情報であってもよい。
The classification result DB 42 includes information identifying each of a plurality of original text documents registered in the original text DB 410, information identifying a sentence included in each of a plurality of original text documents, and a classification class representing the attribute of the sentence And the information which shows are mutually linked | related and memorize | stored. More specifically, information in which a classification class is associated with one sentence (a sentence included in the original text document), and the classification class and a position in the original text document (for example, the start position and the number of characters) are associated. The information may be
本実施の形態では、分類クラスとは、文の表す意味や内容など、文の属性によって定義される文の集合である。なお、本実施の形態では、分類クラス「現象」の他に、発生した現象の原因を表す分類クラス「原因」(第2の分類クラス)、および発生した現象に対する対処を表す分類クラス「対処」(第3の分類クラス)の3種類の分類クラスが採用される。例えば、分類クラス「現象」に属する文としては、「・・・エラーの発生。」が挙げられる。分類クラス「原因」に属する文としては、例えば、「・・・要因と考えられる。」、分類クラス「対処」に属する文としては、例えば、「・・・行い復旧させた。」などが挙げられる。
In the present embodiment, a classification class is a set of sentences defined by the attributes of sentences, such as the meaning and content of sentences. In the present embodiment, in addition to the classification class "phenomenon", the classification class "cause" (second classification class) indicating the cause of the occurred phenomenon and the classification class "action" indicating the countermeasure for the occurred phenomenon Three classification classes (the third classification class) are adopted. For example, as a sentence belonging to the classification class “phenomenon”, “... Occurrence of error” may be mentioned. As a sentence belonging to the classification class "cause", for example, "... considered as a factor", and as a sentence belonging to the classification class "action", for example, "... Be
[文書検索装置のハードウェア構成]
図2に示すように、文書検索装置1は、バス101を介して接続される制御部102、通信制御装置103、記憶装置104、入力装置105、および表示装置106を備えるコンピュータと、これらのハードウェア資源を制御するプログラムによって実現することができる。 [Hardware configuration of document search device]
As shown in FIG. 2, thedocument search device 1 includes a computer including a control unit 102 connected via a bus 101, a communication control device 103, a storage device 104, an input device 105, and a display device 106, and their hardware. It can be realized by a program that controls wear resources.
図2に示すように、文書検索装置1は、バス101を介して接続される制御部102、通信制御装置103、記憶装置104、入力装置105、および表示装置106を備えるコンピュータと、これらのハードウェア資源を制御するプログラムによって実現することができる。 [Hardware configuration of document search device]
As shown in FIG. 2, the
制御部102は、CPU102aと主記憶部102bとを備えている。主記憶部102bには、CPU102aが各種制御や演算を行うためのプログラムが予め格納されている。制御部102によって、図1で示した抽出部31、および表示順決定部32などの文書検索装置1の機能が実現される。
The control unit 102 includes a CPU 102 a and a main storage unit 102 b. Programs for the CPU 102a to perform various controls and operations are stored in advance in the main storage unit 102b. The control unit 102 implements the functions of the document search apparatus 1 such as the extraction unit 31 and the display order determination unit 32 illustrated in FIG. 1.
通信制御装置103は、文書検索装置1と各種機器を接続するための入出力インターフェースである。通信制御装置103は、文書検索装置1と各種外部電子機器との間をネットワーク接続する制御装置としての機能を備えていてもよい。例えば、外部に設置された装置によって実行された検索対象の文書の分類の結果を、通信制御装置103を介して受信して、分類結果DB42に格納してもよい。
The communication control device 103 is an input / output interface for connecting the document search device 1 and various devices. The communication control apparatus 103 may have a function as a control apparatus for connecting the document search apparatus 1 and various external electronic devices via a network. For example, the classification result of the document to be searched which is executed by an apparatus installed outside may be received via the communication control apparatus 103 and stored in the classification result DB 42.
記憶装置104は、読み書き可能な記憶媒体と、その記憶媒体に対してプログラムやデータなどの各種情報を読み書きするための駆動装置とで構成されている。記憶装置104には、記憶媒体としてフラッシュメモリなどの半導体メモリやハードディスクを使用することができる。記憶装置104は、文書DB41、分類結果DB42、プログラム格納部104a、図示しないその他の格納装置で、例えば、この記憶装置104内に格納されているプログラムやデータなどをバックアップするための格納装置などを有することができる。
The storage device 104 includes a readable and writable storage medium, and a drive device for reading and writing various information such as programs and data from and to the storage medium. For the storage device 104, a semiconductor memory such as a flash memory or a hard disk can be used as a storage medium. The storage device 104 is a document DB 41, a classification result DB 42, a program storage unit 104a, and other storage devices (not shown), for example, a storage device for backing up programs and data stored in the storage device 104. It can have.
プログラム格納部104aには、本実施の形態における検索処理などの文書検索に必要な処理を実行するための各種プログラムが格納されている。
The program storage unit 104a stores various programs for executing processing necessary for document search such as search processing in the present embodiment.
入力装置105は、キーボード、マウス、タッチパネルなどで実現され、ユーザからの入力や操作を受け付ける。入力装置105は、ユーザからの検索条件の入力を受け付ける。入力装置105は、図1で説明した、検索条件入力部21として機能する。
The input device 105 is realized by a keyboard, a mouse, a touch panel, and the like, and receives input and operation from the user. The input device 105 receives the input of the search condition from the user. The input device 105 functions as the search condition input unit 21 described with reference to FIG.
表示装置106は、液晶ディスプレイなどが使用される。表示装置106には、入力装置105による入力結果が表示されたり、検索結果の文書に関する情報が表示されたりするようになっている。表示装置106は、図1で説明した、表示部22として機能する。
As the display device 106, a liquid crystal display or the like is used. On the display device 106, an input result by the input device 105 is displayed, and information on a document of the search result is displayed. The display device 106 functions as the display unit 22 described in FIG.
[文書検索装置の動作]
上述した構成を有する文書検索装置1の動作の説明を、図3および図4を参照して説明する。以下においては、例えば、製造現場において不具合(例えば、「△△装置のエラー」)が発生し、ユーザによる検索条件の入力に基づき、文書検索装置1が、現在発生している現象である「△△装置のエラー」に類似する過去に発生した「△△装置のエラー」に関する既存の文書を検索する場合について説明する。 [Operation of document search device]
The operation of thedocument search apparatus 1 having the above-described configuration will be described with reference to FIGS. 3 and 4. In the following, for example, a defect (for example, “error of device Δ”) occurs at the manufacturing site, and the document search device 1 is a phenomenon that is currently occurring based on the input of the search condition by the user. A case will be described in which an existing document related to “ΔΔ device error” that has occurred in the past similar to “Δ device error” is retrieved.
上述した構成を有する文書検索装置1の動作の説明を、図3および図4を参照して説明する。以下においては、例えば、製造現場において不具合(例えば、「△△装置のエラー」)が発生し、ユーザによる検索条件の入力に基づき、文書検索装置1が、現在発生している現象である「△△装置のエラー」に類似する過去に発生した「△△装置のエラー」に関する既存の文書を検索する場合について説明する。 [Operation of document search device]
The operation of the
まず、検索条件入力部21は、ユーザによる検索条件の入力を受け付ける(ステップS1)。検索条件入力部21によって受け付けられたユーザの入力は、図4の表示例に示すように、表示部22の領域220に表示される。本実施の形態では、検索条件として、例えば、「△△装置」、「エラー」、および「発生」の文字列が受け付けられる。
First, the search condition input unit 21 receives an input of a search condition by the user (step S1). The user's input accepted by the search condition input unit 21 is displayed in the area 220 of the display unit 22 as shown in the display example of FIG. 4. In the present embodiment, for example, character strings of “ΔΔ device”, “error”, and “occurrence” are accepted as search conditions.
その後、図4に示すように、ユーザの操作によって、表示部22に表示された「検索」ボタンが押されると、検索部3に信号が入力される。抽出部31は、まず全文検索を実行し、文書DB41から検索条件に一致する文書を抽出する(ステップS2)。
Thereafter, as shown in FIG. 4, when the “search” button displayed on the display unit 22 is pressed by the operation of the user, a signal is input to the search unit 3. First, the extraction unit 31 executes a full text search to extract a document matching the search condition from the document DB 41 (step S2).
より詳細には、抽出部31は、インデックスDB411を参照して全文検索を実行する。抽出部31は、インデックスDB411に登録されている転置インデックスにおいて、検索条件である「△△装置」、「エラー」、および「発生」を含む複数の文書を抽出する。
More specifically, the extraction unit 31 executes a full-text search with reference to the index DB 411. The extraction unit 31 extracts a plurality of documents including the search condition “ΔΔ device”, “error”, and “occurrence” in the transposed index registered in the index DB 411.
また、抽出部31は、抽出した複数の文書それぞれの、検索条件との類似度を算出する。類似度の算出においては、抽出部31は、全文検索で一般に用いられる公知の手法を用いればよい。抽出部31により抽出された文書は、類似度と関連付けて一時的に記憶される。なお、抽出部31により抽出された文書は、検索条件と一致した文書であっても、ユーザが意図する内容とは異なる内容の文書も含まれている場合がある。
The extraction unit 31 also calculates the degree of similarity of each of the plurality of extracted documents with the search condition. In the calculation of the degree of similarity, the extraction unit 31 may use a known method generally used in full-text search. The document extracted by the extraction unit 31 is temporarily stored in association with the degree of similarity. The document extracted by the extraction unit 31 may include a document having a content different from the content intended by the user even if the document matches the search condition.
次に、表示順決定部32は、抽出部31により抽出された複数の文書を出力表示する順序を決定する(ステップS3)。より詳細には、表示順決定部32は、抽出された複数の文書のそれぞれと分類クラス「現象」との関係の度合いを表す指標値に基づいて複数の抽出された文書のうち発生した現象を表す分類クラス(第1の分類クラス)に属する文を含む文書を出力表示する順序を決定する。
Next, the display order determination unit 32 determines the order in which the plurality of documents extracted by the extraction unit 31 are output and displayed (step S3). More specifically, the display order determination unit 32 generates a phenomenon that occurs among a plurality of extracted documents based on an index value that indicates the degree of the relationship between each of the plurality of extracted documents and the classification class "phenomenon". The order in which the document including the sentence belonging to the classification class (the first classification class) to be displayed is displayed is determined.
具体的には、表示順決定部32は、分類結果DB42を参照して、抽出部31により抽出された複数の文書のうち、検索条件と一致し、かつ、これらの抽出された文書に含まれる分類クラス「現象」に分類されている文を含む文書を優先的に出力表示する決定を行う。
Specifically, referring to the classification result DB 42, the display order determination unit 32 matches the search condition among the plurality of documents extracted by the extraction unit 31, and is included in the extracted documents. A decision is made to preferentially output and display a document including a sentence classified into the classification class "phenomenon".
例えば、表示順決定部32は、抽出部31が抽出した文書ごとに算出された類似度に、所定の係数を乗じた表示順序指標値を計算する。所定の係数は、分類クラス「現象」に分類された検索結果が、他の分類クラスに分類された検索結果の表示順指標値の値よりもより高い表示順指標値が算出されるように設定する。抽出部31により抽出された文書i(i=1,2,・・・,n)の表示順序指標値は、次の式(1)により算出される。
For example, the display order determination unit 32 calculates a display order index value obtained by multiplying the similarity calculated for each document extracted by the extraction unit 31 by a predetermined coefficient. The predetermined coefficient is set such that a display order index value higher than the value of the display order index value of the search result classified into another classification class is calculated as the search result classified into the classification class "phenomenon" Do. The display order index value of the document i (i = 1, 2,..., N) extracted by the extraction unit 31 is calculated by the following equation (1).
上式(1)において、例えば、係数の値が0の場合は、分類クラス「現象」に分類された文と検索条件とが一致した場合のみ入出力部2に返却される。
In the above equation (1), for example, when the value of the coefficient is 0, it is returned to the input / output unit 2 only when the sentence classified into the classification class “phenomenon” matches the search condition.
次に、表示部22は、表示順決定部32によって出力表示する順序が決定された検索結果の文書において、表示内容の加工を行う(ステップS4)。例えば、表示部22は、検索結果として表示する複数の文書のそれぞれに含まれる、分類クラス「現象」、「原因」、および「対処」に属する文を強調して検索結果として表示する。
Next, the display unit 22 processes the display content in the search result document in which the display order determination unit 32 determines the order of output and display (step S4). For example, the display unit 22 highlights sentences belonging to the classification classes “phenomenon”, “cause”, and “action” included in each of a plurality of documents displayed as a search result, and displays the sentences as a search result.
より具体的には、表示部22は、図4に示すように、検索結果の文書に対応する原文の文書の一部に、HTMLのタグなど、表示上区別することが可能な加工を加える。具体的には、表示部22は、検索結果の文書に対応する原文の文書が表示される領域221において、分類クラス「現象」、「原因」、および「対処」に分類された文が表示されている領域222a、222b、222cを加工する。
More specifically, as shown in FIG. 4, the display unit 22 adds, to a part of the original text document corresponding to the search result document, processing such as HTML tags that can be distinguished on the display. Specifically, in the area 221 where the original text document corresponding to the document of the search result is displayed, the display unit 22 displays the sentences classified into the classification classes “phenomenon”, “cause” and “action”. The regions 222a, 222b and 222c are processed.
表示部22は、例えば、領域222a、222b、222cをHTMLのブロック要素としてグループ化するタグ(例えば、divタグ)で囲んだり、Cascading Style Sheets(CSS)などのスタイルシートを適用してもよい。
For example, the display unit 22 may surround the regions 222a, 222b, and 222c with tags (for example, div tags) that group the regions 222 as HTML block elements, or may apply a style sheet such as Cascading Style Sheets (CSS).
その後、表示部22は、表示内容が加工された検索結果の文書を表示する(ステップS5)。具体的には、表示部22は、ステップS3で決定された検索結果の文書の出力表示の順序に従って、対応する原文の文書を表示画面の上から順にリスト表示する。図4の表示例に示すように、表示画面において最上位に表示される文書「No.1」は、最も高い表示順指標値が算出された文書である。
Thereafter, the display unit 22 displays the document of the search result in which the display content is processed (step S5). Specifically, the display unit 22 lists and displays the corresponding original text documents from the top of the display screen in accordance with the output display order of the search result documents determined in step S3. As shown in the display example of FIG. 4, the document “No. 1” displayed at the top of the display screen is a document for which the highest display order index value has been calculated.
なお、表示部22は、表示内容が加工された検索結果の文書を表示する際に、各文書において、分類クラス「現象」、「原因」、および「対処」に属する文が互いに識別可能となるように、例えば、領域222a、222b、222cの文字色やハイライト表示の色を互いに変更してもよい。
In addition, when displaying the document of the search result in which the display content is processed, the display unit 22 can distinguish the sentences belonging to the classification class "phenomenon", "cause", and "action" in each document. Thus, for example, the character colors of the areas 222a, 222b, and 222c and the highlight display colors may be changed from each other.
以上説明したように、第1の実施の形態によれば、文書検索装置1は、全文検索によって抽出された複数の文書のうち、現象を表す分類クラスに属する文を含む文書を優先的に表示する。そのため、文書検索装置1は、現在発生している現象と類似する過去に発生した現象の情報を含む既存の文書を優先的に表示することができる。その結果として、ユーザは製造現場などで発生した不具合などに対し、より迅速な緊急対応を行うことができる。
As described above, according to the first embodiment, the document search device 1 preferentially displays, among a plurality of documents extracted by full-text search, a document including a sentence belonging to a classification class representing a phenomenon. Do. Therefore, the document search device 1 can preferentially display an existing document including information of a phenomenon that has occurred in the past, which is similar to a phenomenon that is currently occurring. As a result, the user can perform quicker emergency response to a problem or the like that has occurred at a manufacturing site or the like.
また、文書検索装置1は、検索結果の文書を表示する際に、文書に含まれている、分類クラスに属する文を強調して表示する。そのため、ユーザは、検索結果を表示画面において確認する際に、検索結果の文書が実際に現在発生している現象と類似している情報を含む既存の文書であるかどうかの確認をより容易に行うことができる。
Further, when displaying the document of the search result, the document search device 1 highlights and displays the sentence included in the document and belonging to the classification class. Therefore, when confirming the search result on the display screen, the user can more easily confirm whether the document of the search result is an existing document including information similar to the phenomenon that is actually occurring at present. It can be carried out.
また、文書検索装置1は、分類クラス「現象」、「原因」、および「対処」の3つの分類クラスを用いるため、現在発生している現象に関する既存の文書だけでなく、現在発生している現象の原因の究明や復旧など、ユーザにとってより有用な情報を含む文書を出力表示することができる。
In addition, the document search device 1 uses three classification classes of “classification phenomenon”, “cause” and “action”, and therefore, not only existing documents concerning the phenomena currently occurring but also the present It is possible to output and display a document containing information that is more useful to the user, such as investigation of the cause of the phenomenon and recovery.
また、文書検索装置1は、検索対象の文書について、文単位での分類クラスの情報が予め格納されている分類結果DB42を有するので、文書検索装置1における演算負荷をより低減することができ、文書検索装置1をより簡易な構成とすることが可能となる。
Further, since the document search device 1 has the classification result DB 42 in which information on classification classes in sentence units is stored in advance for the document to be searched, the calculation load in the document search device 1 can be further reduced. It is possible to make the document search device 1 a simpler configuration.
[第2の実施の形態]
次に、本発明の第2の実施の形態について説明する。なお、以下の説明では、上述した第1の実施の形態と同じ構成については同一の符号を付し、その説明を省略する。 Second Embodiment
Next, a second embodiment of the present invention will be described. In the following description, the same components as those in the first embodiment described above are denoted by the same reference numerals, and the description thereof is omitted.
次に、本発明の第2の実施の形態について説明する。なお、以下の説明では、上述した第1の実施の形態と同じ構成については同一の符号を付し、その説明を省略する。 Second Embodiment
Next, a second embodiment of the present invention will be described. In the following description, the same components as those in the first embodiment described above are denoted by the same reference numerals, and the description thereof is omitted.
第1の実施の形態では、分類結果DB42には、検索対象の文書に含まれる文ごとの分類クラスに関する情報が予め格納されている場合について説明した。これに対し、第2の実施の形態では、文書検索装置1aは、さらに複数の文書のそれぞれに含まれる文をそれぞれ文の属性を表す複数の分類クラスのいずれかに分類して、分類結果DB42に記憶する分類実行部5を備えている。また、文書検索装置1aは、分類モデル格納部43を備えている。
In the first embodiment, the case has been described in which the classification result DB 42 prestores information on the classification class for each sentence included in the document to be searched. On the other hand, in the second embodiment, the document search device 1a further classifies the sentences included in each of the plurality of documents into any of a plurality of classification classes representing the attributes of the sentences, and the classification result DB 42. And a classification execution unit 5 for storing the information. The document search device 1 a further includes a classification model storage unit 43.
文書検索装置1aは、図5に示すように、検索対象である複数の原文の文書のそれぞれについて文単位でクラス分類を行い分類結果を分類結果DB42に格納する。文書検索装置1aは、その後にユーザによって入力される検索条件に基づく検索を行う。
As shown in FIG. 5, the document search device 1a classifies each of a plurality of original texts to be searched in sentence units and stores the classification result in the classification result DB 42. The document search device 1a performs a search based on the search condition input by the user thereafter.
分類実行部5は、原文DB410に登録されている複数の原文の文書を、文単位で分類クラスに分類する。より詳細には、分類実行部5は、分類対象である原文DB410に登録されている原文の文書を、分類モデル格納部43に予め格納されている分類モデルに入力する。そして、分類実行部5は、各文書に含まれる文ごとに、予め設定されている、分類クラス「現象」、「原因」、および「対処」に分類して分類結果を出力する。
The classification execution unit 5 classifies a plurality of original text documents registered in the original text DB 410 into classification classes in sentence units. More specifically, the classification execution unit 5 inputs the original text document registered in the original text DB 410 to be classified into the classification model stored in advance in the classification model storage unit 43. Then, the classification execution unit 5 classifies the sentences contained in each document into classification classes “phenomenon”, “cause”, and “action”, which are set in advance, and outputs classification results.
例えば、分類実行部5は、分類を実行する際に、しきい値を設定して1文単位で、分類クラス「現象」、「原因」、および「対処」のいずれかの分類クラスに分類可能かを判断してもよい。なお、この場合において、分類実行部5は、どの分類クラスにも分類されない文を分類結果として出力する場合があってもよい。分類実行部5によって出力される分類結果は分類結果DB42に格納される。
For example, when performing classification, the classification execution unit 5 can set a threshold value to classify into one of the classification classes “phenomenon”, “cause”, and “action”, class by class. You may decide whether or not. In this case, the classification execution unit 5 may output a sentence not classified into any classification class as a classification result. The classification result output by the classification execution unit 5 is stored in the classification result DB 42.
分類結果DB42に格納される分類結果の情報は、原文の文書を識別する情報と、文を識別する情報と、その文が分類された分類クラスを示す情報とが関連付けられたデータである。なお、分類結果DB42に格納される分類結果は、分類クラスと1文(原文の文書に含まれる文)でもよく、また、分類クラスと原文の文書の中での位置(例えば、開始位置と文字数)を含む情報であってもよい。
The classification result information stored in the classification result DB 42 is data in which information for identifying a document in the original text, information for identifying a sentence, and information indicating a classification class in which the sentence is classified is associated. The classification result stored in the classification result DB 42 may be a classification class and one sentence (a sentence included in the original document), and the classification class and the position in the original document (for example, the start position and the number of characters) ) May be included.
分類モデル格納部43には、例えば、外部に設置された装置によって予め学習が行われ構築された分類モデルが格納されている。なお、分類モデルとは、自然言語処理において用いられる公知のアルゴリズムから選択された分類器を学習させて構築したモデルをいい、詳細は後述する。
The classification model storage unit 43 stores, for example, a classification model which is learned and constructed in advance by a device installed outside. The classification model is a model constructed by learning a classifier selected from known algorithms used in natural language processing, and the details will be described later.
次に、本実施の形態に係る文書検索装置1aの分類処理を図6のフローチャートを用いて説明する。
まず、分類実行部5は、検索部3による検索処理に先立って、文書DB41の原文DB410に登録されている原文の文書を読み出して、分類モデル格納部43に格納されている分類モデルに入力する(ステップS20)。 Next, the classification process of thedocument search device 1a according to the present embodiment will be described using the flowchart of FIG.
First, prior to the search processing by thesearch unit 3, the classification execution unit 5 reads out the original text document registered in the original text DB 410 of the document DB 41 and inputs it to the classification model stored in the classification model storage unit 43. (Step S20).
まず、分類実行部5は、検索部3による検索処理に先立って、文書DB41の原文DB410に登録されている原文の文書を読み出して、分類モデル格納部43に格納されている分類モデルに入力する(ステップS20)。 Next, the classification process of the
First, prior to the search processing by the
次に、分類実行部5は、複数の原文の文書のそれぞれについて、文ごとのクラス分類を実行する(ステップS21)。より詳細には、分類実行部5は、原文の文書に含まれる文のそれぞれを、予め定められている分類クラス「現象」、「原因」、および「対処」のいずれかに分類する。
Next, the classification execution unit 5 classifies sentences for each of a plurality of original text documents (step S21). More specifically, the classification execution unit 5 classifies each of the sentences included in the original text document into any one of predetermined classification classes “phenomenon”, “cause”, and “action”.
分類実行部5は、複数の原文の文書のそれぞれについてクラス分類を実行した後に、文書を識別する情報と、文を識別する情報と、その文が分類された分類クラスに関する情報とを互いに関連付けて分類結果DB42に格納する(ステップS22)。なお、各文を識別する情報は、原文の文や、原文の文書における各文の位置であってもよい。
After performing class classification for each of a plurality of original text documents, the classification execution unit 5 associates information for identifying the document, information for identifying the sentence, and information on the classification class into which the sentence is classified. It stores in the classification result DB 42 (step S22). The information identifying each sentence may be the original sentence or the position of each sentence in the original document.
分類実行部5による分類処理が完了すると、抽出部31は、第1の実施の形態と同様に、検索条件入力部21に入力される検索条件に基づいて、インデックスDB411を参照して全文検索を実行し、検索条件に一致する複数の文書を抽出する。
When the classification process by the classification execution unit 5 is completed, the extraction unit 31 refers to the full text search with reference to the index DB 411 based on the search condition input to the search condition input unit 21 as in the first embodiment. Execute and extract multiple documents that match the search condition.
そして、表示順決定部32は、抽出された複数の文書を出力表示する順序を決定する。なお、表示順決定部32は、分類実行部5による分類結果が格納されている分類結果DB42を用いて文書を出力表示する順序を決定する。
Then, the display order determination unit 32 determines the order in which the plurality of extracted documents are output and displayed. The display order determination unit 32 determines the order in which the document is output and displayed using the classification result DB 42 in which the classification result by the classification execution unit 5 is stored.
検索結果の文書を出力表示する順序が決定すると、表示部22は、表示内容の加工を行う。例えば、表示部22は、検索結果の文書に含まれる分類クラス「現象」、「原因」、および「対処」に分類されている文を、対応する原文の文書において強調して表示する。さらに、表示部22は、各分類クラスの文が互いに識別可能となるように強調して表示する。
When the order of outputting and displaying the search result document is determined, the display unit 22 processes the display content. For example, the display unit 22 highlights and displays, in the corresponding original text document, the sentences classified into the classification classes “phenomenon”, “cause” and “action” included in the document of the search result. Furthermore, the display unit 22 emphasizes and displays the sentences of each classification class so as to be distinguishable from each other.
以上説明したように、第2の実施の形態に係る文書検索装置1aによれば、分類実行部5が、分類モデル格納部43に予め格納されている分類モデルを用いて、原文の文書のクラス分類を実行する。これにより、文書検索装置1aは、新たな原文の文書が原文DB410に登録される際に、その原文の文書についてのクラス分類を実行することができる。そのため、文書検索装置1aにおける検索対象の原文の文書の更新に対応することが可能となる。
As described above, according to the document search device 1a according to the second embodiment, the classification execution unit 5 uses the classification model stored in advance in the classification model storage unit 43 to use the class of the original document Perform classification. Thus, when a new original text document is registered in the original text DB 410, the document search device 1a can perform class classification for the original text document. Therefore, it becomes possible to cope with the update of the original document of the search target in the document search device 1a.
[第3の実施の形態]
次に、本発明の第3の実施の形態について説明する。なお、以下の説明では、上述した第1および第2の実施の形態と同じ構成については同一の符号を付し、その説明を省略する。 Third Embodiment
Next, a third embodiment of the present invention will be described. In the following description, the same components as those in the first and second embodiments described above are designated by the same reference numerals, and the description thereof will be omitted.
次に、本発明の第3の実施の形態について説明する。なお、以下の説明では、上述した第1および第2の実施の形態と同じ構成については同一の符号を付し、その説明を省略する。 Third Embodiment
Next, a third embodiment of the present invention will be described. In the following description, the same components as those in the first and second embodiments described above are designated by the same reference numerals, and the description thereof will be omitted.
第2の実施の形態では、検索部3による検索処理に先立って、分類実行部5が、分類モデル格納部43に予め格納されている分類モデルを用い、文書DB41(原文DB410)に登録されている原文の文書を文単位で各分類クラスに分類する場合について説明した。これに対し、第3の実施の形態では、文書検索装置1bは、さらに学習部6を備える。学習部6は、予め定められた分類器の学習を行って、分類実行部5が分類処理を実行する際に用いる分類モデルを構築する。
In the second embodiment, prior to the search processing by the search unit 3, the classification execution unit 5 is registered in the document DB 41 (original text DB 410) using the classification model stored in advance in the classification model storage unit 43. The case where original text documents are classified into sentence classes in each classification class is described. On the other hand, in the third embodiment, the document search device 1 b further includes a learning unit 6. The learning unit 6 learns a predetermined classifier, and constructs a classification model used when the classification execution unit 5 executes the classification process.
図7に示すように、学習部6は、教師データ設定部61と、分類モデル学習部62とを備える。学習部6が用いる分類器としては、例えば、サポートベクターマシン(Support Vector Machine、以下、「SVM」という。)や、2層構造のニューラルネットである「word2vec」と畳み込みニューラルネットとを組み合わせたネットワークなど、自然言語処理での文書の分類で用いられる公知のアルゴリズムから選択すればよい。本実施の形態では、教師あり学習を用いる分類器を採用するが、分類モデルの構築においては、教師なし学習を用いる分類器を採用してもよい。
As shown in FIG. 7, the learning unit 6 includes a teacher data setting unit 61 and a classification model learning unit 62. The classifier used by the learning unit 6 is, for example, a network combining a support vector machine (Support Vector Machine, hereinafter referred to as "SVM") or "word2vec", which is a two-layer neural network, and a convolutional neural network. Etc., from known algorithms used in document classification in natural language processing. In the present embodiment, a classifier using supervised learning is adopted, but in constructing a classification model, a classifier using unsupervised learning may be adopted.
教師データ設定部61は、文とその文が属すべき分類クラスとを含む教師データを設定する。より詳細には、教師データ設定部61は、分類クラス「現象」を表す文、分類クラス「原因」を表す文、および分類クラス「対処」を表す文のように、ラベル付けされた教師データを準備する。
The teacher data setting unit 61 sets teacher data including a sentence and a classification class to which the sentence should belong. More specifically, the teacher data setting unit 61 labels the teacher data labeled like a sentence representing the classification class "phenomenon", a sentence representing the classification class "cause", and a sentence representing the classification class "action". prepare.
分類モデル学習部62は、教師データ設定部61により設定された教師データを分類器に入力し、例えば、SVMなどの分類器を学習させて分類モデルを構築する。より詳細には、分類モデル学習部62は、まず、テキストデータの文をベクトル表現に変換する。具体的には、分類モデル学習部62は、tf-idf法などのアルゴリズムを用いて、出現した単語ごとに重み付けした文ベクトルを利用してもよい。
The classification model learning unit 62 inputs the training data set by the training data setting unit 61 into the classifier, and learns a classifier such as an SVM, for example, to construct a classification model. More specifically, the classification model learning unit 62 first converts sentences of text data into vector representations. Specifically, the classification model learning unit 62 may use a sentence vector weighted for each appearing word using an algorithm such as the tf-idf method.
分類モデル学習部62は、この文ベクトルを、SVMなどの分類器で分類し、分類モデルを構築する。分類モデル学習部62によって構築された分類モデルは、分類モデル格納部43に格納される。
The classification model learning unit 62 classifies this sentence vector with a classifier such as SVM to construct a classification model. The classification model constructed by the classification model learning unit 62 is stored in the classification model storage unit 43.
次に、学習部6によって実行される分類モデル構築処理について、図8のフローチャートを用いて説明する。なお、本実施の形態において、分類モデル構築処理は、分類実行部5によって実行される分類処理に先立って行われる。
Next, classification model construction processing executed by the learning unit 6 will be described using the flowchart of FIG. In the present embodiment, the classification model construction process is performed prior to the classification process performed by the classification execution unit 5.
図8に示すように、教師データ設定部61によって設定された教師データが、SVMなどの分類器に入力される(ステップS30)。次に、分類モデル学習部62は、入力された教師データに基づいて分類器の学習を行い、分類モデルを構築する(ステップS31)。分類モデル学習部62によって構築された分類モデルは、分類モデル格納部43に格納される。
As shown in FIG. 8, the teacher data set by the teacher data setting unit 61 is input to a classifier such as an SVM (step S30). Next, the classification model learning unit 62 learns the classifier based on the input teacher data, and constructs a classification model (step S31). The classification model constructed by the classification model learning unit 62 is stored in the classification model storage unit 43.
分類モデルが構築された後は、第2の実施の形態と同様に、分類実行部5による分類処理が実行され、学習部6により構築された分類モデルを用いて文書が分類される。さらに、検索部3による検索処理が実行されて、抽出された複数の文書の出力表示の順序が決定される。そして、表示部22は、検索結果の文書に対応する原文の文書において、表示内容の加工を行い、各分類クラスに属する文が互いに識別可能となるように強調して表示する。
After the classification model is constructed, classification processing by the classification execution unit 5 is executed as in the second embodiment, and the document is classified using the classification model constructed by the learning unit 6. Furthermore, search processing by the search unit 3 is executed, and the order of output display of the plurality of extracted documents is determined. Then, the display unit 22 processes the display content in the original text document corresponding to the search result document, emphasizing and displaying the sentences belonging to each classification class so that they can be distinguished from each other.
以上説明したように、第3の実施の形態に係る文書検索装置1bによれば、学習部6が所定の分類器を学習させて分類モデルを構築する。これにより、文書検索装置1bは、必要に応じ、分類モデルの更新や、分類クラスの再設定などをローカルで行うことが可能となる。
As described above, according to the document search device 1b according to the third embodiment, the learning unit 6 learns a predetermined classifier to construct a classification model. As a result, the document search device 1b can locally update the classification model, reset the classification class, and the like as needed.
以上、本発明の文書検索装置、および文書検索方法における実施の形態について説明したが、本発明は説明した実施の形態に限定されるものではなく、請求項に記載した発明の範囲において当業者が想定し得る各種の変形を行うことが可能である。
Although the embodiments of the document search apparatus and document search method of the present invention have been described above, the present invention is not limited to the described embodiments, and a person skilled in the art would be within the scope of the invention described in the claims. Various possible modifications can be made.
例えば、説明した実施の形態では、発生した「現象」、現象が発生した「原因」、および発生した現象に対する「対処」の3つの分類クラスが予め設定されている場合について説明した。しかし、分類クラスはこれら3つの場合に限られず、分類クラス「現象」を単独で用いてもよく、さらに異なる分類クラスを追加し、組み合わせて用いてもよい。
For example, in the embodiment described above, the case has been described in which three classification classes of "occurring phenomenon", "cause" where the phenomenon occurred, and "action" for the occurring phenomenon are set in advance. However, the classification class is not limited to these three cases, and the classification class “phenomenon” may be used alone, and further different classification classes may be added and used in combination.
1…文書検索装置、3…検索部、4…記憶部、21…検索条件入力部、31…抽出部、32…表示順決定部、41…文書DB、42…分類結果DB、102…制御部、102a…CPU、102b…主記憶部、103…通信制御装置、104…記憶装置、104a…プログラム格納部、105…入力装置。
DESCRIPTION OF SYMBOLS 1 ... Document search apparatus, 3 ... Search part, 4 ... Storage part, 21 ... Search condition input part, 31 ... Extraction part, 32 ... Display order determination part, 41 ... Document DB, 42 ... Classification result DB, 102 ... Control part 102a: CPU, 102b: main storage unit, 103: communication control device, 104: storage device, 104a: program storage unit, 105: input device.
Claims (9)
- 複数の文書が記憶されている文書データベースと、
前記複数の文書のそれぞれを識別する第1の情報と、前記複数の文書のそれぞれに含まれる文を識別する第2の情報と、前記文の属性を表す分類クラスを示す第3の情報とが互いに関連付けて記憶されている分類結果データベースと、
ある現象に関する文書を検索するための検索条件が入力される検索条件入力部と、
前記検索条件に基づいて前記文書データベースに記憶された前記複数の文書に対して全文検索を実行して前記検索条件に一致する文書を抽出する抽出部と、
前記抽出部により抽出された文書が複数あるときに、前記複数の抽出された文書を出力表示する順序を決定する表示順決定部と、
を備え、
前記分類クラスは、少なくとも前記現象を表す第1の分類クラスを含み、
前記表示順決定部は、前記分類結果データベースを参照して、前記複数の抽出された文書のうち、前記第1の分類クラスに関連付けられた文を含む文書を優先的に出力表示する決定を行うことを特徴とする文書検索装置。 A document database in which a plurality of documents are stored;
First information identifying each of the plurality of documents, second information identifying a sentence included in each of the plurality of documents, and third information indicating a classification class representing an attribute of the sentence Classification result database stored in association with each other,
A search condition input unit into which a search condition for searching a document related to a certain phenomenon is input;
An extraction unit which executes full-text search on the plurality of documents stored in the document database based on the search condition to extract a document matching the search condition;
A display order determination unit that determines the order in which the plurality of extracted documents are output and displayed when there are a plurality of documents extracted by the extraction unit;
Equipped with
The classification class includes at least a first classification class representing the phenomenon;
The display order determination unit refers to the classification result database and determines to preferentially output and display a document including a sentence associated with the first classification class among the plurality of extracted documents. A document search apparatus characterized in that. - 請求項1に記載の文書検索装置において、
さらに前記複数の文書のそれぞれに含まれる文をそれぞれ文の属性を表す複数の分類クラスのいずれかに分類して、前記分類結果データベースに記憶する分類実行部を備えることを特徴とする文書検索装置。 In the document search device according to claim 1,
A document search apparatus further comprising a classification execution unit which classifies a sentence included in each of the plurality of documents into any one of a plurality of classification classes representing attributes of the sentence and stores the classification in the classification result database. . - 請求項2に記載の文書検索装置において、
さらに分類器を学習させて前記分類クラスを定義する分類モデルを構築する学習部を備え、
前記分類実行部は、前記学習部により構築された前記分類モデルを用いて文書を分類する
ことを特徴とする文書検索装置。 In the document search device according to claim 2,
And a learning unit configured to learn a classifier to construct a classification model that defines the classification class.
A document search apparatus, wherein the classification execution unit classifies a document using the classification model constructed by the learning unit. - 請求項3に記載の文書検索装置において、
前記学習部は、文とその文が属すべき分類クラスとを含む教師データに基づいて前記分類器を学習させて前記分類モデルを構築することを特徴とする文書検索装置。 In the document search device according to claim 3,
The document search apparatus, wherein the learning unit learns the classifier based on teacher data including a sentence and a classification class to which the sentence belongs, to construct the classification model. - 請求項1から4のうちのいずれか1項に記載の文書検索装置において、
前記分類クラスは、さらに前記現象が発生した原因を表す第2の分類クラスと、前記現象への対処を表す第3の分類クラスとを含むことを特徴とする文書検索装置。 The document search device according to any one of claims 1 to 4.
The document search apparatus further includes a second classification class representing a cause of occurrence of the phenomenon, and a third classification class representing a countermeasure against the phenomenon. - 請求項1から5のうちのいずれか1項に記載の文書検索装置において、
さらに前記分類クラスに属する文を強調して前記複数の抽出された文書を表示する表示部を備えることを特徴とする文書検索装置。 In the document search device according to any one of claims 1 to 5,
Furthermore, the document search device is characterized by further comprising a display unit for displaying the plurality of extracted documents by emphasizing sentences belonging to the classification class. - 請求項6に記載の文書検索装置において、
前記表示部は、強調して表示された前記文について、その文が属する前記分類クラスが互いに識別可能となるように表示することを特徴とする文書検索装置。 In the document search device according to claim 6,
10. The document search apparatus according to claim 1, wherein the display unit displays the sentences displayed in an emphasized manner such that the classification classes to which the sentences belong can be distinguished from one another. - 請求項1から7のうちいずれか1項に記載の文書検索装置において、
前記表示順決定部は、前記複数の文書のそれぞれと前記現象との関係の度合いを表す指標値に基づいて前記複数の抽出された文書のうち前記第1の分類クラスに属する文を含む文書を出力表示する前記順序を決定することを特徴とする文書検索装置。 In the document search device according to any one of claims 1 to 7,
The display order determination unit determines a document including a sentence belonging to the first classification class among the plurality of extracted documents based on an index value indicating the degree of the relationship between each of the plurality of documents and the phenomenon. A document search apparatus characterized in that the order of displaying the output is determined. - ある現象に関する文書を検索するための検索条件が入力される検索条件入力ステップと、
前記検索条件に基づいて文書データベースに記憶されている複数の文書に対して全文検索を実行して前記検索条件に一致する文書を抽出する抽出ステップと、
前記抽出ステップで抽出された文書が複数あるときに、前記複数の抽出された文書を出力表示する順序を決定する表示順決定ステップと、
を備え、
前記表示順決定ステップは、前記複数の文書のそれぞれを識別する第1の情報と、前記複数の文書のそれぞれに含まれる文を識別する第2の情報と、前記文の属性を表す分類クラスを示す第3の情報とが互いに関連付けて記憶されている分類結果データベースを参照して、前記複数の抽出された文書のうち、前記現象を表す第1の分類クラスに関連付けられた文を含む文書を優先的に出力表示する決定を行い、
前記分類クラスは、少なくとも前記第1の分類クラスを含む
ことを特徴とする文書検索方法。 A search condition input step in which a search condition for searching a document related to a certain phenomenon is input;
An extraction step of performing a full text search on a plurality of documents stored in a document database based on the search condition to extract a document matching the search condition;
A display order determination step of determining an order in which the plurality of extracted documents are output and displayed when there are a plurality of documents extracted in the extraction step;
Equipped with
The display order determination step includes: first information identifying each of the plurality of documents; second information identifying a sentence included in each of the plurality of documents; and a classification class representing an attribute of the sentence Among the plurality of extracted documents, a document including a sentence associated with the first classification class representing the phenomenon is referred to with reference to the classification result database in which the third information to be shown is stored in association with each other. Make a decision to display preferentially.
The document classification method includes at least the first classification class.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2017213502A JP7053219B2 (en) | 2017-11-06 | 2017-11-06 | Document retrieval device and method |
JP2017-213502 | 2017-11-06 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019087593A1 true WO2019087593A1 (en) | 2019-05-09 |
Family
ID=66331610
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2018/034358 WO2019087593A1 (en) | 2017-11-06 | 2018-09-18 | Document retrieval device and method |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP7053219B2 (en) |
WO (1) | WO2019087593A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2021022070A (en) * | 2019-07-25 | 2021-02-18 | 東京電力ホールディングス株式会社 | Method for processing information, information processor, and program |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08255172A (en) * | 1995-03-16 | 1996-10-01 | Toshiba Corp | Document retrieval system |
JP2012208775A (en) * | 2011-03-30 | 2012-10-25 | Casio Comput Co Ltd | Retrieval method, retrieval device and computer program |
JP2012208774A (en) * | 2011-03-30 | 2012-10-25 | Casio Comput Co Ltd | Retrieval method, retrieval apparatus and computer program |
-
2017
- 2017-11-06 JP JP2017213502A patent/JP7053219B2/en active Active
-
2018
- 2018-09-18 WO PCT/JP2018/034358 patent/WO2019087593A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08255172A (en) * | 1995-03-16 | 1996-10-01 | Toshiba Corp | Document retrieval system |
JP2012208775A (en) * | 2011-03-30 | 2012-10-25 | Casio Comput Co Ltd | Retrieval method, retrieval device and computer program |
JP2012208774A (en) * | 2011-03-30 | 2012-10-25 | Casio Comput Co Ltd | Retrieval method, retrieval apparatus and computer program |
Also Published As
Publication number | Publication date |
---|---|
JP7053219B2 (en) | 2022-04-12 |
JP2019086934A (en) | 2019-06-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Akimushkin et al. | Text authorship identified using the dynamics of word co-occurrence networks | |
US20120136862A1 (en) | System and method for presenting comparisons of electronic documents | |
US20180246915A1 (en) | Automatically converting spreadsheet tables to relational tables | |
US20060277173A1 (en) | Extraction of information from documents | |
JP7281905B2 (en) | Document evaluation device, document evaluation method and program | |
KR20060041845A (en) | Method and apparatus for visually emphasizing numerical data contained within an electronic document | |
US20110144978A1 (en) | System and method for advancement of vocabulary skills and for identifying subject matter of a document | |
JP2004139304A (en) | Hyper text inspection device, its method, and program | |
WO2022262266A1 (en) | Text abstract generation method and apparatus, and computer device and storage medium | |
US20240296691A1 (en) | Image reading systems, methods and storage medium for performing geometric extraction | |
JP4787955B2 (en) | Method, system, and program for extracting keywords from target document | |
Hoffswell et al. | Interactive repair of tables extracted from pdf documents on mobile devices | |
WO2019087593A1 (en) | Document retrieval device and method | |
US20160350410A1 (en) | Context-dependent evidence detection | |
JP2010026923A (en) | Method, device and program for document classification, and computer-readable recording medium | |
CN111966836A (en) | Knowledge graph vector representation method and device, computer equipment and storage medium | |
KR102553061B1 (en) | Homepage layout providing system using artificial intelligence and the operating method thereof | |
JP3743204B2 (en) | Data analysis support method and apparatus | |
JP6994138B2 (en) | Information management device and file management method | |
US20210318949A1 (en) | Method for checking file data, computer device and readable storage medium | |
Madan et al. | Parsing and summarizing infographics with synthetically trained icon detection | |
JP7135730B2 (en) | Summary generation method and summary generation program | |
JP4835791B2 (en) | GUI evaluation system, GUI evaluation method, and GUI evaluation program | |
JP2008171164A (en) | Classification support apparatus and method, and program | |
JP4162035B2 (en) | Hypertext inspection apparatus and method, and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18874018 Country of ref document: EP Kind code of ref document: A1 |