JP5439028B2 - Information search apparatus, information search method, and program - Google Patents

Information search apparatus, information search method, and program Download PDF

Info

Publication number
JP5439028B2
JP5439028B2 JP2009116025A JP2009116025A JP5439028B2 JP 5439028 B2 JP5439028 B2 JP 5439028B2 JP 2009116025 A JP2009116025 A JP 2009116025A JP 2009116025 A JP2009116025 A JP 2009116025A JP 5439028 B2 JP5439028 B2 JP 5439028B2
Authority
JP
Japan
Prior art keywords
match
sentence
information
search
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2009116025A
Other languages
Japanese (ja)
Other versions
JP2010266970A (en
Inventor
達彦 岡田
健典 亘
敬司 溝渕
貞治 高井
隆光 石岡
世紀 井上
Original Assignee
株式会社エヌ・ティ・ティ・データ
合同会社シンタックス
株式会社Nttデータ・スマートソーシング
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社エヌ・ティ・ティ・データ, 合同会社シンタックス, 株式会社Nttデータ・スマートソーシング filed Critical 株式会社エヌ・ティ・ティ・データ
Priority to JP2009116025A priority Critical patent/JP5439028B2/en
Publication of JP2010266970A publication Critical patent/JP2010266970A/en
Application granted granted Critical
Publication of JP5439028B2 publication Critical patent/JP5439028B2/en
Application status is Active legal-status Critical
Anticipated expiration legal-status Critical

Links

Images

Description

The present invention relates to an information retrieval apparatus for searching corresponding to the result of analyzing the input text data, information retrieval method, a contact and a program.

For example, a user's opinion by electronic mail or an electronic document in a company is collected, converted into text data, and stored in a database or the like. Then, when responding to an inquiry or complaint from a user by an operator, there is a search method in which a corresponding input method is searched from a database using a sentence input by the operator as a search key and transmitted to a terminal used by the operator.
In such a response scene by an operator, it is necessary to narrow down the search results suitable for the search key in order to keep the customer from waiting. However, in such a search in the corresponding scene, a sentence including the meaning of the sentence and the user's intention, such as an inquiry or complaint from the user, is used as a search key. For this reason, it is necessary to perform a search that emphasizes the meaning of the sentence and the user's intention, rather than a general search using words, keywords, and the like as search keys.
For example, when creating dictionary data to be searched by a search device, the syntax structure and meaning of the sentence to be searched are analyzed, the part of speech and the relationship between the words of the analyzed sentence are extracted, and the extracted information is Create tree-structured dictionary data as matching conditions. Then, the search text input as a search key is analyzed, an analysis result that matches the matching condition is searched from the dictionary data, and the obtained search result is displayed on the user terminal (for example, patent document) 1).

JP 2003-58537 A

However, a search using dictionary data as in Patent Document 1 has a problem that it is necessary to create dictionary data according to the purpose and conditions of matching.
For example, to create dictionary data according to matching conditions suitable for “inquiry about usage” using two sentences “mobile phone is difficult to connect” and “cell phone cannot be connected” The correct dictionary data. For example, the above two sentences have implications for inquiries regarding how to use a mobile phone. For this reason, the above-mentioned two sentences are sentences associated as matching conditions here. Therefore, when creating dictionary data for “inquiry about usage”, it is necessary to create dictionary data under matching conditions in which the above two sentences are associated with each other as a search target.

  On the other hand, when creating dictionary data based on matching conditions suitable for “responding to opinions about companies”, the sentence “cell phone is difficult to connect” has the meaning of a request for improvement by the user, and “cell phone is not connected” "Has the meaning of a complaint from the user. For this reason, when searching for improvement requests from users, it is preferable that only the former, not the latter, be obtained as a search result. Therefore, in such a case, it is necessary to create dictionary data based on matching conditions suitable for “responding to opinions about companies”.

In other words, as described above, when the purpose of the search is different, it is necessary to create dictionary data based on matching conditions according to the purpose, and thus there is a problem that labor for creating dictionary data increases. .
In addition, since a huge amount of dictionary data must be stored in accordance with matching conditions, there is a problem that efficient use of the storage area cannot be achieved.

The present invention has been made in view of such circumstances, and its object is an information retrieval device capable search based on different matching conditions by using one dictionary data, information retrieval method, the contact and program It is to provide.

  In order to solve the above-described problems, an information search apparatus according to the present invention includes an input unit to which a search key sentence composed of a plurality of words is input, and analyzes the search key sentence to form the search key sentence. An analysis unit that obtains an analysis result related to the word, and a dictionary configured in a tree structure with a clause constituted by at least one of the words as a subtree node, and a match dictionary related to a sentence constituted by at least one of the clauses The match dictionary storage unit that stores rule information representing information about the clause included in the sentence as information, and the relationship between the match dictionary information stored in the match dictionary storage unit and the search key sentence is collated In order to evaluate the degree of matching with the search key sentence with respect to a word satisfying the matching condition A match profile storage unit that stores match profile information having an evaluation criterion, and based on the match profile information, the search key sentence and the match dictionary information are collated according to the associated matching condition, and the result of the collation A search processing unit that calculates a score representing a degree of matching between the search key sentence and the match dictionary information according to the evaluation criterion associated with the match profile information for the sentence that satisfies the matching condition; It is characterized by providing.

  Further, the information search device represents whether the evaluation criterion gives a score corresponding to the degree of matching for a word satisfying the matching condition, and the search processing unit is configured to perform the evaluation according to the evaluation criterion. The score given to the word satisfying the matching condition is calculated for each sentence satisfying the matching condition to obtain the score.

  In the information search apparatus, the match profile storage unit may search the match profile information associated with at least one matching condition among the plurality of matching conditions having different characteristics according to the purpose of the search. It is characterized by comprising a plurality.

  In the information search apparatus, the match profile storage unit is associated with at least one of word element matching, attribute matching, and dependency matching as the matching condition.

Further, the word the information retrieval apparatus, the input unit, which inputs the configured search subject sentence from the plurality of words, the analysis unit is analyzing the search subject sentence, constituting said search subject sentence analysis results obtained regarding the analysis based on the results, corresponding to the rule information including attribute information indicating character information, and the attribute of the word related to the character string of the word, a phrase composed of at least one of said word attached to, a dictionary information configured in a tree structure as a sub-tree nodes, the dictionary creation unit for creating and storing the matching dictionary storage unit said matching dictionary information about configured sentence by at least one of the clauses, It is further provided with the feature.

  In order to solve the above-described problem, in the information search method of the present invention, the input unit accepts input of a search key sentence composed of a plurality of words, and the analysis unit analyzes the search key sentence, An analysis result regarding the word constituting the search key sentence is obtained, and the search processing unit is associated with a matching condition for checking the relationship between the match dictionary information and the search key sentence. The match profile information is read from a match profile storage unit that stores match profile information having an evaluation criterion for evaluating the degree of matching with the search key sentence for a word that satisfies, and is configured by at least one of the words A dictionary in which clauses are constructed in a tree structure with subtree nodes, and is a matrix related to a sentence composed of at least one of the clauses. As the dictionary information, using the match dictionary information of the match dictionary storage unit that stores rule information representing information about the clause included in the sentence, based on the match profile information, the associated matching condition The search key sentence is matched with the match dictionary information, and the search key sentence is determined according to the evaluation criterion associated with the match profile information for the sentence that satisfies the matching condition as a result of the matching. And a score representing the degree of matching between the match dictionary information and the match dictionary information.

  According to another aspect of the present invention, there is provided an input unit for inputting a search key sentence composed of a plurality of words, an analysis for analyzing the search key sentence and obtaining an analysis result relating to the words constituting the search key sentence. Means for associating a matching condition for collating the relationship between the match dictionary information and the search key sentence, and for evaluating a degree of matching with the search key sentence for a word satisfying the matching condition A dictionary in which the match profile information is read out from a match profile storage unit that stores match profile information having the evaluation criteria, and a clause composed of at least one word is configured as a sub-tree node in a tree structure, The phrase included in the sentence as match dictionary information related to the sentence constituted by the two phrases Using the match dictionary information of the match dictionary storage unit that stores rule information representing information related thereto, based on the match profile information, the search key sentence according to the matching condition associated with the match dictionary information, As a result of the collation, the degree of collation between the search key sentence and the match dictionary information is represented according to the evaluation criteria associated with the match profile information for the sentence that satisfies the matching condition. It is a program for functioning as a search processing means for calculating a score.

Further, the input means inputs the retrieval sentence composed of a plurality of words, said analysis means, said search by analyzing the sentence to obtain the analysis result regarding the words constituting said search subject sentence , the computer, on the basis of further result of the analysis, the association and configured clause the rule information including attribute information indicating character information, and the attribute of the word related to the character string of the word, by at least one of said word Te, a dictionary information configured in a tree structure as a sub-tree nodes, to function as a dictionary creation means for storing and generating the match dictionary information about configured sentence by at least one of the clauses in the matching dictionary storage unit It is a program for.

  According to this invention, it is possible to realize a search based on different matching conditions using a single dictionary data.

It is a block diagram which shows an example of the information search system which concerns on this Embodiment. It is a block diagram which shows an example of the client terminal device which concerns on this Embodiment. It is a block diagram which shows an example of the WEB server which concerns on this Embodiment. It is a block diagram which shows an example of the Japanese language analysis server concerning this Embodiment. It is the schematic which shows an example of the match profile memorize | stored in the match profile memory | storage part which concerns on this Embodiment. It is the schematic which shows an example of the match dictionary data memorize | stored in the match dictionary memory | storage part which concerns on this Embodiment. It is a block diagram which shows an example of the Japanese language analysis server concerning this Embodiment. It is the schematic which shows an example of the structure tree produced by the syntax analysis part which concerns on this Embodiment. It is the schematic for demonstrating word element matching. It is the schematic for demonstrating pending matching. It is the schematic for demonstrating attribute matching. It is a flowchart which shows an example of the creation method of the match dictionary data in the information search system which concerns on this Embodiment. It is a flowchart which shows an example of the search method in the information search system which concerns on this Embodiment. It is a flowchart explaining in detail about an example of the matching process and scoring process in the information search system which concerns on this Embodiment. It is the schematic explaining an example of the search result data which concern on this Embodiment. It is a reference figure for demonstrating the search result which concerns on this Embodiment. It is a flowchart which shows an example of the search start process in the information search system which concerns on this Embodiment. It is a flowchart which shows an example of the display method of the search result in the information search system which concerns on this Embodiment. It is the schematic which shows an example of the image showing the search result displayed on the display part of the client terminal device which concerns on this Embodiment. FIG. 20 is a schematic diagram illustrating an example of a screen displayed after performing a narrow search from the search result illustrated in FIG. 19. It is a reference figure which shows an example of a search key sentence. It is a reference figure which shows an example of the matched single sentence. FIG. 10 is a reference diagram for explaining an example of setting a match profile. It is the schematic which shows an example of the screen on which the search result obtained based on the match profile A is displayed. It is the schematic which shows an example of the screen on which the search result obtained based on the match profile B is displayed. It is the schematic which shows an example of the screen on which the search result obtained based on the match profile C is displayed. FIG. 10 is a reference diagram for explaining another example of setting a match profile. It is the schematic which shows an example of the search result obtained by specific match mode. It is the schematic which shows the other example of the search result obtained by specific match mode. It is the schematic which shows the other example of the search result obtained by specific match mode. It is the schematic which shows the other example of the search result obtained by specific match mode.

Hereinafter, an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing an example of an information search system according to this embodiment.
As shown in FIG. 1, the information search system 1 includes a client terminal device 100, a WEB server 300, a Japanese language analysis server 500, and a database file server 700.

  The client terminal device 100 is, for example, an information calculation processing device such as a personal computer, and an input unit to which a search key sentence is input and a search key sentence input from a user via the input unit via a network. It transmits to the WEB server 300. Details will be described later with reference to FIG.

  The WEB server 300 performs communication between the client terminal device 100 and the Japanese language analysis server 500, for example, transmits a search key sentence received from the client terminal device 100 to the Japanese language analysis server 500 and receives from the Japanese language analysis server 500. The retrieved result is transmitted to the client terminal device 100. Details will be described later with reference to FIG.

  The Japanese analysis server 500 includes a search processing unit 501, a match profile storage unit 502, a match dictionary storage unit 503, a dictionary creation unit 504, a document analysis unit 505, and a memory area 506. When receiving the search key sentence from the WEB server 300, the Japanese analysis server 500 performs a search based on the search key sentence, and transmits the search result to the client terminal device 100 via the WEB server 300. Details will be described later with reference to FIGS.

The database file server 700 is a storage unit that stores information to be searched by the Japanese analysis server 500 (hereinafter referred to as search target information), and includes, for example, information on response to inquiries, information on a repair manual, or electronic A data source 701 that is a storage device that stores, as text data, information related to companies, products handled, and services such as opinions and complaints acquired from users and the like by mail is provided.
The database file server 700 can use, for example, a storage device used as a data warehouse in a company.

Next, the client terminal device 100 will be described in detail with reference to FIG. FIG. 2 is a block diagram illustrating an example of the client terminal device 100 according to the present embodiment.
As illustrated in FIG. 2, the client terminal device 100 includes a browser (display control unit) 101, a display unit 102, an input unit 103, and a communication unit 104.
The display unit 102 is a liquid crystal display device, for example, and displays display data such as an operation screen and a search result screen.
The input unit 103 is an input interface including a keyboard and a mouse, for example, and accepts an operation instruction and a search key sentence input from a user.
When the type of search service is specified by the user via the input unit 102, the communication unit 104 analyzes a request control signal for requesting execution of the search by the specified search service via the WEB server 300. Send to server 500. In addition, the communication unit 104 transmits the search key text input from the user via the input unit 102 to the WEB server 300 via the network.

The browser 101 receives, for example, a program (for example, Javascript (registered trademark)) for causing the display unit 102 to display a web page received from the WEB server 300, and executes the program. Part. The browser 101 functions as this program, generates display data displayed by the display unit 102, and outputs the display data to the display unit 102.
The browser 101 includes a storage unit 111, a data processing unit 112, and a display processing unit 113. Each configuration will be described below.

  The storage unit 111 stores a program and predetermined set values that are processed by the data processing unit 112 and the display processing unit 113. In addition, the storage unit 111 stores search results obtained by the Japanese language analysis server 500 (for example, a matched sentence, a matched single sentence, a matched word, a matching condition used for the matching, or a match including match position information) Information, etc.) and search rule information when performing a refinement search (for example, a program or setting value for retrieving a refinement target from a search result using a matched word specified via the input unit 103 as a search key) Etc.).

The data processing unit 112 operates as a program on the browser 101 received from the WEB server 300, converts display data received from the WEB server 300 into display data for display on the screen of the display unit 102, and performs display processing. The display unit 102 is controlled to display display data on the display unit 102. In addition, the data processing unit 112 creates result display data in which tag information representing emphasis is added to words that satisfy the matching condition based on the search result stored in the storage unit 111.
The display processing unit 113 is controlled by the data processing 112 and causes the display unit 102 to display the display data converted by the data processing unit 112.

Next, the WEB server 300 will be described in detail with reference to FIG. FIG. 3 is a block diagram showing an example of the WEB server 300 according to the present embodiment.
As illustrated in FIG. 3, the WEB server 300 includes a communication unit 301, a request processing unit 302, a data conversion unit 303, and a storage unit 304.
The communication unit 301 communicates with the client terminal device 100 and the Japanese language analysis server 500 via a network, for example.
The request processing unit 302 performs data conversion so as to create web page data of display data displayed by the display unit 102 of the client terminal device 100 based on the request control signal received from the client terminal device 100 via the communication unit 301. The unit 303 is controlled. Further, the request processing unit 302 receives a request control signal from the client terminal device 100, creates a code file executed by the client terminal device 100 and setting data related to display of display data, and transmits the setting data to the client terminal device 100. To do.

The data conversion unit 303 is controlled by the request processing unit 302 and creates web page data to be transmitted to the client terminal device 100 based on the search result received from the Japanese analysis server 500.
The storage unit 304 temporarily stores setting data used by the request processing unit 302 and the data conversion unit 303 and search results obtained by the search of the Japanese analysis server 500.

  The WEB server 300 may be, for example, a WEB server that is connected to an in-house LAN (Local area network) and provides information suitable for handling inquiries from users made by a call center operator. It may be a WEB server for providing data for sharing. As described above, there may be a plurality of WEB servers 300 depending on the purpose of the search of the information search system 1.

Next, the Japanese language analysis server 500 will be described in detail with reference to FIG. FIG. 4 is a block diagram showing an example of the Japanese language analysis server 500 according to the present embodiment.
As shown in FIG. 4, the search processing unit 501 has a function of executing a program for providing different search services α and β to the client terminal device 100, and executes a search by the search service α designated by the user. When the control signal is received from the WEB server 300, the search service α program is started, and the match profile associated with the search service α is read from the match profile storage unit 502. In addition, the search processing unit 501 reads match dictionary data stored in the match dictionary storage unit 503. Further, the search processing unit 501 expands the read match profile and match dictionary data in the memory area 506 to create a dictionary object.

When the search processing unit 501 receives a search key sentence from the client terminal device 100 via the WEB server 300, the search processing unit 501 outputs the received search key sentence to the document analysis unit 505, and writes a search result object ( (Empty state) is generated in the memory area 506. As a result, a memory area for recording the search result can be secured.
Further, the search processing unit 501 analyzes the search key sentence analyzed in accordance with the match mode predetermined in the match profile read from the match profile storage unit 502 and the match dictionary data developed in the dictionary object in the memory area 506. Are matched (hereinafter referred to as matching), and a sentence or the like satisfying the match mode is searched (hereinafter referred to as matching processing).

In addition, the search processing unit 501 calculates a score that evaluates the degree of matching with a search key sentence in a sentence obtained by matching (hereinafter referred to as a matched sentence) in accordance with a score mode predetermined in the match profile. (Hereinafter referred to as scoring process). Although details will be described later, for example, the search processing unit 501 calculates the score of a word included in a single sentence obtained by matching (hereinafter referred to as a matched single sentence), and calculates the sum of the scores of the words. To calculate the score of the matched sentence and the score of the matched single sentence.
Further, the search processing unit 501 associates a sentence ID or the like of a sentence obtained by matching with a score of the sentence or the like and stores it as a search result in a search result object in the memory area 506. Also, the search processing unit 501 transmits the search result of the search result object to the client terminal device 100 via the WEB server 300.

  The memory area 506 is a storage area for temporarily storing information. For example, a dictionary object or a search result object created by the search processing unit 501 is created.

The match profile storage unit 502 stores, for example, match profiles A, B... Corresponding to each search service α, β. Here, the match profile includes match mode information that defines a predetermined match mode and score mode information that defines a score calculation method applied to a result extracted in the match mode. For example, in the match mode definition of match profile A, search service α and match profile A are associated in advance. Therefore, when a search service is designated by the user, a match profile determined in advance according to the search service, and a match mode and score mode determined in advance in the match profile are determined.
The present invention is not limited to this, and the client terminal device 100 analyzes Japanese information via the WEB server 300 through the WEB server 300 together with the request control signal and information indicating the combination of the search service type, match profile type, and match mode type. When transmitting to the server 500, the combination of these may be determined by the user.

Here, the match profile will be described in detail with reference to FIG. FIG. 5 is a schematic diagram illustrating an example of a match profile stored in the match profile storage unit 502 according to the present embodiment.
As shown in FIG. 5, the match profile A includes a match mode definition PA1 as match mode information, a relative appearance frequency flag PA2, a relative appearance frequency emphasis coefficient PA3, a sentence appearance position PA4 as search mode information, and a search. It includes a key appearance position PA5, a predicate attribute match coefficient PA6, a dependency match coefficient PA7, a part of speech category PA8, a conjunction evaluation PA9, and a synonym match coefficient PA10.

  The match mode definition PA1 is information relating to a combination of match modes determined in advance according to the search services α, β,. Here, the match mode represents a technique for performing matching between the search key sentence and the match dictionary data. For example, as described later with reference to FIGS. There is a match and it is defined as the combination.

The score mode information is information that is referred to when calculating the score of a matched sentence or a single sentence, including information on a score condition and a weighting method for a result obtained according to the match mode.
The score mode information includes a relative appearance frequency flag PA2, a relative appearance frequency importance coefficient PA3, a sentence appearance position PA4, a search key appearance position PA5, a predicate attribute match coefficient PA6, a dependency match coefficient PA7, a part of speech category PA8, a conjunction evaluation PA9, And a score mode that can be used in any match mode, including the synonym match coefficient PA10. Here, the match mode information and the score mode information can be arbitrarily combined regardless of the matching mode predetermined in the match mode definition PA1.

  Further, the score mode information is information indicating whether or not to calculate a score to be given to a matched word, and when information representing the calculation is set, a coefficient given in each The score is determined as a set value. In other words, the search processing unit 501 uses the matching mode information to calculate the degree of matching between the matched sentence or single sentence and the search key sentence using the score mode information for the matched sentence or single sentence. A score for evaluation can be calculated. Here, the score indicates the degree of matching between the matched sentence or single sentence and the search key sentence. For example, in the relationship between the matched sentence or the like and the search key sentence, there is a sentence structure or a relation relationship. This is a score for evaluating how similar the meaning of a sentence is due to the match or the predicate attribute being matched.

The relative appearance frequency flag PA2 indicates whether to use weighting based on the relative appearance frequency (tf × idf). When the flag is on, this weighting is performed, and when the flag is off, it is not performed.
Here, the relative appearance frequency (tf × idf) is a relative value that is generally used as a key for keyword (important word) extraction, and is multiplied by the following coefficient (tf, idf).
Note that tf (term frequency) is the relative frequency of occurrence of a specific word in a sentence, and idf (inversed document frequency) is the reciprocal of the number of sentences containing the specific word. In other words, the more common words that are included in any sentence, the lower the relative appearance frequency. Therefore, a certain sentence is characterized by a word contained in it and having a high relative appearance frequency (tf × idf).

  The relative appearance frequency importance coefficient PA3 is a weighting coefficient for the score of the tf value, which is performed in a state where the relative appearance frequency flag PA2 is on.

  The sentence appearance position PA4 is information indicating whether or not weighting is performed according to the sentence appearance position in a sentence or a single sentence. For example, in a state where the sentence appearance position PA4 represents weighting according to the appearance position, this indicates that a gradient coefficient is given from a word close to the head of the sentence stored in the match dictionary storage unit 503. ing. Here, a coefficient is set such that a word appearing at a position near the beginning is weighted more and a word that appears closer to the end of the sentence is lighter. This coefficient can be set arbitrarily.

  The search key appearance position PA5 is information indicating whether or not weighting is performed according to the sentence appearance position in the search key sentence. For example, the search key appearance position PA5 indicates that weighting according to the appearance position is performed, and a word that matches the search key sentence is positioned closer to the head depending on the position at which the search key sentence appears. This indicates that a gradient coefficient is given from a character string such as an appearing word. Here, for words that appear near the beginning of the search key sentence, a weight is set to be heavy, and a coefficient is set so that the weight that appears closer to the end of the search key sentence is lighter. . This coefficient can be set arbitrarily.

  The predicate attribute match coefficient PA6 is information representing the score coefficient of the node when the predicate attribute matches. For example, it is specified how much weight is added to a matched word or a related word when the predicate attribute of the matched word and the corresponding word in the search key statement further matches. Yes. If there is simply no attribute such as a noun, weighting addition is not performed, and an attribute such as negation is given, and weighting addition in the case of matching can arbitrarily set a score count as a multiplier. If there is no attribute such as a simple noun phrase, weighting is not added. In addition of weights when attributes are assigned and they match, score counting as a multiplier can be arbitrarily set.

  The pending match coefficient PA7 is information representing a score coefficient when matching is performed in the pending units. For example, it is defined whether or not weighting is performed on words in which matched words are related to each other (hereinafter referred to as “related pairs”). The score count can be arbitrarily set as a multiplier.

  The part-of-speech category PA8 is information representing weighting for each part-of-speech category. For example, a slope coefficient is given to each part-of-speech according to the order of priority such as user word> proprietary noun> general noun> adjective / adjective tone> verb. Indicates whether or not to give. Note that an inclined coefficient can be arbitrarily set for each part of speech.

  The conjunction evaluation PA9 is information indicating whether or not the entire matched sentence is emphasized (or neglected) when a specific conjunction is included in the sentence matched with the search key sentence. For example, the conjunction evaluation PA9 is a state that emphasizes the whole sentence when there is a specific conjunction, and includes a coefficient to be multiplied to each word of the matched sentence including the specific conjunction, or a specific conjunction. Coefficients to be multiplied only for the matched words among the matched sentences are defined.

  The synonym match coefficient PA10 is information indicating how many times the score of the character string that has been subjected to character string replacement by the synonym processing unit of the document analysis unit 505 described later is to be increased. The synonym match coefficient PA10 is information used to drop the score ranking of matching with synonyms and synonyms compared to when the words themselves match.

The match dictionary storage unit 503 stores match dictionary data. The match dictionary data includes, for example, a symbol map MD1 indicating a correspondence relationship with word information replaced with a symbol ID, and sentence information MD2 indicating a relation with each sentence. And single sentence information MD3 included in the sentence.
Here, the match dictionary data will be described in detail with reference to FIG. FIG. 6 is a schematic diagram showing an example of match dictionary data stored in the match dictionary storage unit 503 according to the present embodiment.

  The symbol map MD1 is information that associates text data of word information identified by a symbol ID with a symbol ID for identifying the word information. As a result, the match dictionary storage unit 503 can store the word information in the sentence information MD2 and the single sentence information MD3 by replacing them with the symbol IDs associated with the symbol map MD1 without storing them as text data.

  The sentence information MD2 is registration information necessary for registering an analyzed sentence structure tree (details will be described later) in the match dictionary storage unit 503. The sentence information is identified by the sentence ID 21 and the sentence ID 21. Sentence text data 22, sentence addition information 23 including information such as the date and time when the sentence was stored in the database file server 700 as search target information and the storage location in the data source 701, and the term map 24. Including. Here, the term map 24 is information indicating the number of appearances of the word information included in the sentence, and the number of appearances for each word information is associated with the symbol ID.

On the other hand, the single sentence information MD3 includes a rule (rule information) 32 representing information of a subtree node in a structure tree (see FIG. 8 for details) for each clause included in the single sentence, and identifies each single sentence. A simple sentence ID 31 is assigned.
The rule 32 included in the single sentence information MD3 includes, for example, word information 321, predicate attribute 322, parent rule ID 323, weight value 324, conjunction type 325, category 326, child node presence flag 327, and the like.

The word information 321 includes, for example, a symbol ID and position information indicating the position of word information in a single sentence. The word information 321 includes the number of word information 1, word information 2,..., Word information n corresponding to the number of words included in the single sentence. For example, the word information 321 includes the symbol ID of the word information and the word information in the single sentence. Includes position information (start position and end position).
The predicate information 322 includes, for example, a word ID, an attribute of a phrase such as a verb adjective, and an attribute symbol ID representing the meaning of the phrase (negative, negative tendency, desire, affirmation, etc.).

The parent rule ID 323 is information representing a clause of a parent subtree node that has a parent-child relationship.
The weight value 324 is, for example, a coefficient that gives a weight according to the subject or predicate in the sentence. Further, the weight value 324 is a coefficient or the like that defines a reference score when the rule (node) matches in scoring described later. Normally, the entire dictionary is set to an arbitrary value, but weights corresponding to the subject and predicate in the sentence can be given when the dictionary is created.
The conjunction type 325 is information representing a conjunction when the clause (phrase) corresponding to the rule 32 is a conjunction such as “So, that is, that is,...
The category 326 is information representing the type of part of speech such as a verb, noun, adverb, conjunction,.
The child node presence flag 327 is information indicating the presence / absence of a clause of a child subtree node having a parent-child relationship, and when the flag is on, the subtree node is a parent subtree node. It is information indicating that there is.

Next, the search processing unit 501, the dictionary creation unit 504, and the document analysis unit 505 will be described in detail with reference to FIG. FIG. 7 is a block diagram showing an example of the Japanese language analysis server 500 according to the present embodiment.
The dictionary creation unit 504 reads the text to be searched (search target information) from the database file server 700, converts the text contained in the search target information into, for example, units of single sentences by dividing each phrase, and the document analysis unit Output to 505. For example, if the text data of sentence A included in the search target information is "I could not receive it when I sent an image in my PC. And I received it when I sent an image in my phone." Divided into two single sentences separated by a punctuation mark “.”, The simple sentence A1 “I could not receive when I sent the image in the PC.” And the simple sentence A2 “Also, I could receive it when I sent the image in my phone. Is output to the document analysis unit 505.
When the dictionary analysis unit 504 receives the result analyzed by the document analysis unit 505 from the document analysis unit 505, the dictionary creation unit 504 stores the result in the match dictionary storage unit 503.
Note that the dictionary creation unit 504 only needs to divide the search target information read from the database file server 700 into an appropriate length. For example, the dictionary creation unit 504 is determined to be one sentence by a period, a single bullet, a space, or a line feed. The sentence may be separated by a single sentence.

When the match creation data is created by the dictionary creation unit 504, the document analysis unit 505 receives, for example, search target information divided into single sentence units from the dictionary creation unit 504, performs document analysis, and obtains an analysis result. The data is output to the dictionary creation unit 504.
When the document analysis unit 505 receives a request control signal for executing a search by a specific search service from the client terminal device 100, the document analysis unit 505 performs document analysis on the search key sentence received from the search processing unit 501, The analysis result is output to the search processing unit 501. Hereinafter, the document analysis unit 505 will be described in detail.

The document analysis unit 505 includes a dictionary unit 550 and an analysis unit 551.
The dictionary unit 550 includes a system dictionary 5501, a user dictionary 5502, and a synonym dictionary 5503. On the other hand, the analysis unit 551 includes a morphological analysis unit 5511, a syntax analysis unit 5512, and a synonym processing unit 5513.

The system dictionary 5501 is, for example, dictionary data in which a minimum unit word having meaning as a morpheme is associated with the meaning, part of speech, attribute information, and the like of the word.
The user dictionary 5502 is dictionary data added to the system dictionary 5501 by, for example, an administrator who uses the Japanese language analysis server 500.
The synonym dictionary is dictionary data in which a word and its synonym are associated with each other in order to replace a plurality of synonyms and synonyms. For example, word information of match dictionary data in the match dictionary storage unit 503 and its synonyms are It is associated.

The morpheme analysis unit 5511 receives, for example, the search target information divided for each single sentence by the dictionary creation unit 504, and decomposes the text to be searched into a plurality of morphemes (word information). For example, when the sentence A is input, the morpheme analysis unit 5511 converts the single sentence A1 included in the sentence A into a plurality of morphemes (“PC”, “inside”, “in”, “present”, “image”, “in”, and “send”). "I did", "Place", "Receive", "I couldn't").
In this way, the morpheme analysis unit 5511 can decompose the search target information into morphemes when creating match dictionary data, but is not limited to this, and when performing a search by inputting a search key sentence. The search processing unit 501 receives a search key sentence divided for each single sentence and decomposes it into morphemes.

  Further, the morpheme analysis unit 5511 searches the part of speech of the decomposed morpheme with reference to the system dictionary 5501 and the user dictionary 5502, and based on the obtained part of speech information, the morpheme analysis unit 5511 responds to the dependency relationship in the sentence and the meaning of the morpheme. Create a clause. For example, based on the morpheme decomposed from the single sentence A1, the syntax analysis unit 5512, the phrase A101 “in the PC”, the phrase A102 “is”, the phrase A103 “image”, and the phrase A104 “sent” , Phrase A105 “Could not be received” is created. Here, the phrase is a unit of a character string including at least one word. A single sentence is a unit of a character string including one sentence including at least one clause, and is delimited by, for example, a phrase. Furthermore, a sentence is a unit of a sentence including a plurality of simple sentences. A sentence includes both a simple sentence and a sentence.

Further, the morpheme analysis unit 5511 refers to the dictionary data stored in the system dictionary 5501 and the user dictionary 5502, respectively, the category of part of speech (for example, verb, noun, adverb, conjunction, etc.) Search by attribute (eg, negation, negative tendency, desire, affirmation, etc.) that represents the meaning of the phrase (eg, therefore, so on, etc.), verbs, adjectives, etc. Information is given to morphemes and phrases.
For example, the morpheme analysis unit 5511 analyzes the phrase A105 “cannot be received”, obtains as an analysis result that the part of speech is “noun (sa modification connection)” and the meaning of the phrase is “negative”. The analysis result is given to the phrase A105.

  Based on the information analyzed by the morphological analysis unit 5511, the syntax analysis unit 5512 evaluates the part of speech and meaning of the clauses constituting the sentence, attribute information, the position in the sentence, the arrangement, and the like. The relationship is analyzed, and the analysis result is output to the synonym processing unit 5513.

Furthermore, the syntax analysis unit 5512 gives a sentence ID for identifying each sentence when the search target information is analyzed, and registration information necessary for registering (storing) the sentence in the match dictionary storage unit 503. Is generated. Further, the syntax analysis unit 5512 creates a structural tree as shown in FIG. 8 based on the relationship between phrases using the analysis results of word information and phrases, and represents the rules for each subtree node. Information is generated and output to the synonym processing unit 5513.
The rule is information associated with each subtree node constituting the structural tree shown in FIG. 8, and as shown in FIG. 6, word information 321, predicate attribute 322, parent rule ID 323, A weight value 324, a conjunction type 325, a category 326, a child node presence flag 327, and the like are included.

Here, a structural tree created by the syntax analysis unit 5512 will be described with reference to FIG. FIG. 8 is a schematic diagram illustrating an example of a structural tree created by the syntax analysis unit 5512.
As shown in FIG. 8, the rule corresponding to the subtree node is created for each phrase segmented by the morphological analysis unit 5511. In addition, the structure tree constituted by the subtree nodes is created by the relationship based on the context of the sentence.

  The synonym processing unit 5513 refers to the synonym dictionary 5503 and searches the decomposed morphemes and clauses for whether there is a synonym to be unified. If there is a corresponding synonym, the synonym processor 5513 performs a search from the synonym dictionary 5503. Replace with the obtained synonym.

Here, when the dictionary creation unit 504 reads the search target information from the database file server 700 and outputs the search target information divided into simple sentences to the document analysis unit 505, the document analysis unit 505 reads the document as described above. The analysis is performed, and the analysis result is output to the dictionary creation unit 504.
The dictionary creation unit 504 receives the analysis result from the document analysis unit 505, and information as a rule of a clause constituting a single sentence, for example, word information, symbol ID, predicate attribute of each word word information, between rules (subtree nodes) Information necessary for registration in the match dictionary storage unit 503 such as a parent rule ID, a child node presence flag, a weight value, a conjunction type, a category, and the like representing the connection of

In addition, the dictionary creation unit 504 confirms the symbol map MD1 read from the match dictionary storage unit 503, and replaces the word information used uniformly in the match dictionary storage unit 503 with the symbol ID. Further, the dictionary creation unit 504 creates sentence information MD2 including a sentence ID, sentence text, sentence additional information, a term map, and the like, and word information MD3 including a word ID 31 and a rule 32, and data for registering a match dictionary. Create Further, the dictionary creation unit 504 adds the match dictionary registration data to the match dictionary data in the match dictionary storage unit 503.
Note that the dictionary creation unit 504, when the word information having no symbol ID corresponding to the symbol map MD1 exists in the analysis result received from the document analysis unit 505, a new symbol ID is assigned to the word information. And the correspondence between the word information and the new symbol ID is added to the symbol map MD1.

  Next, the match mode executed by the search processing unit 501 will be described in detail with reference to FIGS. FIG. 9 is a schematic diagram for explaining word element matching. FIG. 10 is a schematic diagram for explaining the pending matching. FIG. 11 is a schematic diagram for explaining attribute matching.

  As shown in FIGS. 9A to 9C, there are three types of word element matching: product set type, full set type, and subset type. In word element matching, the search processing unit 501 compares the match dictionary data rule 32 stored in the match dictionary storage unit 503 with a character string (for example, a phrase) corresponding to the rule 32 in the search key sentence. . Note that the search processing unit 501 executes any one of a product set type, a full set type, and a subset type in the match mode definition PA1 in which the word element matching is a match profile.

  Here, the product set type means at least a part of a rule character string (a clause corresponding to a subtree node) of the match dictionary data in the match dictionary storage unit 503 and a character string (subtree node) included in the search key sentence. When at least a part of the phrase corresponding to the phrase matches, the matching character string is obtained as a word that satisfies the condition of the match mode. If there is no matching word even in part, the result is that a single sentence that satisfies the match mode condition cannot be obtained.

For example, as shown in FIG. 9A, the phrase “watching soccer” corresponding to the rule of the match dictionary data in the match dictionary storage unit 503 and the phrases “soccer boy”, “watching tour” and “ When the word “Soccer” is matched, the phrase “Soccer boy” included in the search key sentence matches the part of the phrase “Soccer watching” corresponding to the rule in the word “soccer” that is part of it. Yes. In this case, the phrase “watching soccer” corresponding to the rule satisfies the condition of match mode.
In addition, the phrase “watching tour” included in the search key sentence has the same word “watching game” as the part of the phrase “watching soccer game” corresponding to the rule, and satisfies the condition of match mode. .
As described above, when the condition of the match mode is satisfied, the phrase “watching soccer” corresponding to the rule is obtained as a result of the matching process. In addition, the search processing unit 501 detects a word that satisfies the condition of the match mode as a matched word.

  On the other hand, the phrase “battle” included in the search key sentence is a unit of words included in the phrase “watching soccer game” corresponding to the rule and does not satisfy the match mode condition because there is no matching part. For this reason, as a result of the matching process, a simple sentence that satisfies the condition of the match mode cannot be obtained. Note that the word “war” is a character included in a part of the word “watching” in the phrase “watching soccer”, but when compared word by word, “war” and “watching” are different character strings (word Therefore, it is determined that they do not match.

Next, all set types will be described with reference to FIG.
The entire set type includes all of the character string (phrase corresponding to the subtree node) of the match dictionary data in the match dictionary storage unit 503 and the character string (phrase corresponding to the subtree node) included in the search key sentence. When they match, the matched character string is a match mode obtained as a word (matched word) that satisfies the match mode condition. In this case, unlike the intersection set described above, if there is a phrase that matches even at least (even one word), and if all the character strings in the phrase do not match, a simple sentence that satisfies the match mode condition is The result is that it was not obtained.
For example, as shown in FIG. 9B, the phrases “watching soccer” corresponding to the rule of the match dictionary data in the match dictionary storage unit 503 are compared with the phrases “watching soccer” and “watching” included in the search key sentence. In this case, the phrase “watching soccer” matches both the clause corresponding to the rule and the search key sentence and satisfies the match mode. (Matched phrase) is obtained.
On the other hand, the phrase “watching” of the search key sentence matches the part of the phrase “watching soccer” corresponding to the rule, but all the character strings do not match. As a result of the processing, a simple sentence satisfying the match mode condition cannot be obtained.

Next, the subset type will be described with reference to FIG.
The subset type is a part of a character string (a clause corresponding to a subtree node) in which a character string (a clause corresponding to the subtree node) of the rule of the match dictionary data in the match dictionary storage unit 503 is included in the search key sentence. Is a match mode in which the matched character string is obtained as a word that satisfies the condition of the match mode. In this case, unlike the product set type described above, if at least all of the clauses corresponding to the rule are included, the match mode condition is satisfied, and a matching word or clause is obtained as a matched word or a matched clause. On the other hand, if the clause included in the search key sentence does not include all the clauses corresponding to the rule, the result is that word information satisfying the match mode condition cannot be obtained.

For example, as shown in FIG. 9C, the phrases “watching soccer” corresponding to the rules of the match dictionary data in the match dictionary storage unit 503 and the phrases “watching soccer watching tour” and “watching tour” included in the search key sentence are included. When matched, the phrase “Soccer watching” corresponding to the rule is part of the phrase “Soccer watching tour” included in the search key sentence, and all the words constituting the phrase are included in the search key sentence. Since it matches as a word included in the phrase, the condition of the match mode is satisfied. Thus, when the condition of the match mode is satisfied, the phrase “watching soccer” (matched phrase) corresponding to the rule is obtained as a result of the matching process.
On the other hand, the phrase “watching tour” in the search key sentence matches some words “watching” in the phrase “watching soccer game” corresponding to the rule, but all the clauses corresponding to the rule are part of the search key sentence. Since they do not match, the match mode condition is not satisfied, and as a result of the matching process, a simple sentence that satisfies the match mode condition is not obtained.

Next, an example of pending matching will be described with reference to FIGS.
As shown in FIGS. 10A and 10B, there are two types of dependency matching: a node parent-child relationship type that extracts a relationship having a dependency relationship and a node single type that does not perform dependency evaluation. In the dependency matching, the search processing unit 501 includes the dependency relationship of the character string (the clause corresponding to the subtree node) of the rule 32 of the match dictionary data stored in the match dictionary storage unit 503 and the search key sentence. The character string (the clause corresponding to the subtree node) is checked against the dependency relationship. The search processing unit 501 executes any one of the node parent-child relationship type and the node single type in the match mode definition PA1 of the match profile for the match matching.

  Here, the node parent-child relationship type is a condition for the relationship between words matched by the integrated type of word element matching, and is a character string (partial tree node) of the rule of match dictionary data in the match dictionary storage unit 503 And the parent-child relationship of the matched word (phrase corresponding to the subtree node) obtained by the word element matching integrated type in the character string included in the search key sentence matches In this match mode, the matching character string is obtained as a word that satisfies the match mode condition. If there is no character string having a matching parent-child relationship, a simple sentence that satisfies the match mode condition cannot be obtained.

For example, as shown in FIG. 10A, in the parent-child relationship of the match dictionary data in the match dictionary storage unit 503, the phrase corresponding to the parent rule is “go” and the phrase corresponding to the child rule is “watching soccer”. The following two patterns coincide with this parent-child relationship. In other words, the phrase corresponding to the parent rule is “go” and the phrase corresponding to the child rule is “soccer”, and the phrase corresponding to the parent rule is “go” and corresponds to the child rule. It is a pattern in which the phrase is “watching”.
Therefore, as a character string included in the search key sentence, a phrase corresponding to the parent rule is “go” and a phrase corresponding to the child rule is “soccer”. Meet the mode conditions. As described above, when the condition of the match mode is satisfied, the parent-child relationship of the phrase “watching soccer”-“go” corresponding to the rule is obtained as a result of the matching process.
On the other hand, as a character string included in the search key sentence, the phrase corresponding to the child rule is “watching soccer” and there is no phrase corresponding to the parent rule, or the phrase corresponding to the child rule is “go”. If there is no clause corresponding to the parent rule, the match mode condition is not satisfied.

Next, the node single type will be described with reference to FIG.
The node single type includes a parent-child relationship of a character string (a clause corresponding to a subtree node) of a rule of match dictionary data in the match dictionary storage unit 503, and a character string (a clause corresponding to a subtree node) included in a search key sentence. In the parent-child relationship, when at least one of the parent node and the child node is matched, the matching character string is obtained as a word that satisfies the condition of the match mode. That is, the dependency is not actually evaluated. Note that the comparison of character strings in a node is performed according to the type corresponding to word element matching. The result is that it was not obtained.

For example, as shown in FIG. 10B, when the parent-child relationship of the match dictionary data in the match dictionary storage unit 503 is a phrase “watching soccer” corresponding to the parent node, and a phrase corresponding to the child node “going”, The search key sentence including “soccer” as the phrase corresponding to the parent node satisfies the condition of the match mode, and as a result of the matching process, the parent-child relationship of the phrases “watching soccer”-“go” corresponding to the rule is obtained.
A search key sentence including “go” as a clause corresponding to the child node satisfies the matching condition. On the other hand, the search key sentence including “Yes” as the clause corresponding to the child node does not satisfy the match mode condition because there is no matching character string in either the parent node or the child node.
For example, in the match mode definition PA1 of the match profile, it is determined in advance whether or not to perform the matching matching, and one of the above-described types is determined in advance.

Next, attribute matching will be described. As attribute matching, there are a sentence attribute matching type and a word matching type that does not substantially evaluate attribute matching, similar to dependency matching.
Here, the sentence attribute match type is shown in FIG.
In attribute matching, the search processing unit 501 causes the attribute of the rule string of the match dictionary data stored in the match dictionary storage unit 503 (the clause corresponding to the subtree node) and the corresponding character string in the search key sentence ( The attribute of the clause corresponding to the subtree node is checked.
Here, the sentence attribute match type is such that at least a part of the rule clause of the match dictionary data in the match dictionary storage unit 503 matches at least a part (word) of the clauses included in the search key sentence. When the attributes of the matching parts match, the matching character string is a match mode obtained as a word that satisfies the match mode conditions. Even if the character strings match, if the attributes are different, a simple sentence that satisfies the match mode condition cannot be obtained.

For example, in the rule 32 of the match dictionary data in the match dictionary storage unit 503, the predicate attribute of the phrase “watching soccer” corresponding to the rule is “denial”, and the predicate attribute of the phrase “watching watching” included in the search key sentence is If “No”, the condition of the match mode is satisfied. In other words, in the case of “do not watch soccer”, it is decomposed into “soccer (noun)” + “watching (noun)” + “do (auxiliary verb)”, but the attribute of “do not (auxiliary verb)” is “denial” . For this reason, the predicate attribute of “watching soccer” is “negative”, which satisfies the condition of the match mode.
On the other hand, if the predicate attribute of the phrase “Soccer watching” included in the search key sentence is “possible”, the character string of the clause “Soccer watching” matches, but the attributes are different, so the match mode condition is met. Absent. That is, in the case of “can watch soccer”, it is decomposed into “soccer (noun)” + “watching (noun)” + “can (auxiliary verb)”, but the attribute of “can (auxiliary verb)” is “possible” . For this reason, the predicate attribute of “watching soccer” is “possible” and does not satisfy the condition of the match mode.
The “attribute” here means semantic information of an auxiliary verb, and there are denial, doubt, possibility, and the like. For example, the word “cannot be used” is decomposed into “use (verb)” + “not (auxiliary verb)”, but the attribute of “not (auxiliary verb)” is “denial”.
In the attribute matching, it is determined that the match mode condition is satisfied even if “no attributes” such as nouns are matched.

Next, an example of a method for creating match dictionary data in the information search system 1 according to the present embodiment will be described with reference to FIG. FIG. 12 is a flowchart showing an example of a method for creating match dictionary data in the information search system 1 according to the present embodiment.
As shown in FIG. 12, when the creation of match dictionary data is instructed from an operation unit (not shown) of the Japanese analysis server 500, for example, the dictionary creation unit 504 searches the data source 701 of the database file server 700 for a search target. The sentence is read out, and the sentence is divided into phrases and output to the document analysis unit 505 as a single sentence unit. For example, if the text data of sentence A to be searched is "I couldn't receive it when I sent an image in my PC. And I couldn't receive it when I sent an image in my phone." . ”Is divided into two single sentences separated by“. ”, And the simple sentence A1“ I couldn't receive it when I sent the image in the PC. ” Is output to the document analysis unit 505 (step ST1).

The morpheme analysis unit 5511 of the document analysis unit 505 that has received the single sentence A1 separates the single sentence A into morphemes (for example, in units of words), and decomposes the sentence to be searched into a plurality of morphemes. For example, when the sentence A is input, the morphological analysis unit 5511 performs a plurality of words “PC”, “inside”, “in”, “a”, “image”, “send”, “send”, and “send”. ”“ Place ”“ Receive ”“ Disabled ”.
The morpheme analysis unit 5511 analyzes the part of speech, attributes, meaning, and the like of the decomposed morpheme with reference to the system dictionary 5501 and the user dictionary 5502 to obtain an analysis result (step ST2).

  Next, the syntax analysis unit 5512 creates a clause corresponding to the subtree node of the structural tree by combining at least one morpheme. Note that here, the character string corresponding to the subtree node is one of the constituent elements of the sentence, and is the smallest set obtained when the sentence is divided as an actual word so as not to be unnatural. An example in which a phrase is a unit will be described. However, the present invention is not limited to this.

Based on the analysis result analyzed by the morphological analysis unit 5511, the syntax analysis unit 5512 evaluates the part of speech and meaning of words constituting the sentence, the attribute information, the position in the sentence, the arrangement, and the like. The relationship between clauses is analyzed, and the relationship between clauses, the appearance position of words, sentence components (sentences, predicates, etc.) in sentences are obtained as analysis results. Further, the syntax analysis unit 5512 assigns a single sentence ID for identifying each single sentence for each single sentence.
Next, the syntax analysis unit 5512 creates a structure tree with the clause as a subtree node based on the analysis result, and outputs the analysis result to the synonym processing unit 5513 (step ST3).

  The synonym processing unit 5513 refers to the synonym dictionary 5503 and searches the decomposed words for synonyms and synonyms to be unified. If there are corresponding synonyms, the synonym processing unit 5513 selects the corresponding words. The synonym dictionary 5503 is replaced with a synonym obtained by searching (step ST4). Then, the synonym processing unit 5513 outputs the analysis result by the analysis unit 551 to the dictionary creation unit 504.

  The dictionary creation unit 504 that has received the analysis result, based on the analysis result, as a rule for each clause, for example, word information text data, a predicate attribute of each word information, a parent rule ID representing a connection between rules (subtree nodes) Information necessary for registration in the match dictionary storage unit 503, such as a child node presence flag, a weight value, a conjunction type, and a category, is obtained. Then, the dictionary creation unit 504 creates the registration data that can be registered as the single sentence information MD3 of the match dictionary data by putting together the rules and the single sentence IDs of each phrase for each single sentence constituted by these phrases.

In addition, the dictionary creation unit 504 reads the symbol map MD1 from the match dictionary storage unit 503, and determines whether there is word information that is uniformly used in the match dictionary storage unit 503 in the text to be searched. If there is a word identical to the word information, this word is replaced with the symbol ID. When there is no identical word information to be replaced in the symbol map MD1, the dictionary creation unit 504 gives a new symbol ID to the word information.
Then, the dictionary creation unit 504 stores sentence ID, sentence text, sentence information including sentence additional information, a term map, and the like (information necessary for registering sentences) and match dictionary registration data based on simple sentence information. Create (step ST5).

  Next, the dictionary creation unit 504 writes the created match dictionary registration data in the match dictionary storage unit 503, and registers the analysis result as match dictionary data (step ST6).

Next, an example of a search method based on match dictionary data in the information search system 1 according to the present embodiment will be described with reference to FIG. FIG. 13 is a flowchart showing an example of a search method in the information search system 1 according to the present embodiment.
As illustrated in FIG. 13, for example, when the search service α is designated by the user from the input unit 103 of the client terminal device 100, the client terminal device 100 performs a request control signal for executing a search using the designated search service α. Is transmitted to the Japanese language analysis server 500 via the WEB server 300.

  When the Japanese analysis server 500 receives this search request control signal, the search processing unit 501 activates the search service α program and reads the match profile associated with the search service α from the match profile storage unit 502. Here, since the search service α and the match profile A are associated in advance in the match mode definition of the match profile A, the search processing unit 501 reads the match profile A when the search program α is started. Further, the search processing unit 501 reads the match dictionary data stored in the match dictionary storage unit 503 (step ST10).

  Then, for example, the search processing unit 501 expands the read match profile A and match dictionary data in the memory area 506 to create a dictionary object (step ST11). Note that the search processing unit 501 may read the term map 24 attached to each sentence from the match profile A, calculate the appearance frequency information of the word information, and temporarily store it in the memory area 506. As described above, by obtaining the appearance frequency information of each word information in advance when expanding the memory for creating the dictionary object, the processing load for calculating the appearance frequency information of the word information is reduced during the matching process.

Here, when a search key sentence is input by the user from the input unit 103 of the client terminal apparatus 100, the client terminal apparatus 100 transmits the search key sentence to the Japanese analysis server 500 via the WEB server 300 ( Step ST12).
When receiving the search key sentence (step ST13), the Japanese analysis server 500 performs a search based on the search key sentence as shown below.

First, the search processing unit 501 generates a search result object, which is an empty object, in the memory area 506 in order to return a search result for the search service α to the client terminal device 100, and secures a storage area for recording the result. (Step ST14).
Then, the search processing unit 501 analyzes the search key sentence according to the program of the search service α. That is, the morpheme analysis unit 5511 analyzes the morpheme of the single sentence divided by the search processing unit 501, divides the morpheme into morphemes, and searches the system dictionary 5501 and the user dictionary 5502 to search for parts of speech, attributes, and the like. The morpheme analysis unit 5511 creates a phrase according to the relationship between the sentence and the meaning of the morpheme based on the information indicating the part of speech, the attribute, and the like.

Next, the syntax analysis unit 5512 evaluates the part of speech and the meaning of the phrase constituting the sentence, the attribute information, the position in the sentence, the arrangement, etc. based on the analysis result by the morpheme analysis unit 5511, and determines the relationship between the phrases in the sentence. The relationship is analyzed, and the analysis result is output to the synonym processing unit 5513.
The synonym processing unit 5513 refers to the synonym dictionary 5503 and searches the decomposed word or phrase for whether there is a synonym or synonym that should be unified. The word or phrase is replaced with a synonym or the like obtained by searching from the synonym dictionary 5503. Then, the synonym processing unit 5513 outputs the analysis result by the analysis unit 551 to the search processing unit 501 (step ST15).

Then, the search processing unit 501 performs matching between the analyzed search key sentence and the match dictionary data expanded in the dictionary object in the memory area 506 according to a match mode predetermined in the match profile A, and The type of matching is determined, and a sentence or the like that satisfies the condition of the match mode definition is searched (matching process) (step ST16). Details will be described later.
Further, in step ST16, the search processing unit 501 calculates a score that evaluates the degree of matching with the search key sentence in a sentence obtained by matching according to a score mode predetermined in the match profile A ( Scoring process).

Then, the search processing unit 501 writes the sentence satisfying the match mode obtained by the search in the matching process and the score obtained by the scoring process in the search result object in the memory area 506 (step ST17).
Then, the search processing unit 501 transmits the contents of the search result object to the client terminal device 100 via the WEB server 300 (step ST18).

Next, an example of matching processing and scoring processing in the information search system 1 according to the present embodiment will be described in detail with reference to FIG. FIG. 14 is a flowchart illustrating in detail an example of matching processing and scoring processing in the information search system 1 according to the present embodiment. The process shown in FIG. 14 is a detailed description of the process corresponding to step ST16 in FIG.
As shown in FIG. 14, the search processing unit 501 refers to the symbol map MD1 of the match dictionary data and replaces the word information of the search key sentence analyzed by the document analysis unit 505 in step ST15 of FIG. Step ST20).
Then, the search processing unit 501 performs a matching process according to a match mode determined in advance in the match profile A. In the present embodiment, the search processing unit 501
For word element matching, dependency matching, and attribute matching, it is determined which type is matched, and a single sentence that matches the condition defined by the match mode definition PA1 is extracted (step ST21). Thereby, the search processing unit 501 can obtain a sentence that matches the search key sentence by the matching process from the match dictionary data.

  The search processing unit 501 calculates the score using the match type information determined in each matching mode and the score mode defined by the match profile A for the result obtained by the matching process, and sums these values. Is temporarily stored in the memory area 506 (step ST22). Here, the score is calculated irrespective of the match mode definition actually used for extracting the result. This is because matching based on the match mode definition is extraction of the search result itself, while the score calculation process is intended to make it easier to obtain a result more suitable for the purpose of the search among the extracted results. This is an evaluation process, and it is useful to evaluate the extracted result again in terms of word elements, involvement, and attributes regardless of the match mode definition conditions.

For example, in the case of dependency matching, when the node single type is performed as shown in FIG. 10B, the clause corresponding to the rule is “(Watch soccer game) − (go)” and is included in the search key sentence. When the phrases to be played are “(soccer) − (go)” and “(soccer) − (s)”, it is determined that both match as matching results, but the match match coefficient PA7 in the match profile is Then, a score is calculated by applying a coefficient to “(soccer)-(go)” that matches the search key sentence.
That is, when a sentence or a single sentence satisfying such dependency matching conditions is obtained, the search processing unit 501 compares the matched sentence or single sentence with the search key sentence, and the score mode information indicates If the relationship is negative, the score can be calculated by applying a coefficient.

Then, the search processing unit 501 uses the matched sentence and the score of the sentence to match the matched sentence (for example, the type of match mode used in the matching process, the matched word or the sentence in the matched phrase) An appearance position (hereinafter referred to as match position information) and score) is generated (step ST23).
Next, for example, the search processing unit 501 rearranges the matched sentences in descending order of score score calculated in step ST22 (step ST24). Then, the search processing unit 501 associates the search key sentence, the matched sentence, the matched single sentence, and the match information, and writes them in the search result object in the memory area 506 (step ST25).

Then, the search processing unit 501 transmits the contents of the search result object to the WEB server 300 (step ST26). The WEB server 300 temporarily stores the received search result object in the storage unit 304, creates search result display data (web page) that can be displayed by the display unit 102 of the client terminal device 100, and stores the search result object in the client terminal device 100. Send. The client terminal device 100 displays the search result display data on the display unit 102 based on the display data.
Note that the match position information is information indicating a character position at which a word satisfying the matching condition (matched word) appears in a matched single sentence or sentence including the matched word.

Here, an example of search result data, which is the content of the search result object, will be described with reference to FIG. FIG. 15 is a schematic diagram illustrating an example of search result data.
As shown in FIG. 15, the search result data includes a “matched sentence” that matches the “search key sentence” input by the user in the client terminal device 100 and the search key sentence obtained by the matching process by the Japanese analysis server 500. ", A single sentence that matches the search key sentence and is a single sentence included in the matched sentence, and a" match information "generated for each clause included in this matched single sentence Are associated with each other.

Next, an example of search result data transmitted to the client terminal device 100 as the contents of the search result object will be described in more detail with reference to FIG.
As shown in FIG. 16A, when the search key sentence is “Internet not connected” and the Japanese analysis server 500, for example, as the search result 1, the single sentence ID “001-1” and the text “Internet is not connected”. However, a case where the simple sentence ID “002-3” and the text “Suddenly the Internet could not be made” is obtained as the search result 2 will be described below.

  FIG. 16B is a diagram for explaining a character position representing match position information of a sentence obtained from a search key sentence or a search result. As shown in FIG. 16B, for example, the search key sentence is given a number indicating the character position “1, 2, 3,..., 12” one by one from the beginning of the sentence. The number representing the character position can represent the character position of the matched word appearing in the matched single sentence or sentence.

  FIG. 16C shows an example of the search result. As shown in FIG. 16C, the search result 1 has a score of “8.” when the sentence ID is “001”, the single sentence ID is “001-1”, and the match mode is “relevant matching”. 9 ”and the match position information is“ key1: 7, res1: 7 ”,“ key9: 12, res9: 12 ”. Here, the match position indicates a word matched in the match mode, and the match position information “key1: 7, res1: 7” indicates “Internet” of the search key sentence and the match position information “key9: 12”. “res9: 12” means “not connected” in the search key sentence. In other words, the “Internet” and “not connected” have a score of “8.9” for the matching between the parent-child relationship in the search key sentence and the parent-child relationship in the sentence of the search result 1 in the dependency matching. Means.

Here, the match position information “key1: 7, res1: 7” is information indicating the position of the matched word in the sentence, and “key1: 7” is the first counted from the head of the search key sentence. The seventh to seventh character strings correspond to matching words. Further, “res1: 7” indicates that the first to seventh character strings counted from the head of the matched single sentence (or matched sentence) correspond to the matched word. This number is the number of characters counted from the beginning of the sentence and is information indicating the position of the character in the sentence.
As described above, the search result includes information in which a sentence including a word satisfying the matching condition is associated with match position information of a matched word included in the sentence.

Next, an example of a search start process in the information search system 1 according to the present embodiment will be described using FIG. FIG. 17 is a flowchart showing an example of a search start process in the information search system 1 according to the present embodiment.
As shown in FIG. 17, for example, when the input unit 103 of the client terminal device 100 receives a request for using the search service by the Japanese analysis server 500 from the user, the client terminal device 100 displays the search for display by the Japanese analysis server 500. A request control signal is transmitted to the WEB server 300 so as to transmit data.

When the WEB server 300 receives a request control signal from the client terminal device 100 via the communication unit 301, the request processing unit 302 displays on the display unit 102 of the client terminal device 100 based on the request control signal. The data conversion unit 303 is controlled so as to create data of the data web page. Next, the data conversion unit 303 reads necessary setting data from the storage unit 304 and creates search display data for displaying a text box in which a search key sentence is input by the user. And the communication part 301 transmits this display data for a search to the client terminal device 100 (step ST30).
For example, the data conversion unit 303 creates display data composed of HTML text or the like for causing the client terminal device 100 to display search display data. Then, the request processing unit 302 displays the display data and rule information (for example, configured with a CSS file) in which rules for displaying the search result are described, or displays the search result on the client terminal device 100. A program code (for example, javascript) that is used to be displayed on the unit 102 and that runs on the browser 101 is transmitted to the client terminal device 100 via the communication unit 301.

When the client terminal device 100 receives display data or a program from the WEB server 300, the client terminal device 100 starts this program. Then, the data processing unit 112 generates display data displayed by the display unit 102 from the display data for search received from the WEB server 300 according to this program, and controls the display processing unit 113. The display processing unit 113 causes the display unit 102 to display the display data generated by the data processing 112.
When a specific search service is specified by the user via the input unit 103 of the client terminal device 100, the client terminal device 100 generates a request control signal for executing a search by the specified search service.
Further, when a search key sentence is input by the user, the input unit 103 receives this (step ST31).

Next, the client terminal device 100 transmits the type of search service designated by the user and the input search key text to the WEB server 300 via the communication unit 104 together with the search request message.
When receiving the search request message from the client terminal device 100, the WEB server 300 extracts a search key sentence from the search request message and requests the Japanese analysis server 500 to perform a search using, for example, the search service α (step ST32).

Next, an example of a search result display method in the information search system 1 according to the present embodiment will be described with reference to FIG. FIG. 18 is a flowchart showing an example of a search result display method in the information search system 1 according to the present embodiment.
As shown in FIG. 18, when the WEB server 300 receives the search result from the Japanese analysis server 500, the WEB server 300 temporarily stores it in the storage unit 304. Then, the data conversion unit 303 reads out rule information to be displayed on the client terminal device 100 from the storage unit 304. Next, the data conversion unit 303 creates display data of a web page for displaying the search result on the display device 102 of the client terminal device 100 based on the rule information, and sends the search result message via the communication unit 301 as a search result message. Is transmitted to the client terminal device 100 (step ST40).
For example, for each search key sentence, the data conversion unit 303 assigns predetermined tags to link information to matched sentences, matched single sentences, applied match modes, match position information, scores, etc. Data (XML file) that can be handled by the data processing unit 112 on the apparatus 100 side is created and transmitted as a search result.

When receiving the search result, the data processing unit 112 of the client terminal device 100 temporarily stores the search result (XML file) in the storage unit 111. Based on the rule information stored in the storage unit 111, the data processing unit 112 displays a tag corresponding to the match mode applied to the word corresponding to the match position included in the message of the XML file. Is inserted (step ST41).
For example, in the case of a search result as shown in FIG. 16C, the data processing unit 112 sets the match mode applied to “Internet” corresponding to “key 1: 7, res1: 7” in the match position information. A display tag indicating that it is “involved matching” is created by referring to rule information (CSS file) in which a rule for displaying a search result is described. For example, the data processing unit 112 creates highlight setting information for emphasizing and displaying a specific word as information for displaying the search result based on the rule information, and assigns it to the word corresponding to the match position information as a tag To do. As the highlight setting information, for example, setting information for displaying a matched word with an underline or a word matched by a word element matching match mode is displayed in red, and attribute matching is performed. Setting information for displaying the word matched by the match mode in blue so that the user can visually distinguish and recognize each match mode is included.

In addition, as highlight setting information, when there are a plurality of match modes that are determined to be matched, a word that matches a match mode that is set to be displayed preferentially according to a priority set in advance by the user Or the like may be highlighted, or may be preferentially highlighted in descending order of score.
Furthermore, as a search result, when the priority order of the match mode is determined in advance by the user as described above, if there are a plurality of matched simple sentences or sentences, the matched simple sentences are displayed according to the priority order. May be.
Moreover, you may display preferentially as a search result in an order from the matched word near the head of the search key sentence.

  Based on such highlight setting information, the search result is displayed on the display unit 102 of the client terminal device 100. At this time, when the user selects a matched word or the like displayed in an emphasized manner from the matched single sentences or the like displayed as the search results, the input unit 102 accepts this. Then, the data processing unit 112 identifies the position of the matched word or the like selected by the user from the match position information of the search result temporarily stored in the storage unit 111, and the matching selected by the user A narrow-down search according to the word or the like is further performed (step ST42).

Here, the refinement search will be described in detail with reference to FIGS. FIG. 19 is a schematic diagram illustrating an example of an image representing a search result displayed on the display unit 102 of the client terminal device 100 based on the search result display data.
As shown in FIG. 19, the display unit 102 of the client terminal device 100 displays a screen 102A based on the search result display data. On the left side of the screen 102A is a text box 102B for displaying a search key sentence, and on the right side. A suggestion screen 102C for displaying the search result is displayed.
In the text box 102B, a search key sentence “I have registered a credit card payment method, but I receive a bill from the charge support window” is displayed.

The suggestion screen 102C displays the search key sentence 102C1 and a matched single sentence 102C2 that is a search result obtained by the search by the Japanese analysis server 500 based on the search key sentence.
For example, on the suggestion screen 102C, the matched word of the search key sentence 102C1 is highlighted for each color corresponding to the match mode. In the matched simple sentence 102C2, when a plurality of simple sentences are displayed, the matched simple sentences are displayed in the order of the highest score as a normal search result. In addition, the matched word included in the matched simple sentence 102C2 (for example, “I registered the payment method with a credit card, but I receive an invoice from the charge support window”) is included in the search key sentence 102C1 The highlighted words are displayed in the same color as the selected words, and the words searched in the same match mode are highlighted in the same color.
Note that the matched word highlighted and displayed in the search key sentence 102C1 on the suggestion screen 102C can be selected by the user by receiving a selection instruction from the input unit 102 of the client terminal device 100.
When a search result is obtained, the WEB server 300 creates search result display data as shown in FIG. 19 and transmits it to the client terminal device 100.

  Next, a method for performing a narrow search from the screen shown in FIG. 19 will be described with reference to FIG. FIG. 20 is a schematic diagram illustrating an example of a screen displayed after performing a narrowing search from the screen illustrated in FIG. Note that the client terminal device 100 uses the search result received from the Japanese search server 500 (for example, a matched sentence, a matched single sentence, a matched word, a matching condition used for the matching, or a match including match position information) Information, score, etc.), search rule information when performing a refinement search (for example, a program for retrieving a refinement target from a search result using a matched word specified via the input unit 103 as a search key, Set values and the like) are stored in the storage unit 111. As shown in FIG. 20, for example, when the “credit card” of the matched word highlighted in the search key sentence 102C1 on the suggestion screen 102C is selected by the user (for example, on the screen with the mouse) The data processing unit 112 of the client terminal device 100 uses the input unit 103 in the state where the pointer indicates “credit card” and the data processing unit 112 of the client terminal device 100 is designated as a narrowing search target by an operation such as double click. The user receives a selection instruction from the user, and executes a refinement search by “credit card”.

The data processing unit 112 refers to the search result and search rule information stored in the storage unit 111 and detects the match position information of “credit card” in the search key sentence. Further, the data processing unit 112, based on the match position information of “credit card”, matches a single sentence or sentence including a matched word associated in the match position information as a word that matches “credit card” of the search key sentence. Search for.
For example, in the case shown in FIG. 19, the match position of the “credit card” in the search key sentence is “1: 8”, and therefore the match position information in the search result is “key1: 8, res1: 8”. Search for matching sentences that contain words that match in the same match mode.
Further, the data processing unit 112 displays display data (screen 102A-1, see the figure shown in the upper part of FIG. 20) that displays the matched single sentence or sentence obtained by this search at the top of the search result of the search result display data. ). As a result, the display processing unit 112 causes the display unit 102 to display display data that displays the matched sentence obtained by the narrowing search at the top.

Then, as shown in the search result screen 102A-1 shown in the upper part of FIG. 20, the display unit 102 displays “Change to credit card payment” (score: 2.2) as the highest level of the matched single sentence 102C2. Next, “I have registered a payment method with a credit card, but I receive an invoice from the charge support window” (score: 1.4) is displayed.
Accordingly, the client terminal device 100 can execute a re-search based on the matched word using the match position information of the search result, and displays the matched single sentences in descending order of matching with respect to the matched word. can do.
Here, when a word that matches the search key sentence is specified as the narrowing search, the data processing unit 112 selects a word that matches in the word element matching and matches the match position information. Although an example of performing a re-search based on the search result stored in the storage unit 111 has been described, the present invention is not limited to this.

For example, when the user selects the “invoice” of the matched word displayed highlighted in the search key sentence 102C1 on the suggestion screen 102C, the data processing unit 112, via the input unit 103, A selection instruction from the user is accepted, and a refinement search by “invoice” is executed.
The data processing unit 112 refers to the search result and search rule information stored in the storage unit 111 and detects the match position information of “invoice” in the search key sentence. Further, the data processing unit 112, based on the match position information of “Bill”, matches a single sentence or sentence including a matched word associated in the match position information as a word that matches “Bill” of the search key sentence. Search for.
For example, in the case shown in FIG. 19, since the invoice in the search key statement has a match position of “33:35”, the match position information in the search result is “key33: 35, res33: 35”. Search for matching sentences that contain words that match in the same match mode.

Then, the data processing unit 112 displays display data (see the screen 102A-2, the diagram shown in the lower part of FIG. 20) that displays the matched document obtained by this search at the top of the search result display data. create. The display processing unit 113 causes the display unit 102 to display display data that displays the matched sentence obtained by the narrowing search at the top.
The display unit 102 displays this display data, and as shown in the search result screen 102A-2 shown at the bottom of FIG. 20, “The invoice arrives from the charge center” as the highest level of the matched single sentence 102C2. I would like to confirm the breakdown of the invoice details. ”(Score: 4.0), then“ I received the invoice despite the cancellation ”(score: 2.5) ... is displayed.

  As described above, the search result is stored in the storage unit 111 of the client terminal device 100, and the data processing unit 112 can perform a search again based on the matched word by using the matching position information. As a result, when the refinement search is instructed (requested) by the user, the client terminal device 100 determines the position of the matched word in the sentence without performing analysis such as morphological decomposition or syntax analysis on the search result. , And can be obtained using the match position information. Moreover, since the match position information is created for each match mode, a tag for different highlighting according to the match mode can be given to the matched word. Therefore, the client terminal device 100 can obtain the search result of the narrow search by displaying the search result that has been searched again on the display unit 102.

As described above, the search result including the match position and the match pattern is stored in the storage unit 111 of the client terminal device 100, and a word that is a key for the narrowing search by the data processing unit 112 is designated via the input unit 103. Then, a narrowed search result can be easily obtained from only the position information of the corresponding word and the match pattern. For this reason, the client terminal device 100 can reconstruct the display data of the re-search result by the narrow-down search.
On the other hand, unlike the present embodiment, when the match position information is not stored in the storage unit 111 as a search result, the word specified by the refinement search by the client terminal device 100 is located anywhere in the matched sentence or the matched single sentence. It is thought that there was a problem that it cannot be detected unless the sentence is analyzed. Further, as in the present embodiment, when the match position information of the word matched for each match mode is not stored in the client terminal device 100 as a search result, the match is emphasized with the same color or the same font for each match mode. It seems that there was a problem that could not be displayed. The information search system 1 according to the present embodiment can solve the above problem by adopting the configuration as described above.

Further, the client terminal device 100 according to the present embodiment can use the match position information to know the position of the matched word in the matched single sentence or sentence, so that the matched word can be extracted from the sentence. Therefore, re-searching can be performed without performing document analysis such as morphological analysis and syntax analysis. Also, by knowing the position of the matched word, the client terminal device 100 can create and display display data that highlights the matched word.
On the other hand, if there is no match position information, the client terminal device 100 cannot highlight the word because the word for highlighting is not located in the sentence unless document analysis is performed on the search result. Also, it cannot be highlighted with different colors depending on the match condition.
In the above-described processing, the data processing unit 112 on the client terminal device 100 side performs a narrowing search and re-displays, but the word information selected by the user is transmitted from the client terminal device 100 to the WEB server 300 side. The narrowing process may be performed on the WEB server 300 side. In this case, search results and search rule information are transmitted from the Japanese language analysis server to the WEB server 300 and stored in the storage unit 304.

Next, an example of the matching process and the scoring process will be described in detail.
Here, an example will be described in which the search service α is designated by the user, and a match profile A determined in advance as a match profile for the search service α is expanded in the dictionary object in the memory area 506.

FIG. 21 is a diagram illustrating an example of a search key sentence.
As shown in FIG. 21, a search key sentence “ETC card cannot be used at a store” is input to the Japanese language analysis server 500. Thus, when collation is performed by the Japanese analysis server 500, a matched single sentence as shown in FIG. 22 is obtained. FIG. 22 shows a plurality of matched single sentences. For example, “ETC card cannot be used”, “Credit card cannot be used”, “I want to use ETC card”, “I want to use credit card”, “ETC card is easy to use A matched single sentence such as “I lost the ETC card” is obtained by the search.

  FIG. 23 is a schematic diagram for explaining setting of match profiles A to C stored in the match profile storage unit 502. As shown in FIG. 23, the setting of the match profiles A to C includes match mode information including a match mode definition PA1 in which a match mode combination stored in the match profile storage unit 502 shown in FIG. The score mode information includes information corresponding to the synonym match coefficient PA10 from the relative appearance frequency flag PA2. Note that the combination mode shown in (1) to (4) can be used as the match mode, as will be described later with reference to FIG. Here, an example will be described in which (2) word element matching + attribute matching is predetermined as match information of the match profile.

  The match profile A is determined beforehand as match mode information (match mode definition PA1) (2) word element matching and attribute matching are performed, and weighting is not performed on the result of the matching process ( For example, the coefficient 1.0) is predetermined. The match profile B is preliminarily determined to be subjected to word element matching and attribute matching as match mode information (match mode definition PA1), and the weighting on the result of the matching process is an associated match coefficient. Execution of weighting in dependency matching based on PA7 (for example, coefficient 2.0) is predetermined. Further, the match profile C is preliminarily determined to be subjected to word element matching and attribute matching as match mode information (match mode definition PA1), and the weighting on the result of the matching process is a predicate attribute match coefficient. Based on PA6, it is predetermined to execute weighting according to the predicate attribute match coefficient (for example, coefficient 2.0) for a single sentence matched in attribute matching.

Next, search results obtained based on the match profile A in the examples shown in FIGS. 21 to 23 will be described with reference to FIG. FIG. 24 is a schematic diagram showing an example of a screen on which the search ligation obtained based on the match profile A is displayed on the suggestion screen 102C.
Here, since the match mode definition PA1 of the match profile A is a word element + attribute, a result (single sentence) that matches these conditions is extracted in FIG.
In the scoring process shown in FIG. 24, no particular weighting or the like is set as the score mode information (for example, coefficient 0.0 or 1.0). In addition, when each node (rule) matches, a reference point is assigned, and weighting is calculated based on the score mode information. In the present embodiment, the weight value of each rule (node) stored in the match dictionary is applied to the reference point, but a predetermined value may be simply set for the entire apparatus. This time, the reference point will be 1.2 points.
For example, “ETC card cannot be used” matches the words “ETC”, “card” and “useable”, and “ETC” and “card” are nouns, so there are no attributes and 1.2 points each. , “Use” matches the “Use (Negative)” attribute, so 1.2 points × 1.0 (predicate attribute match coefficient), which is 1.2 points, which is 3.6 points. In addition, since there is a dependency pair in each of “ETC” and “usable”, “card” and “usable”, no dependency match coefficient (1.0) is applied and 1.2 points × 1.0 × In 2.4, 2.4 points are added, for a total of 6.0 points.
As described above, dependency matching is not performed in the match mode, but if the dependency match coefficient is defined in the score mode of the match profile, the score can be calculated flexibly for the obtained result. Can be weighted.

On the other hand, “I want to use an ETC card” matches the words “ETC” and “card” with no word / attribute. In this case, it is 2.4 points by 1.2 points + 1.2 points. In addition, since there is no predicate attribute match coefficient or dependency match coefficient, there is no additional point.
In addition, the word “use credit” and the attribute “use (deny)” match “cannot use credit card”. In this case, 1.2 points × 1.0 is 1.2 points. In addition, since there is no application of the dependency match coefficient, there is no additional point. Therefore, as shown in FIG. 24, the search results arranged in the order of the score are “the ETC card cannot be used” at the top (score 6.0 points), followed by “I want to use the ETC card” (score 2 4 points).
In this embodiment, “credit card” and “ETC card” are common in the “card” part, but are registered in the system dictionary 5501 when performing morphological analysis in the form of “credit” alone. Otherwise, since all element types are executed in word matching, both cannot be obtained as matched phrases.

Next, search results obtained based on the match profile B in the example shown in FIGS. 21 to 23 will be described with reference to FIG. FIG. 25 is a schematic diagram illustrating an example of a screen in which search results obtained based on the match profile B are displayed on the suggest screen 102C.
In the scoring process shown in FIG. 25, as the score mode information, the dependency match coefficient is determined to be “present” (multiply the score by 2.0). For example, “ETC card cannot be used” matches the words / attributes “ETC”, “card”, and “use” and has 3.6 points as in profile A. In addition, since there is a dependency pair in each of “ETC” and “usable”, “card” and “usable”, there is a dependency match coefficient (2.0) and 1.2 points × 2.0 × 4.8 points are added at 2, which gives a total of 7.2 points. Since the score processing for other matched single sentences is the same as the description in FIG. 24, the detailed description is omitted.

Next, search results obtained based on the match profile C in the example illustrated in FIGS. 21 to 23 will be described with reference to FIG. FIG. 26 is a schematic diagram illustrating an example of a screen in which search results obtained based on the match profile C are displayed on the suggest screen 102C.
In the scoring process shown in FIG. 26, as the score mode information, there is a predicate attribute match coefficient (multiply the score by 2.0). For example, “ETC card cannot be used” matches “ETC” and “card” without word / attribute, 1.2 points + 1.2 points, 2.4 points, and “use” is word, “use ( (No) ”, the predicate attribute match coefficient exists (2.0) is applied, and 1.2 points × 2.0 is 2.4 points.
Similarly to profile A, “ETC” and “usable”, “card” and “usable” each include a dependency pair, so no dependency match coefficient (1.0) is applied and 1.2. 2.4 points are added at points × 1.0 × 2, for a total of 7.2 points. On the other hand, “Can't use credit card” matches only “Use” is a word and “Use (Negation)” attribute.
Predicate attribute match coefficient exists (2.0) is applied, which is 1.2 points × 2.0 and 2.4 points.

In this way, different search results can be obtained using match profiles A to C in which different score mode information is set.
This eliminates the need to create match dictionary data according to the match mode to be used or to create match dictionary data according to the match mode. Or a combination thereof.

Next, the characteristics of the search result according to the match mode will be described with reference to FIGS.
FIG. 27 shows a search result, and shows a combination of a matched single sentence and a match mode in which the matched single sentence is obtained.
For example, “ETC card cannot be used” means (1) word element matching, (2) combination of word element matching and attribute matching, (3) modification matching, and (4) combination of modification matching and attribute matching. Indicates that it was obtained as a matched single sentence. The search key sentence is “ETC card cannot be used at the store”.

FIG. 28 is a schematic diagram showing search results obtained by (1) word element matching in match mode definition PA1 of the match profile. As shown in FIG. 28, as matched single sentences, sentences having opposite meanings such as “I cannot use an ETC card” and “I want to use an ETC card” are matched. On the other hand, sentences that match only the word “ETC card” such as “I lost my ETC card” or sentences that only match the word “use” such as “I cannot use a credit card” match. Obtained by searching as a sentence.
As described above, in this match profile, it is possible to perform a search including a similar sentence over a wide range. Thus, a plurality of matched simple sentences are obtained. Here, if a large number of matched single sentences are obtained, the user's convenience may be adversely affected. However, the Japanese analysis server 500 can display the evaluated matched single sentence as a search result based on the score by weighting the matched word. Therefore, even when a plurality of similar sentences are searched over a wide range, priority can be given to these search results, and the results can be displayed in the order according to the priority based on the score.

  FIG. 29 is a schematic diagram showing search results obtained by (2) a combination of word element matching and attribute matching in match mode definition PA1 of the match profile. As shown in FIG. 29, as matched single sentences, sentences having opposite meanings such as “I cannot use an ETC card” and “I want to use an ETC card” are matched. However, since “use” matches as the “use (deny)” attribute here, the difference between the scores of both can be made larger than in the example shown in FIG. In this way, by assigning scores to matching single sentence attributes, matched single sentences having a meaning similar to that of the search key sentence can be displayed at the top.

  FIG. 30 is a schematic diagram showing search results obtained by (3) dependency matching in the match mode definition PA1 of the match profile. As shown in FIG. 30, the matched single sentences have different meanings such as “ETC card cannot be used”, “I want to use an ETC card”, “ETC card is easy to use”, but “Use an ETC card” A single sentence related to "" matches. When such a matching mode is used, it is possible to perform a search with an emphasis on the relationship, regardless of the attribute.

  FIG. 31 is a schematic diagram showing search results obtained by the combination of (4) dependency matching and attribute matching in the match profile definition PA1 of the match profile. As shown in FIG. 31, as a matched single sentence, a single sentence having a meaning very close to the search key result such as “ETC card cannot be used” can be obtained by searching. This is the type of parent-child relationship of a node for dependency matching, where the parent node and child node of the dependency relationship match, and the attribute matches the word corresponding to the parent node or child node of the dependency relationship Search results can be obtained only for.

  Therefore, for example, when an operator needs to search for an accurate answer in a short time, such as an operator in a call center, and answer it, the attribute included in the relationship or word by combining multiple match modes Can be narrowed down to a smaller number of search results, for example, what kind of request or claim for a product, what kind of inquiry is difficult to understand, etc. Search according to the difference.

As described above, each match has a different personality, so by selecting the match mode or combination to be used according to the purpose of the search, it is possible to search according to the user without having to bother creating dictionary data. Can be realized.
Compared with the prior art, by switching and using a plurality of profiles for one dictionary data, the data amount of the dictionary data as a whole can be greatly suppressed.
As described above, the Japanese analysis server 500 according to the present embodiment includes a match mode definition PA1 for registering a match mode to be used for each match profile, and adjusts score mode conditions and coefficients for each match profile. In order to make it possible, a relative appearance frequency flag PA2 to a synonym match coefficient PA10 is provided. Thereby, the Japanese language analysis server 500 according to the present embodiment can use the match mode determined according to the match profile. Therefore, the match dictionary storage unit 503 does not need to be created for each match mode, and can perform matching in different match modes using one match dictionary data. Therefore, it is not necessary to create match dictionary data for each match mode, and the storage capacity for storing a plurality of match dictionary data can be reduced. Further, the labor for creating match dictionary data for each match mode is reduced. In addition, when a search service is selected by the user, the associated match mode is uniquely determined by the user, so the user can easily change the match mode by changing the search service by the user. Can do.

In addition, the search processing unit 501 of the Japanese analysis server 500 according to the present embodiment obtains a single sentence or sentence satisfying the matching condition using the matching mode information, and further obtains the matched sentence or single sentence obtained. The score was calculated. As a result, from the matched sentences and simple sentences, those having closer meaning in relation to the search key sentence can be represented by the score.
Therefore, match mode information is used to narrow down the search results according to the matching condition, and then the scores according to the search key sentence attributes and sentence structure (relationship relations) for the matched search results. , And based on the obtained score, it is possible to prioritize matched sentences and single sentences. As a result, it is possible to obtain a sentence or a single sentence that has a closer match with the search key sentence.
Furthermore, the search processing unit 501 of the Japanese analysis server 500 according to the present embodiment sets the matching mode information and the score mode information independently without being determined in association with each other. As a result, it is possible to realize a multifaceted search according to the characteristics of the search key sentence, the purpose and conditions of the search.

Note that the client terminal device 100 according to the present embodiment is an input terminal device used for input work in a call center, for example, and preferably includes an information processing device such as a workstation or a personal computer. Further, in the client terminal device 100 according to the present embodiment, the match dictionary data is configured based on the vocabulary that frequently appears at the time of the telephone response to the user in the call center that performs the user support work of the mobile phone. May be.
As a result, it is possible to realize more efficient character input in a call center operation or the like where it is necessary to input the contents of reception by telephone in real time.

The operation process in the information retrieval system 1 described above can be used as a program to be executed by a computer or a computer-readable recording medium as the program, and the computer system reads and executes the above process. Is done. The “computer system” here includes a CPU, various memories, an OS, and hardware such as peripheral devices.
Further, the “computer system” includes a homepage providing environment (or display environment) if a WWW system is used.
The “computer-readable recording medium” means a flexible disk, a magneto-optical disk, a ROM, a writable nonvolatile memory such as a flash memory, a portable medium such as a CD-ROM, a hard disk built in a computer system, etc. This is a storage device.

Further, the “computer-readable recording medium” means a volatile memory (for example, DRAM (Dynamic DRAM) in a computer system that becomes a server or a client when a program is transmitted through a network such as the Internet or a communication line such as a telephone line. Random Access Memory)), etc., which hold programs for a certain period of time.
The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line.
The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.

1 Information Retrieval System 100 Client Terminal Device 300 WEB Server 500 Japanese Language Analysis Server 700 Database File Server

Claims (8)

  1. An input unit for inputting a search key sentence composed of a plurality of words;
    An analysis unit that analyzes the search key sentence and obtains an analysis result related to the word constituting the search key sentence;
    Information about the clause included in the sentence as a match dictionary information about a sentence constituted by at least one of the clauses as a subtree node in a tree structure including a clause constituted by at least one of the words A match dictionary storage unit for storing rule information representing
    Matching conditions for checking the relationship between the match dictionary information stored in the match dictionary storage unit and the search key sentence are associated with each other, and the search key sentence for words satisfying the matching condition A match profile storage unit that stores match profile information having evaluation criteria for evaluating the degree of matching of
    Based on the match profile information, the search key sentence is matched with the match dictionary information according to the associated matching condition. As a result of the matching, for the sentence satisfying the matching condition, the match profile information A search processing unit that calculates a score representing a degree of matching between the search key sentence and the match dictionary information according to the evaluation criterion associated with
    An information retrieval apparatus comprising:
  2. The evaluation criteria are:
    Represents whether or not to give a score according to the degree of matching for words that meet the matching condition,
    The search processing unit
    The information search device according to claim 1, wherein the score is obtained by calculating the score given to the word satisfying the matching condition for each sentence satisfying the matching condition according to the evaluation criterion. .
  3. The match profile storage unit
    The plurality of match profile information associated with at least one matching condition among the plurality of matching conditions having different characteristics, respectively, according to the purpose of the search, according to claim 1 or 2 Information retrieval device.
  4. The match profile storage unit
    4. The information search apparatus according to claim 1, wherein at least one of word element matching, attribute matching, and dependency matching is associated as the matching condition. 5.
  5. Wherein the input unit inputs the configured search subject sentence from the plurality of words,
    Wherein the analysis unit analyzes the searched text to obtain an analysis result relating to the words constituting said search subject sentence,
    Based on the analysis results, the rule information including attribute information indicating character information, and the attribute of the word related to the character string of the word, in association with clause constituted by at least one of said word, a subtree node a dictionary information configured in a tree structure, a dictionary creation unit for creating and storing the matching dictionary storage unit said matching dictionary information about configured sentence by at least one of the clauses,
    The information search device according to any one of claims 1 to 4, further comprising:
  6. The input section is
    Accept input of search key sentence consisting of multiple words,
    The analysis department
    Analyzing the search key sentence, obtaining an analysis result relating to the word constituting the search key sentence,
    The search processor
    A matching condition for collating the relationship between the match dictionary information and the search key sentence is associated, and an evaluation for evaluating the degree of matching with the search key sentence for a word that satisfies the matching condition Read the match profile information from a match profile storage unit that stores match profile information having a reference;
    Information about the clause included in the sentence as a match dictionary information about a sentence constituted by at least one of the clauses as a subtree node in a tree structure including a clause constituted by at least one of the words Using the match dictionary information in the match dictionary storage unit that stores rule information representing, the matching between the search key sentence and the match dictionary information according to the associated matching condition based on the match profile information And
    As a result of collation, for the sentence satisfying the matching condition, a score representing a degree of collation between the search key sentence and the match dictionary information is calculated according to the evaluation criterion associated with the match profile information. Information search method characterized by
  7. Computer
    An input means for inputting a search key sentence composed of a plurality of words;
    Analyzing means for analyzing the search key sentence and obtaining an analysis result relating to the word constituting the search key sentence;
    A matching condition for collating the relationship between the match dictionary information and the search key sentence is associated, and an evaluation for evaluating the degree of matching with the search key sentence for a word that satisfies the matching condition A dictionary in which the match profile information is read from a match profile storage unit that stores match profile information having a reference, and a clause composed of at least one of the words is configured as a sub-tree node in a tree structure, Based on the match profile information, using the match dictionary information of the match dictionary storage unit that stores rule information representing information about the clause included in the sentence, as match dictionary information about the sentence constituted by the clause, The search key according to the matching condition And the match dictionary information, and as a result of the collation, for the sentence satisfying the matching condition, according to the evaluation criteria associated with the match profile information, the search key sentence and the match dictionary information A program for functioning as a search processing means for calculating a score representing the degree of collation.
  8. Wherein the input means inputs the configured search subject sentence from the plurality of words,
    It said analyzing means analyzes the searched text to obtain an analysis result relating to the words constituting said search subject sentence,
    Said computer further based on the analysis result, the rule information including attribute information indicating character information, and the attribute of the word related to the character string of the word, in association with clause constituted by at least one of said word , a dictionary information configured in a tree structure as a sub-tree nodes, to function as a dictionary creation means for storing in said matching dictionary storage unit to create the matching dictionary information about configured sentence by at least one of the clauses The program according to claim 7 .
JP2009116025A 2009-05-12 2009-05-12 Information search apparatus, information search method, and program Active JP5439028B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2009116025A JP5439028B2 (en) 2009-05-12 2009-05-12 Information search apparatus, information search method, and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2009116025A JP5439028B2 (en) 2009-05-12 2009-05-12 Information search apparatus, information search method, and program

Publications (2)

Publication Number Publication Date
JP2010266970A JP2010266970A (en) 2010-11-25
JP5439028B2 true JP5439028B2 (en) 2014-03-12

Family

ID=43363916

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2009116025A Active JP5439028B2 (en) 2009-05-12 2009-05-12 Information search apparatus, information search method, and program

Country Status (1)

Country Link
JP (1) JP5439028B2 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5699789B2 (en) * 2011-05-10 2015-04-15 ソニー株式会社 Information processing apparatus, information processing method, program, and information processing system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10105555A (en) * 1996-09-26 1998-04-24 Sharp Corp Translation-with-original example sentence retrieving device
JP3879324B2 (en) * 1999-09-14 2007-02-14 富士ゼロックス株式会社 Document summarization apparatus, document summarization method, and recording medium
JP4005343B2 (en) * 2001-12-04 2007-11-07 東京ソフト株式会社 Information retrieval system
JP4654776B2 (en) * 2005-06-03 2011-03-23 富士ゼロックス株式会社 Question answering system, data retrieval method, and computer program

Also Published As

Publication number Publication date
JP2010266970A (en) 2010-11-25

Similar Documents

Publication Publication Date Title
Gupta et al. A survey of text mining techniques and applications
Moldovan et al. Using wordnet and lexical operators to improve internet searches
JP4241934B2 (en) Text processing and retrieval system and method
US6366908B1 (en) Keyfact-based text retrieval system, keyfact-based text index method, and retrieval method
US6415283B1 (en) Methods and apparatus for determining focal points of clusters in a tree structure
US7403938B2 (en) Natural language query processing
US5953718A (en) Research mode for a knowledge base search and retrieval system
KR100820662B1 (en) Search query categorization for business listings search
US7509313B2 (en) System and method for processing a query
US7194455B2 (en) Method and system for retrieving confirming sentences
US7346487B2 (en) Method and apparatus for identifying translations
US6446035B1 (en) Finding groups of people based on linguistically analyzable content of resources accessed
US7739215B2 (en) Cost-benefit approach to automatically composing answers to questions by extracting information from large unstructured corpora
EP0597630B1 (en) Method for resolution of natural-language queries against full-text databases
US7398201B2 (en) Method and system for enhanced data searching
US6678677B2 (en) Apparatus and method for information retrieval using self-appending semantic lattice
US7401077B2 (en) Systems and methods for using and constructing user-interest sensitive indicators of search results
JP4467184B2 (en) Semantic analysis and selection of documents with knowledge creation potential
JP3666004B2 (en) Multilingual document search system
US7107218B1 (en) Method and apparatus for processing queries
US6823325B1 (en) Methods and apparatus for storing and retrieving knowledge
US7912849B2 (en) Method for determining contextual summary information across documents
JP5169816B2 (en) Question answering device, question answering method, and question answering program
US7092936B1 (en) System and method for search and recommendation based on usage mining
JP2012248210A (en) System and method for retrieving content of complicated language such as japanese

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20120425

A711 Notification of change in applicant

Free format text: JAPANESE INTERMEDIATE CODE: A712

Effective date: 20120611

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A821

Effective date: 20120611

RD02 Notification of acceptance of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7422

Effective date: 20130515

RD02 Notification of acceptance of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7422

Effective date: 20130620

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20130625

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20130627

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A821

Effective date: 20130620

RD04 Notification of resignation of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7424

Effective date: 20130816

RD02 Notification of acceptance of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7422

Effective date: 20130823

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20130823

A711 Notification of change in applicant

Free format text: JAPANESE INTERMEDIATE CODE: A712

Effective date: 20131008

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A821

Effective date: 20131008

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20131119

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20131216

R150 Certificate of patent or registration of utility model

Ref document number: 5439028

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

Free format text: JAPANESE INTERMEDIATE CODE: R150

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

S531 Written request for registration of change of domicile

Free format text: JAPANESE INTERMEDIATE CODE: R313531

S111 Request for change of ownership or part of ownership

Free format text: JAPANESE INTERMEDIATE CODE: R313117

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250