WO2016013209A1 - 文集合抽出システム、方法およびプログラム - Google Patents
文集合抽出システム、方法およびプログラム Download PDFInfo
- Publication number
- WO2016013209A1 WO2016013209A1 PCT/JP2015/003652 JP2015003652W WO2016013209A1 WO 2016013209 A1 WO2016013209 A1 WO 2016013209A1 JP 2015003652 W JP2015003652 W JP 2015003652W WO 2016013209 A1 WO2016013209 A1 WO 2016013209A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sentence
- sentences
- similar
- sentence set
- specific
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
Definitions
- the present invention relates to a sentence set extraction system, a sentence set extraction method, and a sentence set extraction program for extracting a set into which a sentence to be analyzed is classified.
- Text mining is a data analysis technique that uses text data written in a natural language as input, grasps the overall tendency of the content, and discovers useful knowledge. By using text mining, for example, it is possible to grasp the contents of an inquiry from a call memo in a call center.
- Patent Document 1 describes a text mining system that displays a dependency relationship network structure between words by paying attention to a relationship of three or more words.
- the text mining system described in Patent Document 1 analyzes linguistic information included in a large amount of text data, extracts relevance of words and dependency relationships, and visualizes the text mining results of these relevances. indicate.
- Patent Document 2 describes a method of determining the synonyms and implications between texts and clustering the texts having the same meaning so that the contents of the texts can be directly comprehended.
- Such an extractor can be realized by constructing an extraction rule and an extraction learning model in advance.
- the target text can be efficiently extracted from a large amount of text data. Can be extracted.
- the text that can be extracted by such an extractor is limited to text that indicates the content of the assumed classification in advance. That is, since it is difficult to prepare an extractor having a content that cannot be assumed in advance, the text having an unexpected content may be overlooked.
- FIG. 12 is an explanatory diagram showing an example of a method for extracting a specific opinion by a general method.
- FIG. 12 shows an example of a call center. For example, it is assumed that complaints and requests are classified and extracted from inquiries to a call center.
- the underlined text illustrated in FIG. 12 indicates a claim or a request.
- the present invention provides a sentence set extraction system, a sentence that can exhaustively and efficiently extract each classified sentence even when the sentence set to be analyzed includes various classifications. It is an object to provide a set extraction method and a sentence set extraction program.
- the sentence set extraction system includes a similar sentence set generation unit that creates a similar sentence set by grouping sentences representing the same concept or event from a set of analysis target sentences, and a specific classification from the set of analysis target sentences.
- a similar sentence that extracts one or more sentences that are not extracted by the specific sentence extractor from the sentences belonging to the similar sentence set as an excluded similar sentence set by using one or more specific sentence extractors that can extract the belonging specific sentence And a set extraction unit.
- Another sentence set extraction system uses one or more specific sentence extractors capable of extracting a specific sentence belonging to a specific classification from a set of analysis target sentences, and extracts a specific sentence from the set of analysis target sentences.
- Analytical sentence set generation unit that generates an analytical sentence set excluding the sentence extracted by a container, and a sentence that represents the same concept or event is grouped from the analytical sentence set to create a similar sentence set, and the number of included sentences Is provided with a similar sentence set specifying unit for specifying a similar sentence set satisfying a predetermined condition.
- the sentence set extraction method creates a similar sentence set by grouping sentences representing the same concept or event from a set of analysis target sentences, and can extract a specific sentence belonging to a specific classification from the set of analysis target sentences.
- One or more specific sentence extractors are used to extract one or more sentences not extracted by the specific sentence extractor from the sentences belonging to the similar sentence set as an excluded similar sentence set.
- Another sentence set extraction method uses one or more specific sentence extractors capable of extracting a specific sentence belonging to a specific classification from a set of analysis target sentences, and extracts a specific sentence from the set of analysis target sentences. Generate a set of analysis sentences that excludes sentences extracted by the container, group sentences that represent the same concept or event from the analysis sentence set, create a similar sentence set, and the number of included sentences satisfies the specified condition It is characterized by specifying a set of similar sentences.
- the sentence set extraction program includes a similar sentence set generation process for creating a similar sentence set by grouping sentences representing the same concept or event from a set of analysis target sentences, and a set of analysis target sentences. Using one or more specific sentence extractors that can extract specific sentences belonging to a specific classification, one or more sentences that are not extracted by the specific sentence extractor from sentences belonging to the similar sentence set are excluded as similar sentence sets. A similar sentence set extraction process to be extracted is executed.
- Another sentence set extraction program uses a computer to extract one or more specific sentence extractors that can extract a specific sentence belonging to a specific category from a set of analysis target sentences.
- Analytical sentence set generation processing that generates an analytical sentence set excluding the sentence extracted by the specific sentence extractor, and a sentence that represents the same concept or event is grouped from the analytical sentence set, and a similar sentence set is created and included
- a similar sentence set specifying process for specifying a similar sentence set in which the number of sentences to be satisfied satisfies a predetermined condition is executed.
- each classified sentence can be extracted comprehensively and efficiently.
- FIG. FIG. 1 is a block diagram showing a configuration example of a first embodiment of a sentence set extraction system according to the present invention.
- the sentence set extraction system of this embodiment includes an analysis target sentence input unit 11, a similar sentence set generation unit 12, and a similar sentence set extraction unit 13.
- the sentence set extraction system extracts a sentence set for each classification from a sentence set in which a content to be analyzed is described.
- the sentence is not limited to a unit delimited by a period or a period, and includes a group of words representing a predetermined meaning.
- FIG. 2 is an explanatory diagram showing the relationship of sentences used in the present invention.
- a set of sentences in which contents to be analyzed, such as requests and claims, are included in the set of sentences.
- this sentence is referred to as an analysis target sentence.
- the analysis target sentence corresponds to a request sentence indicating a request from each user.
- each sentence included in the set of analysis target sentences is classified according to the characteristics of the analysis target sentence.
- sentences classified as analysis target sentences are referred to as specific sentences. Note that, among the sentences to be analyzed, sentences in which the contents of requests and complaints are classified can be referred to as specific opinion sentences.
- memos created by call center operators are information that can be used to improve products and services.
- the entire sentence included in the memo or the like corresponds to a set of sentences, and a sentence indicating a request or a complaint corresponds to an analysis target sentence.
- the sentence to be analyzed is divided into a plurality of items such as “I want you to reduce the price” and “I want you to improve the service contents”, which corresponds to the specific sentence (specific opinion sentence).
- the analysis target sentence input unit 11 inputs an analysis target sentence.
- the analysis target sentence input unit 11 may read and input an analysis target sentence stored in a storage device (not shown), or may be input by receiving an analysis target sentence transmitted from another system or apparatus. May be.
- the analysis target sentence input unit 11 extracts an analysis target sentence including contents to be analyzed from the input sentence set. Also good. In this case, the analysis target sentence input unit 11 may extract the analysis target sentence using a generally known extractor.
- the analysis target sentence input unit 11 may input the text input in the input field as the analysis target sentence. . Further, the analysis target sentence input unit 11 may perform format conversion of the input analysis target sentence as necessary.
- the similar sentence set generation unit 12 creates a similar sentence set by grouping similar sentences from the set of analysis target sentences.
- a method for creating a similar sentence set is arbitrary.
- the similar sentence set generation unit 12 calculates similarity between sentences based on words and syntax included in each sentence, and aggregates similar sentences to collect similar sentences. May be.
- the similar sentence set generation unit 12 may generate a similar sentence set using a general clustering technique. Each sentence included in the similar sentence set classified in this way corresponds to a specific sentence.
- FIG. 3 is an explanatory diagram showing an example of processing for generating a similar sentence set.
- the analysis target sentence input unit 11 performs analysis target sentence extraction processing from ten pieces of text data indicating an inquiry to the call center, and extracts eight analysis target sentences.
- the similar sentence set generation unit 12 creates a similar sentence set from the set of analysis target sentences.
- each row indicated in the similar sentence count result corresponds to a similar sentence set.
- the specific sentences “high price” and “high price” indicating the same event belong to the same similar sentence set, and the specific sentences “UI is bad” and “useless” are the same. Belong to the same sentence set.
- the similar sentence set in which the analysis target sentences are classified preferably has a semantic group (the same concept) so that the classified contents can be understood. Therefore, it is desirable that the similar sentence set generation unit 12 generates a similar sentence set by grouping semantically similar sentences from the set of analysis target sentences.
- a method of grouping semantically similar sentences a method of clustering based on synonyms or implications is known.
- the similar sentence set generation unit 12 may generate a similar sentence set from a set of analysis target sentences using, for example, a method described in Patent Document 2. By clustering based on synonyms or implications, the contents of a set of similar sentences can be tabulated in a form that can be directly understood.
- the similar sentence set generation unit 12 may specify a sentence (hereinafter referred to as a representative sentence) indicating the contents of the similar sentence set.
- a sentence hereinafter referred to as a representative sentence
- the similar sentence set generation unit 12 may specify text indicating content implied by a large number of sentences included in the similar sentence set as a representative sentence.
- the similar sentence set generation unit 12 may specify a cluster-centered text as a representative sentence.
- the similar sentence set extraction unit 13 extracts a specific sentence from sentences belonging to the similar sentence set using an extractor (hereinafter referred to as a specific sentence extractor) that can extract a specific sentence from a set of analysis target sentences. Identify sentences that cannot be extracted with a container.
- a specific sentence extractor an extractor that can extract a specific sentence from a set of analysis target sentences.
- the specific sentence extractor is prepared in advance according to the extraction target. If the specific sentence extractor can extract the specific sentence which shows the desired content from the set of analysis object sentences, the aspect is arbitrary.
- the similar sentence set extraction unit 13 may use, for example, a specific sentence extractor that extracts text that matches a regular expression including a word indicating a desired content.
- the method used by the specific sentence extractor to extract the specific sentence is not limited to a regular expression, and for example, a method of extracting a specific sentence based on an extraction rule or an extraction learning model may be used. .
- the similar sentence set extraction unit 13 extracts a specific sentence for each similar sentence set using one or more specific sentence extractors.
- the similar sentence set extraction unit 13 may count the number of specific sentences extracted from each similar sentence set for each specific sentence extractor.
- the similar sentence set extraction part 13 specifies the sentence which was not extracted by the specific sentence extractor for every similar sentence set.
- the similar sentence set extraction unit 13 may specify a sentence that has not been extracted, for example, by excluding the specific sentence extracted by the specific sentence extractor from the entire similar sentence set.
- the similar sentence set extraction unit 13 counts the number of sentences not extracted for each similar sentence set. Then, the similar sentence set extraction unit 13 extracts one or more sentences not extracted by the specific sentence extractor from the sentences belonging to the similar sentence set as a similar sentence set. At this time, the similar sentence set extraction unit 13 extracts a similar sentence set according to the number of extracted specific sentences. Specifically, the similar sentence set extraction unit 13 extracts a similar sentence set in which the number of sentences included in the specified similar sentence set satisfies a predetermined condition.
- the similar sentence set extraction unit 13 may extract, for example, a similar sentence set in which the number of specified sentences is equal to or greater than a predetermined threshold.
- the similar sentence set extraction unit 13 determines a threshold according to a ratio between “the number of sentences extracted by the specific sentence extractor” and “the number of sentences not extracted by the specific sentence extractor”, for example.
- a similar sentence set in which the number of specified sentences is equal to or greater than the determined threshold value may be extracted.
- the threshold value is set lower as the “number of sentences not extracted by the specific sentence extractor” is larger than the “number of sentences extracted by the specific sentence extractor”.
- the classification of a similar sentence set extracted in this way can be said to be a classification in which there is no extractor for individually extracting the sentence to which the sentence belongs, despite the classification to which many sentences included in the analysis target sentence belong. Therefore, by separately creating an extractor for extracting sentences belonging to this similar sentence set, it becomes possible to efficiently extract a specific sentence from the analysis target sentence, and the classification extracted from the analysis target sentence. Comprehensiveness can also be improved.
- the extracted similar sentence set can be used as learning data for generating an extractor.
- the similar sentence set extraction unit 13 extracts a similar sentence set, thereby specifying a target similar sentence set for which an extractor should be individually generated, and further generating the extractor. Learning data can be collected efficiently.
- the similar sentence set extraction unit 13 may count the number of sentences extracted using the specific sentence extractor for each similar sentence set and display it in a table format.
- FIG. 4 is an explanatory diagram showing an example in which the number of sentences to be extracted is displayed in a table format.
- a similar sentence set is set on the front side, and the content of the specific sentence extractor used for extraction is set on the front of the table.
- the rightmost column of the table indicates the number of sentences that have not been extracted by the specific sentence extractor.
- sentences included in the similar sentence set indicating the content “high charge, high price” are extracted by using a specific sentence extractor that extracts “dissatisfaction with charge”. It is shown that five cases have been extracted using a specific sentence extractor that extracts “dissatisfaction regarding service contents”. In the case of the example shown in FIG. 4, the number of sentences that are not extracted using these two extractors among the sentences included in the similar sentence set indicating the content “high price, high price” is 0. It shows that there was.
- the similar sentence set extraction unit 13 can determine that “other companies have better benefits, other companies better”, “ Two similar sentence sets that cannot be used on their own terminals can be extracted.
- condition used by the similar sentence set extraction unit 13 to extract a similar sentence set is not limited to the number of sentences included in one similar sentence set.
- the similar sentence set extraction unit 13 may use the number of sentences included in a new similar sentence set obtained by combining a plurality of identified similar sentence sets as a condition for extracting the similar sentence set.
- the similar sentence set extraction unit 13 has a predetermined number of sentences included in a new similar sentence set in which one or more similar sentence sets including sentences not extracted by the specific sentence extractor are combined (combined). A similar sentence set that satisfies the conditions (ratio and number of cases) may be extracted.
- the similar sentence set generation unit 12 may determine whether or not to extract a new similar sentence set obtained by combining a plurality of similar sentence sets.
- ⁇ A method for combining a plurality of similar sentence sets is arbitrary.
- the similar sentence set generation unit 12 may combine a plurality of similar sentence sets designated by the user.
- the similar sentence set generation unit 12 may combine the similar sentence sets determined to be similar using any method for determining the similarity between the similar sentence sets.
- the similar sentence set generation unit 12 is similar according to the number of sentences included in the similar sentence set and the ratio of the sentence extracted by the specific sentence extractor and the sentence not extracted, as in the method described above.
- a sentence set may be extracted.
- the similar sentence set generation unit 12 compares the value calculated according to the similarity between the combined similar sentence sets with a threshold without using the number of sentences included in each combined similar sentence set as it is. You may do it.
- the similar sentence set generation unit 12 is generated by combining, for example, when the number of sentences included in two combined similar sentence sets is added or multiplied and the value obtained by multiplying the similarity exceeds a predetermined threshold value. A new set of similar sentences may be extracted.
- the analysis target sentence input unit 11, the similar sentence set generation unit 12, and the similar sentence set extraction unit 13 are realized by a CPU of a computer that operates according to a program (sentence set extraction program).
- the program is stored in a storage unit (not shown) included in the information processing apparatus that implements the sentence set extraction system, and the CPU reads the program, and according to the program, the analysis target sentence input unit 11 and the similar sentence set generation
- the unit 12 and the similar sentence set extraction unit 13 may operate.
- the analysis target sentence input unit 11, the similar sentence set generation unit 12, and the similar sentence set extraction unit 13 may each be realized by dedicated hardware.
- FIG. 5 is a flowchart showing an operation example of the sentence set extraction system of this embodiment.
- the analysis target sentence input unit 11 inputs an analysis target sentence (step S11).
- the similar sentence set generation unit 12 creates a similar sentence set by grouping sentences having similar semantic contents from the set of input analysis target sentences (step S12).
- the similar sentence set extraction unit 13 specifies sentences that are not extracted by the specific sentence extractor from sentences belonging to the similar sentence set (step S13), and counts the number of sentences specified for each similar sentence set (step S14). ). Then, the similar sentence set extraction unit 13 extracts a similar sentence set in which the number of specified sentences satisfies a predetermined condition (step S15).
- the similar sentence set generation unit 12 creates a similar sentence set by grouping similar sentences from the set of analysis target sentences, and one or more similar sentence set extraction units 13 The one or more sentences that are not extracted by the specific sentence extractor are extracted as a similar sentence set from the sentences belonging to the similar sentence set.
- FIG. FIG. 6 is a block diagram showing a configuration example of the second embodiment of the sentence set extraction system according to the present invention.
- symbol same as FIG. 1 is attached
- subjected and description is abbreviate
- the sentence set extraction system of this embodiment includes an analysis target sentence input unit 11, an analysis sentence set generation unit 22, and a similar sentence set specification unit 23.
- the sentence set extraction system of the present exemplary embodiment includes an analysis sentence set generation unit 22 and a similar sentence set identification unit 23 instead of the similar sentence set generation unit 12 and the similar sentence set extraction unit 13 in the first embodiment. ing.
- the analysis sentence set generation unit 22 generates a set excluding the sentence extracted by the specific sentence extractor (hereinafter referred to as an analysis sentence set) from the set of analysis target sentences.
- the contents of the specific sentence extractor used by the analysis sentence set generation unit 22 are the same as those of the specific sentence extractor used by the similar sentence set extraction unit 13 in the first embodiment.
- the analysis sentence set generation unit 22 extracts a specific sentence from the analysis sentence target sentence using one or more specific sentence extractors, and excludes the extracted specific sentence from the analysis target sentence. Generate an analysis sentence set.
- the similar sentence set specifying unit 23 creates a similar sentence set by grouping similar sentences from the generated analysis sentence set.
- the method for creating the similar sentence set is the same as the method for creating the similar sentence set by the similar sentence set generation unit 12 of the first embodiment.
- the similar sentence set specifying unit 23 counts the number of sentences included in each similar sentence set, and specifies a similar sentence set in which the number of sentences included in the similar sentence set satisfies a predetermined condition.
- the similar sentence set specifying unit 23 may specify a similar sentence set in which the number of sentences included in the similar sentence set is equal to or greater than a predetermined threshold, and the similar sentence set extraction of the first embodiment
- the ratio used by the unit 13 may be compared with a threshold value to specify a similar sentence set.
- the classification of the similar sentence set identified in this way is an extraction for individually extracting the sentence to which the sentence belongs, regardless of the classification to which many sentences included in the analysis target sentence belong. It can be said that there is no vessel. Therefore, by creating a separate extractor for extracting sentences belonging to this set of similar sentences, it becomes possible to efficiently extract specific sentences from the analysis target sentence, and the comprehensiveness of the classification extracted from the analysis target sentence Can also be increased.
- the similar sentence set specifying unit 23 may display the number of sentences included in each similar sentence set in a table format.
- FIG. 7 is an explanatory diagram showing an example in which the number of sentences included in the extracted similar sentence set is displayed in a table format.
- the number of sentences included in each similar sentence set illustrated in FIG. 7 corresponds to the number of sentences not extracted by the specific sentence extractor in FIG.
- the analysis target sentence input unit 11, the analysis sentence set generation unit 22, and the similar sentence set specification unit 23 are realized by a CPU of a computer that operates according to a program (sentence set extraction program).
- the analysis target sentence input unit 11, the analysis sentence set generation unit 22, and the similar sentence set specification unit 23 may each be realized by dedicated hardware.
- FIG. 8 is a flowchart showing an operation example of the sentence set extraction system of this embodiment.
- the analysis target sentence input unit 11 inputs an analysis target sentence (step S11).
- the analysis sentence set generation unit 22 generates an analysis sentence set by excluding the sentence extracted by the specific sentence extractor from the set of analysis target sentences (step S22).
- the similar sentence set specifying unit 23 creates a similar sentence set by grouping sentences having similar semantic contents from the analysis sentence set (step S23).
- the similar sentence set specifying unit 23 counts the number of sentences included in each similar sentence set (step S24), and specifies a similar sentence set in which the number of sentences included in the similar sentence set satisfies a predetermined condition (step S25). ).
- the analysis sentence set generation unit 22 generates an analysis sentence set excluding sentences extracted by one or more specific sentence extractors from a set of analysis target sentences, and a similar sentence set
- the specifying unit 23 creates a similar sentence set by grouping similar sentences from the analysis sentence set. Then, the similar sentence set specifying unit 23 specifies a similar sentence set in which the number of included sentences satisfies a predetermined condition.
- the sentence set extraction system since the sentence extracted by the specific sentence extractor is excluded before the similar sentence set is created, it is possible to reduce the sentences to which the similar sentence set is created. Compared with the sentence set extraction system of the first embodiment, the processing time can be further shortened.
- the sentence extracted by each specific sentence extractor can be specified before excluding the sentence extracted by the specific sentence extractor. Therefore, as compared with the sentence set extraction system of the second embodiment, the number of sentences extracted by a plurality of specific sentence extractors can be specified.
- FIG. 9 is a block diagram showing an outline of a sentence set extraction system according to the present invention.
- a sentence set extraction system 81 according to the present invention generates a similar sentence set (for example, a set of specific sentences) by grouping sentences representing the same concept or event from a set of analysis target sentences.
- a similar sentence set generation unit 12 and one or more specific sentence extractors capable of extracting a specific sentence belonging to a specific classification from a set of analysis target sentences, a specific sentence is extracted from sentences belonging to the similar sentence set
- a similar sentence set extraction unit 82 for example, a similar sentence set extraction unit 13 that extracts one or more sentences that are not extracted by a container as an excluded similar sentence set.
- Such a configuration makes it possible to exhaustively and efficiently extract each classified sentence even when the classification of sentences to be analyzed includes various classifications.
- the similar sentence set extraction unit 82 determines that the number of sentences included in a new similar sentence set including one or more similar sentence sets including sentences that are not extracted by the specific sentence extractor is a predetermined condition (for example, A similar sentence set that satisfies the number of sentences, the ratio, etc. is equal to or greater than a predetermined threshold) may be extracted. Further, the similar sentence set extraction unit 82 specifies a similar sentence set including sentences that are not extracted by the specific sentence extractor, and selects a similar sentence set in which the number of sentences included in the specified similar sentence set satisfies a predetermined condition. It may be extracted.
- a predetermined condition for example, A similar sentence set that satisfies the number of sentences, the ratio, etc. is equal to or greater than a predetermined threshold
- the similar sentence set generation unit 81 may create a similar sentence set by clustering a set of analysis target sentences based on synonyms or implications between the analysis target sentences. With such a configuration, the contents of a similar sentence set can be tabulated in a form that can be directly understood. Therefore, the contents extracted by the extractor to be newly generated can also be classified into easy-to-understand contents.
- the similar sentence set extraction unit 82 totals the number of sentences extracted using the specific sentence extractor for each similar sentence set, and the number of sentences extracted by each feature sentence extractor and the specific sentence extractor. The number of sentences that are not extracted may be output for each similar sentence set. By doing so, it becomes easy to grasp the extraction status of the specific sentence extractor currently used and the similar sentence set that needs to be newly created.
- the sentence set extraction system may include an analysis target sentence input unit (for example, the analysis target sentence input unit 11) that extracts an analysis target sentence from a set of input sentences.
- an analysis target sentence input unit for example, the analysis target sentence input unit 11
- information other than the object for which the extractor is to be created can be excluded in advance, so that a specific sentence extractor with high accuracy can be generated.
- FIG. 10 is a block diagram showing another outline of the sentence set extraction system according to the present invention.
- Another sentence set extraction system according to the present invention uses one or more specific sentence extractors capable of extracting a specific sentence belonging to a specific classification from a set of analysis target sentences, and extracts a specific sentence from the set of analysis target sentences.
- An analysis sentence set generation unit 91 (for example, an analysis sentence set generation unit 22) that generates an analysis sentence set excluding sentences extracted by a container, and sentences that represent the same concept or event from the analysis sentence set are grouped and similar
- a similar sentence set identifying unit 92 (for example, a similar sentence set identifying unit 23) that creates a sentence set and identifies a similar sentence set that satisfies a predetermined condition (for example, a predetermined threshold or more).
- each classified sentence can be exhaustively and efficiently extracted even when various classifications are included in the set of sentences to be analyzed.
- the similar sentence set specifying unit 92 may create a similar sentence set by clustering the analysis sentence sets based on the synonyms or implications of the analysis target sentences. Even with such a configuration, the contents of a similar sentence set can be tabulated in a form that can be directly understood. Therefore, the contents extracted by the extractor to be newly generated can also be classified into easy-to-understand contents.
- FIG. 11 is a block diagram showing an outline of the configuration of a computer.
- the computer 1000 includes a CPU 1001, a main storage device 1002, an auxiliary storage device 1003, and an interface 1004.
- the sentence set extraction system described above is implemented in one or more computers 1000.
- the sentence set extraction system according to the present invention may be configured by one device, or may be configured by connecting two or more physically separated devices in a wired or wireless manner.
- each processing unit described above is stored in the auxiliary storage device 1003 in the form of a program (sentence set extraction program).
- the CPU 1001 reads out the program from the auxiliary storage device 1003, develops it in the main storage device 1002, and executes the above processing according to the above program.
- the auxiliary storage device 1003 is an example of a tangible medium that is not temporary.
- Other examples of non-temporary tangible media include magnetic disk, magneto-optical disk, CD-ROM (Compact Disc Read Only Memory), DVD-ROM (Digital Versatile Disk Read Only Memory) connected via the interface 1004 And semiconductor memory.
- CD-ROM Compact Disc Read Only Memory
- DVD-ROM Digital Versatile Disk Read Only Memory
- the program may be for realizing a part of the above-described functions. Further, the program may be a so-called difference file (difference program) that realizes the above-described function in combination with another program already stored in the auxiliary storage device 1003.
- difference file difference program
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
図1は、本発明による文集合抽出システムの第1の実施形態の構成例を示すブロック図である。本実施形態の文集合抽出システムは、分析対象文入力部11と、類似文集合生成部12と、類似文集合抽出部13とを備えている。
含意による類似文集合2:「待ち時間が長い、画面切り替えで待たされる」
図6は、本発明による文集合抽出システムの第2の実施形態の構成例を示すブロック図である。なお、第1の実施形態と同様の構成については、図1と同一の符号を付し、説明を省略する。本実施形態の文集合抽出システムは、分析対象文入力部11と、分析文集合生成部22と、類似文集合特定部23とを備えている。
12 類似文集合生成部
13 類似文集合抽出部
22 分析文集合生成部
23 類似文集合特定部
Claims (16)
- 分析対象文の集合から、同じ概念または出来事を表わす文をグループ化して類似文集合を作成する類似文集合生成部と、
分析対象文の集合から特定の分類に属する特定文を抽出可能な1つ以上の特定文抽出器を用いて、前記類似文集合に属する文の中から当該特定文抽出器で抽出されない1つ以上の文を除外類似文集合として抽出する類似文集合抽出部とを備えた
ことを特徴とする文集合抽出システム。 - 類似文集合抽出部は、特定文抽出器で抽出されない文を含む1つ以上の類似文集合を纏めた新たな類似文集合に含まれる文の数が所定の条件を満たす類似文集合を抽出する
請求項1記載の文集合抽出システム。 - 類似文集合抽出部は、特定文抽出器で抽出されない文を含む類似文集合をそれぞれ特定し、特定された類似文集合に含まれる文の数が所定の条件を満たす類似文集合を抽出する
請求項1記載の文集合抽出システム。 - 類似文集合生成部は、分析対象文同士の同義または含意関係に基づいて、分析対象文の集合をクラスタリングすることにより、類似文集合を作成する
請求項1から請求項3のうちのいずれか1項に記載の文集合抽出システム。 - 類似文集合抽出部は、特定文抽出器を用いて抽出される文の数を類似文集合ごとに集計し、各特徴文抽出器で抽出された文の数および当該特定文抽出器で抽出されなかった文の数を類似文集合ごとに出力する
請求項1から請求項4のうちのいずれか1項に記載の文集合抽出システム。 - 入力される文の集合から、分析対象文を抽出する分析対象文入力部を備えた
請求項1から請求項5のうちのいずれか1項に記載の文集合抽出システム。 - 分析対象文の集合から特定の分類に属する特定文を抽出可能な1つ以上の特定文抽出器を用いて、当該分析対象文の集合から前記特定文抽出器で抽出される文を除外した分析文集合を生成する分析文集合生成部と、
前記分析文集合から、同じ概念または出来事を表わす文をグループ化して類似文集合を作成し、含まれる文の数が所定の条件を満たす類似文集合を特定する類似文集合特定部とを備えた
ことを特徴とする文集合抽出システム。 - 類似文集合特定部は、分析対象文同士の同義または含意関係に基づいて、分析文集合をクラスタリングすることにより、類似文集合を作成する
請求項7記載の文集合抽出システム。 - 分析対象文の集合から、同じ概念または出来事を表わす文をグループ化して類似文集合を作成し、
分析対象文の集合から特定の分類に属する特定文を抽出可能な1つ以上の特定文抽出器を用いて、前記類似文集合に属する文の中から当該特定文抽出器で抽出されない1つ以上の文を除外類似文集合として抽出する
ことを特徴とする文集合抽出方法。 - 特定文抽出器で抽出されない文を含む1つ以上の類似文集合を纏めた新たな類似文集合に含まれる文の数が所定の条件を満たす類似文集合を抽出する
請求項9記載の文集合抽出方法。 - 分析対象文の集合から特定の分類に属する特定文を抽出可能な1つ以上の特定文抽出器を用いて、当該分析対象文の集合から前記特定文抽出器で抽出される文を除外した分析文集合を生成し、
前記分析文集合から、同じ概念または出来事を表わす文をグループ化して類似文集合を作成し、
含まれる文の数が所定の条件を満たす類似文集合を特定する
ことを特徴とする文集合抽出方法。 - 分析対象文同士の同義または含意関係に基づいて、分析文集合をクラスタリングすることにより、類似文集合を作成する
請求項11記載の文集合抽出方法。 - コンピュータに、
分析対象文の集合から、同じ概念または出来事を表わす文をグループ化して類似文集合を作成する類似文集合生成処理、および、
分析対象文の集合から特定の分類に属する特定文を抽出可能な1つ以上の特定文抽出器を用いて、前記類似文集合に属する文の中から当該特定文抽出器で抽出されない1つ以上の文を除外類似文集合として抽出する類似文集合抽出処理
を実行させるための文集合抽出プログラム。 - コンピュータに、
類似文集合抽出処理で、特定文抽出器で抽出されない文を含む1つ以上の類似文集合を纏めた新たな類似文集合に含まれる文の数が所定の条件を満たす類似文集合を抽出させる
請求項13記載の文集合抽出プログラム。 - コンピュータに、
分析対象文の集合から特定の分類に属する特定文を抽出可能な1つ以上の特定文抽出器を用いて、当該分析対象文の集合から前記特定文抽出器で抽出される文を除外した分析文集合を生成する分析文集合生成処理、および、
前記分析文集合から、同じ概念または出来事を表わす文をグループ化して類似文集合を作成し、含まれる文の数が所定の条件を満たす類似文集合を特定する類似文集合特定処理
を実行させるための文集合抽出プログラム。 - コンピュータに、
類似文集合特定処理で、分析対象文同士の同義または含意関係に基づいて、分析文集合をクラスタリングさせることにより、類似文集合を作成させる
請求項15記載の文集合抽出プログラム。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/328,199 US20170220585A1 (en) | 2014-07-23 | 2015-07-21 | Sentence set extraction system, method, and program |
JP2016535794A JP6536580B2 (ja) | 2014-07-23 | 2015-07-21 | 文集合抽出システム、方法およびプログラム |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2014149425 | 2014-07-23 | ||
JP2014-149425 | 2014-07-23 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016013209A1 true WO2016013209A1 (ja) | 2016-01-28 |
Family
ID=55162753
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2015/003652 WO2016013209A1 (ja) | 2014-07-23 | 2015-07-21 | 文集合抽出システム、方法およびプログラム |
Country Status (3)
Country | Link |
---|---|
US (1) | US20170220585A1 (ja) |
JP (1) | JP6536580B2 (ja) |
WO (1) | WO2016013209A1 (ja) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11328025B1 (en) | 2019-04-26 | 2022-05-10 | Bank Of America Corporation | Validating mappings between documents using machine learning |
US11783005B2 (en) | 2019-04-26 | 2023-10-10 | Bank Of America Corporation | Classifying and mapping sentences using machine learning |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003141132A (ja) * | 2001-10-30 | 2003-05-16 | Nippon Yunishisu Kk | 情報処理装置およびその方法 |
JP2010267141A (ja) * | 2009-05-15 | 2010-11-25 | Toshiba Corp | 文書分類装置およびプログラム |
WO2013038774A1 (ja) * | 2011-09-15 | 2013-03-21 | 株式会社東芝 | 文書分類装置、方法およびプログラム |
-
2015
- 2015-07-21 JP JP2016535794A patent/JP6536580B2/ja active Active
- 2015-07-21 US US15/328,199 patent/US20170220585A1/en not_active Abandoned
- 2015-07-21 WO PCT/JP2015/003652 patent/WO2016013209A1/ja active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003141132A (ja) * | 2001-10-30 | 2003-05-16 | Nippon Yunishisu Kk | 情報処理装置およびその方法 |
JP2010267141A (ja) * | 2009-05-15 | 2010-11-25 | Toshiba Corp | 文書分類装置およびプログラム |
WO2013038774A1 (ja) * | 2011-09-15 | 2013-03-21 | 株式会社東芝 | 文書分類装置、方法およびプログラム |
Non-Patent Citations (2)
Title |
---|
KAZUYUKI GOTO ET AL.: "Interactive document classification system to accelerate information and knowledge utilization", TOSHIBA REVIEW, vol. 65, no. 2, 1 February 2010 (2010-02-01), pages 60 - 63 * |
YASUNARI MIYABE: "User no Ito o Han'ei shita Taiwagata Bunsho Bunrui Gijutsu", TOSHIBA REVIEW, vol. 64, no. 2, 1 February 2009 (2009-02-01), pages 58 - 59 * |
Also Published As
Publication number | Publication date |
---|---|
JPWO2016013209A1 (ja) | 2017-04-27 |
US20170220585A1 (en) | 2017-08-03 |
JP6536580B2 (ja) | 2019-07-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6653334B2 (ja) | 情報抽出方法及び装置 | |
US10546005B2 (en) | Perspective data analysis and management | |
AU2017216520A1 (en) | Common data repository for improving transactional efficiencies of user interactions with a computing device | |
CN104142990A (zh) | 搜索方法及装置 | |
JP7309811B2 (ja) | データ注釈方法、装置、電子機器および記憶媒体 | |
KR20120047622A (ko) | 디지털 콘텐츠 관리 시스템 및 방법 | |
CN113627132B (zh) | 数据去重标记码生成方法、系统、电子设备及存储介质 | |
WO2016013209A1 (ja) | 文集合抽出システム、方法およびプログラム | |
CN109033082B (zh) | 语义模型的学习训练方法、装置及计算机可读存储介质 | |
CN112148841B (zh) | 一种对象分类以及分类模型构建方法和装置 | |
CN110874366A (zh) | 数据处理、查询方法和装置 | |
US20170139897A1 (en) | Method, system, and computer program product for dividing a term with appropriate granularity | |
EP3370136A1 (en) | Input data processing method, apparatus and device, and non-volatile computer storage medium | |
JP6508327B2 (ja) | テキスト可視化システム、テキスト可視化方法、及び、プログラム | |
WO2015016133A1 (ja) | 情報管理装置及び情報管理方法 | |
CN103678355B (zh) | 文本挖掘方法和文本挖掘装置 | |
CN106446046B (zh) | 一种在关系数据库中及时快速分析记录的方法 | |
JP6642429B2 (ja) | テキスト処理システム、テキスト処理方法およびテキスト処理プログラム | |
CN106462614B (zh) | 信息分析系统、信息分析方法以及信息分析程序 | |
JP5162215B2 (ja) | データ処理装置、データ処理方法、および、プログラム | |
JP6190341B2 (ja) | データ生成装置、データ生成方法、及びプログラム | |
JP5954742B2 (ja) | 文書を検索する装置及び方法 | |
KR102078541B1 (ko) | 이슈 관심도 기반의 뉴스 가치 평가 장치 및 방법, 이를 기록한 기록매체 | |
CN113656443B (zh) | 数据拆解方法、装置、电子设备和存储介质 | |
US10909154B2 (en) | Search system, search method and search program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15825290 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2016535794 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15328199 Country of ref document: US |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15825290 Country of ref document: EP Kind code of ref document: A1 |