US20190228064A1 - Generation apparatus, generation method, and program - Google Patents
Generation apparatus, generation method, and program Download PDFInfo
- Publication number
- US20190228064A1 US20190228064A1 US16/371,297 US201916371297A US2019228064A1 US 20190228064 A1 US20190228064 A1 US 20190228064A1 US 201916371297 A US201916371297 A US 201916371297A US 2019228064 A1 US2019228064 A1 US 2019228064A1
- Authority
- US
- United States
- Prior art keywords
- target
- keyword
- text
- template
- keywords
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000013145 classification model Methods 0.000 claims abstract description 37
- 238000004590 computer program Methods 0.000 claims abstract description 11
- 230000004044 response Effects 0.000 claims description 12
- 238000012545 processing Methods 0.000 description 117
- 230000000875 corresponding effect Effects 0.000 description 48
- 238000000354 decomposition reaction Methods 0.000 description 48
- 230000008569 process Effects 0.000 description 15
- 239000000047 product Substances 0.000 description 12
- 238000012937 correction Methods 0.000 description 11
- 238000004891 communication Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 8
- 238000012546 transfer Methods 0.000 description 6
- 239000006227 byproduct Substances 0.000 description 5
- 239000000284 extract Substances 0.000 description 5
- 230000008859 change Effects 0.000 description 4
- 230000014509 gene expression Effects 0.000 description 2
- 230000010365 information processing Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000000611 regression analysis Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
Images
Classifications
-
- G06F17/248—
-
- G06F17/271—
-
- G06F17/2755—
-
- G06F17/277—
-
- G06F17/2881—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/186—Templates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/268—Morphological analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/55—Rule-based translation
- G06F40/56—Natural language generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Definitions
- the present invention relates to a generation apparatus, a generation method, and a program.
- a report based on statistical information can be conventionally made in such a manner that a person (e.g., reporter) can extract potential features to report on from the statistical information, or additionally to create a text report at the discretion of the reporter. Extracting predetermined information from a statistical report, and the conversion of the information to a predetermined structure is a method utilized to create such reports.
- the method includes one or more processors decomposing one or more portions of text into at least one corresponding keyword and at least one corresponding template.
- the method further includes one or more processors learning a classification model associated with selecting a template based on a category of a keyword.
- the method further includes one or more processors identifying a target keyword that is represented by target data.
- the method further includes one or more processors selecting a target template that is used to represent the target data based on a category associated with the identified target keyword utilizing the classification model.
- the method further includes one or more processors generating target text that represents the target data based on the selected text template based on the selected target template and the identified target keyword.
- FIG. 1 shows a configuration example of a generation apparatus 100 according to an embodiment.
- FIG. 2 shows an operation flow of the generation apparatus 100 according to the embodiment.
- FIG. 3 shows an example of statistics of events acquired by a text acquisition section 110 according to the embodiment.
- FIG. 4 shows an example of multiple objective variables and explanatory variables generated by a learning processing section 130 according to the embodiment.
- FIG. 5 shows a variation of the generation apparatus 100 according to the embodiment.
- FIG. 6 shows an example of the hardware configuration of a computer 1900 .
- Embodiments of the present invention recognize that, even when predetermined information is extracted from statistical information and converted to a predetermined structure, there can be a difficulty in creating text automatically because a decision by an individual (e.g., a reporter) that has specialized knowledge and the like can be required to create a report text. Further, when statistical information contains a new word, such as a new product that does not exist in a database, it is often desired to create a report for the new word. However, since there is no information on the new word, it is more difficult to create the report automatically.
- Embodiments of the present invention provide a generation apparatus, a generation method, and a program, where the generation apparatus is for generating a target text representing target data, including: a decomposition processing section for decomposing each of multiple texts into a keyword and a template; a learning processing section for learning a classification model to select a template based on the category of a keyword; an identification section for identifying a target keyword representing the target data; a selection section for selecting a target template used to represent the target data based on the category of the target keyword using the classification model; and a generation section for generating the target text representing the target data based on the target template and the target keyword.
- FIG. 1 shows a configuration example of a generation apparatus 100 according to an embodiment of the present invention.
- the generation apparatus 100 automatically creates text representing the features of target data based on the target data as statistical information, such as graph and tabular forms, acquired from the outside.
- the text representing the target data is set as a target text.
- the generation apparatus 100 includes a text acquisition section 110 , a decomposition processing section 120 , a learning processing section 130 , a storage section 140 , a target data acquisition section 210 , an identification section 220 , a selection section 230 , and a generation section 240 .
- the text acquisition section 110 acquires multiple texts.
- the multiple texts are texts representing the features of statistical information and the like that have been reported in the past.
- the multiple texts may be created by experts and/or skilled persons, such as texts in which graphs of product-by-product inquiries and the number of complaints are evaluated and described, texts in which graphs indicative of monthly orders for products and selling situations are described, texts in which present and/or future weather information is described from a distribution of weather data on various regions, and texts in which graphs of stock price movement and exchange rate fluctuations with time are described and future activities are predicted.
- the text acquisition section 110 acquires texts, such as “customers do not know the part # of the ink for Printer A01,” and “the order for scanner B02 is improved as compared with three months before.”
- the text acquisition section 110 may acquire multiple texts created by users using a business analysis tool, a marketing tool, and the like.
- the text acquisition section 110 is connected, for example, to a storage device (e.g., an external database) to acquire the multiple texts.
- the text acquisition section 110 may also acquire multiple texts entered by users.
- the text acquisition section 110 may operate integrally with the generation apparatus 100 .
- the text acquisition section 110 may operate separately from the main body of the generation apparatus 100 .
- the text acquisition section 110 may be connected to the main body of the generation apparatus 100 through a network or the like.
- the decomposition processing section 120 decomposes each of multiple texts into a keyword and a template. In another embodiment, the decomposition processing section 120 may decompose each of multiple texts into a set of keywords and a template.
- the decomposition processing section 120 may be connected to the text acquisition section 110 to extract one or more keywords from an acquired text based on a pre-stored list of multiple keywords or the like. The decomposition processing section 120 further acquires a category corresponding to each of the extracted keywords based on the list of multiple keywords or the like.
- the decomposition processing section 120 may also extract a template from the acquired text based on a pre-stored list of multiple templates or the like.
- the decomposition processing section 120 may set, as a template, a text in which a category corresponding to a keyword extracted from the text is applied to the position of the keyword.
- the decomposition processing section 120 may decompose a text into one or more keywords and a template using language processing. Further, the decomposition processing section 120 may decompose the text into a keyword that does not exist in the list of keywords and a template that does not exist in the list of templates by language processing or the like. In this case, the decomposition processing section 120 may add the decomposed keyword and/or template to the corresponding lists, respectively.
- the learning processing section 130 learns a classification model to select a template based on the category of a keyword.
- the learning processing section 130 is connected to the decomposition processing section 120 to acquire multiple keywords and multiple templates respectively decomposed from the multiple texts.
- the learning processing section 130 learns the classification model to select a template decomposed from one text according to a category corresponding to a keyword included in the one text.
- the learning processing section 130 may learn the classification model to select a template based on the category of a keyword and the statistic of an event corresponding to the keyword.
- the learning processing section 130 learns the classification model to select a template based on a set of categories corresponding to the set of keywords and the statistics of events corresponding to the set of keywords. The learning by the learning processing section 130 will be described later.
- the storage section 140 is connected to the learning processing section 130 to receive and store the classification model learned by the learning processing section 130 .
- the storage section 140 may also store intermediate data in the process of learning by the learning processing section 130 , calculation results, and the like, respectively. Further, the storage section 140 may supply the stored data to a requestor in response to a request from each section in the generation apparatus 100 .
- the storage section 140 may have a keyword storage section 142 and a template storage section 144 .
- the storage section 140 may be provided in the main body of the generation apparatus 100 . Alternatively, the storage section 140 may be a database or the like connected to a network.
- the keyword storage section 142 stores keywords belonging to each of multiple categories in association with the category.
- the keyword storage section 142 may store predetermined keywords or may be connected to the decomposition processing section 120 to store keywords decomposed by the decomposition processing section 120 .
- the keyword storage section 142 may be a list of keywords referred to by the decomposition processing section 120 .
- the keyword storage section 142 may be a keyword dictionary and/or a category dictionary to store multiple keywords together with corresponding categories.
- the template storage section 144 stores templates obtained by decomposing multiple texts.
- the template storage section 144 may store predetermined templates or may be connected to the decomposition processing section 120 to store templates decomposed by the decomposition processing section 120 .
- the template storage section 144 may be a list of templates to be referred to by the decomposition processing section 120 .
- the template storage section 144 may be a template dictionary to store multiple templates.
- the target data acquisition section 210 acquires target data.
- the target data can include statistical information, such as graph and tabular forms. Further, the statistical information includes one or more keywords and the like, and a target keyword representing the target data is included in the keywords. In other words, the statistical information includes keywords as candidates for the target keyword.
- the target data acquisition section 210 may acquire the statistic of an event corresponding to a keyword. In other words, the target data acquisition section 210 may acquire a target statistic as the statistic of an event corresponding to a target keyword in the target data.
- the target data acquisition section 210 may be connected to a business analysis tool, a marketing tool, and the like to acquire statistical information used to create a report to be output by the tools and the like. Further, the target data acquisition section 210 may be connected to a storage device (e.g., an external database) to acquire statistical information. Further, the target data acquisition section 210 may acquire statistical information entered by users.
- the identification section 220 identifies a target keyword representing the target data.
- the identification section 220 is connected to the target data acquisition section 210 to set as the target keyword a keyword contained in the statistical information.
- the identification section 220 may identify a set of target keywords. In this case, the identification section 220 may also identify two or more combinations of keywords to be set as target keywords.
- the selection section 230 is connected to the storage section 140 and the identification section 220 to select a target template used to represent the target data using the classification model learned by the learning processing section 130 based on the category of a target keyword identified by the identification section 220 .
- the selection section 230 may select a target template based on the category of a target keyword and a target statistic. Further, when the identification section 220 identifies a set of target keywords, the selection section 230 may select a target template having a set of categories based on the set of categories corresponding to the set of target keywords and target statistics.
- the generation section 240 is connected to the selection section 230 to generate a target text representing the target data based on the target template and the target keyword.
- the generation section 240 applies a target keyword corresponding to a category contained in a target template to the target template to generate a target text.
- the identification section 220 identifies a set of target keywords
- the generation section 240 generates a target text based on the set of target keywords and the target template.
- the generation section 240 applies respectively corresponding target keywords to a set of categories contained in the target template to generate a target text.
- the above generation apparatus 100 of the embodiment acquires multiple texts created based on statistical information and the like, and learns a classification model to create each of the texts based on a keyword(s) contained in the text. Then, the generation apparatus 100 generates a target text representing target data using the learned classification model based on the keyword(s) contained in the target data. The operation of the generation apparatus 100 will be described in further detail with reference to FIG. 2 .
- FIG. 2 shows an operation flow of the generation apparatus 100 according to an embodiment of the present invention.
- the generation apparatus 100 executes processing steps S 310 to S 380 to learn a classification model in order to generate a target text representing target data.
- the generation apparatus 100 generates a text representing statistical information on the number of product-by-product inquiries will be described.
- the text acquisition section 110 acquires multiple texts. For example, the text acquisition section 110 acquires texts created in the past, such as “Do not know the part # of the ink for printer A01” as a first text, “LED bulb L2 went out in a month” as a second text, and “The calls for cancellation are increasing” as a third text.
- the decomposition processing section 120 decomposes each of multiple texts into keywords and a template. For example, the decomposition processing section 120 extracts a keyword from the acquired text using a keyword dictionary stored for each category in the keyword storage section 142 .
- the keyword storage section 142 stores keywords such as “printer A01” and “LED bulb L2” in a “Product” category, a keyword such as “cancellation” in a “Contract” category, keywords such as “do not know . . . ink” in a “Question for a component” category, keywords such as “in a month” in a “Duration of service” category, and keywords such as “went out” in a “Problem” category, respectively.
- the Product category is set as a first category
- the Contract category as a second category
- the Question for a component category as a third category
- the Duration of service category as a fourth category
- the Problem category as a fifth category.
- the decomposition processing section 120 extracts, as keywords, “printer A01” and “do not know . . . ink” from the first text. Similarly, the decomposition processing section 120 extracts, as keywords, “LED bulb L2,” “went out,” and “in a month” from the second text, and “cancellation” from the third text, respectively.
- the decomposition processing section 120 detects keywords stored in the keyword storage section 142 from each of the multiple texts. Then, the decomposition processing section 120 determines a template based on portions of each text other than the keyword portions. For example, the decomposition processing section 120 sets, as a template, a text in which categories corresponding to keywords extracted from each text are applied to the positions of the keywords. In other words, the decomposition processing section 120 searches each text for each keyword included in the keyword dictionary. Then, when the keyword is hit, the decomposition processing section 120 replaces a keyword portion in the text by a corresponding category.
- the decomposition processing section 120 sets, as a first template, “[Y] the part # of the [Y] for [X]” (where [X] is the Product category and [Y] is the Question for a component category) based on the first text. Similarly, based on the second and third texts, the decomposition processing section 120 sets “[X] [Z] [Y]” (where [X] is the Product category, [Y] is the Duration of service category, and [Z] is the Problem category), and “The calls for [X] are increasing” (where [X] is the Contract category) as second and third templates, respectively.
- the decomposition processing section 120 decomposes the acquired multiple texts into the keywords and templates, respectively.
- the decomposition processing section 120 may store the decomposed templates in the template storage section 144 .
- the decomposition processing section 120 replaces a keyword in each text by a corresponding category to decompose the text into a template.
- the decomposition processing section 120 may use the template dictionary stored in the template storage section 144 to decompose the acquired text into a template.
- the template storage section 144 stores, in the template dictionary, “[Y] the part # of the [Y] for [X],” “[X] [Z] [Y],” “The calls for [X] are increasing,” and the like.
- the decomposition processing section 120 may perform decomposition to set a template obtained by matching between a remaining text, obtained by removing extracted keywords from each text, and a template stored in the template storage section 144 , as a template for the text.
- the decomposition processing section 120 may perform decomposition processing for keywords and a template by using known language processing or the like. For example, the decomposition processing section 120 performs morphological analysis by natural language processing to recognize separation between words, and performs syntax analysis to recognize a text structure. On that basis, the decomposition processing section 120 searches a dictionary for a predetermined part of speech such as noun, and when the part of speech is hit, replaces the part of speech by a corresponding category. In this case, the decomposition processing section 120 may store, in the corresponding keyword storage section 142 or template storage section 144 , information on a decomposed keyword, a category corresponding to the keyword, and/or a template.
- the learning processing section 130 learns a classification model for selecting a template based on the category of each keyword.
- the learning processing section 130 learns the classification model to select a template decomposed from one text according to one or more categories corresponding to one or more keywords included in the one text.
- the learning processing section 130 may learn a classification model for selecting a template based on the statistics of events corresponding to one or more keywords.
- the text acquisition section 110 acquires, in addition to the multiple texts, the statistics of events corresponding to one or more keywords as statistical information as the bases of the created multiple texts.
- FIG. 3 shows an example of the statistics of events acquired by the text acquisition section 110 according to an embodiment of the present invention.
- FIG. 3 shows keywords in the: Question for a component: category in the lateral direction, such as “Where is . . . power button” (Question for the position of a power supply button of . . . ), “Do not know . . . ink,” and “How to change . . . battery” (Charging method for the battery of . . . ).
- FIG. 3 shows keywords in the Product category in the longitudinal direction, such as “Printer A01,” “Printer A02,” and “Note PC A01.”
- Each number shown in FIG. 3 indicates the number of co-occurrences, and each number in parentheses indicates a correlation value.
- FIG. 3 shows that the number of times of co-occurrence of the keywords “Printer A01” and “Do not know . . . ink” in product-by-product inquiries is 35, and the correlation value is 20. Further, for example, FIG. 3 shows that the number of times of co-occurrence of the keywords “Note PC A01” and “How to change . . . battery” in the product-by-product inquiries is 128 , and the correlation value is 2.3.
- co-occurrence means that the keywords (“Printer A01” and “Where is . . . power button”) concurrently appear in a text representing one inquiry, evaluation, the description of a phenomenon, or the like.
- the learning processing section 130 can use the number of co-occurrences to associate, with a template, a combination of keywords likely to appear in texts in order to perform learning.
- the correlation value indicates a ratio of the number of co-occurrences of the keywords to the number of product-by-product inquiries.
- the learning processing section 130 can use such a correlation value to associate a combination of keywords highly correlated in the text with a template in order to perform learning.
- the learning processing section 130 can learn the keywords as keywords more likely to appear in a template including categories corresponding to the keywords.
- the keywords “Note PC A01” and “How to change . . . battery” have a higher value of the number of co-occurrences than the others, but the correlation value of which is not so high.
- the number of appearances of “How to change . . . battery” in inquiries about “Note PC A01” is 128, but in the inquiries about the others, the values are larger than 128.
- the learning processing section 130 can use the number of co-occurrences and the correlation to learn more accurately whether each keyword is a keyword more likely to appear.
- the text acquisition section 110 may further acquire a statistic indicative of an increase or decrease (time-series variation) in the number of appearances of each keyword. This enables the learning processing section 130 to learn whether an event corresponding to the keyword actually increases.
- the text acquisition section 110 acquires the statistics indicative of the number of co-occurrences of a set of keywords, a correlation between keywords, and an increase or decrease in an event corresponding to each of the keywords. Then, based on the statistics, the learning processing section 130 generates an objective variable and an explanatory variable to learn a classification model capable of obtaining a corresponding objective variable for an explanatory variable.
- the explanatory variable may be generated according to a combination of keywords
- the objective variable may be generated according to a corresponding template.
- the learning processing section 130 sets the dimensions of an explanatory variable to n ⁇ k+m according to the maximum value n of categories contained in the template, the types k of categories, and the types m of statistics used.
- the learning processing section 130 uses a column vector having 18 elements as the explanatory variable.
- the learning processing section 130 may make first to fifth elements correspond to a first category to a fifth category, respectively, to set 1 or 0 depending on the category positioned in [X] of the template. For example, when the category positioned in [X] is the first category, the learning processing section 130 sets the first to fifth elements as [1, 0, 0, 0, 0].
- the learning processing section 130 makes sixth to 10th elements correspond to a category positioned in [Y] of the template, and eleventh to 15th elements to a category positioned in [Z] of the template. For example, when the category positioned in [Y] is a third category, the learning processing section 130 sets the sixth to 10th elements as [0, 0, 1, 0, 0], and when there is no category positioned in [Z], sets the 11th to 15th elements to all zeros.
- the learning processing section 130 sets the 16th element as a value of the number of co-occurrences, the 17th element as a correlation value, and the 18th element as a time-series variation of the event.
- the learning processing section 130 may set the values of the 16th to 18th elements in the explanatory function for the keywords “Printer A01” and “Do not know . . . ink” as [35, 20, 2.3].
- the learning processing section 130 sets the category to be placed in [X] of the first template as the first category corresponding to “Printer A01,” the category to be placed in [Y] as the third category corresponding to “Do not know . . . ink,” and no category to be placed in [Z]. Further, the learning processing section 130 acquires 35, 20, and 2.3 as the values of the statistics (the number of co-occurrences, the correlation, and time-series variation) in this order corresponding to the keywords “pPrinter A01” and “Do not know . . . ink” decomposed from the first text. Thus, the learning processing section 130 sets the first to 18th elements of a first explanatory function corresponding to the first text as [1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 35, 20, 2.3] in this order.
- the learning processing section 130 generates a first objective function indicative of the first template in response to the first explanatory function. For example, the learning processing section 130 sets, as the objective function, a column vector having elements larger in number than or equal to those of the template. As an example, the learning processing section 130 uses a column vector having elements whose number is 18 as an objective variable. In this case, the learning processing section 130 may associate the first to 18th elements with the first template to the 18th template, respectively, to set 1 or 0 depending on the template. For example, in the case of an objective variable indicative of the first template, the learning processing section 130 generates, as a first objective variable, a column vector, in which the first element is set to 1 and the other elements are set to 0.
- the learning processing section 130 can generate the first explanatory variable and the first objective variable according to the keywords and the first template decomposed from the first text. Similarly, the learning processing section 130 generates sets of multiple corresponding explanatory variables and objective variables according to the decomposition results of the multiple texts.
- FIG. 4 shows an example of multiple objective variables and explanatory variables generated by the learning processing section 130 according to the embodiment.
- FIG. 4 shows an example where sets of objective variables corresponding to explanatory variables are arranged in the horizontal direction (in rows).
- FIG. 4 shows templates and texts corresponding to the objective variables in columns of the objective variables.
- the first template and the first text are shown in a first row.
- each text is displayed in parentheses, and keywords to be decomposed are indicated by underlines added to the text.
- the learning processing section 130 generates the Nth objective variable with the Nth element set to 1 and the other elements set to 0 to show the Nth template.
- FIG. 4 shows an example where categories to be placed in positions [X], [Y], and [Z] of each template, and three values of statistics (the number of co-occurrences, the correlation, and the time-series variation of an event) are arranged as an explanatory variable.
- the learning processing section 130 sets the first element of the explanatory variable to 1 and the second to fifth elements to 0.
- the learning processing section 130 sets the second element of the explanatory variable to 1, and the first, and third to fifth elements to 0.
- the learning processing section 130 sets the eighth element of the explanatory variable to 1, and the sixth, seventh, ninth, and 10th elements to 0. Further, in response to the fact that the category Y is the fourth category (Duration of service), the learning processing section 130 sets the ninth element of the explanatory variable to 1, and the sixth to eighth, and 10th elements to 0. Further, in response to the fact that the category Z is the fifth category (Problem), the learning processing section 130 sets the 15th element of the explanatory variable to 1, and the 11th to 14th elements to 0.
- the learning processing section 130 generates sets of corresponding explanatory variables and objective variables based on the multiple texts. Then, the learning processing section 130 learns a classification model so that one objective variable corresponding to one explanatory variable will be selected (predicted). For example, the learning processing section 130 performs learning using a classification model by regression analysis. As an example, the learning processing section 130 performs learning by using a known model that is a generalized linear model such as logistic regression as the classification model.
- the generation apparatus 100 may continue learning by returning to the text acquisition step (S 310 ).
- the generation apparatus 100 may complete the learning.
- the learning processing section 130 stores the learned learning model in the storage section 140 . Further, when the learning process does not converge even after a predetermined time has elapsed, the generation apparatus 100 may interrupt the learning and give notice to a user such as warning.
- the target data acquisition section 210 acquires target data. Similarly to the text acquisition section 110 , the target data acquisition section 210 may acquire statistics on the target data. In other words, the target data acquisition section 210 can acquire a target statistic indicative of at least one of the number of co-occurrences of a set of target keywords, a correlation between keywords in the set of target keywords, and an increase or decrease in an even corresponding to each of the target keywords.
- the target data acquisition section 210 may acquire a target statistic in response to receiving information on the target keyword from the identification section 220 .
- the identification section 220 identifies a target keyword representing the target data.
- the identification section 220 may identify a target keyword using the keyword dictionary stored in the keyword storage section 142 .
- the identification section 220 may identify the target keyword by language processing or the like.
- a target keyword contained in the target data may not exist in the dictionary or the like, such as the name of a new product.
- identification section 220 is capable of identifying the target keyword using language processing or the like.
- the identification section 220 identifies a category of the target keyword by using the keyword dictionary or the like.
- the identification section 220 may detect a keyword similar to the target keyword to set a category corresponding to the similar keyword as the category of the target keyword.
- the identification section 220 can identify that the target keyword is in the product category from keywords registered in the keyword dictionary, such as “Printer A01,” “Printer A,” and “Printer.”
- the selection section 230 uses the classification model learned by the learning processing section 130 to select, based on the category of the target keyword, a target template to be used to represent the target data.
- the selection section 230 generates a corresponding explanatory variable based on the target keyword.
- the selection section 230 defines the values of elements corresponding to the category of the explanatory variable (the first to 15th elements in the above example) according to the category of the target keyword.
- the selection section 230 uses the target statistic acquired by the target data acquisition section 210 to define the values of elements corresponding to the statistics of the explanatory variable (the 16th to 18th elements in the above example).
- the selection section 230 generates the explanatory variable of the target data, and this enables the selection section 230 to predict a target template corresponding to the target data using the learned classification model.
- the selection section 230 may calculate an objective variable from the explanatory variable and the classification model to set, as the target template, a template corresponding to the calculated objective variable (corresponding to an element closest to 1).
- the generation section 240 inserts each target keyword in the predicted target template to generate a target text representing the target data.
- the generation section 240 inserts corresponding target keywords into the positions [X], [Y], and [Z] of the categories of the target template to generate a target text.
- the generation section 240 may make alterations in verb and the like to alter the target data into a grammatically correct expression along with the insertion of the target keywords into the target template. In this case, the generation section 240 may use language processing or the like.
- the generation apparatus 100 of the embodiment can learn a classification model for selecting a target template to be used to represent target data based on multiple texts, insert target keywords into a target template, and generate a target text representing the target data.
- the text acquisition section 110 acquires multiple texts included in reports created in the past, and the decomposition processing section 120 decomposes the multiple texts to cause the learning processing section 130 to perform learning so that a classification model corresponding to the reports can be obtained.
- This enables the generation section 240 to generate a target text to be included in a newly created report.
- the generation apparatus 100 can automatically create a text to be newly created based on new statistical information.
- the generation apparatus 100 can learn creator's know-how to read characteristic parts of statistical information based on the knowledge, experience, and ability of the creator to create a similar text based on new statistical information without intervene of the creator. Even when statistical information contains a new word such as a new product that does not exist in a database or the like, the generation apparatus 100 can generate a proper text.
- the generation apparatus 100 can accumulate the know-how to create texts for each business field, each area, and each text creator. In addition, more specialized texts can also be generated based on the accumulated information.
- FIG. 5 shows a variation of the generation apparatus 100 according to the embodiment.
- the generation apparatus 100 of the variation substantially the same components as those of the generation apparatus 100 , according to the embodiment shown in FIG. 1 , are given the same reference numeral to omit the description thereof.
- the generation apparatus 100 of the variation further includes an input section 250 , a correction section 260 , and an addition section 270 .
- the input section 250 is connected to the text acquisition section 110 and the storage section 140 to help the text acquisition section 110 acquire a text when a user enters the text or the like directly or modifies the text.
- the input section 250 presents, as an input candidate, at least either a keyword(s) stored in the keyword storage section 142 or a template stored in the template storage section 144 while the user is entering the target text.
- the input section 250 can predict that a text to be decomposed into the second template is entering. Therefore, according to the array of the second template, the input section 250 displays, as a candidate for the following text part, a keyword in the fifth category, such as “went out” following the result of entering “LED bulb L2” in the first category. This enables the input section 250 to facilitate the user's text input and perform the input operation accurately.
- the input section 250 may perform a fuzzy prefix search on the acquired multiple texts to acquire and display a candidate for the following text part.
- a fuzzy prefix search on the acquired multiple texts to acquire and display a candidate for the following text part.
- the input section 250 can display a keyword, such as “went out” as a candidate for the following text part.
- the correction section 260 Based on at least either a keyword stored in the keyword storage section 142 or a template stored in the template storage section 144 , the correction section 260 corrects the orthographical variants of a keyword and a template obtained by newly decomposing a text.
- the correction section 260 is connected to the decomposition processing section 120 and the storage section 140 to correct orthographical variants that occur when the decomposition processing section 120 decomposes the text into the keyword and the template based on information stored in the storage section 140 .
- the correction section 260 corrects orthographical variants in text parts (not registered in the dictionary) other than the parts identified by referring to the keyword dictionary and the template dictionary.
- the correction section 260 may determine that these keywords are the same. Further, when edit distance between a template obtained by newly decomposing a text and a template stored in the template storage section is less than a predetermined reference distance, the correction section 260 may determine that these templates are the same.
- the correction section 260 may determine a difference between singular and plural forms, a difference from an abbreviation, a difference between lowercase and uppercase letters, a difference between a hyphen and an under bar, and the like to be in the same domains to perform fuzzy matching with the keyword dictionary and the template dictionary in order to determine the matches. Further, for example, even if the omission of a predetermined number of letters, misspelling, the addition of letters, and the like are detected, the correction section 260 may determine the matches to determine a match. In addition, the correction section 260 may use a dictionary, in which words, phrases, and the like considered to match with each other are preregistered, to determine the matches.
- the addition section 270 is connected to the generation section 240 and the text acquisition section 110 , respectively, to newly add the target text generated by the generation section 240 as one of the multiple texts, and supply it to the text acquisition section 110 .
- the user may edit the target text generated by the generation section 240 such as to alter the target text or the like to perform proper text adjustment.
- the addition section 270 acquires the text altered by the user and newly adds the text as one of the multiple texts.
- the learning processing section 130 learns a classification model again using the multiple texts with the text added by the addition section 270 . This allows the generation apparatus 100 to further accumulate know-how to create a text. Further, the learning processing section 130 can learn the text altered by the user again to improve learning accuracy.
- FIG. 6 shows an example of the hardware configuration of a computer 1900 functioning as the generation apparatus 100 according to the embodiment.
- the computer 1900 includes: a CPU peripheral section having a CPU 2000 , a RAM 2020 , a graphics controller 2075 , and a display device 2080 , which are interconnected by a host controller 2082 ; an I/O section having a communication interface 2030 , a hard disk drive 2040 , and a DVD drive 2060 , which are connected to the host controller 2082 through an I/O controller 2084 ; and a legacy I/O section having a ROM 2010 , a flexible disk drive 2050 , and an I/O chip 2070 connected to the I/O controller 2084 .
- the host controller 2082 connects the RAM 2020 with the CPU 2000 and the graphics controller 2075 , which access the RAM 2020 at a high transfer rate.
- the CPU 2000 operates based on programs stored in the ROM 2010 and the RAM 2020 to control each section.
- the graphics controller 2075 acquires image data generated on a frame buffer provided in the RAM 2020 by the CPU 2000 or the like, and displays the image on the display device 2080 .
- the graphics controller 2075 may include therein a frame buffer for storing image data generated by the CPU 2000 or the like.
- the I/O controller 2084 connects the host controller 2082 with the communication interface 2030 , the hard disk drive 2040 , and the DVD drive 2060 as relatively high-speed I/O units.
- the communication interface 2030 communicates with other apparatuses through a network.
- the hard disk drive 2040 stores programs and data used by the CPU 2000 in the computer 1900 .
- the DVD drive 2060 reads a program or data from a DVD-ROM 2095 and provides the read program or data to the hard disk drive 2040 through the RAM 2020 .
- the I/O controller 2084 Also connected to the I/O controller 2084 are relatively low-speed I/O units, i.e., the ROM 2010 , the flexible disk drive 2050 , and the I/O chip 2070 .
- the ROM 2010 stores a boot program executed when the computer 1900 starts, and/or programs and the like depending on the hardware of the computer 1900 .
- the flexible disk drive 2050 reads a program or data from a flexible disk 2090 , and provides the program or data to the hard disk drive 2040 through the RAM 2020 .
- the I/O chip 2070 connects not only the flexible disk drive 2050 to the I/O controller 2084 , but also various I/O devices to the I/O controller 2084 through a parallel port, a serial port, a keyboard port, and a mouse port, for example.
- a program provided to the hard disk drive 2040 through the RAM 2020 is provided by a user in the form of being stored on a recording medium, such as the flexible disk 2090 , a DVD-ROM 2095 , or an IC card.
- the program is read from the recording medium, installed in the hard disk drive 2040 within the computer 1900 through the RAM 2020 , and executed by the CPU 2000 .
- the program is installed on the computer 1900 to cause the computer 1900 to function as the text acquisition section 110 , the decomposition processing section 120 , the learning processing section 130 , the storage section 140 , the keyword storage section 142 , the template storage section 144 , the target data acquisition section 210 , the identification section 220 , the selection section 230 , the generation section 240 , the input section 250 , the correction section 260 , and the addition section 270 .
- Information processes described in the program are read into the computer 1900 to function as specific means implemented by software in corporation with the above-mentioned various hardware resources, i.e., as the text acquisition section 110 , the decomposition processing section 120 , the learning processing section 130 , the storage section 140 , the keyword storage section 142 , the template storage section 144 , the target data acquisition section 210 , the identification section 220 , the selection section 230 , the generation section 240 , the input section 250 , the correction section 260 , and the addition section 270 . Then, information is computed or processed by the specific means depending on the intended use of the computer 1900 in the embodiment to build a specific instance of generation apparatus 100 according to the intended use.
- the CPU 2000 executes a communication program loaded on the RAM 2020 to instruct the communication interface 2030 to perform communication processing based on the processing content described in the communication program.
- the communication interface 2030 reads send data stored in a send buffer area or the like provided in a storage device, such as the RAM 2020 , the hard disk drive 2040 , the flexible disk 2090 , or the DVD-ROM 2095 , to send the data to a network, or writes receive data received from the network to a receive buffer area or the like provided in the storage device.
- the communication interface 2030 may transfer data exchanged with the storage device by the DMA (Direct Memory Access) method.
- the CPU 2000 may read data from the storage device or the communication interface 2030 as a source, and write the data to the communication interface 2030 or the storage device as a destination to transfer the send/receive data.
- DMA Direct Memory Access
- the CPU 2000 reads, into the RAM 2020 , all or necessary parts from files or databases stored in an external storage device, such as the hard disk drive 2040 , the DVD drive 2060 (DVD-ROM 2095 ), or the flexible disk drive 2050 (flexible disk 2090 ), by means of DMA transfer or the like to perform various processing on the data on the RAM 2020 . Then, the CPU 2000 saves the processed data back to the external storage device by means of DMA transfer or the like. In such processing, the RAM 2020 can be considered to temporarily hold the content of the external storage device. Therefore, in the embodiment, the RAM 2020 , the external storage device, and the like are collectively referred to as the memory, the storage section, the storage device, or the like.
- Various programs and various kinds of information, such as data, tables, and databases, in the embodiment are stored in such a storage device as targets of information processing.
- the CPU 2000 can also hold part of the content of the RAM 2020 in a cache memory to perform reading and writing on the cache memory.
- the cache memory serves as part of the function of the RAM 2020 , the cache memory shall be included in the RAM 2020 , the memory, and/or the storage device in the embodiment unless otherwise denoted distinctively.
- the CPU 2000 performs various processing on the data read from the RAM 2020 as specified in a sequence of instructions of a program including various arithmetic operations, information processing, conditional determinations, and searching and replacing information described in the embodiment, and saves the processed data back to the RAM 2020 .
- a conditional determination is made, the CPU 2000 compares any of various variables shown in the embodiment with any other variable or constant to determine whether it meets a condition, such as larger, smaller, not less than, not more than, or equal to, and when the condition is satisfied (or unsatisfied), the procedure branches to a different sequence of instructions or calls a subroutine.
- the CPU 2000 can retrieve information stored in a file or a database in the storage device. For example, when two or more entries are stored in the storage device in such a manner to associate the attribute value of a second attribute with the attribute value of a first attribute, the CPU 2000 searches the two or more entries stored in the storage device for an entry with the attribute value of the first attribute matching with a specified condition to read the attribute value of the second attribute stored in the entry so that the attribute value of the second attribute associated with the first attribute that meets the predetermined condition can be obtained.
- the programs or modules mentioned above may also be stored on an external recording medium.
- an optical recording medium such as DVD, Blu-ray (registered trademark), or CD
- a magnetooptical recording medium such as MO
- a tape medium such as an IC card
- a semiconductor memory such as an IC card
- a storage device such as a hard disk or a RAM provided in a server system connected to a private communication network or the Internet may also be used as a recording medium to provide a program to the computer 1900 through the network.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
- This application claims priority under 35 U.S.C. § 119 to Japan Patent Application No. 2014-221051 filed Oct. 30, 2014, the entire contents of which are incorporated herein by reference.
- The present invention relates to a generation apparatus, a generation method, and a program.
- A report based on statistical information, such as graph and tabular forms, can be conventionally made in such a manner that a person (e.g., reporter) can extract potential features to report on from the statistical information, or additionally to create a text report at the discretion of the reporter. Extracting predetermined information from a statistical report, and the conversion of the information to a predetermined structure is a method utilized to create such reports.
- Aspects of the present invention disclose a method, computer program product, and system for generating target text based on target data. The method includes one or more processors decomposing one or more portions of text into at least one corresponding keyword and at least one corresponding template. The method further includes one or more processors learning a classification model associated with selecting a template based on a category of a keyword. The method further includes one or more processors identifying a target keyword that is represented by target data. The method further includes one or more processors selecting a target template that is used to represent the target data based on a category associated with the identified target keyword utilizing the classification model. The method further includes one or more processors generating target text that represents the target data based on the selected text template based on the selected target template and the identified target keyword.
-
FIG. 1 shows a configuration example of ageneration apparatus 100 according to an embodiment. -
FIG. 2 shows an operation flow of thegeneration apparatus 100 according to the embodiment. -
FIG. 3 shows an example of statistics of events acquired by atext acquisition section 110 according to the embodiment. -
FIG. 4 shows an example of multiple objective variables and explanatory variables generated by alearning processing section 130 according to the embodiment. -
FIG. 5 shows a variation of thegeneration apparatus 100 according to the embodiment. -
FIG. 6 shows an example of the hardware configuration of acomputer 1900. - Embodiments of the present invention recognize that, even when predetermined information is extracted from statistical information and converted to a predetermined structure, there can be a difficulty in creating text automatically because a decision by an individual (e.g., a reporter) that has specialized knowledge and the like can be required to create a report text. Further, when statistical information contains a new word, such as a new product that does not exist in a database, it is often desired to create a report for the new word. However, since there is no information on the new word, it is more difficult to create the report automatically.
- Embodiments of the present invention provide a generation apparatus, a generation method, and a program, where the generation apparatus is for generating a target text representing target data, including: a decomposition processing section for decomposing each of multiple texts into a keyword and a template; a learning processing section for learning a classification model to select a template based on the category of a keyword; an identification section for identifying a target keyword representing the target data; a selection section for selecting a target template used to represent the target data based on the category of the target keyword using the classification model; and a generation section for generating the target text representing the target data based on the target template and the target keyword.
- The present invention will be described below in connection with a preferred embodiment. It should be noted that the following embodiment is not intended to limit the invention in the appended claims. Further, all the combinations of the features described in the embodiment are not necessarily essential to the means for solving the problem in the present invention.
-
FIG. 1 shows a configuration example of ageneration apparatus 100 according to an embodiment of the present invention. Thegeneration apparatus 100 automatically creates text representing the features of target data based on the target data as statistical information, such as graph and tabular forms, acquired from the outside. In the embodiment, the text representing the target data is set as a target text. Thegeneration apparatus 100 includes atext acquisition section 110, adecomposition processing section 120, alearning processing section 130, astorage section 140, a targetdata acquisition section 210, anidentification section 220, aselection section 230, and ageneration section 240. - The
text acquisition section 110 acquires multiple texts. Here, the multiple texts are texts representing the features of statistical information and the like that have been reported in the past. The multiple texts may be created by experts and/or skilled persons, such as texts in which graphs of product-by-product inquiries and the number of complaints are evaluated and described, texts in which graphs indicative of monthly orders for products and selling situations are described, texts in which present and/or future weather information is described from a distribution of weather data on various regions, and texts in which graphs of stock price movement and exchange rate fluctuations with time are described and future activities are predicted. For example, thetext acquisition section 110 acquires texts, such as “customers do not know the part # of the ink for Printer A01,” and “the order for scanner B02 is improved as compared with three months before.” - The
text acquisition section 110 may acquire multiple texts created by users using a business analysis tool, a marketing tool, and the like. Thetext acquisition section 110 is connected, for example, to a storage device (e.g., an external database) to acquire the multiple texts. Thetext acquisition section 110 may also acquire multiple texts entered by users. Thetext acquisition section 110 may operate integrally with thegeneration apparatus 100. Alternatively, thetext acquisition section 110 may operate separately from the main body of thegeneration apparatus 100. In this case, thetext acquisition section 110 may be connected to the main body of thegeneration apparatus 100 through a network or the like. - In one embodiment, the
decomposition processing section 120 decomposes each of multiple texts into a keyword and a template. In another embodiment, thedecomposition processing section 120 may decompose each of multiple texts into a set of keywords and a template. Thedecomposition processing section 120 may be connected to thetext acquisition section 110 to extract one or more keywords from an acquired text based on a pre-stored list of multiple keywords or the like. Thedecomposition processing section 120 further acquires a category corresponding to each of the extracted keywords based on the list of multiple keywords or the like. - The
decomposition processing section 120 may also extract a template from the acquired text based on a pre-stored list of multiple templates or the like. Here, thedecomposition processing section 120 may set, as a template, a text in which a category corresponding to a keyword extracted from the text is applied to the position of the keyword. - The
decomposition processing section 120 may decompose a text into one or more keywords and a template using language processing. Further, thedecomposition processing section 120 may decompose the text into a keyword that does not exist in the list of keywords and a template that does not exist in the list of templates by language processing or the like. In this case, thedecomposition processing section 120 may add the decomposed keyword and/or template to the corresponding lists, respectively. - The
learning processing section 130 learns a classification model to select a template based on the category of a keyword. Thelearning processing section 130 is connected to thedecomposition processing section 120 to acquire multiple keywords and multiple templates respectively decomposed from the multiple texts. Thelearning processing section 130 learns the classification model to select a template decomposed from one text according to a category corresponding to a keyword included in the one text. Thelearning processing section 130 may learn the classification model to select a template based on the category of a keyword and the statistic of an event corresponding to the keyword. - Further, when the
decomposition processing section 120 decomposes a text into a set of keywords and a template, thelearning processing section 130 learns the classification model to select a template based on a set of categories corresponding to the set of keywords and the statistics of events corresponding to the set of keywords. The learning by thelearning processing section 130 will be described later. - The
storage section 140 is connected to thelearning processing section 130 to receive and store the classification model learned by thelearning processing section 130. Thestorage section 140 may also store intermediate data in the process of learning by thelearning processing section 130, calculation results, and the like, respectively. Further, thestorage section 140 may supply the stored data to a requestor in response to a request from each section in thegeneration apparatus 100. Thestorage section 140 may have akeyword storage section 142 and atemplate storage section 144. Thestorage section 140 may be provided in the main body of thegeneration apparatus 100. Alternatively, thestorage section 140 may be a database or the like connected to a network. - The
keyword storage section 142 stores keywords belonging to each of multiple categories in association with the category. Thekeyword storage section 142 may store predetermined keywords or may be connected to thedecomposition processing section 120 to store keywords decomposed by thedecomposition processing section 120. Thekeyword storage section 142 may be a list of keywords referred to by thedecomposition processing section 120. Thekeyword storage section 142 may be a keyword dictionary and/or a category dictionary to store multiple keywords together with corresponding categories. - The
template storage section 144 stores templates obtained by decomposing multiple texts. Thetemplate storage section 144 may store predetermined templates or may be connected to thedecomposition processing section 120 to store templates decomposed by thedecomposition processing section 120. Thetemplate storage section 144 may be a list of templates to be referred to by thedecomposition processing section 120. Thetemplate storage section 144 may be a template dictionary to store multiple templates. - The target
data acquisition section 210 acquires target data. Here, the target data can include statistical information, such as graph and tabular forms. Further, the statistical information includes one or more keywords and the like, and a target keyword representing the target data is included in the keywords. In other words, the statistical information includes keywords as candidates for the target keyword. The targetdata acquisition section 210 may acquire the statistic of an event corresponding to a keyword. In other words, the targetdata acquisition section 210 may acquire a target statistic as the statistic of an event corresponding to a target keyword in the target data. - The target
data acquisition section 210 may be connected to a business analysis tool, a marketing tool, and the like to acquire statistical information used to create a report to be output by the tools and the like. Further, the targetdata acquisition section 210 may be connected to a storage device (e.g., an external database) to acquire statistical information. Further, the targetdata acquisition section 210 may acquire statistical information entered by users. - The
identification section 220 identifies a target keyword representing the target data. Theidentification section 220 is connected to the targetdata acquisition section 210 to set as the target keyword a keyword contained in the statistical information. When the statistical information contains multiple keywords, theidentification section 220 may identify a set of target keywords. In this case, theidentification section 220 may also identify two or more combinations of keywords to be set as target keywords. - The
selection section 230 is connected to thestorage section 140 and theidentification section 220 to select a target template used to represent the target data using the classification model learned by thelearning processing section 130 based on the category of a target keyword identified by theidentification section 220. Theselection section 230 may select a target template based on the category of a target keyword and a target statistic. Further, when theidentification section 220 identifies a set of target keywords, theselection section 230 may select a target template having a set of categories based on the set of categories corresponding to the set of target keywords and target statistics. - The
generation section 240 is connected to theselection section 230 to generate a target text representing the target data based on the target template and the target keyword. As an example, thegeneration section 240 applies a target keyword corresponding to a category contained in a target template to the target template to generate a target text. Further, when theidentification section 220 identifies a set of target keywords, thegeneration section 240 generates a target text based on the set of target keywords and the target template. As an example, thegeneration section 240 applies respectively corresponding target keywords to a set of categories contained in the target template to generate a target text. - The
above generation apparatus 100 of the embodiment acquires multiple texts created based on statistical information and the like, and learns a classification model to create each of the texts based on a keyword(s) contained in the text. Then, thegeneration apparatus 100 generates a target text representing target data using the learned classification model based on the keyword(s) contained in the target data. The operation of thegeneration apparatus 100 will be described in further detail with reference toFIG. 2 . -
FIG. 2 shows an operation flow of thegeneration apparatus 100 according to an embodiment of the present invention. In the embodiment, thegeneration apparatus 100 executes processing steps S310 to S380 to learn a classification model in order to generate a target text representing target data. In the embodiment, an example where thegeneration apparatus 100 generates a text representing statistical information on the number of product-by-product inquiries will be described. - In process S310, the
text acquisition section 110 acquires multiple texts. For example, thetext acquisition section 110 acquires texts created in the past, such as “Do not know the part # of the ink for printer A01” as a first text, “LED bulb L2 went out in a month” as a second text, and “The calls for cancellation are increasing” as a third text. - In process S320, the
decomposition processing section 120 decomposes each of multiple texts into keywords and a template. For example, thedecomposition processing section 120 extracts a keyword from the acquired text using a keyword dictionary stored for each category in thekeyword storage section 142. - In an example embodiment, the
keyword storage section 142 stores keywords such as “printer A01” and “LED bulb L2” in a “Product” category, a keyword such as “cancellation” in a “Contract” category, keywords such as “do not know . . . ink” in a “Question for a component” category, keywords such as “in a month” in a “Duration of service” category, and keywords such as “went out” in a “Problem” category, respectively. Note that the Product category is set as a first category, the Contract category as a second category, the Question for a component category as a third category, the Duration of service category as a fourth category, and the Problem category as a fifth category. In this example embodiment, thedecomposition processing section 120 extracts, as keywords, “printer A01” and “do not know . . . ink” from the first text. Similarly, thedecomposition processing section 120 extracts, as keywords, “LED bulb L2,” “went out,” and “in a month” from the second text, and “cancellation” from the third text, respectively. - Thus, the
decomposition processing section 120 detects keywords stored in thekeyword storage section 142 from each of the multiple texts. Then, thedecomposition processing section 120 determines a template based on portions of each text other than the keyword portions. For example, thedecomposition processing section 120 sets, as a template, a text in which categories corresponding to keywords extracted from each text are applied to the positions of the keywords. In other words, thedecomposition processing section 120 searches each text for each keyword included in the keyword dictionary. Then, when the keyword is hit, thedecomposition processing section 120 replaces a keyword portion in the text by a corresponding category. - As an example, the
decomposition processing section 120 sets, as a first template, “[Y] the part # of the [Y] for [X]” (where [X] is the Product category and [Y] is the Question for a component category) based on the first text. Similarly, based on the second and third texts, thedecomposition processing section 120 sets “[X] [Z] [Y]” (where [X] is the Product category, [Y] is the Duration of service category, and [Z] is the Problem category), and “The calls for [X] are increasing” (where [X] is the Contract category) as second and third templates, respectively. - Thus, the
decomposition processing section 120 decomposes the acquired multiple texts into the keywords and templates, respectively. Note that thedecomposition processing section 120 may store the decomposed templates in thetemplate storage section 144. - In the above description, the
decomposition processing section 120 replaces a keyword in each text by a corresponding category to decompose the text into a template. Alternatively, thedecomposition processing section 120 may use the template dictionary stored in thetemplate storage section 144 to decompose the acquired text into a template. - For example, the
template storage section 144 stores, in the template dictionary, “[Y] the part # of the [Y] for [X],” “[X] [Z] [Y],” “The calls for [X] are increasing,” and the like. Thedecomposition processing section 120 may perform decomposition to set a template obtained by matching between a remaining text, obtained by removing extracted keywords from each text, and a template stored in thetemplate storage section 144, as a template for the text. - Alternatively, the
decomposition processing section 120 may perform decomposition processing for keywords and a template by using known language processing or the like. For example, thedecomposition processing section 120 performs morphological analysis by natural language processing to recognize separation between words, and performs syntax analysis to recognize a text structure. On that basis, thedecomposition processing section 120 searches a dictionary for a predetermined part of speech such as noun, and when the part of speech is hit, replaces the part of speech by a corresponding category. In this case, thedecomposition processing section 120 may store, in the correspondingkeyword storage section 142 ortemplate storage section 144, information on a decomposed keyword, a category corresponding to the keyword, and/or a template. - In process 5330, the
learning processing section 130 learns a classification model for selecting a template based on the category of each keyword. Thelearning processing section 130 learns the classification model to select a template decomposed from one text according to one or more categories corresponding to one or more keywords included in the one text. - The
learning processing section 130 may learn a classification model for selecting a template based on the statistics of events corresponding to one or more keywords. In this case, thetext acquisition section 110 acquires, in addition to the multiple texts, the statistics of events corresponding to one or more keywords as statistical information as the bases of the created multiple texts. -
FIG. 3 shows an example of the statistics of events acquired by thetext acquisition section 110 according to an embodiment of the present invention. As an example,FIG. 3 shows keywords in the: Question for a component: category in the lateral direction, such as “Where is . . . power button” (Question for the position of a power supply button of . . . ), “Do not know . . . ink,” and “How to change . . . battery” (Charging method for the battery of . . . ). Further, as an example,FIG. 3 shows keywords in the Product category in the longitudinal direction, such as “Printer A01,” “Printer A02,” and “Note PC A01.” - Each number shown in
FIG. 3 indicates the number of co-occurrences, and each number in parentheses indicates a correlation value. For example,FIG. 3 shows that the number of times of co-occurrence of the keywords “Printer A01” and “Do not know . . . ink” in product-by-product inquiries is 35, and the correlation value is 20. Further, for example,FIG. 3 shows that the number of times of co-occurrence of the keywords “Note PC A01” and “How to change . . . battery” in the product-by-product inquiries is 128, and the correlation value is 2.3. - Here, “co-occurrence” means that the keywords (“Printer A01” and “Where is . . . power button”) concurrently appear in a text representing one inquiry, evaluation, the description of a phenomenon, or the like. The
learning processing section 130 can use the number of co-occurrences to associate, with a template, a combination of keywords likely to appear in texts in order to perform learning. Further, the correlation value indicates a ratio of the number of co-occurrences of the keywords to the number of product-by-product inquiries. Thelearning processing section 130 can use such a correlation value to associate a combination of keywords highly correlated in the text with a template in order to perform learning. - Here, it is found that the keywords “Printer A01” and “Do not know . . . ink” have higher values than the others in terms of both the number of co-occurrences and the correlation value. Therefore, the
learning processing section 130 can learn the keywords as keywords more likely to appear in a template including categories corresponding to the keywords. - On the other hand, it is found that the keywords “Note PC A01” and “How to change . . . battery” have a higher value of the number of co-occurrences than the others, but the correlation value of which is not so high. In other words, it is found that the number of appearances of “How to change . . . battery” in inquiries about “Note PC A01” is 128, but in the inquiries about the others, the values are larger than 128. Thus, the
learning processing section 130 can use the number of co-occurrences and the correlation to learn more accurately whether each keyword is a keyword more likely to appear. - The
text acquisition section 110 may further acquire a statistic indicative of an increase or decrease (time-series variation) in the number of appearances of each keyword. This enables thelearning processing section 130 to learn whether an event corresponding to the keyword actually increases. - Thus, the
text acquisition section 110 acquires the statistics indicative of the number of co-occurrences of a set of keywords, a correlation between keywords, and an increase or decrease in an event corresponding to each of the keywords. Then, based on the statistics, thelearning processing section 130 generates an objective variable and an explanatory variable to learn a classification model capable of obtaining a corresponding objective variable for an explanatory variable. Here, the explanatory variable may be generated according to a combination of keywords, and the objective variable may be generated according to a corresponding template. - For example, the
learning processing section 130 sets the dimensions of an explanatory variable to n×k+m according to the maximum value n of categories contained in the template, the types k of categories, and the types m of statistics used. In the embodiment, an example where thelearning processing section 130 generates an explanatory variable having 18 dimensions according to n=3, k=5, and m=3 will be described. - As an example, the
learning processing section 130 uses a column vector having 18 elements as the explanatory variable. In this case, thelearning processing section 130 may make first to fifth elements correspond to a first category to a fifth category, respectively, to set 1 or 0 depending on the category positioned in [X] of the template. For example, when the category positioned in [X] is the first category, thelearning processing section 130 sets the first to fifth elements as [1, 0, 0, 0, 0]. - Similarly, the
learning processing section 130 makes sixth to 10th elements correspond to a category positioned in [Y] of the template, and eleventh to 15th elements to a category positioned in [Z] of the template. For example, when the category positioned in [Y] is a third category, thelearning processing section 130 sets the sixth to 10th elements as [0, 0, 1, 0, 0], and when there is no category positioned in [Z], sets the 11th to 15th elements to all zeros. - Further, for example, the
learning processing section 130 sets the 16th element as a value of the number of co-occurrences, the 17th element as a correlation value, and the 18th element as a time-series variation of the event. As an example, thelearning processing section 130 may set the values of the 16th to 18th elements in the explanatory function for the keywords “Printer A01” and “Do not know . . . ink” as [35, 20, 2.3]. - As an example, based on the keywords and the first template decomposed from the first text, the
learning processing section 130 sets the category to be placed in [X] of the first template as the first category corresponding to “Printer A01,” the category to be placed in [Y] as the third category corresponding to “Do not know . . . ink,” and no category to be placed in [Z]. Further, thelearning processing section 130 acquires 35, 20, and 2.3 as the values of the statistics (the number of co-occurrences, the correlation, and time-series variation) in this order corresponding to the keywords “pPrinter A01” and “Do not know . . . ink” decomposed from the first text. Thus, thelearning processing section 130 sets the first to 18th elements of a first explanatory function corresponding to the first text as [1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 35, 20, 2.3] in this order. - Then, the
learning processing section 130 generates a first objective function indicative of the first template in response to the first explanatory function. For example, thelearning processing section 130 sets, as the objective function, a column vector having elements larger in number than or equal to those of the template. As an example, thelearning processing section 130 uses a column vector having elements whose number is 18 as an objective variable. In this case, thelearning processing section 130 may associate the first to 18th elements with the first template to the 18th template, respectively, to set 1 or 0 depending on the template. For example, in the case of an objective variable indicative of the first template, thelearning processing section 130 generates, as a first objective variable, a column vector, in which the first element is set to 1 and the other elements are set to 0. - Thus, the
learning processing section 130 can generate the first explanatory variable and the first objective variable according to the keywords and the first template decomposed from the first text. Similarly, thelearning processing section 130 generates sets of multiple corresponding explanatory variables and objective variables according to the decomposition results of the multiple texts. -
FIG. 4 shows an example of multiple objective variables and explanatory variables generated by thelearning processing section 130 according to the embodiment.FIG. 4 shows an example where sets of objective variables corresponding to explanatory variables are arranged in the horizontal direction (in rows). -
FIG. 4 shows templates and texts corresponding to the objective variables in columns of the objective variables. For example, the first template and the first text are shown in a first row. Note that each text is displayed in parentheses, and keywords to be decomposed are indicated by underlines added to the text. As described in the above example, for example, thelearning processing section 130 generates the Nth objective variable with the Nth element set to 1 and the other elements set to 0 to show the Nth template. - Further,
FIG. 4 shows an example where categories to be placed in positions [X], [Y], and [Z] of each template, and three values of statistics (the number of co-occurrences, the correlation, and the time-series variation of an event) are arranged as an explanatory variable. As described in the above example, in response to the fact that the category X is the first category (Product), thelearning processing section 130 sets the first element of the explanatory variable to 1 and the second to fifth elements to 0. Further, in response to the fact that the category X is the second category (Contract), thelearning processing section 130 sets the second element of the explanatory variable to 1, and the first, and third to fifth elements to 0. - Further, in response to the fact that the category Y is the third category (Question for a component), the
learning processing section 130 sets the eighth element of the explanatory variable to 1, and the sixth, seventh, ninth, and 10th elements to 0. Further, in response to the fact that the category Y is the fourth category (Duration of service), thelearning processing section 130 sets the ninth element of the explanatory variable to 1, and the sixth to eighth, and 10th elements to 0. Further, in response to the fact that the category Z is the fifth category (Problem), thelearning processing section 130 sets the 15th element of the explanatory variable to 1, and the 11th to 14th elements to 0. - As described above, the
learning processing section 130 generates sets of corresponding explanatory variables and objective variables based on the multiple texts. Then, thelearning processing section 130 learns a classification model so that one objective variable corresponding to one explanatory variable will be selected (predicted). For example, thelearning processing section 130 performs learning using a classification model by regression analysis. As an example, thelearning processing section 130 performs learning by using a known model that is a generalized linear model such as logistic regression as the classification model. - When there is a text to be further acquired, i.e., the learning is not completed (process S340, no branch), the
generation apparatus 100 may continue learning by returning to the text acquisition step (S310). When there is no text to be further acquired, i.e., the learning is completed (process S340, yes branch), thegeneration apparatus 100 may complete the learning. In the case of the learning being completed, thelearning processing section 130 stores the learned learning model in thestorage section 140. Further, when the learning process does not converge even after a predetermined time has elapsed, thegeneration apparatus 100 may interrupt the learning and give notice to a user such as warning. - In process S350, the target
data acquisition section 210 acquires target data. Similarly to thetext acquisition section 110, the targetdata acquisition section 210 may acquire statistics on the target data. In other words, the targetdata acquisition section 210 can acquire a target statistic indicative of at least one of the number of co-occurrences of a set of target keywords, a correlation between keywords in the set of target keywords, and an increase or decrease in an even corresponding to each of the target keywords. After theidentification section 220 identifies a target keyword, the targetdata acquisition section 210 may acquire a target statistic in response to receiving information on the target keyword from theidentification section 220. - In process S360, the
identification section 220 identifies a target keyword representing the target data. Similarly to the operation of thedecomposition processing section 120, theidentification section 220 may identify a target keyword using the keyword dictionary stored in thekeyword storage section 142. Instead of, or in addition to this, theidentification section 220 may identify the target keyword by language processing or the like. Here, a target keyword contained in the target data may not exist in the dictionary or the like, such as the name of a new product. When such a target keyword is included in the target data,identification section 220 is capable of identifying the target keyword using language processing or the like. - Further, the
identification section 220 identifies a category of the target keyword by using the keyword dictionary or the like. When the target keyword does not exist in the dictionary or the like, theidentification section 220 may detect a keyword similar to the target keyword to set a category corresponding to the similar keyword as the category of the target keyword. Thus, for example, when the target keyword is “Printer A02” as a new product of Printer A series, even if the target keyword is not registered in the keyword dictionary, theidentification section 220 can identify that the target keyword is in the product category from keywords registered in the keyword dictionary, such as “Printer A01,” “Printer A,” and “Printer.” - In process S370, the
selection section 230 uses the classification model learned by thelearning processing section 130 to select, based on the category of the target keyword, a target template to be used to represent the target data. Theselection section 230 generates a corresponding explanatory variable based on the target keyword. In other words, theselection section 230 defines the values of elements corresponding to the category of the explanatory variable (the first to 15th elements in the above example) according to the category of the target keyword. Further, theselection section 230 uses the target statistic acquired by the targetdata acquisition section 210 to define the values of elements corresponding to the statistics of the explanatory variable (the 16th to 18th elements in the above example). - The
selection section 230 generates the explanatory variable of the target data, and this enables theselection section 230 to predict a target template corresponding to the target data using the learned classification model. In other words, theselection section 230 may calculate an objective variable from the explanatory variable and the classification model to set, as the target template, a template corresponding to the calculated objective variable (corresponding to an element closest to 1). - In process S380, the
generation section 240 inserts each target keyword in the predicted target template to generate a target text representing the target data. As an example, thegeneration section 240 inserts corresponding target keywords into the positions [X], [Y], and [Z] of the categories of the target template to generate a target text. Here, thegeneration section 240 may make alterations in verb and the like to alter the target data into a grammatically correct expression along with the insertion of the target keywords into the target template. In this case, thegeneration section 240 may use language processing or the like. - As described above, the
generation apparatus 100 of the embodiment can learn a classification model for selecting a target template to be used to represent target data based on multiple texts, insert target keywords into a target template, and generate a target text representing the target data. Thus, thetext acquisition section 110 acquires multiple texts included in reports created in the past, and thedecomposition processing section 120 decomposes the multiple texts to cause thelearning processing section 130 to perform learning so that a classification model corresponding to the reports can be obtained. This enables thegeneration section 240 to generate a target text to be included in a newly created report. - Thus, from texts created in the past based on past statistical information, the
generation apparatus 100 can automatically create a text to be newly created based on new statistical information. In other words, thegeneration apparatus 100 can learn creator's know-how to read characteristic parts of statistical information based on the knowledge, experience, and ability of the creator to create a similar text based on new statistical information without intervene of the creator. Even when statistical information contains a new word such as a new product that does not exist in a database or the like, thegeneration apparatus 100 can generate a proper text. - Thus, for example, report preparation work, transfer of the work, the development of the work, and the like can be carried out smoothly. Further, the
generation apparatus 100 can accumulate the know-how to create texts for each business field, each area, and each text creator. In addition, more specialized texts can also be generated based on the accumulated information. -
FIG. 5 shows a variation of thegeneration apparatus 100 according to the embodiment. In thegeneration apparatus 100 of the variation, substantially the same components as those of thegeneration apparatus 100, according to the embodiment shown inFIG. 1 , are given the same reference numeral to omit the description thereof. Thegeneration apparatus 100 of the variation further includes aninput section 250, acorrection section 260, and anaddition section 270. - The
input section 250 is connected to thetext acquisition section 110 and thestorage section 140 to help thetext acquisition section 110 acquire a text when a user enters the text or the like directly or modifies the text. When the user enters or modifies a target text, theinput section 250 presents, as an input candidate, at least either a keyword(s) stored in thekeyword storage section 142 or a template stored in thetemplate storage section 144 while the user is entering the target text. - For example, when the user enters “LED bulb L2,” since it is detected that the keyword in the first category is first entered, the
input section 250 can predict that a text to be decomposed into the second template is entering. Therefore, according to the array of the second template, theinput section 250 displays, as a candidate for the following text part, a keyword in the fifth category, such as “went out” following the result of entering “LED bulb L2” in the first category. This enables theinput section 250 to facilitate the user's text input and perform the input operation accurately. - Instead of, or in addition to this, the
input section 250 may perform a fuzzy prefix search on the acquired multiple texts to acquire and display a candidate for the following text part. Thus, for example, even if the user enters “LED bulb L5” as a new part # that is not registered in the dictionary, theinput section 250 can display a keyword, such as “went out” as a candidate for the following text part. - Based on at least either a keyword stored in the
keyword storage section 142 or a template stored in thetemplate storage section 144, thecorrection section 260 corrects the orthographical variants of a keyword and a template obtained by newly decomposing a text. Thecorrection section 260 is connected to thedecomposition processing section 120 and thestorage section 140 to correct orthographical variants that occur when thedecomposition processing section 120 decomposes the text into the keyword and the template based on information stored in thestorage section 140. - For example, when meanings are substantially the same or similar even if the expressions, character strings, or the like are different, such as “Customers cannot find” and “Do not know,” “#” and “number,” and “Printer A01” “printer A-01,” it is desired to recognize substantially the same keywords and templates. Therefore, the
correction section 260 corrects orthographical variants in text parts (not registered in the dictionary) other than the parts identified by referring to the keyword dictionary and the template dictionary. - For example, when edit distance between a keyword obtained by newly decomposing a text and a keyword stored in the
keyword storage section 142 is less than a predetermined reference distance, thecorrection section 260 may determine that these keywords are the same. Further, when edit distance between a template obtained by newly decomposing a text and a template stored in the template storage section is less than a predetermined reference distance, thecorrection section 260 may determine that these templates are the same. - Further, the
correction section 260 may determine a difference between singular and plural forms, a difference from an abbreviation, a difference between lowercase and uppercase letters, a difference between a hyphen and an under bar, and the like to be in the same domains to perform fuzzy matching with the keyword dictionary and the template dictionary in order to determine the matches. Further, for example, even if the omission of a predetermined number of letters, misspelling, the addition of letters, and the like are detected, thecorrection section 260 may determine the matches to determine a match. In addition, thecorrection section 260 may use a dictionary, in which words, phrases, and the like considered to match with each other are preregistered, to determine the matches. - The
addition section 270 is connected to thegeneration section 240 and thetext acquisition section 110, respectively, to newly add the target text generated by thegeneration section 240 as one of the multiple texts, and supply it to thetext acquisition section 110. Here, the user may edit the target text generated by thegeneration section 240 such as to alter the target text or the like to perform proper text adjustment. In such a case, theaddition section 270 acquires the text altered by the user and newly adds the text as one of the multiple texts. - The
learning processing section 130 learns a classification model again using the multiple texts with the text added by theaddition section 270. This allows thegeneration apparatus 100 to further accumulate know-how to create a text. Further, thelearning processing section 130 can learn the text altered by the user again to improve learning accuracy. -
FIG. 6 shows an example of the hardware configuration of acomputer 1900 functioning as thegeneration apparatus 100 according to the embodiment. Thecomputer 1900 according to the embodiment includes: a CPU peripheral section having aCPU 2000, aRAM 2020, agraphics controller 2075, and adisplay device 2080, which are interconnected by ahost controller 2082; an I/O section having acommunication interface 2030, ahard disk drive 2040, and aDVD drive 2060, which are connected to thehost controller 2082 through an I/O controller 2084; and a legacy I/O section having aROM 2010, aflexible disk drive 2050, and an I/O chip 2070 connected to the I/O controller 2084. - The
host controller 2082 connects theRAM 2020 with theCPU 2000 and thegraphics controller 2075, which access theRAM 2020 at a high transfer rate. TheCPU 2000 operates based on programs stored in theROM 2010 and theRAM 2020 to control each section. Thegraphics controller 2075 acquires image data generated on a frame buffer provided in theRAM 2020 by theCPU 2000 or the like, and displays the image on thedisplay device 2080. Alternatively, thegraphics controller 2075 may include therein a frame buffer for storing image data generated by theCPU 2000 or the like. - The I/
O controller 2084 connects thehost controller 2082 with thecommunication interface 2030, thehard disk drive 2040, and theDVD drive 2060 as relatively high-speed I/O units. Thecommunication interface 2030 communicates with other apparatuses through a network. Thehard disk drive 2040 stores programs and data used by theCPU 2000 in thecomputer 1900. TheDVD drive 2060 reads a program or data from a DVD-ROM 2095 and provides the read program or data to thehard disk drive 2040 through theRAM 2020. - Also connected to the I/
O controller 2084 are relatively low-speed I/O units, i.e., theROM 2010, theflexible disk drive 2050, and the I/O chip 2070. TheROM 2010 stores a boot program executed when thecomputer 1900 starts, and/or programs and the like depending on the hardware of thecomputer 1900. Theflexible disk drive 2050 reads a program or data from aflexible disk 2090, and provides the program or data to thehard disk drive 2040 through theRAM 2020. The I/O chip 2070 connects not only theflexible disk drive 2050 to the I/O controller 2084, but also various I/O devices to the I/O controller 2084 through a parallel port, a serial port, a keyboard port, and a mouse port, for example. - A program provided to the
hard disk drive 2040 through theRAM 2020 is provided by a user in the form of being stored on a recording medium, such as theflexible disk 2090, a DVD-ROM 2095, or an IC card. The program is read from the recording medium, installed in thehard disk drive 2040 within thecomputer 1900 through theRAM 2020, and executed by theCPU 2000. - The program is installed on the
computer 1900 to cause thecomputer 1900 to function as thetext acquisition section 110, thedecomposition processing section 120, thelearning processing section 130, thestorage section 140, thekeyword storage section 142, thetemplate storage section 144, the targetdata acquisition section 210, theidentification section 220, theselection section 230, thegeneration section 240, theinput section 250, thecorrection section 260, and theaddition section 270. - Information processes described in the program are read into the
computer 1900 to function as specific means implemented by software in corporation with the above-mentioned various hardware resources, i.e., as thetext acquisition section 110, thedecomposition processing section 120, thelearning processing section 130, thestorage section 140, thekeyword storage section 142, thetemplate storage section 144, the targetdata acquisition section 210, theidentification section 220, theselection section 230, thegeneration section 240, theinput section 250, thecorrection section 260, and theaddition section 270. Then, information is computed or processed by the specific means depending on the intended use of thecomputer 1900 in the embodiment to build a specific instance ofgeneration apparatus 100 according to the intended use. - As an example, when the
computer 1900 communicates with an external device or the like, theCPU 2000 executes a communication program loaded on theRAM 2020 to instruct thecommunication interface 2030 to perform communication processing based on the processing content described in the communication program. Under the control of theCPU 2000, thecommunication interface 2030 reads send data stored in a send buffer area or the like provided in a storage device, such as theRAM 2020, thehard disk drive 2040, theflexible disk 2090, or the DVD-ROM 2095, to send the data to a network, or writes receive data received from the network to a receive buffer area or the like provided in the storage device. Thus, thecommunication interface 2030 may transfer data exchanged with the storage device by the DMA (Direct Memory Access) method. Alternatively, theCPU 2000 may read data from the storage device or thecommunication interface 2030 as a source, and write the data to thecommunication interface 2030 or the storage device as a destination to transfer the send/receive data. - Further, the
CPU 2000 reads, into theRAM 2020, all or necessary parts from files or databases stored in an external storage device, such as thehard disk drive 2040, the DVD drive 2060 (DVD-ROM 2095), or the flexible disk drive 2050 (flexible disk 2090), by means of DMA transfer or the like to perform various processing on the data on theRAM 2020. Then, theCPU 2000 saves the processed data back to the external storage device by means of DMA transfer or the like. In such processing, theRAM 2020 can be considered to temporarily hold the content of the external storage device. Therefore, in the embodiment, theRAM 2020, the external storage device, and the like are collectively referred to as the memory, the storage section, the storage device, or the like. Various programs and various kinds of information, such as data, tables, and databases, in the embodiment are stored in such a storage device as targets of information processing. Note that theCPU 2000 can also hold part of the content of theRAM 2020 in a cache memory to perform reading and writing on the cache memory. Even in such a form, since the cache memory serves as part of the function of theRAM 2020, the cache memory shall be included in theRAM 2020, the memory, and/or the storage device in the embodiment unless otherwise denoted distinctively. - Further, the
CPU 2000 performs various processing on the data read from theRAM 2020 as specified in a sequence of instructions of a program including various arithmetic operations, information processing, conditional determinations, and searching and replacing information described in the embodiment, and saves the processed data back to theRAM 2020. For example, when a conditional determination is made, theCPU 2000 compares any of various variables shown in the embodiment with any other variable or constant to determine whether it meets a condition, such as larger, smaller, not less than, not more than, or equal to, and when the condition is satisfied (or unsatisfied), the procedure branches to a different sequence of instructions or calls a subroutine. - Further, the
CPU 2000 can retrieve information stored in a file or a database in the storage device. For example, when two or more entries are stored in the storage device in such a manner to associate the attribute value of a second attribute with the attribute value of a first attribute, theCPU 2000 searches the two or more entries stored in the storage device for an entry with the attribute value of the first attribute matching with a specified condition to read the attribute value of the second attribute stored in the entry so that the attribute value of the second attribute associated with the first attribute that meets the predetermined condition can be obtained. - The programs or modules mentioned above may also be stored on an external recording medium. As the recording medium, an optical recording medium, such as DVD, Blu-ray (registered trademark), or CD, a magnetooptical recording medium such as MO, a tape medium, or a semiconductor memory such as an IC card can be used in addition to the
flexible disk 2090 and the DVD-ROM 2095. Further, a storage device, such as a hard disk or a RAM provided in a server system connected to a private communication network or the Internet may also be used as a recording medium to provide a program to thecomputer 1900 through the network. - While the present invention has been described with reference to the embodiment, the technical scope of the present invention is not limited to the description of the aforementioned embodiment. It will be obvious to those skilled in the art that various changes and modifications can be added to the aforementioned embodiment. From the appended claims, it will also be obvious that forms to which such changes or modifications are added shall be included in the technical scope of the present invention.
- The execution sequence of processes, such as operations, procedures, steps, and stages in the apparatus, system, program, and method described in the appended claims and the specification, and shown in the accompanying drawings are not particularly specified as “ahead of,” “prior to,” or the like. It should be noted that the operations and the like can be carried out in any order unless output of the previous process is used in the subsequent process. Even when the description is made using “first,” “next,” and the like in the appended claims, the specification, and the operation flows in the drawings for convenience sake, it does not mean that it is imperative to carry out the operations and the like in this order.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/371,297 US20190228064A1 (en) | 2014-10-30 | 2019-04-01 | Generation apparatus, generation method, and program |
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2014-221051 | 2014-10-30 | ||
JP2014221051A JP5963328B2 (en) | 2014-10-30 | 2014-10-30 | Generating device, generating method, and program |
US14/868,442 US10289674B2 (en) | 2014-10-30 | 2015-09-29 | Generation apparatus, generation method, and program |
US15/341,147 US10296579B2 (en) | 2014-10-30 | 2016-11-02 | Generation apparatus, generation method, and program |
US16/371,297 US20190228064A1 (en) | 2014-10-30 | 2019-04-01 | Generation apparatus, generation method, and program |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/341,147 Continuation US10296579B2 (en) | 2014-10-30 | 2016-11-02 | Generation apparatus, generation method, and program |
Publications (1)
Publication Number | Publication Date |
---|---|
US20190228064A1 true US20190228064A1 (en) | 2019-07-25 |
Family
ID=55852843
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/868,442 Expired - Fee Related US10289674B2 (en) | 2014-10-30 | 2015-09-29 | Generation apparatus, generation method, and program |
US15/341,147 Expired - Fee Related US10296579B2 (en) | 2014-10-30 | 2016-11-02 | Generation apparatus, generation method, and program |
US16/371,297 Pending US20190228064A1 (en) | 2014-10-30 | 2019-04-01 | Generation apparatus, generation method, and program |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/868,442 Expired - Fee Related US10289674B2 (en) | 2014-10-30 | 2015-09-29 | Generation apparatus, generation method, and program |
US15/341,147 Expired - Fee Related US10296579B2 (en) | 2014-10-30 | 2016-11-02 | Generation apparatus, generation method, and program |
Country Status (2)
Country | Link |
---|---|
US (3) | US10289674B2 (en) |
JP (1) | JP5963328B2 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110807305A (en) * | 2019-10-11 | 2020-02-18 | 网娱互动科技(北京)股份有限公司 | Manuscript generation method and system for replacing keywords |
CN113361281A (en) * | 2021-08-05 | 2021-09-07 | 北京明略软件系统有限公司 | White paper generation method, device, equipment and storage medium |
Families Citing this family (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5963328B2 (en) | 2014-10-30 | 2016-08-03 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | Generating device, generating method, and program |
US11580589B2 (en) * | 2016-10-11 | 2023-02-14 | Ebay Inc. | System, method, and medium to select a product title |
JP6996360B2 (en) | 2018-03-09 | 2022-01-17 | 富士通株式会社 | Report creation program and report creation method |
CN108573025B (en) * | 2018-03-12 | 2021-07-02 | 云知声智能科技股份有限公司 | Method and device for extracting sentence classification characteristics based on mixed template |
CN108664612A (en) * | 2018-05-11 | 2018-10-16 | 广东电网有限责任公司 | Intelligent long text data classification method based on keyword scoring |
CN108664473A (en) * | 2018-05-11 | 2018-10-16 | 平安科技(深圳)有限公司 | Recognition methods, electronic device and the readable storage medium storing program for executing of text key message |
CN110738031A (en) * | 2018-07-03 | 2020-01-31 | 广州阿里巴巴文学信息技术有限公司 | Method, device and equipment for generating reading note |
JP7303614B2 (en) * | 2018-07-11 | 2023-07-05 | 株式会社野村総合研究所 | making device |
CN109597888A (en) * | 2018-11-19 | 2019-04-09 | 北京百度网讯科技有限公司 | Establish the method, apparatus of text field identification model |
EP3935581A4 (en) | 2019-03-04 | 2022-11-30 | Iocurrents, Inc. | Data compression and communication using machine learning |
CN110113315B (en) * | 2019-04-12 | 2022-06-14 | 平安科技(深圳)有限公司 | Service data processing method and device |
CN112115710B (en) * | 2019-06-03 | 2023-08-08 | 腾讯科技(深圳)有限公司 | Industry information identification method and device |
CN110609991B (en) * | 2019-09-10 | 2023-09-19 | 卓尔智联(武汉)研究院有限公司 | Text generation method, electronic device and storage medium |
CN110738061B (en) * | 2019-10-17 | 2024-05-28 | 北京搜狐互联网信息服务有限公司 | Ancient poetry generating method, device, equipment and storage medium |
KR20210104247A (en) * | 2020-02-17 | 2021-08-25 | 한국과학기술원 | Method and Apparatus for Recommending PowerPoint |
CN112749251B (en) * | 2020-03-09 | 2023-10-31 | 腾讯科技(深圳)有限公司 | Text processing method, device, computer equipment and storage medium |
CN111506726B (en) * | 2020-03-18 | 2023-09-22 | 大箴(杭州)科技有限公司 | Short text clustering method and device based on part-of-speech coding and computer equipment |
US11423219B2 (en) | 2020-03-19 | 2022-08-23 | International Business Machines Corporation | Generation and population of new application document utilizing historical application documents |
WO2021207768A1 (en) * | 2020-04-10 | 2021-10-14 | Square Panda Inc. | Custom text generation based on skill profile |
CN112000777A (en) * | 2020-09-03 | 2020-11-27 | 上海然慧信息科技有限公司 | Text generation method and device, computer equipment and storage medium |
KR102593884B1 (en) * | 2020-11-12 | 2023-10-26 | 주식회사 포스코인재창조원 | System and method for automatically generating documents and computer-readable recording medium storing of the same |
CN112434504B (en) * | 2020-11-23 | 2024-07-16 | 京东科技控股股份有限公司 | Method, apparatus, electronic device and computer readable medium for generating file information |
US11294971B1 (en) | 2021-01-25 | 2022-04-05 | Coupang Corp. | Systems and methods for modeling item similarity using converted image information |
CN113191456A (en) * | 2021-05-26 | 2021-07-30 | 平安信托有限责任公司 | Document generation method, device, equipment and medium based on text recognition technology |
CN113378057A (en) * | 2021-06-29 | 2021-09-10 | 珠海必要工业科技股份有限公司 | Information prompting method and device, computer equipment and storage medium |
CN113704467B (en) * | 2021-07-29 | 2024-07-02 | 大箴(杭州)科技有限公司 | Massive text monitoring method and device based on data template, medium and equipment |
CN113656588B (en) * | 2021-09-01 | 2024-05-10 | 深圳平安医疗健康科技服务有限公司 | Knowledge graph-based data code matching method, device, equipment and storage medium |
CN113962315B (en) * | 2021-10-28 | 2023-12-22 | 北京百度网讯科技有限公司 | Model pre-training method, device, equipment, storage medium and program product |
CN114118041A (en) * | 2021-11-01 | 2022-03-01 | 深圳前海微众银行股份有限公司 | Text generation method and device and storage medium |
CN117332768B (en) * | 2023-10-10 | 2024-03-08 | 北京睿企信息科技有限公司 | Data processing system for acquiring text generation template |
Family Cites Families (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6085201A (en) * | 1996-06-28 | 2000-07-04 | Intel Corporation | Context-sensitive template engine |
US7269545B2 (en) | 2001-03-30 | 2007-09-11 | Nec Laboratories America, Inc. | Method for retrieving answers from an information retrieval system |
JP3790825B2 (en) * | 2004-01-30 | 2006-06-28 | 独立行政法人情報通信研究機構 | Text generator for other languages |
JP4398777B2 (en) * | 2004-04-28 | 2010-01-13 | 株式会社東芝 | Time series data analysis apparatus and method |
JP2006065623A (en) | 2004-08-27 | 2006-03-09 | Toshiba Corp | Consultation and answer server and consultation and answer program |
JP4595590B2 (en) | 2005-03-04 | 2010-12-08 | 三菱電機株式会社 | Text mining method and text mining apparatus |
JP2007102642A (en) | 2005-10-06 | 2007-04-19 | Oki Electric Ind Co Ltd | Information analysis system, information analysis method and information analysis program |
JP2007157058A (en) * | 2005-12-08 | 2007-06-21 | Toshiba Corp | Classification model learning device, classification model learning method, and program for learning classification model |
JP4895645B2 (en) | 2006-03-15 | 2012-03-14 | 独立行政法人情報通信研究機構 | Information search apparatus and information search program |
JP5128154B2 (en) * | 2006-04-10 | 2013-01-23 | 富士フイルム株式会社 | Report creation support apparatus, report creation support method, and program thereof |
JP4833336B2 (en) * | 2007-05-08 | 2011-12-07 | 富士通株式会社 | Keyword output program, apparatus, and method |
JP5033724B2 (en) | 2007-07-12 | 2012-09-26 | 株式会社沖データ | Document search apparatus, image forming apparatus, and document search system |
US9317593B2 (en) * | 2007-10-05 | 2016-04-19 | Fujitsu Limited | Modeling topics using statistical distributions |
JP5022252B2 (en) * | 2008-01-30 | 2012-09-12 | 日本放送協会 | Expression template generation apparatus, method and program thereof |
JP2010128779A (en) | 2008-11-27 | 2010-06-10 | Kansai Electric Power Co Inc:The | Method for extracting multiple regression equation |
JP2011229194A (en) | 2008-12-24 | 2011-11-10 | Oita Univ | Switching power supply, and electronic circuit |
WO2011078194A1 (en) * | 2009-12-25 | 2011-06-30 | 日本電気株式会社 | Text mining system, text mining method, and recording medium |
JP5039159B2 (en) | 2010-02-26 | 2012-10-03 | 株式会社東芝 | Information classification system, information classification method and program |
JP5540335B2 (en) | 2010-10-04 | 2014-07-02 | 独立行政法人情報通信研究機構 | Natural language sentence generation device and computer program |
JP5807891B2 (en) | 2010-10-04 | 2015-11-10 | 国立研究開発法人情報通信研究機構 | Language model learning apparatus and computer program |
JP2012128779A (en) * | 2010-12-17 | 2012-07-05 | Panasonic Corp | Virtual object display device |
JP2012256197A (en) | 2011-06-08 | 2012-12-27 | Toshiba Corp | Orthographical variant detection device and orthographical variant detection program |
JP5620349B2 (en) * | 2011-07-22 | 2014-11-05 | 株式会社東芝 | Dialogue device, dialogue method and dialogue program |
CN104102639B (en) * | 2013-04-02 | 2018-07-27 | 腾讯科技(深圳)有限公司 | Popularization triggering method based on text classification and device |
JP6118838B2 (en) * | 2014-08-21 | 2017-04-19 | 本田技研工業株式会社 | Information processing apparatus, information processing system, information processing method, and information processing program |
JP5963328B2 (en) | 2014-10-30 | 2016-08-03 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | Generating device, generating method, and program |
-
2014
- 2014-10-30 JP JP2014221051A patent/JP5963328B2/en active Active
-
2015
- 2015-09-29 US US14/868,442 patent/US10289674B2/en not_active Expired - Fee Related
-
2016
- 2016-11-02 US US15/341,147 patent/US10296579B2/en not_active Expired - Fee Related
-
2019
- 2019-04-01 US US16/371,297 patent/US20190228064A1/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110807305A (en) * | 2019-10-11 | 2020-02-18 | 网娱互动科技(北京)股份有限公司 | Manuscript generation method and system for replacing keywords |
CN113361281A (en) * | 2021-08-05 | 2021-09-07 | 北京明略软件系统有限公司 | White paper generation method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
US20170052945A1 (en) | 2017-02-23 |
JP2016091078A (en) | 2016-05-23 |
US10289674B2 (en) | 2019-05-14 |
US20160124933A1 (en) | 2016-05-05 |
US10296579B2 (en) | 2019-05-21 |
JP5963328B2 (en) | 2016-08-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10296579B2 (en) | Generation apparatus, generation method, and program | |
Arras et al. | " What is relevant in a text document?": An interpretable machine learning approach | |
US20240028651A1 (en) | System and method for processing documents | |
CN111444320B (en) | Text retrieval method and device, computer equipment and storage medium | |
JP5356197B2 (en) | Word semantic relation extraction device | |
US20150081277A1 (en) | System and Method for Automatically Classifying Text using Discourse Analysis | |
JP6187877B2 (en) | Synonym extraction system, method and recording medium | |
US20160189057A1 (en) | Computer implemented system and method for categorizing data | |
US20040181527A1 (en) | Robust system for interactively learning a string similarity measurement | |
JP6260294B2 (en) | Information search device, information search method, and information search program | |
Singh et al. | A decision tree based word sense disambiguation system in Manipuri language | |
JPWO2014002775A1 (en) | Synonym extraction system, method and recording medium | |
US20220366346A1 (en) | Method and apparatus for document evaluation | |
Kaur et al. | Comparative analysis of algorithmic approaches for auto-coding with ICD-10-AM and ACHI | |
Iqbal et al. | Bias-aware lexicon-based sentiment analysis | |
CN104699844A (en) | Method and device for determining video tags for advertisements | |
JP2005181928A (en) | System and method for machine learning, and computer program | |
JP2017068862A (en) | Information processing device, information processing method, and information processing program | |
WO2019085118A1 (en) | Topic model-based associated word analysis method, and electronic apparatus and storage medium | |
US11868313B1 (en) | Apparatus and method for generating an article | |
JP2005182696A (en) | Machine learning system and method, and computer program | |
JP2007172260A (en) | Document rule preparation support apparatus, document rule preparation support method and document rule preparation support program | |
WO2021250950A1 (en) | Method, system, and device for evaluating performance of document search | |
JP4567025B2 (en) | Text classification device, text classification method, text classification program, and recording medium recording the program | |
Bouhoun et al. | Information Retrieval Using Domain Adapted Language Models: Application to Resume Documents for HR Recruitment Assistance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:TAKEUCHI, EMIKO;TAKUMA, DAISUKE;TOYOSHIMA, HIROBUMI;REEL/FRAME:048752/0638 Effective date: 20150924 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STCV | Information on status: appeal procedure |
Free format text: NOTICE OF APPEAL FILED |
|
STCV | Information on status: appeal procedure |
Free format text: APPEAL BRIEF (OR SUPPLEMENTAL BRIEF) ENTERED AND FORWARDED TO EXAMINER |