WO2011090036A1 - 動向情報検索装置、動向情報検索方法および記録媒体 - Google Patents
動向情報検索装置、動向情報検索方法および記録媒体 Download PDFInfo
- Publication number
- WO2011090036A1 WO2011090036A1 PCT/JP2011/050783 JP2011050783W WO2011090036A1 WO 2011090036 A1 WO2011090036 A1 WO 2011090036A1 JP 2011050783 W JP2011050783 W JP 2011050783W WO 2011090036 A1 WO2011090036 A1 WO 2011090036A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- trend information
- trend
- search
- expression
- document
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 69
- 230000014509 gene expression Effects 0.000 claims description 140
- 238000000605 extraction Methods 0.000 claims description 28
- 238000011156 evaluation Methods 0.000 claims description 10
- 230000001364 causal effect Effects 0.000 claims 2
- 230000008569 process Effects 0.000 description 35
- 238000012545 processing Methods 0.000 description 32
- 230000007423 decrease Effects 0.000 description 20
- 238000010586 diagram Methods 0.000 description 11
- 239000000284 extract Substances 0.000 description 11
- 230000007704 transition Effects 0.000 description 10
- 238000004590 computer program Methods 0.000 description 7
- 238000002474 experimental method Methods 0.000 description 7
- 238000004458 analytical method Methods 0.000 description 6
- 230000005540 biological transmission Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 230000000052 comparative effect Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000009467 reduction Effects 0.000 description 4
- FFBHFFJDDLITSX-UHFFFAOYSA-N benzyl N-[2-hydroxy-4-(3-oxomorpholin-4-yl)phenyl]carbamate Chemical compound OC1=C(NC(=O)OCC2=CC=CC=C2)C=CC(=C1)N1CCOCC1=O FFBHFFJDDLITSX-UHFFFAOYSA-N 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 230000003247 decreasing effect Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008602 contraction Effects 0.000 description 2
- 239000010779 crude oil Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 230000006866 deterioration Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000012567 pattern recognition method Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
Definitions
- the present invention relates to a trend information search device, a trend information search method, and a recording medium.
- Patent Document 1 discloses a data determination support system that supports investment decisions of investors and the like.
- This data judgment support system stores an asset price database (DB) that stores time-series data such as stock prices and exchange rates of companies, an economic index DB that stores time-series data such as gross domestic product and crude oil prices, and news articles.
- DB asset price database
- economic index DB stores time-series data such as gross domestic product and crude oil prices
- news articles such as gross domestic product and crude oil prices
- This data judgment support system uses these databases to display exchange rate fluctuations and changes in Dubai crude oil prices, as well as display related news in that period.
- Patent Document 2 analyzes what a general investor expects, and based on the analysis result, determines which of the information on stock prices is intentional information for working on stock prices. An information gathering and analysis system is described.
- Patent Documents 3-6 techniques for supporting analysis of information are disclosed in Patent Documents 3-6.
- the document data providing device extracts words from dated document data, adds up the number of words of each word for each field and period, obtains the appearance frequency of these words, and calculates the frequency of each field and each period. A certain number of words with a high frequency of appearance are extracted as feature words.
- the document data provision device displays the feature words of the document data of the period, and when a specific feature word is selected, the document header of the document data including the feature words Display
- the information analysis system which concerns on patent document 4 memorize
- This information department seat system merges the collected information and the geographical condition information associated with it, and the associated information is analyzed as merge information.
- Patent Document 5 describes a data processing apparatus that displays changes in trend information and factors thereof.
- the trend information extraction unit of the data processing apparatus extracts trend information to be processed from the acquired corpus.
- the factor information extraction unit extracts information presumed to be a factor of change in the extracted trend information.
- the key word extraction unit extracts key words presumed to be useful for analysis of trend information.
- the trend information display unit generates a graph indicating the fluctuation of the extracted trend information.
- the factor information display unit displays the factor information that has caused the fluctuation of the trend information in addition to the graph generated by the trend information display unit.
- the factor information display unit extracts and displays factor information useful for analysis of trend information according to a predetermined condition.
- Patent Document 6 describes a technique for providing a user with feedback information for improving a query.
- the query inspection device inspects a query using selectivity regarding the meaning and appearance feature of an image object to provide feedback information to the user.
- the feedback information includes the maximum and minimum number of matches for the query, alternatives to the elements of the query (meaning and appearance features), and the estimated number of images matching the query.
- JP 2007-087354 JP, 2009-163598 A Japanese Patent Laid-Open No. 2000-172701 JP, 2005-128893, A Japanese Patent Application Publication No. 2007-241905 Japanese Patent Application Laid-Open No. 11-328185
- the first problem of the techniques according to Patent Documents 1 to 6 is that the system needs to hold in advance a database of statistical values to be analyzed, such as company performance to be analyzed and economic indicators. Therefore, it is not possible to analyze statistics that are not held as a database.
- a method of acquiring arbitrary statistics data from an external corpus such as the Web for example, using a query of AND conditions consisting of a plurality of keywords such as "2001 AND N company AND sales" in a search engine of the Internet
- keywords such as "2001 AND N company AND sales”
- documents containing these keywords do not necessarily contain information on desired statistics.
- a document that hits “2001 AND N Company AND Sales” may include a document that is a noise regarding job information and a company outline in a news release.
- the present invention has been made in view of the above circumstances, and provides a trend information search device, a trend information search method, and a recording medium capable of automatically acquiring a document including trend information of statistics from an external corpus. With the goal.
- a trend information search device is It is a trend information search device for searching trend information of statistics,
- An expanded query generation unit that generates an expanded query by adding, as a search condition, a trend information element, which is a character string characteristically appearing in a document including trend information, to the input search condition;
- Search means for searching external data using the query generated by the expanded query generation means;
- Trend information evaluation means for evaluating the degree to which trend information of a statistic matching the input condition is included in the document searched by the search means based on the appearance mode of the trend information element in the document; And the like.
- a trend information search method is A trend information search method for searching a document including trend information of a statistic, An extended query generation step of adding a trend information element which is a character string characteristically appearing in a sentence representing trend information to the input search condition, and generating an expanded query; A search step for searching external data using the query generated in the expanded query generation step; A trend information evaluation step of evaluating the degree to which trend information of a statistic matching the input condition is included in the document searched in the search step, based on the appearance mode of the trend information element in the document; And the like.
- a computer readable recording medium storing a trend information search program is: On the computer An expanded query generation step of generating an expanded query by adding a trend information element which is a character string characteristically appearing in a sentence representing trend information to the inputted condition; A search step for searching external data using the query generated in the expanded query generation step; A trend information evaluation step of evaluating the degree to which trend information of a statistic matching the input condition is included in the document searched in the search step, based on the appearance mode of the trend information element in the document; Is recorded a program that is characterized in that
- the present invention it is possible to automatically acquire trend information of statistics on topics that the user is interested in from an external corpus such as the Web, even if the statistics are not held by the system.
- FIG. 6 is a diagram showing an example of a screen for inputting a search condition according to the first embodiment.
- FIG. 6 is a diagram showing an example of a screen for inputting a search condition according to the first embodiment.
- FIG. 5 is a diagram showing an example of data stored in a trend information storage unit in the first embodiment. 5 is a flowchart illustrating an example of trend information search processing according to the first embodiment. It is a block diagram which shows the structural example of the search device which concerns on Embodiment 2 of this invention.
- FIG. 16 is a diagram showing an example of data stored in a cause sentence storage unit in the second embodiment.
- FIG. 15 is a flowchart illustrating an example of trend information search processing according to the second embodiment. It is a block diagram which shows the structural example of the search device which concerns on Embodiment 3 of this invention. It is a flowchart which shows an example of the trend information search process which concerns on Embodiment 3.
- FIG. FIG. 18 is a diagram showing an example of data stored in a cause sentence storage unit in the third embodiment. It is a block diagram which shows the structural example of the search device which concerns on Embodiment 4 of this invention.
- FIG. 18 is a diagram showing an example of data stored in a reputation information storage unit in the fourth embodiment.
- 15 is a flowchart illustrating an example of trend information search processing according to the fourth embodiment. It is a block diagram which shows the example of the hardware constitutions of the search device based on Embodiment 1-4 of this invention.
- a sentence that describes the trend of statistics is characterized in that expressions that are elements for describing the trend of statistics appear in relation to each other. This element is called a "trend information element".
- the “trend information element” includes topic words, statistic names, term expressions, trend expressions, comparative expressions, unit expressions, and the like.
- a topic word is an expression that represents a topic that is a target of statistics. In the case of "the sales in 2001 of N company", “N company” corresponds to the topic word.
- the statistic name is an expression representing the type of statistic that is the object of the statistic. In the case of "sales in 2001 of N company", "sales” is the statistic name.
- period expression is an expression representing a period in which statistics are measured. In the case of "the sales in 2001 of N company", "2001” is the term expression.
- the trend expression is an expression that represents increase or decrease of the statistic (value). Examples of trend expressions include “increase”, “decrease”, “level”, “under and over”, “peak” and “bottoming”.
- a comparative expression is an expression used to compare statistics to some reference. Specific examples of comparative expressions include “YoY”, “YoY”, “YoY”, and “Change”.
- a unit expression is an expression used to describe the value of a statistic. For example, if it is a statistic related to the amount, such as “sales”, “net profit”, “GDP”, “family income”, “trillion yen”, “billion yen”, “1000 yen”, “yen” etc. correspond to this. In addition, if it is a statistical quantity such as “shipment number” or “sales number”, “1 billion units”, “1000 units”, “100 units”, “units”, etc. correspond to this. Furthermore, in the case of statistics concerning the number of people, such as “total population” and “number of users”, “1 billion people”, “1 million people”, “1000 people”, “people”, etc. correspond to this.
- the search device 100 (trend information search device) according to the first embodiment of the present invention includes, as shown in FIG. 1, a storage device 1, a data processing device 2, an input unit 3, and an output unit 4.
- the storage device 1 physically includes a hard disk, a flash memory, and the like, and functionally includes a trend information storage unit 11.
- the data processing device 2 physically comprises a CPU or the like, and functionally comprises an extended query generation unit 21, a trend information search unit 22, and a trend information determination unit 23.
- the input unit 3 includes a keyboard and a pointing device such as a mouse.
- the input unit 3 receives an input of information by the user, and transmits the input information to the data processing device 2.
- the input unit 3 receives, from the user as a search condition, a keyword representing a topic to be searched, a statistic name relating to the topic, and a period to be a target of statistics.
- the output unit 4 is configured of a display or the like.
- the output unit 4 displays the screen transmitted from the data processing device 2.
- FIG. 2 shows an example of a screen on which the user inputs search conditions.
- the search condition input screen C1 of FIG. 2 includes a form C11 for receiving an input of a topic, a form C12 for receiving an input of a statistic name, a form C13 for receiving an input of a year, and a search button C14.
- a search button C14 When the user presses the search button C14, a search is executed under the search conditions input to the forms C11 to C13 at that time.
- “N company” as a topic word, "sales” as a statistic name, and "2001” as a year are input.
- the screen for inputting the search condition is not limited to the above example.
- the term expression is not limited to the year, and may be quarter, month, week, and so on.
- the method of inputting the period expression may be a method of specifying the date and time of the beginning and the end of the period.
- the user may input a certain event, and the designated period may be before or after the date and time the event occurred.
- the expanded query generation unit 21 generates a query for searching for a document that is likely to include trend information related to the topic word, the statistic name, and the term expression input by the user.
- An example of a simple method of generating a query is a method of generating a query by connecting topic words, statistic names, and period expressions with an AND operator. Using this method, for example, the query “N company AND sales AND 2001” is generated for the search condition of FIG.
- the document containing "N company", "sales amount” and "2001” is not necessarily the document describing the fact that the sales of N company in 2001 decreased. Therefore, in order to obtain target trend information with higher probability, the expanded query generation unit 21 expands the query.
- Query expansion includes synonym expansion, trend expression expansion, comparison expression expansion, unit expansion, and the like.
- the expansion of the query by synonyms is to generate a query in which a plurality of synonyms registered in the synonym dictionary in advance are connected by the OR operator.
- the expansion of queries by synonyms includes expansion by topic words synonyms, expansion by statistical name synonyms, expansion by year synonyms, extension by trend synonyms, etc. For example, when the query is expanded with the official name (NXXX) of N company that is a synonym for the topic word “N company”, the query becomes “(N company OR NXXX)”. If the query is expanded by the synonym "income" to the statistic name "sales amount”, the query becomes “(sales amount OR income)". When the query is expanded with the synonym "2001” for the term expression "2001”, the query becomes "(2001 OR 2001)". When the query is expanded by the above synonyms with all the words entered as the search conditions in FIG. 2, the expanded query is “(N company OR NXXX) AND (sales volume OR income) AND (2001 OR 2001) It becomes ".
- the expansion of query by trend expression is to generate a query in which typical expressions used in describing increase or decrease of statistics are connected by OR operator.
- typical expressions used in describing the increase or decrease of statistics are “increase”, “decrease” and the like.
- the meaning of "increase” is “expansion”, “growth”, etc.
- Synonyms for "decrease” are “fall”, “reduction”, etc.
- the expanded query is “(N company OR NXXX) AND (sales volume OR income) AND (2001 OR 2001) AND (increase OR expansion OR growth OR decrease OR decline OR reduction).
- the query expansion method using trend expression is not limited to the above example.
- a method is also possible in which the user can limit the range of expansion by the trend expression.
- this method is used, a screen for the user to input search conditions is shown in FIG.
- the expanded query generation unit 21 expands the query by the trend expression, using only the expression that means decrease.
- the expansion query is “(N company OR NXXX) AND (sales volume OR income) AND (2001 OR 2001) AND (decrease OR drop OR reduction)”.
- the expansion of the query by the comparison expression is to generate a query in which typical expressions used in comparing temporal transitions of statistics are connected by an OR operator.
- typical expressions used when comparing the time transition of statistics are “transition” "year-on-year”, “year-on-year”, and "year-on-year”.
- the expanded query is “N (N Corporation OR NXXX) AND (Sales OR income) AND (2001 OR 2001) AND (Decrease OR decline OR contraction) AND (Trend OR OR OR Y / Y Y / Y Y / Y)
- the expansion of the query by unit expression is to generate a query in which units of statistics are connected by an OR operator.
- the unit is determined by statistics. Which unit expression corresponds to which statistic is defined and stored.
- the unit corresponding to the statistical amount "sales amount” is "trillion yen” "one billion yen” "one million yen” etc.
- the trend information search unit 22 searches the external data 5 using the expanded query generated by the expanded query generation unit 21, and passes the document group of the search result to the trend information determination unit 23.
- the external data 5 is a document on the Internet, a document stored in a document database in an intranet, or the like.
- the trend information search unit 22 may have its own search means, or may have a means for executing a search using an external search engine.
- the trend information determination unit 23 determines whether each document of the search result passed from the trend information search unit 22 is a document including the trend information intended by the user. For discrimination, the trend information discrimination unit 22 evaluates the degree to which the document includes trend information. This evaluation is performed based on the appearance of the trend information element in the document.
- the manner in which the trend information element appears in the document means, for example, the frequency at which the trend information element appears in the document, the frequency at which a predetermined language pattern appears, and the frequency at which the trend information appears in the document title.
- the language pattern referred to here indicates a type of word arrangement used to express a certain meaning in a document including trend information. Specific examples of language patterns are “ ⁇ topic word> ⁇ year>”, “ ⁇ year> ⁇ topic word>”, “ ⁇ year> ⁇ statistics>”, and “ ⁇ statistics> ⁇ year>” , Etc
- the degree to which the document includes the trend information element is represented by the integrated score S.
- the combined score S is calculated by any one or a combination of a topic score TS, a statistic score SS, a period score PS, a trend score MS, a comparison score CS, and a unit score US.
- the trend information determination unit 23 creates data in which the search keyword and the document ID designated by the user, and the sentences subjected to the determination are summarized, and stores the data in the trend information storage unit 11.
- the topic score TS is a score obtained by quantifying whether the document is a document related to a topic word input by the user.
- the topic score TS can be calculated using the number ts1 of topic words appearing in the document title and the number ts2 of topic words appearing in the text.
- the method of calculating the topic score TS is not limited to this.
- As another method of calculating the topic score TS for example, there is a method of adding the appearance frequency of related words of topic words or the product of the appearance frequency and the degree of association to the topic score TS.
- the related term of the topic word can be obtained as follows. (1) A set of documents searched by the trend information search unit 22 using the expanded query generated by the trend expression expansion unit 21 is G1. (2) A set of documents retrieved by the trend information search unit 22 using the query excluding the topic word and its synonyms among the expanded queries generated by the trend expression expansion unit 21 is G2.
- the appearance frequency of the word t in the document set G1 is F_G1 (t)
- the appearance frequency of the word t in the document set G2 is F_G2 (t).
- the value of R (t) F_G1 (t) / F_G2 (t) is taken as the association frequency of the word t and the topic element. Calculate R (t) for every word t included in the sentence.
- the words included in the document are arranged in descending order by R (t), and the top N words are taken as topic word related words. Note that N is a predetermined natural number, and R (t) is its degree of association.
- the statistic score SS is a score obtained by quantifying whether or not the retrieved document has a description related to the statistic input by the user.
- the statistic score SS is the number ss1 of occurrences of the language pattern “ ⁇ statistical word of ⁇ topic word>”, the number ss2 of statistics appearing in the document title, and the number of statistics appearing in the text It can be calculated from ss3.
- the period score PS is a score obtained by quantifying whether or not the retrieved document has a description regarding the period input by the user.
- the year score YS can be calculated using, for example, ys1, ys2, and ys3.
- ys1 is a language pattern of " ⁇ topic word> ⁇ year>"," ⁇ yearword> ⁇ topicword”,” ⁇ year> ⁇ statistics>", and " ⁇ statistics> ⁇ year>” (the trend information element
- the combination pattern is the number that appears in the text.
- ys2 is the number of year expressions appearing in the document title.
- ys3 is the number of year expressions that appear in the text.
- the period score PS can be defined by extending and applying the method of calculating the year score YS to a general period expression.
- the entered period represents a quarter or a month
- the expression (of course, including its synonyms) representing the year including the period is a target of calculation in obtaining PS It becomes. For example, first, numerical values are calculated for the input period elements in the same manner as the year score YS. Next, it is calculated similarly to the year score YS whether the expression showing the year including the period appears. Finally, the period score PS is calculated by weighting and adding the two numbers.
- the trend score MS is a score that quantifies whether or not the trend expression input by the user appears in the retrieved document.
- the trend score MS can be calculated based on ms1, ms2, and ms3.
- ms1 is the number of occurrences of the language pattern “ ⁇ statistics> is ⁇ trend expression>” in the text.
- ms2 is the number of trend expressions that appear in the document title.
- ms3 is the number of trend expressions appearing in the text.
- the comparison score CS is a score obtained by quantifying whether or not the search result document has a comparison expression such as "year-to-year ratio" or "transition".
- the comparison expression score CS can be calculated from cs1, cs2, and cs3.
- cs1 is the number of occurrences of the language pattern “ ⁇ statistics> is ⁇ comparison expression>” and “ ⁇ statistics> ⁇ comparisonexpression>” in the text.
- cs2 is the number of comparison expressions appearing in the document title.
- cs3 is the number of comparison expressions appearing in the text.
- the comparison score CS is a weighted linear sum of cs1, cs2, and cs3.
- the unit expression score US is a score obtained by quantifying whether or not there is a unit expression related to the statistic input by the user in the search result document.
- the unit score US can be calculated from us 1, us 2 and us 3.
- us1 is the number of occurrences of the language pattern “ ⁇ statistics> is ⁇ number> ⁇ unit>” and “ ⁇ statistics> is ⁇ number> ⁇ unit>” in the text.
- us2 is the number of unit expressions appearing in the document title.
- us3 is the number of unit expressions appearing in the text.
- the trend information determination unit 23 performs determination using the integrated score S.
- the integrated score S is calculated using the topic score TS, the statistic score SS, the year score YS, the trend expression score MS, the comparison expression score CS, and the unit expression score US.
- the integrated score S is a numerical value that evaluates the degree to which the document includes trend information of a statistic that matches the search condition.
- the weights W1 to W6 are numerical values arbitrarily determined based on experiments.
- the trend information determination unit 23 stores the document determined to include the trend information in the trend information storage unit 11. Further, the number of trend expression elements appearing in each paragraph in the document is counted, and the paragraph in which the appearance frequency of the trend expression element is most frequently stored in the trend information list in the trend information storage unit 11.
- topic score TS topic score TS, statistic score SS, year score YS, trend expression score MS, comparison expression score CS, unit expression score US, language pattern of each expression
- the method of calculating each score is not limited to this.
- the method of determining whether the text of the search result contains trend information intended by the user is not limited to the above example.
- the determination method may be, for example, a method using a pattern recognition method.
- supervised learning was performed using sentences including known trend information, using the number of matches of each expression to the language pattern, the appearance frequency in the title, and the appearance frequency in the text as feature vectors.
- Discrimination is performed using a discriminator.
- examples of classifiers used include support vector machines and neural networks.
- the trend information storage unit 11 stores trend information that is searched by the trend information search unit 22 and determined as trend information by the trend information determination unit 23 in association with the original document information.
- An example of data stored in the trend information storage unit 11 is shown in FIG.
- the document ID is identification information (ID: IDentifier) for distinguishing individual documents, and uses an address indicating the location of the document body such as a URL (Uniform Resource Locator) or a file path. It is also good.
- the topic word, the statistic name, the year (period expression), the document ID, and the trend information list are used as an example of data stored in the trend information storage unit 11, but in addition, by document ID
- the content of the document body to be shown, the creation date of the document, the update date, the creator, and other information may be stored, and the present invention is not limited to the content described in the present embodiment.
- the output unit 4 displays the trend information list (FIG. 4) stored in the trend information storage unit 11 as a search result for the user.
- trend information search processing 1 An example of processing (trend information search processing 1) in which the search device 100 generates an extended query, searches, and determines the acquired document will be described with reference to FIG.
- the expanded query generation unit 21 expands the search condition input in S11 to generate a query (S11).
- the expansion of the search condition is one or more expansion processes selected from an expansion by a synonymous element, an expansion by a trend element, an expansion by a comparison element, and an expansion by a unit element.
- the generated query is passed to the trend information search unit 22.
- the process of S11 will be specifically described by taking, as an example, the case where the topic word "N company", the statistic name "sales amount”, and the year expression "2001” are input on the search condition input screen C1 of FIG. .
- the case where the synonym extension, the trend expression extension, the comparison expression extension, and the unit expression extension are all performed will be described as an example.
- the query is “(N company OR NXXX) AND (sales volume OR income) AND (2001 OR 2001) AND (increase OR expansion OR growth OR decrease OR decrease OR reduction) AND (transition OR year-on-year OR previous year Year-on-year comparison OR year-on-year comparison) AND (Trillion yen OR 1 billion yen OR 1 million yen))
- the combination of query expansion processing may be any combination determined in advance or a combination set by the user.
- the trend information search unit 22 searches the external data 5 using the expanded query passed from the expanded query generation unit 21, and passes the document group of the search result to the trend information determination unit 23 (S12).
- the trend information determination unit 23 describes whether or not trend information of statistics matching the search condition designated by the user is described. (S13). The determination is performed based on any one of the topic score TS, the statistic score SS, the year score YS, the trend expression score MS, the comparison expression score CS, the unit expression score US, or a combination thereof. The score to be used may be a predetermined score or a score selected by the user. Then, the trend information determination unit 23 creates the data shown in FIG. 4 based on the determination result, and stores the data in the trend information storage unit 11.
- the data processing device 2 displays the trend information list stored in the trend information storage unit 11 as a search result on the output unit 4 (S14), and ends the process.
- the search device 100 generates an expanded query using trend information elements based on topic words, statistic names, and period expressions input by the user, and from the external data Search for documents containing relevant trend information.
- trend information elements such as topic word, statistic name, year (period expression), trend expression, comparison expression, unit expression, etc.
- a trend that conforms to the search condition input by the user in the text Determine if information can be included.
- the search device 100 can automatically acquire trend information of statistics on a topic that the user is interested in from an external corpus such as the Web. .
- the reason is that the expanded query is generated using the trend information element based on the topic word and statistic name input by the user, and the document including the matching trend information is retrieved from the external data and retrieved. This is because the degree of including trend information that matches the search condition input by the user is evaluated based on the appearance mode of the trend information element in the document.
- the search device 200 according to the second embodiment is characterized in that it has a function of extracting and storing a “cause statement” that explains the cause of the trend of statistics, as compared with the first embodiment.
- the search device 200 includes a cause sentence storage unit 12, a cause sentence candidate extraction unit 24, and a cause sentence determination unit 25.
- the cause sentence storage unit 12 stores a cause sentence which is extracted from the trend information storage unit 11 by the cause sentence candidate extraction unit 24 and determined as a sentence explaining the cause of the trend information by the cause sentence determination unit 25. .
- FIG. 7 shows an example of data stored in the cause sentence storage unit. Referring to FIG. 7, regarding the statistic name "sales amount" of the topic word "N company", the cause sentence of the document D01 which is "decreased” in the 2001 fiscal year is "personal products centered on personal computers: 25.8% It can be understood that the description is "... due to the decrease.”
- a combination of a topic word, a statistic name, a term expression, a trend expression, a document ID, and a cause sentence list is used as an example of data stored in the cause sentence storage unit 12.
- information such as the content of the document body indicated by the document ID, the creation date of the document, the update date, and the creator may be stored, and the present invention is not limited to the content described in this embodiment.
- the cause sentence candidate extraction unit 24 includes, from each document of the document group stored in the trend information storage unit 11, a sentence including a language pattern representing a cause such as “influence” “cause” “for” “with”. Extract The cause sentence candidate extraction unit 24 passes the extracted sentence to the cause sentence determination unit 25 as a cause sentence candidate for explaining the cause of the trend information specified by the user.
- the cause sentence determination unit 25 determines whether each cause sentence candidate passed from the cause sentence candidate extraction unit 24 is a cause sentence. The determination is performed using the following numerical values.
- the numerical value is the appearance frequency FT of the topic word or its related term input by the user in the sentence, the appearance frequency FS of the statistic expression in the sentence, the appearance frequency FY of the year expression in the sentence, the sentence The appearance frequency FM of the trend expression in 1, the appearance frequency FC of the comparison expression in the sentence, and the appearance frequency FU of the unit expression in the sentence.
- the cause sentence determining unit 25 determines whether the sentence of the cause sentence candidate is a cause sentence explaining the cause of the trend information specified by the user. .
- the appearance frequency FY of the year expression can generally be replaced with the appearance frequency of the term expression.
- the cause sentence determination unit 25 stores the search condition and the document ID designated by the user, and the list of sentences determined as the cause sentence in the cause sentence storage unit 12.
- the determination is performed by the integrated score F.
- the integrated score F is a score obtained by evaluating the degree to which the cause sentence candidate is the cause sentence.
- the weights V1 to V6 and the threshold value ⁇ are predetermined values obtained empirically.
- the combination of the score to be used may be a predetermined arbitrary combination, and may be a combination set by the user.
- the method of calculating the integrated score F as a weighted linear sum of FT, FS, FY, FM, FC, and FU is described.
- the method of determining the integrated score F is not limited to this.
- the method of determining whether the sentence of the cause sentence candidate is the cause sentence is not limited to the above example.
- the determination method may be performed using, for example, a method of pattern recognition.
- supervised learning was performed using sentences including known trend information, using the number of matches of each expression to the language pattern, the appearance frequency in the title, and the appearance frequency in the text as feature vectors.
- Discrimination is performed using a discriminator.
- examples of classifiers used include support vector machines and neural networks.
- the output unit 4 integrates the trend information list stored in the trend information storage unit 11 and the cause sentence list stored in the cause sentence storage unit 12 and displays the result as a search result.
- FIG. 8 shows an example of a screen displaying a search result.
- the search result screen C3 in the example of FIG. 8 displays a list of documents determined to include trend information and a cause sentence. Also, the document ID portion is a link, and by clicking, the document body can be accessed.
- the trend information search process 2 differs from the trend information search process 1 of the first embodiment shown in FIG. 5 in that it includes a cause sentence candidate extraction process (S24) and a cause sentence determination process (S25).
- the processes of S21 to S23 are the same as the processes of S11 to S13 of the trend information search process 1 shown in FIG.
- the cause sentence candidate extraction unit 24 extracts candidate cause sentences from each document of the document group stored in the trend information storage unit 11. Do.
- the document to be extracted is a sentence including a language pattern that indicates the cause, such as “influence”, “cause”, “reason”, “for”, “in conjunction with”, and the like.
- the cause sentence candidate extraction unit 24 passes the extracted cause sentence candidate to the cause sentence determination unit 25 (S24).
- the cause sentence determination unit 25 determines whether each of the cause sentence candidate sentences extracted by the cause sentence candidate extraction unit 24 is a cause sentence (S25). Discrimination is performed using the integrated score F calculated using the following numerical values.
- the numerical values are the frequency of occurrence FT of the topic word or its related words input by the user in the document, the frequency of occurrence FS of the statistic expression, the frequency of occurrence FY of the year expression, and the frequency FM of the trend expression , And one or more combinations of the frequency of occurrence FC of the comparison expression and the frequency of occurrence FU of the unit expression.
- the combination of numerical values to be used may be any combination determined in advance, or may be a combination set by the user.
- the cause sentence determination unit 25 creates the list shown in FIG. 7 from the determination result, and stores the list in the cause sentence storage unit 12.
- the data processing device 2 integrates the trend information list stored in the trend information storage unit 11 and the cause sentence list stored in the cause sentence storage unit 12 and displays the result on the output unit 4 as a search result. (S27), the process ends.
- the search apparatus 200 extracts candidates of cause sentences explaining the cause of the trend information with the language pattern representing the cause as a clue, and whether or not it is the cause sentence from the appearance frequency of the trend information element To determine the Thus, for trend information automatically acquired from an external corpus such as the Web, it is possible to extract a causative sentence describing the trend information.
- the search device 300 according to the third embodiment is characterized in that it includes a year expression expansion unit 26 in addition to the configuration described in the second embodiment.
- the other configuration is the same as that of the second embodiment.
- the year expression expansion unit 26 generates a year expression query corresponding to each of the Y years before and after the year input by the user, and repeatedly executes trend information search processing, trend information determination processing, and cause sentences for each year. It instructs the downstream to perform candidate extraction processing and cause sentence identification processing.
- FIG. 11 is a flowchart illustrating an example of the operation of trend information search according to the third embodiment.
- the year expression expansion process (S30) and the process of confirming whether or not the search process has ended for all the expanded years (S36) Differs in that it contains
- the search target is the period from fiscal 1998 to fiscal 2004.
- the search process is performed for seven years from fiscal 1998 to fiscal 2004.
- the fiscal year query used for the first search is "fiscal year 1998" and the second is "fiscal year 1999".
- the trend expression expansion unit 21 generates an expansion query using the year query generated by the year expression expansion unit 26 (S31).
- the trend information search unit 22, the trend information determination unit 23, the cause sentence candidate extraction unit 24, and the cause sentence determination unit 25 perform trend information search (S32), trend information determination (S33), cause sentence candidate extraction (S34). And cause statement determination (S35).
- the processes of steps S32 to S35 are the same as the processes of steps S22 to S25 of FIG.
- step S36 the year expression expansion unit 26 checks whether or not the process has been performed for all the years included in the expanded period. If an unprocessed year remains (step S36; NO), the process target is set to the next year, and the process returns to step S30 to repeat the processing following the trend expression expansion. If the process has ended for all the years included in the extended period (step S36; YES), the process is ended.
- FIG. 12 An example of data stored in the cause sentence storage unit in the third embodiment is shown in FIG. It can be seen from FIG. 12 that sales of company N fluctuated from 1998 to 2004 due to different causes.
- the unit of the period for searching the trend information by the year has been described as an example.
- the unit of the period is not limited to the year.
- the term expression may be in units of quarters, months, weeks, etc., or an expression specifying the date and time of the beginning and the end of the term.
- the period expansion unit instead of the year expression expansion unit 26, the period expansion unit extends the search target period to a predetermined range before and after the search period on the basis of the designated period.
- the search device 300 repeatedly generates the expanded query over a predetermined range before and after the period input by the user, and searches for the trend information and the cause sentence. Therefore, the user can grasp the trend of statistics and the transition of the cause of the trend before and after the period in which the user is interested.
- FIG. 4 a configuration example of the search device 400 according to the fourth embodiment will be described with reference to FIG.
- the configuration of the search device 400 differs from the configuration of the search device 300 shown in FIG. 10 in that the reputation information extraction unit 27 and the reputation information storage unit 13 are provided.
- the other configuration is the same as that of the third embodiment.
- the reputation information extraction unit 27 extracts the sender information of the document from which the cause sentence is extracted, and determines whether the reputation in the document is positive or negative.
- the reputation determination unit stores the determination result in the reputation information storage unit 13.
- the sender information is the domain name of the web site, the meta information of the document, the signature described in the news article, and the like.
- the positive expression dictionary stores positive expressions such as “wonderful”, “good”, and “good”.
- the negative expression dictionary stores negative expressions such as "slowness”, “deterioration”, “slowness” and the like. In this example, if the ratio FP / FN of the appearance frequency FP of the positive expression to the appearance frequency FN of the negative expression in the document is 1 or more, the positive reputation is determined, and if less than 1, the negative reputation is discriminated.
- the reputation information storage unit 13 stores information on the year, the document ID, the sender ID, and the reputation as additional information related to the document stored in the cause sentence storage unit 12.
- FIG. 14 shows an example of data stored in the reputation information storage unit.
- the sender P01 sends documents with positive and negative reputations depending on the year, but the sender P02 always sends negative documents regardless of the year, and the sender P03 does not It can be seen that they always send out positive documents.
- trend information search processing 4 An example of processing (trend information search processing 4) performed in the search device 400 will be described with reference to FIG.
- the operation of the trend information search according to the fourth embodiment differs from the trend information search process 3 shown in FIG. 11 in that it includes a reputation information extraction process (S46).
- trend information search processing 4 When the user presses a search execution button, trend information search processing 4 is executed.
- the processing contents from the year expression expansion process (S40) in FIG. 15 to the cause sentence discrimination (S45) are the same as the operations in S30 to S35 in FIG.
- the reputation information extraction unit 27 extracts the sender information of the document from which the cause sentence is extracted. Next, the reputation information extraction unit 27 determines whether the reputation in this document is positive or negative. Then, the reputation information extraction unit 27 stores the determination result in the reputation information storage unit 13 (S46).
- step S47 If the process has not been completed for all the years included in the expanded period (step S47; NO), the process returns to step S40, the process target is set to the next year, and the process following the trend expression expansion is repeated. If the process has ended for all the years included in the expanded period (step S47; YES), the process ends.
- the search device 400 extracts the sender information of the document from which the cause sentence is extracted, and determines whether the reputation in the document is positive or negative.
- the user can grasp the transition of what kind of reputation document a certain sender sends out each year.
- FIG. 16 shows an example of the hardware configuration of the search device (the search device 100, the search device 200, the search device 300, and the search device 400) according to the embodiment of the present invention.
- the search apparatus (search apparatus 100 and search apparatus 200 and search apparatus 300 and search apparatus 400) are, as shown in FIG. 16, a control unit 31, a main storage unit 32, an external storage unit 33, an operation unit 34, a display unit 35, The transceiver unit 36 is provided.
- the main storage unit 32, the external storage unit 33, the operation unit 34, the display unit 35, and the transmission / reception unit 36 are all connected to the control unit 31 via the internal bus 38.
- the control unit 31 is configured of a CPU (Central Processing Unit) or the like.
- the control unit 31 executes processing in accordance with the trend information search program 37 stored in the external storage unit 33.
- the main storage unit 32 is configured by a RAM (Random-Access Memory) or the like.
- the main storage unit 32 loads the trend information search program 37 stored in the external storage unit 33, and is used as a work area of the control unit 31.
- the external storage unit 33 includes a flash memory, a hard disk, a DVD-RAM (Digital Versatile Disc Random-Access Memory), a DVD-RW (Digital Versatile Disc Rewritable), and the like.
- the external storage unit 33 stores the trend information search program 37 in advance. Further, the external storage unit 33 supplies the stored data to the control unit 31 according to the instruction of the control unit 31 and stores the data supplied from the control unit 31.
- the trend information storage unit 11, the cause sentence storage unit 12 and the reputation information storage unit 13 are configured by storage areas secured in the external storage unit 33.
- a part or all of the trend information storage unit 11, the cause sentence storage unit 12 and the reputation information storage unit 13 may be temporarily configured as a part of the storage area of the main storage unit 32.
- the operation unit 34 includes a keyboard and a pointing device such as a mouse, and an interface device for connecting the keyboard and the pointing device to the internal bus 38.
- the user uses the operation unit 34 to input keywords of trend information and the like.
- the display unit 35 is configured of a CRT (Cathode Ray Tube) or an LCD (Liquid Crystal Display).
- the display unit 35 displays a screen for inputting a search keyword or a search result.
- the display unit 35 may also be configured of a printer and its interface device.
- the transmission / reception unit 36 is configured of communication devices and a serial interface or LAN (Local Area Network) interface connected to them.
- the transmitting and receiving unit 36 transmits a query to a search engine on the Internet, a document database in an intranet, and the like via a network (not shown), and receives document data of a search result.
- Functions of the expanded query generation unit 21, the trend information search unit 22, the trend information determination unit 23, the cause sentence candidate extraction unit 24, the cause sentence determination unit 25, the year expression expansion unit 26 and the reputation information extraction unit 27 are the control unit 31, This is realized by executing the trend information search program 37 using the main storage unit 32, the external storage unit 33, the operation unit 34, the display unit 35, the transmission / reception unit 36, and the like.
- the main part that performs processing for the search device including the control unit 31, the main storage unit 32, the external storage unit 33, the transmitting and receiving unit 36, etc. is not a dedicated system but an ordinary computer system. It can be realized using.
- a computer program for executing the above-mentioned operation is stored and distributed in a computer readable recording medium (flexible disc, CD-ROM, DVD-ROM, etc.), and the computer program is installed in the computer.
- a search device may be configured to execute the above process.
- the computer program may be stored in the storage device 1 of a server device on a communication network such as the Internet, and the search device may be configured by a normal computer system downloading or the like.
- the computer program may be posted on a bulletin board (BBS: Bulletin Board System) on a communication network, and the computer program may be distributed via the network. Then, the computer program may be activated and executed in the same manner as other application programs under the control of the OS so that the above-described processing can be executed.
- BSS Bulletin Board System
- the search device of the present invention can be used to collect decision materials in analyzing the transition of company performance and stock prices or the transition of macroeconomic indicators.
Abstract
Description
特許文献3に係る文書データ提供装置は、日付つき文書データから単語を抽出し、分野、期間毎に各単語の単語数を集計し、これらの単語の出現頻度を求め、各分野および各期間の出現頻度の大きい一定数の単語を特徴語として抽出する。この文書データ提供装置は利用者により分野と期間が指定されると、その期間の文書データの特徴語を表示し、特定の特徴語が選択されたならその特徴語を含む文書データの文書見出し等を表示する。
統計量の動向情報を検索する動向情報検索装置であって、
入力された検索条件に、動向情報を含む文書に特徴的に現れる文字列である動向情報要素を検索条件として付加して、拡張されたクエリを生成する拡張クエリ生成手段と、
前記拡張クエリ生成手段で生成されたクエリを用いて外部データを検索するための検索手段と、
前記検索手段によって検索された文書に、前記入力した条件に適合する統計量の動向情報が含まれる程度を、当該文書における前記動向情報要素の出現様態に基づいて評価する動向情報評価手段と、
を備えることを特徴とする。
統計量の動向情報を含む文書を検索する動向情報検索方法であって、
入力された検索条件に、動向情報を表す文章に特徴的に現れる文字列である動向情報要素を付加し、拡張されたクエリを生成する拡張クエリ生成ステップと、
前記拡張クエリ生成ステップで生成されたクエリを用いて外部データを検索するための検索ステップと、
前記検索ステップで検索された文書に、前記入力した条件に適合する統計量の動向情報が含まれる程度を、当該文書における前記動向情報要素の出現様態に基づいて評価する動向情報評価ステップと、
を備えることを特徴とする。
コンピュータに、
入力された条件に、動向情報を表す文章に特徴的に現れる文字列である動向情報要素を付加することによって拡張したクエリを生成する拡張クエリ生成ステップ、
前記拡張クエリ生成ステップで生成されたクエリを用いて外部データを検索するための検索ステップ、
前記検索ステップで検索された文書に、前記入力した条件に適合する統計量の動向情報が含まれる程度を、当該文書における前記動向情報要素の出現様態に基づいて評価する動向情報評価ステップ、
を実行させることを特徴とするプログラムを記録している。
本発明の実施形態1に係る検索装置100(動向情報検索装置)は、図1に示すように、記憶装置1と、データ処理装置2と、入力部3と、出力部4と、を備える。
記憶装置1は、物理的にはハードディスクやフラッシュメモリなどから構成され、機能的には動向情報記憶部11を備える。
データ処理装置2は、物理的にはCPU等から構成され、機能的には、拡張クエリ生成部21、動向情報検索部22、動向情報判別部23から構成される。
入力部3は、利用者から検索対象となるトピックを表すキーワードと、そのトピックに関係する統計量名と、統計の対象となる期間と、を検索条件として受け付ける。
この方法を使用すると、例えば、図2の検索条件に対して、クエリ「N社 AND 売上高 AND 2001年」が生成される。しかし、前記したように、単に「N社」「売上高」「2001年」が含まれる文書が、2001年のN社の売上高が減少した事実を記載した文書であるとは限らない。そこで、より高い確率で目的とする動向情報を得るために、拡張クエリ生成部21はクエリの拡張を行う。クエリの拡張には、同義語による拡張、動向表現による拡張、比較表現による拡張、単位による拡張、などが含まれる。
TS=W11・ts1+W12・ts2
から計算できる。ここで、重みW11と重みW12は実験に基づき任意に決められた値であるが、W11>W12であることが好ましい。
(1)動向表現拡張部21が生成した拡張クエリを用いて動向情報検索部22が検索した文書集合をG1とする。
(2)動向表現拡張部21が生成した拡張クエリのうち、トピック語とその同義語を除いたクエリを用いて動向情報検索部22が検索した文書集合をG2とする。
(3)文書集合G1での単語tの出現頻度をF_G1(t)、文書集合G2での単語tの出現頻度をF_G2(t)とする。
(4)R(t)=F_G1(t)/F_G2(t)の値を単語tとトピック要素の関連度数とする。文章に含まれるすべての単語tについてR(t)を計算する。文書に含まれる各単語をR(t)で降順に並べ、上位N個の単語をトピック語の関連語とする。なお、Nは所定の自然数としR(t)をその関連度とする。
SS=W21・ss1+W22・ss2+W23・ss3
として計算できる。ここで、重みW21、重みW22、重みW23、は実験に基づいて任意に決められた値であるが、W21>W22>W23であることが好ましい。
YS=W31・ys1+W32・ys2+W33・ys3
として計算できる。ここで、重みW31、W32、W33は実験に基づき任意に決められた値であるが、W31>W32>W33であることが好ましい。
MS=W41・ms1+W42・ms2+W43・ms3
として計算できる。ここで、重みW41、重みW42、重みW43、は実験に基づき任意に決められた数値であるが、W41>W42>W43であることが好ましい。
CS=W51・cs1+W52・cs2+W53・cs3
として計算できる。ここで、重みW51、重みW52、重みW53、は実験に基づき任意に定めた値であるが、W51>W52>W53であることが好ましい。
CS=W61・us1+W62・us2+W63・us3
として計算できる。ここで、重みW61、重みW62、重みW63、は実験に基づき任意に定めた値であるが、W61>W62>W63であることが好ましい。
S=W1・TS+W2・SS+W3・YS+W4・MS+W5・CS+W6・US
として計算できる。動向情報判別部23は、統合スコアSがあらかじめ定めた閾値θを超えた場合に、その文書に動向情報が含まれていると判別する。ここで、重みW1~W6は、実験に基づき任意に定めた数値である。
次に本発明の実施形態2について説明する。実施形態2に係る検索装置200は、実施形態1と比べて、統計量の動向の原因を説明する「原因文」を抽出して記憶する機能を持つ点を特徴とする。
原因文判別部25は、利用者の指定した検索条件と文書ID、および、原因文と判別された文のリストを原因文記憶部12に格納する。
F=V1・FT+V2・FS+V3・FY+V4・FM+V5・FC+V6・FU
から計算される。統合スコアFが所定の閾値ωを超えた場合に、原因文判別部25はその候補文が原因文であると判別する。ここで、重みV1~V6及び閾値ωは、経験的に求められた所定の値である。なお、使用されるスコアの組み合わせは、予め定められた任意の組み合わせでも良いし、利用者が設定した組み合わせでも良い。
次に実施形態3について説明する。実施形態3に係る検索装置300は、図5に示すように、実施形態2で説明した構成に加え年度表現拡張部26を備えている点に特徴がある。その他の構成は、実施の形態2と同様である。
次に本発明の実施形態4について説明する。まず、実施形態4に係る検索装置400の構成例を、図13を参照して説明する。検索装置400の構成は、図10に示された検索装置300の構成と比較すると、評判情報抽出部27と評判情報記憶部13とを備える点で異なる。その他の構成は、実施の形態3と同様である。
2 データ処理装置
3 入力部
4 出力部
11 動向情報記憶部
12 原因文記憶部
13 評判情報記憶部
21 拡張クエリ生成部
22 動向情報検索部
23 動向情報判別部
24 原因文候補抽出部
25 原因文判別部
26 年度表現拡張部
27 評判情報抽出部
31 制御部
32 主記憶部
33 外部記憶部
34 操作部
35 表示部
36 送受信部
37 動向情報検索用プログラム
38 内部バス
100 検索装置
200 検索装置
300 検索装置
400 検索装置
Claims (10)
- 統計量の動向情報を検索する動向情報検索装置であって、
入力された検索条件に、動向情報を含む文書に特徴的に現れる文字列である動向情報要素を検索条件として付加して、拡張されたクエリを生成する拡張クエリ生成手段と、
前記拡張クエリ生成手段で生成されたクエリを用いて外部データを検索するための検索手段と、
前記検索手段によって検索された文書に、前記入力した条件に適合する統計量の動向情報が含まれる程度を、当該文書における前記動向情報要素の出現様態に基づいて評価する動向情報評価手段と、
を備えることを特徴とする動向情報検索装置。 - 前記動向情報要素は、トピック語、統計量名、期間表現、動向表現、比較表現、もしくは単位表現、またはそれらの組合せのうちの少なくとも1つを含み、
前記拡張クエリ生成手段は、前記動向情報要素の同義語を用いて前記クエリを生成する、
ことを特徴とする請求項1に記載の動向情報検索装置。 - 前記動向情報要素は、トピック語、統計量名、期間表現、動向表現、比較表現、もしくは単位表現、またはそれらの組合せのうちの少なくとも1つを含み、
前記動向情報評価手段は、前記動向情報要素の同義語の出現様態に基づいて、前記入力した条件に適合する統計量の動向情報が含まれる程度を評価する、
ことを特徴とする請求項1または2に記載の動向情報検索装置。 - 前記動向情報評価手段は、前記動向情報要素及びその同義語と、所定の言語パターンと、が前記文書に現れる頻度から算出されるスコアによって前記入力した条件に適合する統計量の動向情報が含まれる程度を評価する、
ことを特徴とする請求項3に記載の動向情報検索装置。 - 前記検索手段によって検索された文書から、原因を表す言語パターンを含む一又は複数の文を抽出し、前記入力した条件に適合する統計量の動向の原因を説明する原因文の候補とする原因文候補抽出手段と、
前記原因文の候補が、前記統計量の動向の原因を説明する原因文である程度を、前記動向情報要素の出現頻度に基づいて評価する原因文評価手段と、
をさらに備えることを特徴とする請求項1ないし4のいずれか1項に記載の動向情報検索装置。 - 前記動向情報要素は、トピック語、統計量名、期間表現、動向表現、比較表現、もしくは単位表現、またはそれらの組合せのうちの少なくとも1つを含む、
ことを特徴とする請求項5に記載の動向情報検索装置。 - 前記原因文候補抽出手段によって前記原因文の候補が抽出された文書について、その文書の発信者情報を抽出し、前記文書内の評判がポジティブかネガティブかを評価する評判情報抽出手段を、
さらに備えることを特徴とする請求項5または6に記載の動向情報検索装置。 - 前記入力された条件の期間を含む前後の期間に拡張したクエリを生成する期間表現拡張手段を、
さらに備えることを特徴とる請求項1ないし7のいずれか1項に記載の動向情報検索装置。 - 統計量の動向情報を含む文書を検索する動向情報検索方法であって、
入力された検索条件に、動向情報を表す文章に特徴的に現れる文字列である動向情報要素を付加し、拡張されたクエリを生成する拡張クエリ生成ステップと、
前記拡張クエリ生成ステップで生成されたクエリを用いて外部データを検索するための検索ステップと、
前記検索ステップで検索された文書に、前記入力した条件に適合する統計量の動向情報が含まれる程度を、当該文書における前記動向情報要素の出現様態に基づいて評価する動向情報評価ステップと、
を備えることを特徴とする動向情報検索方法。 - コンピュータに、
入力された条件に、動向情報を表す文章に特徴的に現れる文字列である動向情報要素を付加することによって拡張したクエリを生成する拡張クエリ生成ステップ、
前記拡張クエリ生成ステップで生成されたクエリを用いて外部データを検索するための検索ステップ、
前記検索ステップで検索された文書に、前記入力した条件に適合する統計量の動向情報が含まれる程度を、当該文書における前記動向情報要素の出現様態に基づいて評価する動向情報評価ステップ、
を実行させることを特徴とする動向情報検索用プログラムを記録したコンピュータ読み取り可能な記録媒体。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/574,148 US20120284305A1 (en) | 2010-01-19 | 2011-01-18 | Trend information search device, trend information search method and recording medium |
JP2011550913A JP5786718B2 (ja) | 2010-01-19 | 2011-01-18 | 動向情報検索装置、動向情報検索方法およびプログラム |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2010009085 | 2010-01-19 | ||
JP2010-009085 | 2010-01-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2011090036A1 true WO2011090036A1 (ja) | 2011-07-28 |
Family
ID=44306838
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2011/050783 WO2011090036A1 (ja) | 2010-01-19 | 2011-01-18 | 動向情報検索装置、動向情報検索方法および記録媒体 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20120284305A1 (ja) |
JP (1) | JP5786718B2 (ja) |
WO (1) | WO2011090036A1 (ja) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104331493A (zh) * | 2014-11-17 | 2015-02-04 | 百度在线网络技术(北京)有限公司 | 通过计算机实现的用于生成趋势解释数据的方法及装置 |
JP6155409B1 (ja) * | 2017-01-23 | 2017-06-28 | 株式会社xenodata lab. | 決算分析システムおよび決算分析プログラム |
JP2018120567A (ja) * | 2017-01-23 | 2018-08-02 | 株式会社xenodata lab. | 決算分析システムおよび決算分析プログラム |
JP2020129232A (ja) * | 2019-02-07 | 2020-08-27 | 株式会社日本総合研究所 | 機械学習装置、プログラム及び機械学習方法 |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10922363B1 (en) * | 2010-04-21 | 2021-02-16 | Richard Paiz | Codex search patterns |
US11048765B1 (en) | 2008-06-25 | 2021-06-29 | Richard Paiz | Search engine optimizer |
US11809506B1 (en) | 2013-02-26 | 2023-11-07 | Richard Paiz | Multivariant analyzing replicating intelligent ambience evolving system |
US11741090B1 (en) | 2013-02-26 | 2023-08-29 | Richard Paiz | Site rank codex search patterns |
US20140280017A1 (en) * | 2013-03-12 | 2014-09-18 | Microsoft Corporation | Aggregations for trending topic summarization |
US9244952B2 (en) | 2013-03-17 | 2016-01-26 | Alation, Inc. | Editable and searchable markup pages automatically populated through user query monitoring |
KR102425770B1 (ko) * | 2020-04-13 | 2022-07-28 | 네이버 주식회사 | 급상승 검색어 제공 방법 및 시스템 |
CN113642974A (zh) * | 2020-05-10 | 2021-11-12 | 张孟强 | 基于求职招聘双方需求的循环双向竞价匹配方法与系统 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002329056A (ja) * | 2001-04-27 | 2002-11-15 | Mitsubishi Electric Corp | 情報処理装置及び情報処理方法 |
JP2004192374A (ja) * | 2002-12-12 | 2004-07-08 | Ricoh Co Ltd | 文書検索装置、プログラムおよび記録媒体 |
JP2006146802A (ja) * | 2004-11-24 | 2006-06-08 | Mitsubishi Electric Corp | テキストマイニング装置およびテキストマイニング方法 |
JP2008541233A (ja) * | 2005-05-04 | 2008-11-20 | グーグル・インコーポレーテッド | オリジナルのユーザ入力に基づくユーザ入力の提案および絞込み |
Family Cites Families (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5675819A (en) * | 1994-06-16 | 1997-10-07 | Xerox Corporation | Document information retrieval using global word co-occurrence patterns |
US6581056B1 (en) * | 1996-06-27 | 2003-06-17 | Xerox Corporation | Information retrieval system providing secondary content analysis on collections of information objects |
US6038560A (en) * | 1997-05-21 | 2000-03-14 | Oracle Corporation | Concept knowledge base search and retrieval system |
US6201884B1 (en) * | 1999-02-16 | 2001-03-13 | Schlumberger Technology Corporation | Apparatus and method for trend analysis in graphical information involving spatial data |
US7194483B1 (en) * | 2001-05-07 | 2007-03-20 | Intelligenxia, Inc. | Method, system, and computer program product for concept-based multi-dimensional analysis of unstructured information |
US7069263B1 (en) * | 2002-02-19 | 2006-06-27 | Oracle International Corporation | Automatic trend analysis data capture |
US8375286B2 (en) * | 2002-09-19 | 2013-02-12 | Ancestry.com Operations, Inc. | Systems and methods for displaying statistical information on a web page |
US7240049B2 (en) * | 2003-11-12 | 2007-07-03 | Yahoo! Inc. | Systems and methods for search query processing using trend analysis |
US8375048B1 (en) * | 2004-01-20 | 2013-02-12 | Microsoft Corporation | Query augmentation |
US7958115B2 (en) * | 2004-07-29 | 2011-06-07 | Yahoo! Inc. | Search systems and methods using in-line contextual queries |
US20060047636A1 (en) * | 2004-08-26 | 2006-03-02 | Mohania Mukesh K | Method and system for context-oriented association of unstructured content with the result of a structured database query |
US8135694B2 (en) * | 2006-03-13 | 2012-03-13 | Adobe Systems Incorporated | Augmenting the contents of an electronic document with data retrieved from a search |
US7877381B2 (en) * | 2006-03-24 | 2011-01-25 | International Business Machines Corporation | Progressive refinement of a federated query plan during query execution |
US7475063B2 (en) * | 2006-04-19 | 2009-01-06 | Google Inc. | Augmenting queries with synonyms selected using language statistics |
US8126874B2 (en) * | 2006-05-09 | 2012-02-28 | Google Inc. | Systems and methods for generating statistics from search engine query logs |
US7860886B2 (en) * | 2006-09-29 | 2010-12-28 | A9.Com, Inc. | Strategy for providing query results based on analysis of user intent |
KR100837751B1 (ko) * | 2006-12-12 | 2008-06-13 | 엔에이치엔(주) | 문서 집합을 기반으로 단어 간의 연관도를 측정하는 방법및 상기 방법을 수행하는 시스템 |
US8166026B1 (en) * | 2006-12-26 | 2012-04-24 | uAffect.org LLC | User-centric, user-weighted method and apparatus for improving relevance and analysis of information sharing and searching |
US10394771B2 (en) * | 2007-02-28 | 2019-08-27 | International Business Machines Corporation | Use of search templates to identify slow information server search patterns |
JP4810469B2 (ja) * | 2007-03-02 | 2011-11-09 | 株式会社東芝 | 検索支援装置、プログラム及び検索支援システム |
JP5168961B2 (ja) * | 2007-03-19 | 2013-03-27 | 富士通株式会社 | 最新評判情報通知プログラム、記録媒体、装置及び方法 |
JP4359787B2 (ja) * | 2007-07-02 | 2009-11-04 | ソニー株式会社 | 情報処理装置、コンテンツの評判検索方法およびコンテンツの評判検索システム |
CN101339551B (zh) * | 2007-07-05 | 2013-01-30 | 日电(中国)有限公司 | 自然语言查询需求扩展设备及其方法 |
JP5309543B2 (ja) * | 2007-12-06 | 2013-10-09 | 日本電気株式会社 | 情報検索サーバ、情報検索方法及びプログラム |
US20110246889A1 (en) * | 2008-12-10 | 2011-10-06 | Herman Moore | Statistical and visual sports analysis system |
US8756229B2 (en) * | 2009-06-26 | 2014-06-17 | Quantifind, Inc. | System and methods for units-based numeric information retrieval |
-
2011
- 2011-01-18 JP JP2011550913A patent/JP5786718B2/ja active Active
- 2011-01-18 US US13/574,148 patent/US20120284305A1/en not_active Abandoned
- 2011-01-18 WO PCT/JP2011/050783 patent/WO2011090036A1/ja active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002329056A (ja) * | 2001-04-27 | 2002-11-15 | Mitsubishi Electric Corp | 情報処理装置及び情報処理方法 |
JP2004192374A (ja) * | 2002-12-12 | 2004-07-08 | Ricoh Co Ltd | 文書検索装置、プログラムおよび記録媒体 |
JP2006146802A (ja) * | 2004-11-24 | 2006-06-08 | Mitsubishi Electric Corp | テキストマイニング装置およびテキストマイニング方法 |
JP2008541233A (ja) * | 2005-05-04 | 2008-11-20 | グーグル・インコーポレーテッド | オリジナルのユーザ入力に基づくユーザ入力の提案および絞込み |
Non-Patent Citations (1)
Title |
---|
YASUHIRO UENISHI ET AL.: "Sotai Hyogen ni Motozuita Doko Joho Chushutsu System no Kochiku", PROCEEDINGS OF THE 15TH ANNUAL MEETING OF THE ASSOCIATION FOR NATURAL LANGUAGE PROCESSING, 2 March 2009 (2009-03-02), pages 160 - 163 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104331493A (zh) * | 2014-11-17 | 2015-02-04 | 百度在线网络技术(北京)有限公司 | 通过计算机实现的用于生成趋势解释数据的方法及装置 |
CN104331493B (zh) * | 2014-11-17 | 2017-07-07 | 百度在线网络技术(北京)有限公司 | 通过计算机实现的用于生成趋势解释数据的方法及装置 |
JP6155409B1 (ja) * | 2017-01-23 | 2017-06-28 | 株式会社xenodata lab. | 決算分析システムおよび決算分析プログラム |
JP2018120284A (ja) * | 2017-01-23 | 2018-08-02 | 株式会社xenodata lab. | 決算分析システムおよび決算分析プログラム |
JP2018120567A (ja) * | 2017-01-23 | 2018-08-02 | 株式会社xenodata lab. | 決算分析システムおよび決算分析プログラム |
JP2020129232A (ja) * | 2019-02-07 | 2020-08-27 | 株式会社日本総合研究所 | 機械学習装置、プログラム及び機械学習方法 |
JP7280705B2 (ja) | 2019-02-07 | 2023-05-24 | 株式会社日本総合研究所 | 機械学習装置、プログラム及び機械学習方法 |
Also Published As
Publication number | Publication date |
---|---|
US20120284305A1 (en) | 2012-11-08 |
JPWO2011090036A1 (ja) | 2013-05-23 |
JP5786718B2 (ja) | 2015-09-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5786718B2 (ja) | 動向情報検索装置、動向情報検索方法およびプログラム | |
CN107111614B (zh) | 使用统计流数据进行不同语言之间的机器翻译 | |
US8849789B2 (en) | System and method for searching for documents | |
US8082247B2 (en) | Best-bet recommendations | |
US11195050B2 (en) | Machine learning to generate and evaluate visualizations | |
EP2289007B1 (en) | Search results ranking using editing distance and document information | |
US8117177B2 (en) | Apparatus and method for searching information based on character strings in documents | |
EP1522933B1 (en) | Computer aided query to task mapping | |
CN108460082B (zh) | 一种推荐方法及装置,电子设备 | |
US20130066887A1 (en) | Determining relevant information for domains of interest | |
US20070198459A1 (en) | System and method for online information analysis | |
US20110213761A1 (en) | Searchable web site discovery and recommendation | |
US20060288038A1 (en) | Generation of a blended classification model | |
JP5329540B2 (ja) | ユーザ中心の情報探索方法、コンピュータ読み取り可能な記録媒体およびユーザ中心の情報探索システム | |
CN102722498A (zh) | 搜索引擎及其实现方法 | |
JP4896132B2 (ja) | 情報価値を反映した情報検索方法及びその装置 | |
CN102737021A (zh) | 搜索引擎及其实现方法 | |
TWI461942B (zh) | An ad management apparatus, an advertisement selecting apparatus, an advertisement management method, an advertisement management program, and a recording medium on which an advertisement management program is recorded | |
US20100169316A1 (en) | Search query concept based recommendations | |
US9552415B2 (en) | Category classification processing device and method | |
KR102107474B1 (ko) | 크롤링을 통한 사회이슈 도출 시스템 및 그 도출 방법 | |
JP5048852B2 (ja) | 検索装置、検索方法、検索プログラム、及びそのプログラムを記憶するコンピュータ読取可能な記録媒体 | |
JP2006268690A (ja) | Faq提示・改善方法、faq提示・改善装置およびfaq提示・改善プログラム | |
JP2010146366A (ja) | 情報提供サーバ | |
WO2021250950A1 (ja) | 文書検索の性能を評価する方法、システム、および装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11734642 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2011550913 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13574148 Country of ref document: US |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 11734642 Country of ref document: EP Kind code of ref document: A1 |