WO2021106141A1 - 情報処理装置、情報処理方法、及び情報処理プログラム - Google Patents
情報処理装置、情報処理方法、及び情報処理プログラム Download PDFInfo
- Publication number
- WO2021106141A1 WO2021106141A1 PCT/JP2019/046557 JP2019046557W WO2021106141A1 WO 2021106141 A1 WO2021106141 A1 WO 2021106141A1 JP 2019046557 W JP2019046557 W JP 2019046557W WO 2021106141 A1 WO2021106141 A1 WO 2021106141A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- search target
- query
- search
- information processing
- processing unit
- Prior art date
Links
- 230000010365 information processing Effects 0.000 title claims abstract description 61
- 238000003672 processing method Methods 0.000 title claims description 4
- 239000000284 extract Substances 0.000 claims abstract description 12
- 238000000034 method Methods 0.000 claims description 44
- 238000013528 artificial neural network Methods 0.000 claims description 8
- 238000010586 diagram Methods 0.000 description 8
- 238000013500 data storage Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 3
- 239000013598 vector Substances 0.000 description 3
- 238000012706 support-vector machine Methods 0.000 description 2
- 241000008357 Okapia johnstoni Species 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/151—Transformation
- G06F40/157—Transformation using dictionaries or tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3335—Syntactic pre-processing, e.g. stopword elimination, stemming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3338—Query expansion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Definitions
- the present invention relates to an information processing device, an information processing method, and an information processing program.
- Patent Document 1 a document concept search device has been proposed (see Patent Document 1).
- the document concept search device accepts correct answer information.
- the correct answer information is a set of a set of a search query and a set of correct answer documents that are search target documents that conceptually match the search query.
- correct answer information needs to be created in advance.
- correct answer information is created by a user's computer operation.
- the amount of data is increasing.
- the number of documents to be searched is increasing.
- the increase in the amount of data increases the burden on the user who creates the correct answer information.
- An object of the present invention is to reduce the burden on the user.
- the information processing device extracts a character string from the acquisition unit that acquires a plurality of search target documents and the first search target document among the plurality of search target documents, and creates a query based on the character string. Then, it has a processing unit that searches the search target of the query from the plurality of search target documents and creates correct answer data including one or more search target documents that are the search results and the query.
- the burden on the user can be reduced.
- Embodiment 1 It is a functional block diagram which the information processing apparatus of Embodiment 1 has. It is a figure which shows the structure of the hardware which the information processing apparatus of Embodiment 1 has. It is a flowchart which shows the example of the creation process of the correct answer data of Embodiment 1. It is a flowchart which shows the example of the learning process of Embodiment 1. It is a figure which shows the example of the learning model of Embodiment 1. FIG. It is a flowchart which shows the example of the update process of Embodiment 1. It is a functional block diagram which the information processing apparatus of Embodiment 2 has. It is a flowchart which shows the example of the creation process of the correct answer data of Embodiment 2. It is a functional block diagram which the information processing apparatus of Embodiment 3 has. It is a flowchart which shows the example of the creation process of the correct answer data of Embodiment 3.
- FIG. 1 is a functional block diagram of the information processing apparatus of the first embodiment.
- the information processing device 100 is a device that executes an information processing method.
- the information processing device 100 includes a storage unit 110, a processing unit 120, a learning processing unit 130, an acquisition unit 140, a search unit 150, an update processing unit 160, and an output unit 170.
- FIG. 2 is a diagram showing a hardware configuration of the information processing apparatus according to the first embodiment.
- the information processing device 100 includes a processor 101, a volatile storage device 102, and a non-volatile storage device 103.
- the processor 101 controls the entire information processing device 100.
- the processor 101 is a CPU (Central Processing Unit), an FPGA (Field Programmable Gate Array), or the like.
- the processor 101 may be a multiprocessor.
- the information processing apparatus 100 may be realized by a processing circuit, or may be realized by software, firmware, or a combination thereof.
- the processing circuit may be a single circuit or a composite circuit.
- the volatile storage device 102 is the main storage device of the information processing device 100.
- the volatile storage device 102 is a RAM (Random Access Memory).
- the non-volatile storage device 103 is an auxiliary storage device of the information processing device 100.
- the non-volatile storage device 103 is an HDD (Hard Disk Drive) or an SSD (Solid State Drive).
- the input device 11 and the display device 12 are connected to the information processing device 100.
- the input device 11 is a mouse, a keyboard, or the like.
- the display device 12 is a display.
- the storage unit 110 is realized as a storage area reserved in the volatile storage device 102 or the non-volatile storage device 103.
- a part or all of the processing unit 120, the learning processing unit 130, the acquisition unit 140, the search unit 150, the update processing unit 160, and the output unit 170 may be realized by the processor 101.
- a part or all of the processing unit 120, the learning processing unit 130, the acquisition unit 140, the search unit 150, the update processing unit 160, and the output unit 170 may be realized as modules of a program executed by the processor 101.
- the program executed by the processor 101 is also called an information processing program.
- an information processing program is recorded on a recording medium.
- the storage unit 110 includes a search target document group 111, a correct answer data storage unit 112, and a learning model storage unit 113.
- the search target document group 111 is a plurality of search target documents.
- the correct answer data storage unit 112 stores the correct answer data created by the processing unit 120.
- the correct answer data may be considered as correct answer information.
- the information stored in the learning model storage unit 113 will be described later.
- the information stored in the storage unit 110 may be stored in the external device.
- the external device is a cloud server.
- the acquisition unit 140 acquires a plurality of search target documents (that is, search target document group 111). For example, the acquisition unit 140 acquires a plurality of search target documents from the storage unit 110. Further, for example, the acquisition unit 140 acquires a plurality of search target documents from an external device.
- the processing unit 120 extracts a character string from one of the plurality of search target documents.
- One search target document is also referred to as a first search target document.
- the processing unit 120 creates a query based on the character string.
- the processing unit 120 uses the query to search the search target of the query from the plurality of search target documents.
- the processing unit 120 creates correct answer data including one or more search target documents and queries that are the result of the search.
- the processing unit 120 creates correct answer data including one or more search target documents and queries and numbers corresponding to one or more search target documents, which are the search results.
- the number may be expressed as a ranking.
- the learning processing unit 130 the acquisition unit 140, the search unit 150, the update processing unit 160, and the output unit 170 will be described later.
- FIG. 3 is a flowchart showing an example of the process of creating correct answer data according to the first embodiment.
- the process of FIG. 3 is started by a user input operation. Further, for example, the process of FIG. 3 starts at a preset time.
- the processing unit 120 selects one search target document from the search target document group 111.
- the selected search target document may be considered as the first search target document.
- Step S12 The processing unit 120 extracts a character string from the selected search target document. For example, the processing unit 120 extracts a sentence or word in the selected search target document as a character string. Further, for example, the processing unit 120 extracts a character string from the selected search target documents based on the rule that the word-separation is performed with a preset character string length. (Step S13) The processing unit 120 creates a query based on the character string.
- Step S14 The processing unit 120 searches the search target of the query from the search target document group 111 by using the query.
- Search methods include keyword search, text search based on the importance of words in TF-IDF or Okapi BM25, and similarity search using the similarity between the query character string and the character string in the search target document.
- the degree of similarity is the difference in character length, the editing distance, the degree of duplication of morphologically analyzed word strings, the degree of duplication of dependency-analyzed phrase units, the degree of duplication of dependency relationships, and the method described in Non-Patent Document 1. It may be calculated by using the Euclidean distance of the multidimensional vector by, the distance between the vectors of the cosine similarity, and the like. In addition, the similarity may be calculated using a machine learning model. Further, the search target may be a plurality of search target documents, which are a plurality of documents in a state in which the extracted character string is deleted.
- the processing unit 120 creates correct answer data including one or more search target documents and queries that are search results, and a ranking corresponding to one or more search target documents.
- the ranking may be the importance or the similarity. Further, the ranking may be the searched order. Further, the selected search target document may be the first in the ranking.
- the processing unit 120 stores the correct answer data in the correct answer data storage unit 112.
- Step S17 The processing unit 120 determines whether or not all the search target documents of the search target document group 111 have been selected. When all the documents to be searched are selected, the process ends. If there is a search target document that has not been selected in the search target document group 111, the processing unit 120 advances the process to step S11.
- FIG. 4 is a flowchart showing an example of the learning process of the first embodiment.
- the process of FIG. 4 is started after the process of creating the correct answer data is completed.
- the learning processing unit 130 executes a learning process of calculating the weights used in the neural network of the learning model using the correct answer data.
- This sentence may be expressed as follows.
- the learning processing unit 130 executes a learning process of calculating the weights of the nodes included in the neural network of the learning model using the correct answer data.
- the learning processing unit 130 executes a learning process of changing the weights of the nodes included in the neural network of the learning model using the correct answer data.
- the learning algorithm described in Non-Patent Document 2 or a learning algorithm such as SVM (Support Vector Machine) or a decision tree may be used.
- the learning process will be explained concretely.
- a learning model is used. For example, in the learning model, a query of correct answer data and two search target documents are input. Then, information indicating which of the two search target documents is the higher search result is output.
- the query for correct answer data is query Q.
- the query Q is associated with the search target documents A, B, and C included in the correct answer data.
- the search target document A is ranked first.
- the search target document B is ranked second.
- the search target document C is ranked 3rd.
- the learning model is shown.
- FIG. 5 is a diagram showing an example of the learning model of the first embodiment.
- FIG. 5 shows two neural networks (NN: Neural Network).
- the two neural networks are expressed as NN1 and NN2.
- the learning data is a combination of the query Q and the search target document A, and a combination of the query Q and the search target document B.
- a combination of the query Q and the search target document A is input to NN1.
- a combination of query Q and search target document B is input to NN2.
- the learning data is referred to as learning data 1.
- the learning data is a combination of the query Q and the search target document C, and a combination of the query Q and the search target document B.
- a combination of query Q and search target document C is input to NN1.
- a combination of query Q and search target document B is input to NN2.
- the learning data is referred to as learning data 2.
- score 1 and score 2 are compared.
- the difference between the score 1 and the score 2 is calculated using the equation (1).
- the result of the calculation is called a difference score. Further, for example, it is decided to subtract the score 2 from the score 1.
- the difference score is input to the sigmoid function.
- the sigmoid function is defined by Eq. (2).
- the learning processing unit 130 uses the error backpropagation method (backpropagation) to minimize the error between the above expectation and the determination result, so that the weight of the node included in NN1 and the weight of the node included in NN2 are minimized. And are calculated.
- the learning processing unit 130 stores the learning model after learning in the learning model storage unit 113. Further, the learning processing unit 130 may store the weights of the nodes included in the NN1 and the weights of the nodes included in the NN2 in the learning model storage unit 113.
- FIG. 6 is a flowchart showing an example of the update process of the first embodiment.
- the acquisition unit 140 acquires a new query input to the information processing device 100.
- the new query is also referred to as the first query.
- the search unit 150 uses the new query to search the search target of the new query from the search target document group 111.
- the search method is a keyword search.
- the search unit 150 calculates the score by using the keywords included in the new query and each search target document of the search target document group 111. For example, a search target document containing many keywords included in a new query has a high score.
- the search unit 150 ranks the documents to be searched based on the score. In this way, in the new query, one or more searched documents to be searched and the ranking are associated with each other.
- Step S33 The update processing unit 160 selects the top N search target documents from the one or more search target documents that are the results of the search by the search unit 150 and are associated with the ranking.
- N is an integer of 1 or more and is a predetermined number. In this way, the update processing unit 160 selects a predetermined number of high-ranking search target documents.
- the update processing unit 160 calculates a score 1 using a new query, the top N search target documents, and NN1 that uses weights. In other words, the update processing unit 160 calculates the score 1 by using the new query, the top N search target documents, and the weighted NN1. For example, the update processing unit 160 inputs a new query and a search target document of one of the top N search target documents into NN1. As a result, the score 1 is calculated. The update processing unit 160 updates the calculated score 1 as a new ranking. Similarly, the update processing unit 160 calculates a score 1 for each of the N search target documents and updates the ranking. In this way, the update processing unit 160 updates the rank of each of the N search target documents to a new rank.
- the update processing unit 160 may use the average value of the original rank of the search target document and the score 1 as a new rank.
- NN1 the case where NN1 is used is shown.
- NN1 and NN2 are equivalent models. Therefore, NN2 may be used.
- Step S34 The output unit 170 outputs a new combination of ranks. For example, the output unit 170 outputs a combination of a new query, N search target documents, and an updated new ranking. Further, for example, the output unit 170 outputs the combination to the display device 12. As a result, N search target documents are displayed on the display device 12 in a ranking format.
- the user looks at the display device 12.
- the user can select a search target document that conceptually matches the new query from the N search target documents.
- the user selects a search target document that conceptually matches the new query, the user performs a selection operation on the information processing device 100.
- Step S35 The acquisition unit 140 determines whether or not the search target document conceptually matching the new query has been acquired by the selection operation. In other words, the acquisition unit 140 determines whether or not the selection operation has been performed. When the selection operation is performed, the acquisition unit 140 advances the process to step S36. If the selection operation has not been performed, the acquisition unit 140 ends the process.
- Step S36 The acquisition unit 140 stores the combination of the new query and the search target document conceptually matching the new query as correct answer data in the correct answer data storage unit 112.
- the information processing device 100 creates correct answer data. Therefore, the user does not have to create the correct answer data. Therefore, the information processing device 100 can reduce the burden on the user.
- the information processing apparatus 100 uses NN1 to update the order of the search target documents.
- the information processing apparatus 100 can provide the user with a ranking of the search target documents, which cannot be determined only from the search results of the search unit 150.
- the information processing apparatus 100 updates the ranking of N documents among the search target documents searched by the search unit 150.
- the information processing device 100 does not update the ranking of all the search target documents searched by the search unit 150. In this way, the information processing device 100 can reduce the processing load of the information processing device 100 by narrowing down the number of documents to be searched.
- Embodiment 2 Next, the second embodiment will be described. In the second embodiment, matters different from the first embodiment will be mainly described. Then, in the second embodiment, the description of the matters common to the first embodiment will be omitted. In the description of the second embodiment, FIGS. 1 to 6 are referred to.
- FIG. 7 is a functional block diagram of the information processing apparatus according to the second embodiment.
- the configuration of FIG. 7, which is the same as the configuration shown in FIG. 1, has the same reference numerals as those shown in FIG.
- the information processing device 100a has a processing unit 120a.
- the processing unit 120a will be described later.
- FIG. 8 is a flowchart showing an example of the process of creating correct answer data according to the second embodiment.
- step S12 is not executed.
- the process of FIG. 8 is different from the process of FIG. 3 in that step S13a is executed. Therefore, in FIG. 8, step S13a will be described.
- the same number as the step number in FIG. 3 is assigned, and the description of the process will be omitted. Further, each step in FIG. 8 is executed by the processing unit 120a.
- Step S13a The processing unit 120a creates a query based on the summary sentence of the search target document selected in step S11. Specifically, the processing unit 120a creates a summary sentence as a query. Further, the processing unit 120a may extract a character string from the summary sentence and create a query based on the character string.
- the summary sentence is stored in the storage unit 110 or an external device in advance. The summary sentence is acquired by the acquisition unit 140.
- the abstract may be prepared by the method described in Non-Patent Document 3.
- the information processing device 100a creates correct answer data. Therefore, the user does not have to create the correct answer data. Therefore, the information processing device 100a can reduce the burden on the user.
- Embodiment 3 Next, the third embodiment will be described. In the third embodiment, matters different from the first embodiment will be mainly described. Then, in the third embodiment, the description of the matters common to the first embodiment will be omitted. In the description of the third embodiment, FIGS. 1 to 6 are referred to.
- FIG. 9 is a functional block diagram of the information processing apparatus according to the third embodiment.
- the configuration of FIG. 9, which is the same as the configuration shown in FIG. 1, has the same reference numerals as those shown in FIG.
- the information processing device 100b has a processing unit 120b.
- the processing unit 120b will be described later.
- FIG. 10 is a flowchart showing an example of the process of creating correct answer data according to the third embodiment.
- step S12 is not executed.
- the process of FIG. 10 is different from the process of FIG. 3 in that step S13b is executed. Therefore, in FIG. 10, step S13b will be described.
- the same number as the step number in FIG. 3 is assigned, and the description of the process will be omitted. Further, each step in FIG. 10 is executed by the processing unit 120b.
- Step S13b The processing unit 120b creates a query based on the paraphrase sentence of the search target document selected in step S11. Specifically, the processing unit 120b creates a paraphrase statement as a query. Further, the processing unit 120a may extract a character string from the paraphrase sentence and create a query based on the character string.
- processing unit 120b may create a query based on the paraphrase sentence of the summary sentence of the search target document selected in step S11. Further, the processing unit 120b may extract a character string from the paraphrase sentence of the summary sentence and create a query based on the character string.
- the paraphrase sentence of the search target document or the paraphrase sentence of the summary sentence of the search target document is stored in the storage unit 110 or an external device in advance.
- the paraphrase sentence of the search target document or the paraphrase sentence of the summary sentence of the search target document is acquired by the acquisition unit 140.
- the paraphrase sentence may be created by a method of word replacement using a synonym dictionary. Further, the paraphrase sentence may be created by the method described in Non-Patent Document 4.
- the information processing device 100b creates correct answer data. Therefore, the user does not have to create the correct answer data. Therefore, the information processing device 100b can reduce the burden on the user.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
Abstract
Description
検索方法の1つとして、概念検索が知られている。ここで、文書概念検索装置が提案されている(特許文献1を参照)。例えば、文書概念検索装置は、正解情報を受け付ける。正解情報は、検索クエリと、検索クエリに概念的に適合する検索対象文書である正解文書の集合との組の集合である。
図1は、実施の形態1の情報処理装置が有する機能ブロック図である。情報処理装置100は、情報処理方法を実行する装置である。情報処理装置100は、記憶部110、処理部120、学習処理部130、取得部140、検索部150、更新処理部160、及び出力部170を有する。
図2は、実施の形態1の情報処理装置が有するハードウェアの構成を示す図である。情報処理装置100は、プロセッサ101、揮発性記憶装置102、及び不揮発性記憶装置103を有する。
また、情報処理装置100には、入力装置11と表示装置12が接続される。例えば、入力装置11は、マウス、キーボードなどである。例えば、表示装置12は、ディスプレイである。
記憶部110は、揮発性記憶装置102又は不揮発性記憶装置103に確保した記憶領域として実現される。
ここで、記憶部110に格納されている情報は、外部装置に格納されてもよい。例えば、外部装置は、クラウドサーバである。
図3は、実施の形態1の正解データの作成処理の例を示すフローチャートである。例えば、図3の処理は、ユーザの入力操作によって開始する。また、例えば、図3の処理は、予め設定された時刻に開始する。
(ステップS11)処理部120は、検索対象文書群111の中から1つの検索対象文書を選択する。例えば、選択された検索対象文書は、第1の検索対象文書と考えてもよい。
(ステップS13)処理部120は、文字列に基づいて、クエリを作成する。
また、検索対象は、抽出された文字列が削除された状態の複数の文書である複数の検索対象文書でもよい。
(ステップS16)処理部120は、正解データを正解データ記憶部112に格納する。
(ステップS21)学習処理部130は、正解データを用いて、学習モデルのニューラルネットワークで使用される重みを算出する学習処理を実行する。この文章は、次のように表現してもよい。学習処理部130は、正解データを用いて、学習モデルのニューラルネットワークに含まれるノードの重みを算出する学習処理を実行する。または、学習処理部130は、正解データを用いて、学習モデルのニューラルネットワークに含まれるノードの重みを変更する学習処理を実行する。
学習処理には、非特許文献2に記載の学習アルゴリズム、又はSVM(Support Vector Machine)、決定木などの学習アルゴリズムが用いられてもよい。
例えば、学習データは、クエリQと検索対象文書Aの組合せと、クエリQと検索対象文書Bの組合せである。NN1には、クエリQと検索対象文書Aの組合せが入力される。NN2には、クエリQと検索対象文書Bの組合せが入力される。当該学習データは、学習データ1と呼ぶ。
ここで、学習データ1の場合、検索対象文書Aが検索対象文書Bよりも上位であることが期待される。学習データ2の場合、検索対象文書Bが検索対象文書Cよりも上位であることが期待される。
学習処理部130は、誤差逆伝播法(バックプロパゲーション)を用いて、上記の期待と判定結果との誤差を最小化するように、NN1に含まれるノードの重みとNN2に含まれるノードの重みとを算出する。
(ステップS31)取得部140は、情報処理装置100に入力された新規クエリを取得する。また、新規クエリは、第1のクエリとも言う。
(ステップS32)検索部150は、新規クエリを用いて、検索対象文書群111の中から新規クエリの検索対象を検索する。例えば、検索方法は、キーワード検索である。
このように、新規クエリには、検索された1以上の検索対象文書と順位とが対応付けられる。
なお、上記では、NN1を用いる場合を示した。NN1とNN2は、等価なモデルである。そのため、NN2が用いられてもよい。
選択操作が行われた場合、取得部140は、処理をステップS36に進める。選択操作が行われていない場合、取得部140は、処理を終了する。
次に、実施の形態2を説明する。実施の形態2では、実施の形態1と相違する事項を主に説明する。そして、実施の形態2では、実施の形態1と共通する事項の説明を省略する。実施の形態2の説明では、図1~6を参照する。
情報処理装置100aは、処理部120aを有する。処理部120aについては、後で説明する。
要約文は、予め記憶部110又は外部装置に格納されている。要約文は、取得部140によって、取得される。なお、要約文は、非特許文献3に記載の方法で作成されてもよい。
次に、実施の形態3を説明する。実施の形態3では、実施の形態1と相違する事項を主に説明する。そして、実施の形態3では、実施の形態1と共通する事項の説明を省略する。実施の形態3の説明では、図1~6を参照する。
情報処理装置100bは、処理部120bを有する。処理部120bについては、後で説明する。
Claims (10)
- 複数の検索対象文書を取得する取得部と、
前記複数の検索対象文書のうちの第1の検索対象文書の中から文字列を抽出し、前記文字列に基づいてクエリを作成し、前記複数の検索対象文書の中から前記クエリの検索対象を検索し、検索の結果である1以上の検索対象文書と前記クエリとを含む正解データを作成する処理部と、
を有する情報処理装置。 - 前記取得部は、前記第1の検索対象文書の要約文を取得し、
前記処理部は、前記要約文に基づいてクエリを作成する、
請求項1に記載の情報処理装置。 - 前記処理部は、前記要約文の中から文字列を抽出し、抽出された文字列に基づいてクエリを作成する、
請求項2に記載の情報処理装置。 - 前記取得部は、前記要約文の言い換え文を取得し、
前記処理部は、前記言い換え文に基づいてクエリを作成する、
請求項2に記載の情報処理装置。 - 前記処理部は、前記言い換え文の中から文字列を抽出し、抽出された文字列に基づいてクエリを作成する、
請求項4に記載の情報処理装置。 - 前記取得部は、前記第1の検索対象文書の言い換え文を取得し、
前記処理部は、前記言い換え文に基づいてクエリを作成する、
請求項1に記載の情報処理装置。 - 前記処理部は、前記言い換え文の中から文字列を抽出し、抽出された文字列に基づいてクエリを作成する、
請求項6に記載の情報処理装置。 - 学習処理部と、
検索部と、
更新処理部と、
出力部と、
をさらに有し、
前記処理部は、検索の結果である1以上の検索対象文書と前記クエリと前記1以上の検索対象文書に対応する番号を含む正解データを作成し、
前記学習処理部は、前記処理部が作成した正解データを用いて、学習モデルのニューラルネットワークで使用される重みを算出する学習処理を実行し、
前記取得部は、第1のクエリを取得し、
前記検索部は、前記複数の検索対象文書の中から前記第1のクエリの検索対象を検索し、
前記更新処理部は、前記検索部による検索の結果であり、順位が対応付けられている1以上の検索対象文書のうち、上位の予め決められた件数の検索対象文書を選択し、前記第1のクエリと選択された1以上の検索対象文書と前記重みを用いる前記ニューラルネットワークとを用いて、選択された1以上の検索対象文書の順位を更新し、
前記出力部は、選択された1以上の検索対象文書と更新された順位とを出力する、
請求項1から7のいずれか1項に記載の情報処理装置。 - 情報処理装置が、
複数の検索対象文書を取得し、
前記複数の検索対象文書のうちの第1の検索対象文書の中から文字列を抽出し、
前記文字列に基づいてクエリを作成し、
前記複数の検索対象文書の中から前記クエリの検索対象を検索し、
検索の結果である1以上の検索対象文書と前記クエリとを含む正解データを作成する、
情報処理方法。 - 情報処理装置に、
複数の検索対象文書を取得し、
前記複数の検索対象文書のうちの第1の検索対象文書の中から文字列を抽出し、
前記文字列に基づいてクエリを作成し、
前記複数の検索対象文書の中から前記クエリの検索対象を検索し、
検索の結果である1以上の検索対象文書と前記クエリとを含む正解データを作成する、
処理を実行させる情報処理プログラム。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020227016332A KR102452777B1 (ko) | 2019-11-28 | 2019-11-28 | 정보 처리 장치, 정보 처리 방법, 및 기록 매체 |
DE112019007834.8T DE112019007834T5 (de) | 2019-11-28 | 2019-11-28 | Informationsverarbeitungsvorrichtung, informationsverarbeitungsverfahren und informationsverarbeitungsprogramm |
CN201980102347.8A CN114730318A (zh) | 2019-11-28 | 2019-11-28 | 信息处理装置、信息处理方法以及信息处理程序 |
JP2020529656A JP6840293B1 (ja) | 2019-11-28 | 2019-11-28 | 情報処理装置、情報処理方法、及び情報処理プログラム |
PCT/JP2019/046557 WO2021106141A1 (ja) | 2019-11-28 | 2019-11-28 | 情報処理装置、情報処理方法、及び情報処理プログラム |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2019/046557 WO2021106141A1 (ja) | 2019-11-28 | 2019-11-28 | 情報処理装置、情報処理方法、及び情報処理プログラム |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021106141A1 true WO2021106141A1 (ja) | 2021-06-03 |
Family
ID=74845349
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2019/046557 WO2021106141A1 (ja) | 2019-11-28 | 2019-11-28 | 情報処理装置、情報処理方法、及び情報処理プログラム |
Country Status (5)
Country | Link |
---|---|
JP (1) | JP6840293B1 (ja) |
KR (1) | KR102452777B1 (ja) |
CN (1) | CN114730318A (ja) |
DE (1) | DE112019007834T5 (ja) |
WO (1) | WO2021106141A1 (ja) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007011891A (ja) * | 2005-07-01 | 2007-01-18 | Nippon Telegr & Teleph Corp <Ntt> | 情報検索方法及び装置及びプログラム及びプログラムを格納した記憶媒体 |
JP2019125124A (ja) * | 2018-01-16 | 2019-07-25 | ヤフー株式会社 | 抽出装置、抽出方法、及び抽出プログラム |
JP2019200449A (ja) * | 2018-05-14 | 2019-11-21 | 株式会社日立製作所 | 案件振分支援システム、案件振分支援装置、及び案件振分支援方法 |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4711761B2 (ja) * | 2005-07-08 | 2011-06-29 | 株式会社ジャストシステム | データ検索装置、データ検索方法、データ検索プログラムおよびコンピュータに読み取り可能な記録媒体 |
US9348912B2 (en) | 2007-10-18 | 2016-05-24 | Microsoft Technology Licensing, Llc | Document length as a static relevance feature for ranking search results |
US8812493B2 (en) * | 2008-04-11 | 2014-08-19 | Microsoft Corporation | Search results ranking using editing distance and document information |
KR101649146B1 (ko) * | 2015-01-15 | 2016-08-19 | 주식회사 카카오 | 검색 방법 및 검색 서버 |
US11675795B2 (en) * | 2015-05-15 | 2023-06-13 | Yahoo Assets Llc | Method and system for ranking search content |
WO2016187705A1 (en) * | 2015-05-22 | 2016-12-01 | Coveo Solutions Inc. | System and method for ranking search results |
JP6495206B2 (ja) | 2016-07-13 | 2019-04-03 | 日本電信電話株式会社 | 文書概念ベース生成装置、文書概念検索装置、方法、及びプログラム |
US20180232434A1 (en) * | 2017-02-16 | 2018-08-16 | Microsoft Technology Licensing, Llc | Proactive and retrospective joint weight attribution in a streaming environment |
US10832131B2 (en) | 2017-07-25 | 2020-11-10 | Microsoft Technology Licensing, Llc | Semantic similarity for machine learned job posting result ranking model |
KR102088435B1 (ko) * | 2017-09-29 | 2020-03-12 | 인하대학교 산학협력단 | 검색 결과 다양성 인덱스 기반의 효율적 검색 장치 및 그 방법 |
JP6985181B2 (ja) * | 2018-02-28 | 2021-12-22 | ヤフー株式会社 | 情報処理装置、情報処理方法、およびプログラム |
-
2019
- 2019-11-28 CN CN201980102347.8A patent/CN114730318A/zh active Pending
- 2019-11-28 WO PCT/JP2019/046557 patent/WO2021106141A1/ja active Application Filing
- 2019-11-28 DE DE112019007834.8T patent/DE112019007834T5/de active Pending
- 2019-11-28 JP JP2020529656A patent/JP6840293B1/ja active Active
- 2019-11-28 KR KR1020227016332A patent/KR102452777B1/ko active IP Right Grant
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007011891A (ja) * | 2005-07-01 | 2007-01-18 | Nippon Telegr & Teleph Corp <Ntt> | 情報検索方法及び装置及びプログラム及びプログラムを格納した記憶媒体 |
JP2019125124A (ja) * | 2018-01-16 | 2019-07-25 | ヤフー株式会社 | 抽出装置、抽出方法、及び抽出プログラム |
JP2019200449A (ja) * | 2018-05-14 | 2019-11-21 | 株式会社日立製作所 | 案件振分支援システム、案件振分支援装置、及び案件振分支援方法 |
Also Published As
Publication number | Publication date |
---|---|
JP6840293B1 (ja) | 2021-03-10 |
DE112019007834T5 (de) | 2022-07-14 |
JPWO2021106141A1 (ja) | 2021-12-09 |
CN114730318A (zh) | 2022-07-08 |
KR20220073850A (ko) | 2022-06-03 |
KR102452777B1 (ko) | 2022-10-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lin et al. | Choosing transfer languages for cross-lingual learning | |
Zhai et al. | Online latent Dirichlet allocation with infinite vocabulary | |
US8918348B2 (en) | Web-scale entity relationship extraction | |
US8499008B2 (en) | Mixing knowledge sources with auto learning for improved entity extraction | |
Vijayanarasimhan et al. | Deep networks with large output spaces | |
US20140229476A1 (en) | System for Information Discovery & Organization | |
US8812504B2 (en) | Keyword presentation apparatus and method | |
Yang et al. | xMoCo: Cross momentum contrastive learning for open-domain question answering | |
US20110022598A1 (en) | Mixing knowledge sources for improved entity extraction | |
WO2016015267A1 (en) | Rank aggregation based on markov model | |
CN115374362A (zh) | 多路召回模型训练方法、多路召回方法、装置及电子设备 | |
US20220222442A1 (en) | Parameter learning apparatus, parameter learning method, and computer readable recording medium | |
Zhang et al. | Semantic table retrieval using keyword and table queries | |
US9286289B2 (en) | Ordering a lexicon network for automatic disambiguation | |
Wang et al. | Reproducibility, Replicability, and Insights into Dense Multi-Representation Retrieval Models: from ColBERT to Col | |
González et al. | ELiRF-UPV at SemEval-2019 task 3: Snapshot ensemble of hierarchical convolutional neural networks for contextual emotion detection | |
JP6840293B1 (ja) | 情報処理装置、情報処理方法、及び情報処理プログラム | |
Xie et al. | Joint entity linking for web tables with hybrid semantic matching | |
Zhai et al. | Online topic models with infinite vocabulary | |
CN114328820A (zh) | 信息搜索方法以及相关设备 | |
Tamang et al. | Adding smarter systems instead of human annotators: re-ranking for system combination | |
JP2010009237A (ja) | 多言語間類似文書検索装置及び方法及びプログラム及びコンピュータ読取可能な記録媒体 | |
Zheng et al. | An improved focused crawler based on text keyword extraction | |
JP4314271B2 (ja) | 単語間関連度算出装置、単語間関連度算出方法及び単語間関連度算出プログラム並びにそのプログラムを記録した記録媒体 | |
Tepper et al. | LeanVec: Search your vectors faster by making them fit |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2020529656 Country of ref document: JP Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19954597 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 20227016332 Country of ref document: KR Kind code of ref document: A |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19954597 Country of ref document: EP Kind code of ref document: A1 |