WO2022123695A1 - Learning device, search device, learning method, search method, and program - Google Patents

Learning device, search device, learning method, search method, and program Download PDF

Info

Publication number
WO2022123695A1
WO2022123695A1 PCT/JP2020/045898 JP2020045898W WO2022123695A1 WO 2022123695 A1 WO2022123695 A1 WO 2022123695A1 JP 2020045898 W JP2020045898 W JP 2020045898W WO 2022123695 A1 WO2022123695 A1 WO 2022123695A1
Authority
WO
WIPO (PCT)
Prior art keywords
sparse
learning
vector
feature
search
Prior art date
Application number
PCT/JP2020/045898
Other languages
French (fr)
Japanese (ja)
Inventor
拓 長谷川
京介 西田
宗一郎 加来
準二 富田
仙 吉田
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to PCT/JP2020/045898 priority Critical patent/WO2022123695A1/en
Publication of WO2022123695A1 publication Critical patent/WO2022123695A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Definitions

  • the present invention relates to a learning device, a search device, a learning method, a search method, and a program.
  • Document search requires high-speed retrieval of documents related to search queries from a large number of documents.
  • a search query is performed using this inverted index.
  • a technique for searching a document using the words contained in is known.
  • Non-Patent Document 1 a technique that can perform document retrieval even if words do not exactly match.
  • a technique that regards the vector obtained by the neural network as a potential word vector creates an inverted index, and performs document retrieval. It is known (for example, Non-Patent Document 1).
  • Non-Patent Document 1 in order to perform a high-speed search using an inverted index, a constraint term relating to the vector norm is added to the loss function at the time of training in a high dimension. It realized a sparse vector. For this reason, it is often difficult to explicitly control the sparsity of the obtained vector, and there is a possibility that the feature space is represented by a specific low-dimensional subspace.
  • One embodiment of the present invention has been made in view of the above points, and an object thereof is to acquire a vector that can be regarded as a pseudo sparse in a document search using an inverted index by a neural network.
  • the learning device includes a search query, a first document related to the search query, and a plurality of second documents not related to the search query.
  • the features of the plurality of first features and the features of the first document are represented, respectively, which represent the features of the plurality of search queries.
  • a feature amount generator that generates a plurality of second feature amount vectors and a plurality of third feature amount vectors representing the features of the second document, and model parameters of the second neural network.
  • the elements that take 0 in each dimension by normalization and average shift for each of the plurality of the first feature quantity vectors, the plurality of the second feature quantity vectors, and the plurality of the third feature quantity vectors.
  • a conversion unit that converts a plurality of first learning sparse feature quantities vectors, a plurality of second learning sparse feature quantities vectors, and a plurality of third learning sparse feature quantity vectors that have been sparsed by adjusting the ratio.
  • the first neural using the plurality of the first learning sparse feature quantity vectors, the plurality of the second learning sparse feature quantity vectors, and the plurality of the third learning sparse feature quantity vectors. It is characterized by having an update unit for updating the model parameters of the network and the model parameters of the second neural network.
  • a vector that can be regarded as a pseudo sparse can be acquired by a neural network.
  • a search device 10 for searching a document related to a search query from among the documents to be searched will be described using a vector obtained by a neural network and an inverted index. Further, the inverted index generation device 20 for generating (or creating) the inverted index and the learning device 30 for learning the neural network will also be described.
  • the search device 10, the inverted index generation device 20, and the learning device 30 are described as different devices, but two or more of these devices are realized by the same device. You may be.
  • the search device 10 and the inverted index generation device 20 may be realized by the same device
  • the inverted index generation device 20 and the learning device 30 may be realized by the same device
  • the learning device 30 and the search device may be realized.
  • 10 may be realized by the same device, or the search device 10, the inverted index generation device 20, and the learning device 30 may be realized by the same device.
  • the search target document set is ⁇ D 1 , ..., D m ⁇
  • the search device 10 inputs the search query Q
  • the sequence set of documents related to the search query Q ⁇ D 1 , ... ..., D k ⁇ and its relevance degree ⁇ S 1 , ..., Sk ⁇ shall be output.
  • m is the number of documents to be searched
  • k (where k ⁇ m) is the number of documents related to the search query Q.
  • FIG. 1 is a diagram showing an example of the overall configuration of the search device 10 according to the first embodiment.
  • the search device 10 has a context coding unit 101, a pseudo sparse coding unit 102, an inverted index utilization unit 103, and a ranking unit 104.
  • the context coding unit 101 and the pseudo-sparse coding unit 102 are realized by a neural network, and their parameters have been learned in advance.
  • the parameters of the neural network that realizes the context coding unit 101 and the pseudo sparse coding unit 102 are referred to as “model parameters”.
  • the trained model parameters are stored in an auxiliary storage device such as an HDD (Hard Disk Drive) or SSD (Solid State Drive), for example.
  • the context coding unit 101 takes the search query Q as an input and outputs the feature amount U of the search query Q using the trained model parameters.
  • BERT Bidirectional Encoder Representations from Transformers
  • BERT is a context-aware pre-learning model using Transformer, which takes text as input and outputs d-dimensional features. By converting this feature quantity with one layer of a fully coupled neural network, it demonstrates high performance in various tasks of natural language processing.
  • Reference 1 “J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding.
  • the CLS tag is added to the beginning of the search query Q and the SEP tag is added to the end of the sentence, and then input to the context coding unit 101.
  • BERT is an example, and another context-considered pre-learning model using Transformer may be used as the neural network that realizes the context coding unit 101. More generally, as the neural network that realizes the context coding unit 101, any neural network capable of encoding text may be used. However, by realizing the context coding unit 101 with a context-aware pre-learning model such as BERT, it is possible to obtain a feature quantity considering the entire context.
  • a context-aware pre-learning model such as BERT, it is possible to obtain a feature quantity considering the entire context.
  • the context coding unit 101 is realized by BERT, and the feature quantity U is a d-dimensional vector.
  • the pseudo-sparse coding unit 102 takes the feature amount U of the search query Q as an input and uses the trained model parameters to use the pseudo-sparse feature amount U'of the search query Q (that is, the search query Q which can be regarded as pseudo-sparse). Feature quantity U') is output.
  • the model of the fully connected layer described in Non-Patent Document 1 can be used. More specifically, several fully connected layers (for example, about 3 to 5 layers) are stacked so that the dimension number d'of the pseudo-sparse feature amount U'is larger than the dimension number d of the feature amount U. After using a general ignition function such as the ReLu function as the ignition function of the final layer of these fully connected layers, divide each output value of the ignition function by the L2 norm of the entire output value to move onto a hypersphere with a radius of 1. A projecting model can be used.
  • a general ignition function such as the ReLu function
  • Non-Patent Document 1 The model described in Non-Patent Document 1 is an example, and as a neural network for realizing the pseudo-sparse coding unit 102, the output dimension is higher than the input dimension and the output dimension is higher than the input dimension. Any model can be used as long as the final layer uses a general ignition function f: R ⁇ R that satisfies all of the following conditions 1-1 to 1-3.
  • Condition 1-1 f (x) ⁇ 0 for all x
  • Condition 1-3: a ⁇ R such that f (a) 0
  • the dimension number d'of the pseudo-sparse feature quantity U' is as high as possible.
  • the higher the number of dimensions d' the higher the expressive power of the pseudo-sparse feature quantity U', while the higher the calculation cost for calculating the pseudo-sparse feature quantity U'and the higher the learning cost for learning the model parameters.
  • d' the dimension number of the pseudo-sparse feature quantity U'
  • the amount of information and the allowable calculation cost of the document set to be searched may differ depending on the situation, and the dimension of the space where the codomain of the map by the neural network that realizes the dimension number d'and the pseudo-sparse coding unit 102 extends. It does not always match the number (that is, the rank of the representation matrix of the map). Therefore, the degree of the dimension number d'may differ, for example, depending on the amount of information possessed by the document set to be searched, the available computational resources, and the like.
  • the projection on the above-mentioned hypersphere is not essential and may not be necessary. However, it is preferable to perform this projection because it is highly expected to promote the learning of pseudo-sparse features.
  • the context coding unit 101 and the pseudo sparse coding unit 102 are expressed as different functional units, but this is for convenience, and the context coding unit 101 and the pseudo sparse coding unit 102 are represented. May be one functional unit.
  • the context coding unit 101 and the pseudo-sparse coding unit 102 may be collectively referred to as the coding unit 100.
  • the inverted index utilization unit 103 receives a pseudo-sparse feature quantity U'as an input, and obtains a subset ⁇ V'i
  • K is
  • k, and is a set of indexes (or document numbers, document IDs, etc.) of documents related to the search query Q.
  • the pseudo-sparse feature quantity V'i of the search target document is a d'dimensional vector obtained by inputting the search target document Di to the context coding unit 101 and the pseudo sparse coding unit 102.
  • V'i ( v'i1 , v'i2 , ..., v'id' ).
  • the index of a document is also referred to as a "document index”.
  • the inverted index is stored in an auxiliary storage device such as an HDD or SSD, for example.
  • v'ir v'ir ⁇ Wr ⁇ i ⁇ ⁇ 1, ..., m ⁇ is the information set as the value.
  • Wr is a set ⁇ v'1r , v'2r , ..., Which is a collection of elements of the r -th dimension of the pseudo-sparse features V'1 , V'2 , ...
  • t is a preset threshold value (however, 0 ⁇ t ⁇ 100), and may be a value different from the threshold value t pas or t que described later, or may be the same value.
  • the inverted index utilization unit 103 obtains a subset ⁇ V'i
  • i ⁇ K ⁇ of the sparse features of the document to be searched is also expressed as ⁇ V'1 , ..., V'k ⁇ .
  • the ranking unit 104 is a subset of the pseudo-sparse feature quantity U'of the search query Q and the pseudo-sparse feature quantity of the search target document ⁇ V'i
  • i ⁇ K ⁇ ⁇ V'1 , ..., V'k ⁇ .
  • i ⁇ K ⁇ is a set ordered in ascending or descending order of the degree of relevance S i .
  • i ⁇ K ⁇ are renumbered from the document index to ⁇ D 1 , ..., D k ⁇ and, respectively. It can be expressed as ⁇ S 1 , ..., Sk ⁇ .
  • the similarity function s for example, an inner product or the like can be used.
  • the similarity function s any function capable of measuring the similarity between vectors can be used.
  • V''i ( v''i1, v''i2 , ..., V'' ik )
  • V'i2 , ..., v'id' is converted to V''i .
  • FIG. 2 is a flowchart showing an example of the search process according to the first embodiment.
  • Step S101 First, the context coding unit 101 inputs the search query Q and outputs the feature amount U of the search query Q using the trained model parameters.
  • Step S102 Next, the pseudo-sparse coding unit 102 outputs the pseudo-sparse feature amount U'of the search query Q using the trained model parameters with the feature amount U obtained in the above step S101 as an input. ..
  • Step S103 Next, the inverted index utilization unit 103 uses the inverted index generated in advance with the pseudo-sparse feature amount U'obtained in the above step S102 as an input to obtain the pseudo-sparse feature amount of the search target document. Obtain a subset ⁇ V'i
  • Step S104 Then, the ranking unit 104 inputs ⁇ V'i
  • the search device 10 obtains an ordered set ⁇ D i
  • the search device 10 according to the present embodiment uses the pseudo-sparse feature amount U'of the search query Q and the translocation index generated in advance by the translocation index generation device 20 to reduce the document amount of the search target document. It is possible to obtain a related document and its relevance in consideration of the context of the search query Q and the entire search target document after satisfying the search speed required for the document search without depending on the order.
  • the inverted index generation device 20 inputs a set of documents to be searched ⁇ D 1 , ..., D m ⁇ and outputs an inverted index.
  • FIG. 3 is a diagram showing an example of the overall configuration of the inverted index generation device 20 according to the first embodiment.
  • the inverted index generation device 20 has a context coding unit 101, a pseudo sparse coding unit 102, and an inverted index generation unit 105.
  • the context coding unit 101 and the pseudo sparse coding unit 102 are realized by the same neural network as the context coding unit 101 and the pseudo sparse coding unit 102 described at the time of the above search, and their model parameters are realized. Is pre-learned.
  • the context coding unit 101 takes the search target document Di as an input and outputs the feature amount Vi of the search target document Di using the trained model parameters.
  • the pseudo-sparse coding unit 102 inputs the feature amount V i of the search target document Di and outputs the pseudo sparse feature amount V'i of the search target document Di using the trained model parameters.
  • the search speed at the time of document retrieval is determined according to the number of elements (that is, the number of values) of the set Cr of the values of the inverted index, and the number of elements can be adjusted by the value of the threshold value t . Therefore, if the calculation speed of the processor or the like is known, it is possible to adjust the search speed (in other words, the search amount) so as to satisfy the search time required for document retrieval by adjusting the value of t. Become.
  • FIG. 4 is a flowchart showing an example of the inverted index generation process according to the first embodiment.
  • the inverted index generation process is executed after the learning process described later is completed and before the search process described above is executed.
  • Step S201 First, the context coding unit 101 inputs the search target document Di and outputs the feature amount Vi of the search target document Di using the trained model parameters.
  • Step S202 Next, the pseudo-sparse coding unit 102 inputs the feature amount Vi of the search target document Di and uses the trained model parameters to generate the pseudo-sparse feature amount V'i of the search target document Di. Output.
  • the inverted index generation device 20 can generate an inverted index from the set of input search target documents ⁇ D 1 , ..., D m ⁇ .
  • the search device 10 satisfies the search speed required for the document search without depending on the order of the document amount of the search target document, and then performs the search query Q and the search query Q. It is possible to obtain a related document considering the context of the entire search target document and its degree of relevance (that is, a document related to the search query Q can be searched).
  • a training data set is a set of training data used for training (training) model parameters.
  • one document randomly extracted from the set of documents G i related to the search query Q i is D i + , and one document randomly extracted from the set G ⁇ G i of the documents not related to the search query Q i .
  • Di- and let (Qi, Di +, Di- ) be the training data (that is, let the data composed of the search query Qi , its positive example, and its negative example be the training data. ). Then, a set of these training data ⁇ (Q i , Di + , Di ⁇ )
  • i 1, ..., C ⁇ is used as a training data set.
  • FIG. 5 is a diagram showing an example of the overall configuration of the learning device 30 according to the first embodiment.
  • the learning device 30 has a context coding unit 101, a pseudo sparse coding unit 102, a ranking unit 104, a division unit 106, an update unit 107, and a determination unit 108. And have.
  • the context coding unit 101 and the pseudo-sparse coding unit 102 are realized by the same neural network as the context coding unit 101 and the pseudo-sparse coding unit 102 described in the above-mentioned search time and inverted index generation time. However, it is assumed that the model parameters have not been trained.
  • the division unit 106 takes the training data set as an input and randomly divides this training data set into a plurality of mini-batch. In this embodiment, it is assumed that the model parameters are repeatedly updated (learned) for each mini-batch.
  • the determination unit 108 determines whether or not the end condition for ending the repeated update of the model parameter is satisfied.
  • the number of times one training data is repeatedly learned is called an epoch, and the number of repetitions is called the number of epochs.
  • the context coding unit 101 takes the training data (Q i , Di + , Di ⁇ ) as an input, and uses the untrained model parameters to generate the training data (Q i , Di + , Di ⁇ ).
  • the feature quantity (U i , V i + , V i- ) is output. That is, the context coding unit 101 takes the search query Q i , the positive example Di + , and the negative example Di ⁇ as inputs, and outputs the respective feature quantities U i , V i + , and V i ⁇ .
  • the pseudo-sparse coding unit 102 takes the feature quantities (U i , V i + , V i- ) of the training data (Q i , Di + , Di ⁇ ) as inputs, and uses the model parameters that have not been trained. Obtain pseudo - sparse features ( U'i , V'i + , V'i- ) of training data (Qi, Di + , Di- ) . That is, the pseudo-sparse coding unit 102 takes the feature quantities U i , V i + , and V i ⁇ as inputs, and obtains the respective pseudo sparse feature quantities U'i , V'i + , and V'i ⁇ .
  • the pseudo-sparse coding unit 102 sets the pseudo-sparse feature quantities U'i , V'i + and V'i - for learning pseudo-sparse feature quantities U''i , V''i + and V''i , respectively. Convert to- .
  • W'1r is a set ⁇ u'1r , u'2r , ..., u'm ' ' that collects the elements of the r -th dimension of each pseudo-sparse feature quantity U'tr, i contained in Z tr 1 .
  • r ⁇ it is a subset of the top t que % in descending order of its value. This means that only the elements with large values are used for learning.
  • m'' is an arbitrary natural number satisfying m'' ⁇ c, and is, for example, the number of training data included in the mini-batch.
  • t que is a preset threshold value (however, 0 ⁇ t que ⁇ 100).
  • the conversion unit 102 converts the pseudo-sparse feature amount V' + i for learning pseudo-sparse feature amount V'' + i by the following equation (3).
  • W'2r is a set ⁇ v' + 1r , v' + 2r , ..., V'that collects the elements of the r -th dimension of each pseudo-sparse feature quantity V' + tr, i contained in Z tr 2 .
  • V' that collects the elements of the r -th dimension of each pseudo-sparse feature quantity V' + tr, i contained in Z tr 2 .
  • t pas is a preset threshold value (however, 0 ⁇ t pas ⁇ 100)
  • t que and t pas may have the same value or different values.
  • the conversion unit 102 converts the pseudo-sparse feature amount V' -i into the learning pseudo-sparse feature amount V''- i by the following equation (4).
  • W'3r is a set ⁇ v' - 1r , v' - 2r , ..., V'that collects the elements of the r - th dimension of each pseudo-sparse feature quantity V' - tr, i contained in Z tr 3 .
  • -For m''r ⁇ it is a subset of the top tpas % in descending order of the value.
  • each element of the above subset Z tr 1 is a pseudo-sparse feature obtained with the same model parameters, and can be realized by, for example, a set of pseudo-sparse features obtained in the same mini-batch. be.
  • Z tr 2 and Z tr 3 This also applies to Z tr 2 and Z tr 3 .
  • t que ⁇ (1 / m'') ⁇ 100 or t pas ⁇ (1 / m'') ⁇ 100 a subset in which the upper t que % is collected or the upper t pas % is collected.
  • the subset can be an empty set, it is necessary to have t que > (1 / m'') ⁇ 100 and t pas > (1 / m'') ⁇ 100 in order to avoid this.
  • the norm of the output value is 1 (that is, when the output value of the firing function of the final layer of the neural network that realizes the pseudo-sparse coding unit 102 is projected onto a hypersphere having a radius of 1), t. It is necessary to satisfy que > (2 / m'') ⁇ 100 and t pas > (2 / m'') ⁇ 100.
  • pseudo-sparse features having the same model parameters and having a magnitude satisfying the same, it can be obtained between the latest fixed learning steps only when the learning coefficient is not large.
  • the pseudo-sparse features obtained may be added to the subset.
  • the pseudo-sparse features obtained between the most recent fixed learning steps are added to the subset and the calculation graph cannot be saved in the memory, the pseudo-sparse features obtained in the past learning steps are in the upper t. It may only be used to calculate que % (or tpas %).
  • the training data set is divided into mini-batch units and model parameters are repeatedly learned for each mini-batch (that is, mini-batch learning) will be described, but it is not always necessary to use mini-batch learning, and online learning is performed.
  • the model parameters may be learned by any other learning method such as batch learning or batch learning. However, as described above, since a subset of pseudo-sparse features is important, it is preferable to learn model parameters by mini-batch learning.
  • the ranking unit 104 inputs the pseudo-sparse feature quantities U''i , V''i +, and V''i-for learning, and the relevance degree S i + of the regular example D i + to the search query Q i and the search query. Outputs the degree of association S i - of the negative example D i - with respect to Q i .
  • - S ( U''i , V''i- ) is calculated.
  • the update unit 107 updates the model parameters by the supervised learning method with the relevance degrees S i + and S i ⁇ as inputs.
  • the error function of supervised learning the error function in ranking learning may be used.
  • FIG. 6 is a flowchart showing an example of the learning process according to the first embodiment. It is assumed that the model parameters are initialized with appropriate values.
  • Step S301 First, the division unit 106 takes the training data set as an input and randomly divides this training data set into a plurality of mini-batch.
  • Step S302 Next, the learning device 30 executes model parameter update processing for each mini-batch. As a result, the model parameters are updated by the model parameter update process. Details of the model parameter update process will be described later. This model parameter update process is also called a learning step.
  • Step S303 Then, the determination unit 108 determines whether or not the predetermined end condition is satisfied.
  • the learning device 30 ends the learning process when it is determined that the end condition is satisfied (YES in step S303), while the learning device 30 ends the learning process when it is determined that the end condition is not satisfied (NO in step S303). return.
  • steps S301 to S302 are repeatedly executed until a predetermined end condition is satisfied.
  • the predetermined end conditions are, for example, that the number of epochs is equal to or greater than the predetermined first threshold value and that the error function has converged (for example, the value of the error function is less than the predetermined second threshold value). That, the amount of change in the error function before and after updating the model parameters is less than the predetermined third threshold value, etc.).
  • FIG. 7 is a flowchart showing an example of the model parameter update process according to the first embodiment. In the following, a case where model parameters are updated using a certain mini-batch will be described.
  • Step S401 First, the context coding unit 101 takes the training data (Q i , Di + , Di ⁇ ) in the mini-batch as an input, and uses the untrained model parameters to use the training data (Q i ,). Outputs the features (U i , V i + , V i- ) of D i + , D i- ) .
  • Step S402 Next, the pseudo-sparse coding unit 102 has been trained by inputting the features (U i , Vi + , Vi ⁇ ) of the training data (Q i , Di + , Di ⁇ ) . Pseudo - sparse features ( U'i , V'i + , V'i- ) of the training data (Qi, Di + , Di- ) are obtained using model parameters that are not.
  • Step S403 Next, the pseudo-sparse coding unit 102 uses the pseudo-sparse features ( U'i , V'i + , V'i- ) for learning pseudo - sparse features ( U''i , V''i . It is converted to + , V''i- ) and the learning pseudosparse features ( U''i , V''i + , V''i- ) are output.
  • Step S404 Next, the ranking unit 104 inputs the learning pseudo-sparse features ( U''i , V''i + , V''i ⁇ ), and inputs the normal example D i + to the search query Q i .
  • the relevance S i + and the relevance S i - of the negative example Di - for the search query Q i are output.
  • the above steps S401 to S404 are repeatedly executed for all the training data ( Qi , Di + , Di ⁇ ) included in the mini-batch.
  • Step S405 Subsequently, the update unit 107 takes the respective relevance S i + and S i- obtained in the above step S404 as inputs, and inputs the value of the error function (for example, hinge loss) and the error function regarding the model parameter. Calculate with the gradient.
  • the gradient of the error function related to the model parameters may be calculated by, for example, the error back propagation method.
  • Step S406 Then, the update unit 107 updates the model parameters by an arbitrary optimization method using the value of the error function calculated in step S405 above and its gradient.
  • the learning device 30 can learn the model parameters of the neural network that realizes the context coding unit 101 and the pseudo-sparse coding unit 102 by using the input training data set.
  • the learning device 30 according to the present embodiment sets the pseudo-sparse feature amount to 0 by setting 0 as an element whose value is not included in the upper t que or t pas % among the elements of the pseudo-sparse feature amount.
  • model parameters are learned using this pseudo-sparse feature for learning (in other words, only the element with a large value among the elements of the pseudo-sparse feature is learned. Used for.). This makes it possible to stably acquire features that can be regarded as sparse in a pseudo manner when searching a document.
  • this learning method will also be referred to as "top t learning”.
  • MRR Mean Reciprocal Rank
  • the model parameters were initialized according to the normal distribution N (0, 0.02) except for the bias initialized at 0.
  • the learning rate was set to 5 ⁇ 10-5 , and was linearly attenuated so as to be 0 in the final step.
  • the gradient was clipped with a maximum norm of 1.
  • BERT used the base model (768 dimensions), and the number of dimensions of the intermediate layer of the two output layers was 1000, and the number of dimensions D of the final layer (that is, the pseudo-sparse feature amount) was 30,000.
  • the margin ⁇ of hinge loss was 1.0.
  • a BERT Wordpiece tokenizer (vocabulary number 30K) was used for tokenizing.
  • t que 0.1%.
  • the threshold T is too strongly dependent on the training data in the mini-batch, so the current learning step is used to stabilize the calculation.
  • Pseudo-sparse feature quantities of the training data for the past 20 steps including this were saved, and by using these together, the calculation of the top T% was stabilized.
  • the pseudo-sparse features for 19 steps excluding the current learning step are used only for determining the threshold value T when calculating the upper T%, and the calculation graph is not retained and the parameters are not updated. ..
  • the 1st top-T is the number of output of related documents in the first stage
  • the 1st RT is the response time in the first stage
  • the 2st RT is the response time in the second stage.
  • FIG. 8A and 8B show the comparison results of the pseudo-sparse features of the search target document when top t learning is used and when top t learning is not used.
  • FIG. 8A is a result of calculating the pseudo-sparse features of 80,000 documents out of all the search target documents using the trained model parameters. However, dimensions without non-zero elements are excluded. In the left figure of FIG. 8A (without top learning), the dimension without non-zero elements was 24769 out of 30000 dimensions. Further, it can be seen that many of the 80,000 pseudo-sparse features have non-zero elements, and a very biased pseudo-sparse feature set is obtained. On the other hand, in the right figure of FIG.
  • top learning has the effect of suppressing convergence to model parameters that map to a biased subspace.
  • FIG. 9 is a diagram showing an example of the overall configuration of the search device 10 according to the second embodiment.
  • the search device 10 has a context coding unit 101, a normalized sparse coding unit 109, an inverted index utilization unit 103, and a ranking unit 104. That is, the search device 10 according to the present embodiment has a normalized sparse coding unit 109 instead of the pseudo sparse coding unit 102.
  • the context coding unit 101 and the normalized sparse coding unit 109 may be combined into the coding unit 100A.
  • the normalized sparse coding unit 109 takes the feature amount U of the search query Q as an input, uses the trained model parameters, and uses the pseudo-sparse feature amount U'of the search query Q (this pseudo-sparse feature amount is used in the present embodiment. It is also called "normalized sparse feature quantity").
  • the neural network that realizes the normalized sparse coding unit 109 is normalized and averaged before using the firing function of the final layer of the neural network that realizes the pseudo-sparse coding unit 102 described in the first embodiment. It is a model that shifts.
  • the feature quantity Ui of the search query Qi included in the training data at the time of learning is input to the normalized sparse coding unit 109, it is output in the fully connected layer before using the firing function of the final layer.
  • X ⁇ x 1 , ..., x i , ... x s' , ⁇ as these appropriate subsets. Note that s'is the number of elements of the subset X.
  • ⁇ and ⁇ are hyperparameters set in advance. Since the sum part (that is, (1 / s') (x 1j + ... + x s'j )) in the above equation (5) can be calculated in advance at the time of learning, the calculation result. May be used.
  • the output of the fully connected layer before using the firing function in the final layer of the neural network that realizes the normalized sparse coding unit 109 is normalized, and the average is shifted using ⁇ . It is possible to adjust the sparseness of the sparse feature (that is, the proportion of non-zero elements).
  • ReLu function is used as an example in the above equation (5), as described in the first embodiment, it is possible to use a general ignition function that satisfies all of the conditions 1-1 to 1-3. It is possible.
  • FIG. 10 is a flowchart showing an example of the search process according to the second embodiment.
  • Step S501 First, the context coding unit 101 inputs the search query Q and outputs the feature amount U of the search query Q using the trained model parameters.
  • Step S502 Next, the normalized sparse coding unit 109 inputs the feature amount U obtained in step S501 above, and uses the trained model parameters to generate the normalized sparse feature amount U'of the search query Q. Output.
  • Step S503 Next, the inverted index utilization unit 103 takes the normalized sparse feature amount U'obtained in the above step S502 as an input, and uses the inverted index generated in advance to use the normalized sparse feature of the search target document. Obtain a subset of quantities ⁇ V'i
  • the inverted index according to the present embodiment is obtained by replacing "pseudo-sparse feature amount" with "normalized sparse feature amount” in the explanation of the inverted index in the first embodiment, and its configuration and the like are the first. It is the same as the embodiment.
  • Step S504 Then, the ranking unit 104 inputs the normalized sparse feature quantity U'obtained in the above step S502 and the set ⁇ V'i
  • the search device 10 obtains an ordered set ⁇ D i
  • FIG. 11 is a diagram showing an example of the overall configuration of the inverted index generation device 20 according to the second embodiment.
  • the inverted index generation device 20 has a context coding unit 101, a normalized sparse coding unit 109, and an inverted index generation unit 105. That is, the inverted index generation device 20 according to the present embodiment has a normalized sparse coding unit 109 instead of the pseudo sparse coding unit 102.
  • the normalized sparse coding unit 109 takes the feature amount V i of the search target document Di as an input, and outputs the normalized sparse feature amount V'i of the search target document Di using the trained model parameters.
  • the normalized sparse coding unit 109 uses the firing function after performing normalization and mean shift on the d'vector output in the fully connected layer of the final layer, as explained at the time of retrieval. , The normalized sparse feature quantity V'i is obtained.
  • ⁇ and ⁇ are hyperparameters that are set in advance, but they may be different values at the time of search (of course, they may be the same values at the time of search and at the time of inverted index generation).
  • the above-mentioned document Di may be a search target document Di or another document (for example, a document used as training data). Further, the sum part (that is, (1 / s') (y 1j + ... + y s'j )) in the above equation (6) may be calculated in advance and the calculation result may be used. .. In this case, wi does not have to be included in Y. However, if ⁇ W ; wi ⁇ W, then Y ⁇ W is preferable.
  • ⁇ and ⁇ make it possible to adjust the sparseness of the normalized sparse features, so that the amount of documents stored in each index of the inverted index can be adjusted accordingly, and the search can be performed.
  • the speed (that is, the amount of search) can be adjusted.
  • inverted index is similarly generated by replacing “pseudo-sparse feature amount” with “normalized sparse feature amount” in the description of the inverted index in the first embodiment.
  • FIG. 12 is a flowchart showing an example of the inverted index generation process according to the second embodiment.
  • Step S601 First, the context coding unit 101 inputs the search target document Di and outputs the feature amount Vi of the search target document Di using the trained model parameters.
  • Step S602 Next, the normalized sparse coding unit 109 takes the feature amount V i of the search target document Di as an input and uses the trained model parameters to search the search target document D i the normalized sparse feature amount V'i . Is output.
  • the inverted index generation device 20 can generate an inverted index from the set of input search target documents ⁇ D 1 , ..., D m ⁇ .
  • the amount of documents stored in each index of the inverted index can be adjusted by adjusting ⁇ and ⁇ , so that the search time required for document retrieval is satisfied. It is possible to adjust the amount of documents.
  • the values of ⁇ and ⁇ can be set independently at the time of searching, at the time of generating an inverted index, and at the time of learning described later.
  • FIG. 13 is a diagram showing an example of the overall configuration of the learning device 30 according to the second embodiment.
  • the learning device 30 includes a context coding unit 101, a normalized sparse coding unit 109, a ranking unit 104, a division unit 106, an update unit 107, and a determination unit. It has 108 and.
  • the context coding unit 101 and the normalized sparse coding unit 109 are realized by the same neural network as the context coding unit 101 and the normalized sparse coding unit 109 described in the search time and the inverted index generation time. However, it is assumed that the model parameters have not been trained.
  • the normalized sparse coding unit 109 takes the feature quantities (U i , V i + , V i ⁇ ) of the training data (Q i , Di + , Di ⁇ ) as inputs, and uses the model parameters that have not been trained. , The normalized sparse features ( U'i , V'i + , V'i- ) of the training data (Qi, Di + , Di- ) are obtained.
  • the normalized sparse coding unit 109 takes the feature quantities U i , Vi + , and Vi ⁇ as inputs, and as described above at the time of searching and at the time of generating the inverted index, each normalized sparse feature quantity U ' i , V'i + and V'i- are obtained.
  • the values of ⁇ and ⁇ may be changed according to the learning stage (for example, learning step).
  • the pseudo-sparse coding unit 102 sets the pseudo-sparse feature quantities U'i , V'i + and V'i - for learning pseudo-sparse feature quantities U''i , V''i + and V''i , respectively. Convert to- .
  • FIG. 14 is a flowchart showing an example of the model parameter update process according to the second embodiment.
  • Step S701 First, the context coding unit 101 takes the training data (Qi, Di + , Di- ) in the mini - batch as an input, and uses the untrained model parameters to use the training data ( Qi ,). Outputs the features (U i , V i + , V i- ) of D i + , D i- ) .
  • Step S702 Next, the normalized sparse coding unit 109 learns by inputting the features (U i , V i + , V i ⁇ ) of the training data (Q i , Di + , Di ⁇ ).
  • the normalized sparse features ( U'i , V'i + , V'i- ) of the training data (Qi, Di + , Di- ) are obtained using the model parameters that have not been completed.
  • Step S703 Next, the normalized sparse coding unit 109 uses the normalized sparse features ( U'i , V'i + , V'i- ) for learning pseudo - sparse features ( U''i , V'. ' i + , V''i- ) is converted, and the learning pseudosparse features ( U''i , V''i + , V''i- ) are output.
  • steps S704 to S706 are the same as steps S404 to S406 in FIG. 6, and their description thereof will be omitted.
  • the learning device 30 can learn the model parameters of the neural network that realizes the context coding unit 101 and the normalized sparse coding unit 109 by using the input training data set. ..
  • the differences from the first embodiment will be mainly described, and the description of the components substantially the same as those of the first embodiment will be omitted.
  • FIG. 15 is a diagram showing an example of the overall configuration of the search device 10 according to the third embodiment.
  • the search device 10 has a context coding unit 101, a gradient estimation type pseudo-sparse coding unit 110, an inverted index utilization unit 103, and a ranking unit 104. That is, the search device 10 according to the present embodiment has a gradient estimation type pseudo-sparse coding unit 110 instead of the pseudo-sparse coding unit 102.
  • the context coding unit 101 and the gradient estimation type pseudo-sparse coding unit 110 may be collectively referred to as the coding unit 100B.
  • the gradient estimation type pseudo-sparse coding unit 110 uses the feature quantity U of the search query Q as an input and uses the trained model parameters to make a search query. Outputs the pseudo-sparse feature amount U'of Q.
  • the name "gradient estimation type pseudo-sparse coding unit” is a threshold value t 2, ru described later in the forward propagation process of the neural network by the gradient estimation type pseudo sparse coding unit 110 when performing gradient estimation during learning. Strictly speaking, the processing content is different from that of the pseudo-sparse coding unit 102 in order to calculate the above.
  • the gradient estimation type pseudo-sparse coding unit 110 at the time of search and at the time of generating an inverted index performs the same processing as the pseudo-sparse coding unit 102 described in the first embodiment, the gradient at the time of searching and at the time of generating an inverted index
  • the estimation type pseudo-sparse coding unit 110 may be referred to as a “pseudo-sparse coding unit 102”. Therefore, the search process and the inverted index generation process according to the present embodiment are the same as those of the first embodiment.
  • FIG. 16 is a diagram showing an example of the overall configuration of the inverted index generation device 20 according to the third embodiment.
  • the inverted index generation device 20 has a context coding unit 101, a gradient estimation type pseudo-sparse coding unit 110, and an inverted index generation unit 105. That is, the inverted index generation device 20 according to the present embodiment has a gradient estimation type pseudo-sparse coding unit 110 instead of the pseudo-sparse coding unit 102.
  • the gradient estimation type pseudo-sparse coding unit 110 performs the same processing as the pseudo-sparse coding unit 102 at the time of generating the inverted index
  • the gradient estimation type pseudo-sparse coding unit 110 is referred to as “pseudo-sparse code”. It may be "sparse part 102".
  • the inverted index generation process according to the present embodiment is the same as that of the first embodiment.
  • FIG. 17 is a diagram showing an example of the overall configuration of the learning device 30 according to the third embodiment.
  • the learning device 30 includes a context coding unit 101, a gradient estimation type pseudo-sparse coding unit 110, a ranking unit 104, a division unit 106, and an update unit 107A. It has a determination unit 108.
  • the context coding unit 101 and the gradient estimation type pseudo-sparse coding unit 110 are the same as the context coding unit 101 and the gradient estimation type pseudo-sparse coding unit 110 described in the above-mentioned search time and inverted index generation time. It is realized by a neural network, but its model parameters are not trained.
  • the gradient estimation type pseudo-sparse coding unit 110 obtains the features (U i , V i + , V i ⁇ ) of the training data (Q i , Di + , Di ⁇ ) as in the first embodiment. Pseudo - sparse features for training ( U''i , V''i + , V''i- of training data (Qi, Di + , Di- ) using untrained model parameters as input. ) Is output. At this time, the gradient estimation type pseudo-sparse coding unit 110 calculates the threshold values t 2, ru, etc., which will be described later.
  • the transformation in the forward propagation of the neural network that realizes the gradient estimation type pseudo - sparse coding unit 110 is represented by the function g1.
  • t 1, ru is a threshold value
  • t 1, ru minW'1 r (that is, the smallest element among the elements included in W'1 r ).
  • the update unit 107A updates the model parameters by the supervised learning method with the relevance degrees S i + and S i ⁇ as inputs.
  • the update unit 107A calculates (estimates) the gradient of the error function (for example, hinte loss) by the error back propagation method, instead of the partial differential of the function g1 shown in the above equation ( 7 ), the update unit 107A replaces the partial differential of the function g1.
  • the gradient of the error function is obtained by back - propagating the error using the partial derivative of the function g2 described later.
  • the partial differential of the function g 2 shown in the following equation (9) is replaced with the partial derivative of the function g 1 shown in the above equation (8). Is used.
  • t 2, ru is a threshold value, and is a value to be set so as to satisfy t 1, ru > t 2, ru .
  • the upper left of FIG. 18 is a graph showing the function g 1
  • the lower left is the function g 2
  • the upper right is the partial derivative of the function g 1
  • the element of b or less is 0, whereas in the partial differential of the function g 2 , the element of a or more and b or less is 0. It does not become. Therefore, by using the partial derivative of the function g 2 at the time of back propagation, it is possible to reduce the elements in which the gradient of the error function becomes 0 (in other words, increase the elements in which the error can be back-propagated. It is possible to learn stably and efficiently.
  • the threshold values t 2 and ru are a set ⁇ u'1r, u'2r, ..., U'm''r ⁇ in which the elements of the r-th dimension of the pseudo-sparse features contained in Z tr 1 are collected. It is conceivable to set the minimum value of the subset obtained by collecting the top 2 ⁇ t que % in descending order of the value.
  • Such threshold values t 2 and ru are calculated at the time of forward propagation of the neural network that realizes the gradient estimation type pseudo-sparse coding unit 110.
  • the threshold values t 2, r v + are a set ⁇ v' + 1r , v' + 2r , ..., v' + m , which is a collection of elements of the r-th dimension of the pseudo-sparse features contained in Z tr 2 .
  • it is conceivable to set the minimum value of the subset obtained by collecting the top 2 ⁇ t pas % in descending order of the value.
  • Such threshold values t 2, r v + are calculated at the time of forward propagation of the neural network that realizes the gradient estimation type pseudo-sparse coding unit 110.
  • the threshold values t 2, r v- are a set ⁇ v' - 1r , v' - 2r , ..., V'-, which is a collection of elements of the r-th dimension of the pseudo - sparse features contained in Z tr 3 .
  • V'- is a collection of elements of the r-th dimension of the pseudo - sparse features contained in Z tr 3 .
  • m''r ⁇ it is conceivable to set the minimum value of the subset obtained by collecting the top 2 ⁇ t pas % in descending order of the value.
  • Such threshold values t 2, r v- are calculated at the time of forward propagation of the neural network that realizes the gradient estimation type pseudo-sparse coding unit 110.
  • t 1, r u > t 2, r for the thresholds t 1, ru, t 1, r v + and t 1, r v- which change depending on the dimension r and how to take a mini-batch at the time of learning.
  • the above 2 ⁇ t que % and 2 ⁇ t pas % are examples, and an arbitrary value L (where L> 1) can be used instead of 2.
  • the above-mentioned method of determining the threshold values t 1, ru, t 1, r v + and t 1, r v- is an example, and the threshold value may be determined by other methods.
  • the threshold value may be determined by other methods.
  • b t 1, ru (or t 1, r v + or t 1, r v- )
  • a t 2, ru (or t 2, r v + or t 2, r v- ).
  • FIG. 19 is a flowchart showing an example of the model parameter update process according to the third embodiment. Since steps S801 to S804 are the same as steps S401 to S404 in FIG. 7, the description thereof will be omitted.
  • Step S805 The gradient estimation type pseudo-sparse coding unit 110 calculates the threshold values t 2, ru, t 2, r v + and t 2, r v- .
  • Step S806 Subsequently, the update unit 107A takes the relevance S i + and S i ⁇ obtained in step S804 as inputs, and sets the value of the error function (for example, hinge loss) and the gradient of the error function with respect to the model parameter. To calculate. At this time, when the update unit 107A calculates (estimates) the gradient of the error function by the error back propagation method, the update unit 107A back-propagates the error by using the partial differential of the function g 2 instead of the partial differential of the function g 1 . Obtain the gradient of the error function.
  • the error function for example, hinge loss
  • Step S807 Then, the update unit 107A updates the model parameters by an arbitrary optimization method using the value of the error function calculated in step S806 and the gradient thereof.
  • the learning device 30 learns the model parameters of the neural network that realizes the context coding unit 101 and the gradient estimation type pseudo-sparse coding unit 110 by using the input training data set. Can be done.
  • the threshold value does not depend on how to take the subset during learning (that is, Z tr 1 , Z tr 2 and Z tr 3 ). Stability can be taken into consideration, and learning can be further stabilized and promoted. That is, for example, in the first embodiment, the error can be back-propagated to the element whose gradient becomes 0 or does not become 0 depending on how the subset is taken.
  • the threshold values t 2, ru, t 2, r v + and t 2, r v- are calculated at the time of forward propagation of the neural network that realizes the gradient estimation type pseudo-sparse coding unit 110, but for example. , May be calculated by the update unit 107A in step S806 above.
  • the learning device 30 may have the coding unit 100 instead of the coding unit 100B (that is, the gradient estimation type pseudo-sparse coding unit 110). Instead, it may have a pseudo-sparse coding unit 102).
  • the gradient estimation pattern 1 is a case where the threshold values t 2, ru, t 2, r v + and t 2, r v- are determined by the above determination method 2.
  • MRR represents the average reverse rank
  • P represents the recall rate
  • Latency represents the average value of search time (unit is ms).
  • the search device 10, the inverted index generation device 20, and the learning device 30 can be realized by the hardware configuration of a general computer or computer system, and can be realized by, for example, the hardware configuration of the computer 500 shown in FIG.
  • FIG. 21 is a diagram showing an example of the hardware configuration of the computer 500.
  • the computer 500 shown in FIG. 21 has an input device 501, a display device 502, an external I / F 503, a communication I / F 504, a processor 505, and a memory device 506. Each of these hardware is connected so as to be communicable via the bus 507.
  • the input device 501 is, for example, a keyboard, a mouse, a touch panel, or the like.
  • the display device 502 is, for example, a display or the like.
  • the computer 500 may not have at least one of the input device 501 and the display device 502.
  • the external I / F 503 is an interface with an external device.
  • the external device includes a recording medium 503a and the like.
  • the computer 500 can read and write the recording medium 503a via the external I / F 503.
  • each functional unit (context coding unit 101, pseudo-sparse coding unit 102 (or normalized sparse coding unit 109 or gradient estimation type pseudo-sparse coding unit 110), and translocation of the search device 10 are included.
  • One or more programs that realize the index utilization unit 103 and the ranking unit 104) may be stored.
  • the recording medium 503a has each functional unit (context coding unit 101, pseudo-sparse coding unit 102 (or normalized sparse coding unit 109 or gradient estimation type pseudo-sparse coding)) included in the inverted index generator 20.
  • One or more programs that realize the unit 110) and the inverted index generation unit 105) may be stored.
  • the recording medium 503a has each functional unit (context coding unit 101, pseudo-sparse coding unit 102 (or normalized sparse coding unit 109 or gradient estimation type pseudo-sparse coding unit 110) included in the learning device 30. ),
  • the ranking unit 104, the division unit 106, the update unit 107 (or the update unit 107A), and the determination unit 108) may be stored.
  • the recording medium 503a includes, for example, a CD (Compact Disc), a DVD (Digital Versatile Disc), an SD memory card (Secure Digital memory card), a USB (Universal Serial Bus) memory card, and the like.
  • a CD Compact Disc
  • DVD Digital Versatile Disc
  • SD memory card Secure Digital memory card
  • USB Universal Serial Bus
  • the communication I / F 504 is an interface for connecting the computer 500 to the communication network.
  • One or more programs that realize each functional unit of the search device 10 may be acquired (downloaded) from a predetermined server device or the like via the communication I / F 504.
  • one or more programs that realize each functional unit of the inverted index generation device 20 may be acquired from a predetermined server device or the like via the communication I / F 504.
  • one or more programs that realize each functional unit of the learning device 30 may be acquired from a predetermined server device or the like via the communication I / F 504.
  • the processor 505 is, for example, various arithmetic units such as a CPU (Central Processing Unit) and a GPU.
  • Each functional unit included in the search device 10 is realized, for example, by a process in which one or more programs stored in the memory device 506 are executed by the processor 505.
  • each functional unit of the inverted index generation device 20 is realized by, for example, a process of causing the processor 505 to execute one or more programs stored in the memory device 506.
  • each functional unit included in the learning device 30 is realized by, for example, a process of causing the processor 505 to execute one or more programs stored in the memory device 506.
  • the memory device 506 is, for example, various storage devices such as HDD, SSD, RAM (RandomAccessMemory), ROM (ReadOnlyMemory), and flash memory.
  • the search device 10 according to the first to third embodiments can realize the above-mentioned search process by having the hardware configuration of the computer 500 shown in FIG. 21.
  • the inverted index generation device 20 according to the first to third embodiments can realize the above-mentioned inverted index generation process by having the hardware configuration of the computer 500 shown in FIG. 21. ..
  • the learning device 30 according to the first to third embodiments can realize the above-mentioned learning process by having the hardware configuration of the computer 500 shown in FIG. 21.
  • the hardware configuration of the computer 500 shown in FIG. 21 is an example, and the computer 500 may have another hardware configuration.
  • the computer 500 may have a plurality of processors 505 or may have a plurality of memory devices 506.
  • Appendix 1 With memory With at least one processor connected to the memory Including The processor Using the model parameters of the first neural network as input to a plurality of training data including the search query, the first document related to the search query, and the second document not related to the search query.
  • a plurality of first learning sparse feature vectors, a plurality of second learning sparse feature vectors, and a plurality of third learning sparses by adjusting the ratio of elements that take 0 in each dimension by shifting. Converted to a sparse feature vector and Using the plurality of the first learning sparse feature quantity vectors, the plurality of the second learning sparse feature quantity vectors, and the plurality of the third learning sparse feature quantity vectors of the first neural network.
  • a learning device that updates a model parameter and a model parameter of the second neural network.
  • the processor For each of the first feature amount vector, the second feature amount vector, and the third feature amount vector, each of the output vectors of the fully connected layer included in the final layer of the second neural network. After normalizing and averaging the element values in the dimension, the first sparse feature vector, the second sparse feature vector, and the second sparse feature vector are calculated by calculating the value of the firing function that satisfies the predetermined condition of the final layer. Converted to 3 sparse feature vectors, By setting the value of the element satisfying a predetermined condition in each dimension of each of the first sparse feature amount vector, the second sparse feature amount vector, and the third sparse feature amount vector to 0.
  • the learning device according to Appendix 1, which converts the first learning sparse feature amount vector, the second learning sparse feature amount vector, and the third learning sparse feature amount vector into the third learning sparse feature amount vector.
  • the processor As preset parameters ⁇ and ⁇ , In each dimension of the plurality of output vectors relating to the plurality of the first feature vector, the normalization is performed by the subset of the set of the elements of the dimension and the parameter ⁇ , and the average shift is performed by the parameter ⁇ . After that, by calculating the value of the firing function, the first feature quantity vector is converted into the first sparse feature quantity vector. In each dimension of the plurality of output vectors relating to the plurality of the second feature vectors, the normalization is performed by the subset of the set of the elements of the dimension and the parameter ⁇ , and the average shift is performed by the parameter ⁇ .
  • the second feature amount vector is converted into the second sparse feature amount vector.
  • the normalization is performed by the subset of the set of the elements of the dimension and the parameter ⁇ , and the average shift is performed by the parameter ⁇ .
  • the learning device according to Appendix 2, wherein the third feature amount vector is converted into the third sparse feature amount vector by calculating the value of the firing function.
  • Get the set of sparse feature vectors as a value and t is set as a preset value satisfying 0 ⁇ t ⁇ 100, and the element not included in the upper t% in the set of the elements of the same dimension of the second sparse feature vector is set to 0.
  • a search device that calculates the degree of association between the search query and a document related to the search query using a third sparse feature vector.
  • a non-temporary storage medium that stores a program that can be executed by a computer to perform a learning process.
  • the learning process is Using the model parameters of the first neural network as input to a plurality of training data including the search query, the first document related to the search query, and the second document not related to the search query.
  • a non-temporary storage medium that updates the model parameters and the model parameters of the second neural network.
  • a non-temporary storage medium that stores a program that can be executed by a computer to perform a search process.
  • the search process is Using the search query as an input and using the trained model parameters of the first neural network, a feature vector representing the characteristics of the search query is generated.
  • the output vector of the fully connected layer with respect to the feature quantity vector is normalized and mean-shifted in each dimension, and then sparsed by an ignition function satisfying a predetermined condition.
  • the second sparse feature of the document related to the search query is sparse using the inverted index created in advance, using the index of the dimension corresponding to the non-zero element included in the first sparse feature vector as a key.
  • Get the set of sparse feature vectors as a value and t is set as a preset value satisfying 0 ⁇ t ⁇ 100, and the element not included in the upper t% in the set of the elements of the same dimension of the second sparse feature vector is set to 0.
  • Search device 20
  • Inverted index generator 30
  • Learning device 100 Coding unit 100A Coding unit 100B Coding unit
  • Contextual coding unit 102
  • Pseudo-sparse coding unit 103
  • Inverted index utilization unit 104
  • Ranking unit 105
  • Inverted index generation unit 106 Division 107 Update part 107A Update part 108
  • Judgment part 109 Normalized sparse coding part 110

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A learning device according to one embodiment is characterized by having: a feature quantity generation unit that accepts, as input, a plurality of training data in which a search query, a first document associated with the search query, and a second document not associated with the search query are included, and that, by using model parameters of a first neural network, generates a plurality of first feature quantity vectors which respectively represent a plurality of features of the search query, a plurality of second feature quantity vectors which respectively represent a plurality of features of the first document, and a plurality of third feature quantity vectors which respectively represent a plurality of features of the second document; a conversion unit that, by using the model parameters of a second neural network, converts the plurality of first feature quantity vectors, the plurality of second feature quantity vectors, and the plurality of third feature quantity vectors into a plurality of first sparse feature quantity vectors for learning, a plurality of second sparse feature quantity vectors for learning, and a plurality of third sparse feature quantity vectors for learning, respectively, the sparse feature quantity vectors for learning being made sparse by adjusting the ratio of elements that take a 0 in each dimension by normalization and mean shift; and an update unit for updating the model parameters of the first neural network and the model parameters of the second neural network by using the plurality of first sparse feature quantity vectors for learning, the plurality of second sparse feature quantity vectors for learning, and the plurality of third sparse feature quantity vectors for learning.

Description

学習装置、検索装置、学習方法、検索方法及びプログラムLearning device, search device, learning method, search method and program
 本発明は、学習装置、検索装置、学習方法、検索方法及びプログラムに関する。 The present invention relates to a learning device, a search device, a learning method, a search method, and a program.
 文書検索では、大量の文書の中から検索クエリに関連する文書を高速に取り出すことが要求される。この要求を実現する技術として、例えば、文書内に含まれる単語をキー、その単語が含まれる文書の文書番号をバリューとする転置インデックスを作成した上で、この転置インデックスを利用して、検索クエリに含まれる単語で文書検索を行う技術が知られている。 Document search requires high-speed retrieval of documents related to search queries from a large number of documents. As a technology to realize this requirement, for example, after creating an inverted index whose key is a word contained in a document and whose value is the document number of the document containing the word, a search query is performed using this inverted index. A technique for searching a document using the words contained in is known.
 また、単語の完全一致で文書検索を行う場合、語彙の曖昧性や表記ゆれ等により検索漏れが起こり得る。このため、単語が完全一致しなくても文書検索を行うことができる技術として、ニューラルネットワークで得られたベクトルを潜在的な単語ベクトルとみなして、転置インデックスを作成し、文書検索を行う技術が知られている(例えば、非特許文献1)。 In addition, when a document search is performed with an exact word match, search omission may occur due to vocabulary ambiguity or notational fluctuations. For this reason, as a technique that can perform document retrieval even if words do not exactly match, there is a technique that regards the vector obtained by the neural network as a potential word vector, creates an inverted index, and performs document retrieval. It is known (for example, Non-Patent Document 1).
 しかしながら、上記の非特許文献1等に記載されている従来技術では、転置インデックスを利用した高速な検索を行うために、学習時にベクトルのノルムに関する制約項を損失関数に追加することによって高次元でスパースなベクトルを実現していた。このため、得られるベクトルのスパース性を明示的にコントロールすることが困難な場合が多く、また特徴空間が特定の低次元の部分空間で表現されてしまう可能性もあった。 However, in the prior art described in Non-Patent Document 1 and the like described above, in order to perform a high-speed search using an inverted index, a constraint term relating to the vector norm is added to the loss function at the time of training in a high dimension. It realized a sparse vector. For this reason, it is often difficult to explicitly control the sparsity of the obtained vector, and there is a possibility that the feature space is represented by a specific low-dimensional subspace.
 本発明の一実施形態は、上記の点に鑑みてなされたもので、転置インデックスを利用した文書検索において、疑似的にスパースとみなすことができるベクトルをニューラルネットワークにより獲得することを目的とする。 One embodiment of the present invention has been made in view of the above points, and an object thereof is to acquire a vector that can be regarded as a pseudo sparse in a document search using an inverted index by a neural network.
 上記目的を達成するため、一実施形態に係る学習装置は、検索クエリと、前記検索クエリに関連がある第1の文書と、前記検索クエリに関連がない第2の文書とが含まれる複数の訓練データを入力として、第1のニューラルネットワークのモデルパラメータを用いて、複数の前記検索クエリの特徴をそれぞれ表す複数の第1の特徴量ベクトルと、複数の前記第1の文書の特徴をそれぞれ表す複数の第2の特徴量ベクトルと、複数の前記第2の文書の特徴をそれぞれ表す複数の第3の特徴量ベクトルとを生成する特徴量生成部と、第2のニューラルネットワークのモデルパラメータを用いて、複数の前記第1の特徴量ベクトルと複数の前記第2の特徴量ベクトルと複数の前記第3の特徴量ベクトルとのそれぞれについて、正規化及び平均シフトにより各次元で0を取る要素の割合を調整することでスパース化した複数の第1の学習用スパース特徴量ベクトルと複数の第2の学習用スパース特徴量ベクトルと複数の第3の学習用スパース特徴量ベクトルとに変換する変換部と、複数の前記第1の学習用スパース特徴量ベクトルと複数の前記第2の学習用スパース特徴量ベクトルと複数の前記第3の学習用スパース特徴量ベクトルとを用いて、前記第1のニューラルネットワークのモデルパラメータと前記第2のニューラルネットワークのモデルパラメータとを更新する更新部と、を有することを特徴とする。 In order to achieve the above object, the learning device according to the embodiment includes a search query, a first document related to the search query, and a plurality of second documents not related to the search query. Using the training data as input and using the model parameters of the first neural network, the features of the plurality of first features and the features of the first document are represented, respectively, which represent the features of the plurality of search queries. Using a feature amount generator that generates a plurality of second feature amount vectors and a plurality of third feature amount vectors representing the features of the second document, and model parameters of the second neural network. The elements that take 0 in each dimension by normalization and average shift for each of the plurality of the first feature quantity vectors, the plurality of the second feature quantity vectors, and the plurality of the third feature quantity vectors. A conversion unit that converts a plurality of first learning sparse feature quantities vectors, a plurality of second learning sparse feature quantities vectors, and a plurality of third learning sparse feature quantity vectors that have been sparsed by adjusting the ratio. And the first neural using the plurality of the first learning sparse feature quantity vectors, the plurality of the second learning sparse feature quantity vectors, and the plurality of the third learning sparse feature quantity vectors. It is characterized by having an update unit for updating the model parameters of the network and the model parameters of the second neural network.
 転置インデックスを利用した文書検索において、疑似的にスパースとみなすことができるベクトルをニューラルネットワークにより獲得することができる。 In a document search using an inverted index, a vector that can be regarded as a pseudo sparse can be acquired by a neural network.
第一の実施形態に係る検索装置の全体構成の一例を示す図である。It is a figure which shows an example of the whole structure of the search apparatus which concerns on 1st Embodiment. 第一の実施形態に係る検索処理の一例を示すフローチャートである。It is a flowchart which shows an example of the search process which concerns on 1st Embodiment. 第一の実施形態に係る転置インデックス生成装置の全体構成の一例を示す図である。It is a figure which shows an example of the whole structure of the inverted index generation apparatus which concerns on 1st Embodiment. 第一の実施形態に係る転置インデックス生成処理の一例を示すフローチャートである。It is a flowchart which shows an example of the inverted index generation processing which concerns on 1st Embodiment. 第一の実施形態に係る学習装置の全体構成の一例を示す図である。It is a figure which shows an example of the whole structure of the learning apparatus which concerns on 1st Embodiment. 第一の実施形態に係る学習処理の一例を示すフローチャートである。It is a flowchart which shows an example of the learning process which concerns on 1st Embodiment. 第一の実施形態に係るモデルパラメータ更新処理の一例を示すフローチャートである。It is a flowchart which shows an example of the model parameter update process which concerns on 1st Embodiment. 頻度分布の比較例を示す図である。It is a figure which shows the comparative example of a frequency distribution. 頻度分布の比較例を示す図である。It is a figure which shows the comparative example of a frequency distribution. 第二の実施形態に係る検索装置の全体構成の一例を示す図である。It is a figure which shows an example of the whole structure of the search apparatus which concerns on 2nd Embodiment. 第二の実施形態に係る検索処理の一例を示すフローチャートである。It is a flowchart which shows an example of the search process which concerns on 2nd Embodiment. 第二の実施形態に係る転置インデックス生成装置の全体構成の一例を示す図である。It is a figure which shows an example of the whole structure of the inverted index generation apparatus which concerns on 2nd Embodiment. 第二の実施形態に係る転置インデックス生成処理の一例を示すフローチャートである。It is a flowchart which shows an example of the inverted index generation processing which concerns on 2nd Embodiment. 第二の実施形態に係る学習装置の全体構成の一例を示す図である。It is a figure which shows an example of the whole structure of the learning apparatus which concerns on 2nd Embodiment. 第二の実施形態に係るモデルパラメータ更新処理の一例を示すフローチャートである。It is a flowchart which shows an example of the model parameter update process which concerns on 2nd Embodiment. 第三の実施形態に係る検索装置の全体構成の一例を示す図である。It is a figure which shows an example of the whole structure of the search apparatus which concerns on 3rd Embodiment. 第三の実施形態に係る転置インデックス生成装置の全体構成の一例を示す図である。It is a figure which shows an example of the whole structure of the inverted index generation apparatus which concerns on 3rd Embodiment. 第三の実施形態に係る学習装置の全体構成の一例を示す図である。It is a figure which shows an example of the whole structure of the learning apparatus which concerns on 3rd Embodiment. 関数g及びgとその偏微分の一例を示す図である。It is a figure which shows an example of the functions g 1 and g 2 and the partial derivative thereof. 第三の実施形態に係るモデルパラメータ更新処理の一例を示すフローチャートである。It is a flowchart which shows an example of the model parameter update process which concerns on 3rd Embodiment. 第三の実施形態に係る学習装置の全体構成の変形例を示す図である。It is a figure which shows the modification of the whole structure of the learning apparatus which concerns on 3rd Embodiment. コンピュータのハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware configuration of a computer.
 以下、本発明の一実施形態について説明する。 Hereinafter, an embodiment of the present invention will be described.
 [第一の実施形態]
 本実施形態では、ニューラルネットワークで得られるベクトルと転置インデックスとを利用して、検索対象の文書の中から検索クエリに関連する文書を検索する検索装置10について説明する。また、当該転置インデックスを生成(又は作成)する転置インデックス生成装置20と、当該ニューラルネットワークの学習を行う学習装置30についても説明する。
[First Embodiment]
In the present embodiment, a search device 10 for searching a document related to a search query from among the documents to be searched will be described using a vector obtained by a neural network and an inverted index. Further, the inverted index generation device 20 for generating (or creating) the inverted index and the learning device 30 for learning the neural network will also be described.
 なお、本実施形態では、検索装置10と転置インデックス生成装置20と学習装置30とがそれぞれ異なる装置であるものとして説明するが、これらの装置のうちの2以上の装置が同一の装置で実現されていてもよい。例えば、検索装置10と転置インデックス生成装置20が同一装置で実現されていてもよいし、転置インデックス生成装置20と学習装置30が同一装置で実現されていてもよいし、学習装置30と検索装置10が同一装置で実現されていてもよいし、検索装置10と転置インデックス生成装置20と学習装置30とが同一装置で実現されていてもよい。 In the present embodiment, the search device 10, the inverted index generation device 20, and the learning device 30 are described as different devices, but two or more of these devices are realized by the same device. You may be. For example, the search device 10 and the inverted index generation device 20 may be realized by the same device, the inverted index generation device 20 and the learning device 30 may be realized by the same device, or the learning device 30 and the search device may be realized. 10 may be realized by the same device, or the search device 10, the inverted index generation device 20, and the learning device 30 may be realized by the same device.
 ・検索時
 まず、検索装置10により文書検索を行う場合について説明する。ここで、検索対象の文書集合を{D,・・・,D}として、検索装置10は、検索クエリQを入力し、この検索クエリQに関連する文書の順序集合{D,・・・,D}とその関連度{S,・・・,S}とを出力するものとする。mは検索対象の文書数、k(ただし、k≦m)は検索クエリQに関連する文書数である。
-At the time of search First, a case where a document search is performed by the search device 10 will be described. Here, the search target document set is {D 1 , ..., D m }, the search device 10 inputs the search query Q, and the sequence set of documents related to the search query Q {D 1 , ... ..., D k } and its relevance degree {S 1 , ..., Sk } shall be output. m is the number of documents to be searched, and k (where k ≦ m) is the number of documents related to the search query Q.
 なお、検索クエリQ及び各検索対象文書D(i=1,・・・,m)はテキスト(文字列)である。また、検索クエリQに関連する文書とは、この検索クエリQに対する検索結果として得られる文書のことである。 The search query Q and each search target document Di ( i = 1, ..., M) are texts (character strings). Further, the document related to the search query Q is a document obtained as a search result for the search query Q.
 <検索装置10の全体構成>
 本実施形態に係る検索装置10の全体構成について、図1を参照しながら説明する。図1は、第一の実施形態に係る検索装置10の全体構成の一例を示す図である。
<Overall configuration of search device 10>
The overall configuration of the search device 10 according to the present embodiment will be described with reference to FIG. FIG. 1 is a diagram showing an example of the overall configuration of the search device 10 according to the first embodiment.
 図1に示すように、本実施形態に係る検索装置10は、文脈符号化部101と、疑似スパース符号化部102と、転置インデックス活用部103と、ランキング部104とを有する。ここで、文脈符号化部101及び疑似スパース符号化部102はニューラルネットワークで実現されており、そのパラメータは予め学習済みであるものとする。以降では、文脈符号化部101及び疑似スパース符号化部102を実現するニューラルネットワークのパラメータを「モデルパラメータ」と表す。なお、学習済みモデルパラメータは、例えば、HDD(Hard Disk Drive)やSSD(Solid State Drive)等の補助記憶装置に記憶されている。 As shown in FIG. 1, the search device 10 according to the present embodiment has a context coding unit 101, a pseudo sparse coding unit 102, an inverted index utilization unit 103, and a ranking unit 104. Here, it is assumed that the context coding unit 101 and the pseudo-sparse coding unit 102 are realized by a neural network, and their parameters have been learned in advance. Hereinafter, the parameters of the neural network that realizes the context coding unit 101 and the pseudo sparse coding unit 102 are referred to as “model parameters”. The trained model parameters are stored in an auxiliary storage device such as an HDD (Hard Disk Drive) or SSD (Solid State Drive), for example.
 文脈符号化部101は、検索クエリQを入力として、学習済みモデルパラメータを用いて、この検索クエリQの特徴量Uを出力する。 The context coding unit 101 takes the search query Q as an input and outputs the feature amount U of the search query Q using the trained model parameters.
 ここで、文脈符号化部101を実現するニューラルネットワークとしては、例えば、BERT(Bidirectional Encoder Representations from Transformers)等を用いることができる。BERTはTransformerを用いた文脈考慮型の事前学習モデルであり、テキストを入力として、d次元の特徴量を出力する。この特徴量を全結合のニューラルネットワーク1層で変換することで、自然言語処理の様々なタスクで高性能を発揮している。BERTの詳細については、例えば、参考文献1「J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert:Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.」等を参照されたい。また、Transformerの詳細については、例えば、参考文献2「Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. Attention Is All You Need. arXiv preprint arXiv:1706.03762, 2017.」等を参照されたい。 Here, as the neural network that realizes the context coding unit 101, for example, BERT (Bidirectional Encoder Representations from Transformers) or the like can be used. BERT is a context-aware pre-learning model using Transformer, which takes text as input and outputs d-dimensional features. By converting this feature quantity with one layer of a fully coupled neural network, it demonstrates high performance in various tasks of natural language processing. For more information on BERT, see, for example, Reference 1 "J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. ArXiv preprint arXiv: 1810.4805 , 2018. ”etc. For details on Transformer, see, for example, Reference 2 "Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Please refer to "1706.03762, 2017." etc.
 文脈符号化部101を実現するニューラルネットワークとしてBERTを用いる場合、検索クエリQの文頭にCLSタグ、文末にSEPタグを追加した上で、文脈符号化部101に入力する。 When BERT is used as a neural network to realize the context coding unit 101, the CLS tag is added to the beginning of the search query Q and the SEP tag is added to the end of the sentence, and then input to the context coding unit 101.
 なお、BERTは一例であって、文脈符号化部101を実現するニューラルネットワークとしては、Transformerを用いた文脈考慮型の他の事前学習モデルが用いられてもよい。より一般には、文脈符号化部101を実現するニューラルネットワークとしては、テキストを符号化することが可能な任意のニューラルネットワークが用いられてもよい。ただし、BERT等の文脈考慮型の事前学習モデルで文脈符号化部101を実現することで、文脈全体を考慮した特徴量を得ることが可能になる。以降では、文脈符号化部101はBERTで実現されているものとして、特徴量Uはd次元ベクトルであるものとする。 Note that BERT is an example, and another context-considered pre-learning model using Transformer may be used as the neural network that realizes the context coding unit 101. More generally, as the neural network that realizes the context coding unit 101, any neural network capable of encoding text may be used. However, by realizing the context coding unit 101 with a context-aware pre-learning model such as BERT, it is possible to obtain a feature quantity considering the entire context. Hereinafter, it is assumed that the context coding unit 101 is realized by BERT, and the feature quantity U is a d-dimensional vector.
 疑似スパース符号化部102は、検索クエリQの特徴量Uを入力として、学習済みモデルパラメータを用いて、検索クエリQの疑似スパース特徴量U'(つまり、疑似的にスパースとみなせる、検索クエリQの特徴量U')を出力する。 The pseudo-sparse coding unit 102 takes the feature amount U of the search query Q as an input and uses the trained model parameters to use the pseudo-sparse feature amount U'of the search query Q (that is, the search query Q which can be regarded as pseudo-sparse). Feature quantity U') is output.
 ここで、疑似スパース符号化部102を実現するニューラルネットワークとしては、例えば、上記の非特許文献1に記載されている全結合層のモデル等を用いることができる。より具体的には、特徴量Uの次元数dよりも疑似スパース特徴量U'の次元数d'の方が大きくなるように全結合層を数層(例えば、3から5層程度)積み上げると共に、これらの全結合層の最終層の発火関数としてReLu関数等の一般的な発火関数を用いた後に発火関数の各出力値を出力値全体のL2ノルムで割ることにより半径1の超球面上へ射影するモデルを用いることができる。最終層の発火関数をReLu関数とすることで、0を要素に持つ疑似スパース特徴量U'を得ることが可能となる(つまり、最終層の発火関数としてReLu関数を用いることで、他の関数を用いる場合と比較して、よりスパースな表現能力の獲得が可能となる)。 Here, as the neural network that realizes the pseudo-sparse coding unit 102, for example, the model of the fully connected layer described in Non-Patent Document 1 can be used. More specifically, several fully connected layers (for example, about 3 to 5 layers) are stacked so that the dimension number d'of the pseudo-sparse feature amount U'is larger than the dimension number d of the feature amount U. After using a general ignition function such as the ReLu function as the ignition function of the final layer of these fully connected layers, divide each output value of the ignition function by the L2 norm of the entire output value to move onto a hypersphere with a radius of 1. A projecting model can be used. By using the ReLu function as the firing function of the final layer, it is possible to obtain a pseudo-sparse feature quantity U'having 0 as an element (that is, by using the ReLu function as the firing function of the final layer, other functions. Compared to the case of using, it is possible to acquire more sparse expressive ability).
 なお、上記の非特許文献1に記載されているモデルは一例であって、疑似スパース符号化部102を実現するニューラルネットワークとしては、入力次元よりも出力次元の方が高次元であり、かつ、最終層に以下の条件1-1~条件1-3の全てを満たす一般的な発火関数f:R→Rを用いているものであれば任意のモデルを用いることが可能である。 The model described in Non-Patent Document 1 is an example, and as a neural network for realizing the pseudo-sparse coding unit 102, the output dimension is higher than the input dimension and the output dimension is higher than the input dimension. Any model can be used as long as the final layer uses a general ignition function f: R → R that satisfies all of the following conditions 1-1 to 1-3.
 条件1-1:全てのxに対して、f(x)≧0であること
 条件1-2:fは単調増加であること
 条件1-3:f(a)=0となるa∈Rが存在すること
 また、疑似スパース特徴量U'の次元数d'は可能な限り高次元であることが好ましい。ただし、次元数d'が高いほど疑似スパース特徴量U'の表現力は高くなる一方で、疑似スパース特徴量U'を計算するための計算コストやモデルパラメータを学習するための学習コスト等が高くなる。更に、検索対象の文書集合が持つ情報量と許容可能な計算コストは状況によって異なり得ると共に、次元数d'と疑似スパース符号化部102を実現するニューラルネットワークによる写像の終域が張る空間の次元数(つまり、当該写像の表現行列のランク)とは必ずしも一致するとは限らない。そのため、次元数d'をどの程度とするかは、例えば、検索対象の文書集合が持つ情報量や利用可能な計算資源等によって異なり得る。
Condition 1-1: f (x) ≧ 0 for all x Conditions 1-2: f is a monotonous increase Condition 1-3: a ∈ R such that f (a) = 0 Existence Further, it is preferable that the dimension number d'of the pseudo-sparse feature quantity U'is as high as possible. However, the higher the number of dimensions d', the higher the expressive power of the pseudo-sparse feature quantity U', while the higher the calculation cost for calculating the pseudo-sparse feature quantity U'and the higher the learning cost for learning the model parameters. Become. Further, the amount of information and the allowable calculation cost of the document set to be searched may differ depending on the situation, and the dimension of the space where the codomain of the map by the neural network that realizes the dimension number d'and the pseudo-sparse coding unit 102 extends. It does not always match the number (that is, the rank of the representation matrix of the map). Therefore, the degree of the dimension number d'may differ, for example, depending on the amount of information possessed by the document set to be searched, the available computational resources, and the like.
 また、上記の超球面上への射影は必須ではなく、無くてもよい。ただし、疑似スパース特徴量の学習を促進することが大いに期待されるため、この射影は行った方が好ましい。 Also, the projection on the above-mentioned hypersphere is not essential and may not be necessary. However, it is preferable to perform this projection because it is highly expected to promote the learning of pseudo-sparse features.
 なお、本実施形態では、文脈符号化部101と疑似スパース符号化部102とを異なる機能部として表現しているが、これは便宜上であって、文脈符号化部101と疑似スパース符号化部102とが1つの機能部であってもよい。例えば、文脈符号化部101と疑似スパース符号化部102とをまとめて符号化部100としてもよい。 In the present embodiment, the context coding unit 101 and the pseudo sparse coding unit 102 are expressed as different functional units, but this is for convenience, and the context coding unit 101 and the pseudo sparse coding unit 102 are represented. May be one functional unit. For example, the context coding unit 101 and the pseudo-sparse coding unit 102 may be collectively referred to as the coding unit 100.
 転置インデックス活用部103は、疑似スパース特徴量U'を入力として、予め生成された転置インデックスを用いて、検索対象文書の疑似スパース特徴量の部分集合{V'|i∈K}を得る。Kは|K|=kであり、検索クエリQに関連する文書のインデックス(又は、文書番号や文書ID等)の集合である。また、検索対象文書の疑似スパース特徴量V'は、検索対象文書Dを文脈符号化部101及び疑似スパース符号化部102に入力することで得られるd'次元のベクトルである。以降では、i=1,・・・,mに対して、V'=(v'i1,v'i2,・・・,v'id')と表す。また、文書のインデックスを「文書インデックス」とも表す。なお、転置インデックスは、例えば、HDDやSSD等の補助記憶装置に記憶されている。 The inverted index utilization unit 103 receives a pseudo-sparse feature quantity U'as an input, and obtains a subset {V'i | i ∈ K} of the pseudo-sparse feature quantity of the search target document by using the inverted index generated in advance. K is | K | = k, and is a set of indexes (or document numbers, document IDs, etc.) of documents related to the search query Q. Further, the pseudo-sparse feature quantity V'i of the search target document is a d'dimensional vector obtained by inputting the search target document Di to the context coding unit 101 and the pseudo sparse coding unit 102. Hereinafter, for i = 1, ..., M, it is expressed as V'i = ( v'i1 , v'i2 , ..., v'id' ). The index of a document is also referred to as a "document index". The inverted index is stored in an auxiliary storage device such as an HDD or SSD, for example.
 ここで、本実施形態に係る転置インデックスは、疑似スパース特徴量の各次元1,2,・・・,d'(つまり、次元のインデックス又は次元番号)をキーとし、キーrに関してC={(i,v'ir)|v'ir∈Wi∈{1,・・・,m}で表される集合をバリューとして設定した情報である。Wは、検索対象文書の疑似スパース特徴量V',V',・・・,V'のr次元目の要素を集めた集合{v'1r,v'2r,・・・,v'mr}について、その値の降順に上位t%を集めた部分集合である。なお、tは予め設定された閾値(ただし、0<t≦100)であり、後述する閾値tpas又はtqueと異なる値であってもよいし、同一の値であってもよい。 Here, the inverted index according to the present embodiment uses each dimension 1, 2, ..., D'(that is, a dimension index or a dimension number) of the pseudo-sparse feature quantity as a key, and Cr = {with respect to the key r . (I, v'ir ) | v'ir ∈ Wr} i ∈ {1, ..., m} is the information set as the value. Wr is a set { v'1r , v'2r , ..., Which is a collection of elements of the r -th dimension of the pseudo-sparse features V'1 , V'2 , ... For v'mr }, it is a subset of the top t% in descending order of the value. In addition, t is a preset threshold value (however, 0 <t ≦ 100), and may be a value different from the threshold value t pas or t que described later, or may be the same value.
 このとき、転置インデックス活用部103は、スパース特徴量U'=(u',u',・・・,u'd')に関してu'≠0である各次元rをキーとして、転置インデックスからバリューを取得する。この際に、より高速に文書検索を行いたい場合は、転置インデックス活用部103は、疑似スパース特徴量U'=(u',u',・・・,u'd')の各要素のうち、その値が所定の閾値よりも小さい要素を0に近似した上でバリューの取得を行ってもよい(この場合、0に近似された要素に対応する次元については、バリューは取得されない。)。 At this time, the inverted index utilization unit 103 transposes using each dimension r where u'r ≠ 0 with respect to the sparse feature quantity U'= (u'1, u'2 , ..., u'd ' ) as a key. Get value from the index. At this time, if it is desired to perform document retrieval at a higher speed, the inverted index utilization unit 103 uses each element of the pseudo-sparse feature quantity U'= (u'1, u'2 , ..., U'd ' ). Of these, an element whose value is smaller than a predetermined threshold value may be approximated to 0, and then the value may be acquired (in this case, the value is not acquired for the dimension corresponding to the element approximated to 0). ).
 そして、転置インデックス活用部103は、取得したバリューの集合に含まれる全ての文書インデックスの集合をKとして、検索対象文書のスパース特徴量の部分集合{V'|i∈K}を得る。なお、以降では、文書インデックスをリナンバリングすることで、検索対象文書のスパース特徴量の部分集合{V'|i∈K}を{V',・・・,V'}とも表す。 Then, the inverted index utilization unit 103 obtains a subset {V'i | i ∈ K} of the sparse features of the search target document, with the set of all document indexes included in the set of acquired values as K. In the following, by renumbering the document index, the subset {V'i | i ∈ K} of the sparse features of the document to be searched is also expressed as { V'1 , ..., V'k }.
 ランキング部104は、検索クエリQの疑似スパース特徴量U'と検索対象文書の疑似スパース特徴量の部分集合{V'|i∈K}={V',・・・,V'}とを入力として、検索クエリQに関連する文書(以降、「関連文書」とも表す。)の順序集合{D|i∈K}とその関連度{S|i∈K}とを出力する。なお、関連文書の順序集合{D|i∈K}は、関連度Sの昇順又は降順に順序付けた集合である。また、関連文書の順序集合{D|i∈K}とその関連度{S|i∈K}は文書インデックスをリナンバリングすることで、それぞれ{D,・・・,D}及び{S,・・・,S}と表すことができる。 The ranking unit 104 is a subset of the pseudo-sparse feature quantity U'of the search query Q and the pseudo-sparse feature quantity of the search target document {V'i | i ∈ K} = { V'1 , ..., V'k }. Is input to output the sequence set {D i | i ∈ K} of the documents related to the search query Q (hereinafter, also referred to as "related documents") and the degree of relevance {S i | i ∈ K}. .. The ordered set of related documents {D i | i ∈ K} is a set ordered in ascending or descending order of the degree of relevance S i . In addition, the ordered set of related documents {D i | i ∈ K} and its relevance degree {S i | i ∈ K} are renumbered from the document index to {D 1 , ..., D k } and, respectively. It can be expressed as { S 1 , ..., Sk}.
 このとき、ランキング部104は、検索対象文書の疑似スパース特徴量の部分集合{V',・・・,V'}を{V'',・・・,V''}に変換した上で、ベクトル間の類似度を測る適当な類似度関数sを用いて、S=s(U',V'')により検索クエリQと文書Dの関連度Sを計算する。類似度関数sとしては、例えば、内積等を用いることが可能である。ただし、類似度関数sとしては、ベクトル間の類似度を測ることができる任意の関数を用いることが可能である。また、例えば、ベクトル間の距離を測ることができる任意の距離関数dを用いて、s=1/dにより類似度関数sを定義してもよい。 At this time, the ranking unit 104 converts the subset { V'1 , ..., V'k } of the pseudo-sparse features of the search target document into { V''1 , ..., V''k }. Then, using an appropriate similarity function s that measures the similarity between vectors, the degree of association S i between the search query Q and the document Di is calculated by S i = s (U', V''i ). .. As the similarity function s, for example, an inner product or the like can be used. However, as the similarity function s, any function capable of measuring the similarity between vectors can be used. Further, for example, the similarity function s may be defined by s = 1 / d using an arbitrary distance function d capable of measuring the distance between vectors.
 ここで、V''=(v''i1,v''i2,・・・,v''ik)として、ランキング部104は、以下の式(1)によりV'=(v'i1,v'i2,・・・,v'id')をV''に変換する。 Here, assuming that V''i = ( v''i1, v''i2 , ..., V'' ik ), the ranking unit 104 is subjected to V'i = ( v'i1 ) according to the following equation (1). , V'i2 , ..., v'id' ) is converted to V''i .
Figure JPOXMLDOC01-appb-M000001

 このように、ランキング部104は、r=1,・・・,kに対して、(i,v'ir)がバリューCに含まれる場合はv''ir=v'ir、そうでない場合はv''ir=0とすることで、V'をV''に変換する。なお、転置インデックスのバリューCの生成方法により、この変換は、V'=(v'i1,v',・・・,v'ik)の各要素について、その値が上位t%未満である場合は0とすることを意味する。
Figure JPOXMLDOC01-appb-M000001

As described above, the ranking unit 104 has v'' ir = v'ir if (i, v'ir ) is included in the value Cr for r = 1, ..., K, otherwise. Converts V'i to V''i by setting v'' ir = 0. In addition, depending on the method of generating the value Cr of the inverted index, in this conversion, the value of each element of V'i = ( v'i1 , v'2 , ..., V'ik ) is less than the upper t%. If it is, it means that it is set to 0.
 <検索処理>
 入力された検索クエリQの関連文書の順序集合{D|i∈K}とその関連度{S|i∈K}とを得るための検索処理について、図2を参照しながら説明する。図2は、第一の実施形態に係る検索処理の一例を示すフローチャートである。
<Search process>
The search process for obtaining the ordered set {D i | i ∈ K} of the related documents of the input search query Q and the degree of relevance {S i | i ∈ K} will be described with reference to FIG. FIG. 2 is a flowchart showing an example of the search process according to the first embodiment.
 ステップS101:まず、文脈符号化部101は、検索クエリQを入力として、学習済みモデルパラメータを用いて、この検索クエリQの特徴量Uを出力する。 Step S101: First, the context coding unit 101 inputs the search query Q and outputs the feature amount U of the search query Q using the trained model parameters.
 ステップS102:次に、疑似スパース符号化部102は、上記のステップS101で得られた特徴量Uを入力として、学習済みモデルパラメータを用いて、検索クエリQの疑似スパース特徴量U'を出力する。 Step S102: Next, the pseudo-sparse coding unit 102 outputs the pseudo-sparse feature amount U'of the search query Q using the trained model parameters with the feature amount U obtained in the above step S101 as an input. ..
 ステップS103:次に、転置インデックス活用部103は、上記のステップS102で得られた疑似スパース特徴量U'を入力として、予め生成された転置インデックスを用いて、検索対象文書の疑似スパース特徴量の部分集合{V'|i∈K}を得る。 Step S103: Next, the inverted index utilization unit 103 uses the inverted index generated in advance with the pseudo-sparse feature amount U'obtained in the above step S102 as an input to obtain the pseudo-sparse feature amount of the search target document. Obtain a subset {V'i | i ∈ K}.
 ステップS104:そして、ランキング部104は、上記のステップS102で得られた疑似スパース特徴量U'と、上記のステップS103で得られた集合{V'|i∈K}とを入力として、{V'|i∈K}を{V''|i∈K}に変換した上で、この{V''|i∈K}と、疑似スパース特徴量U'とを用いて、検索クエリQの関連文書の順序集合{D|i∈K}とその関連度{S|i∈K}とを出力する。 Step S104: Then, the ranking unit 104 inputs {V'i | i ∈ K} of the pseudo-sparse feature quantity U'obtained in the above step S102 and the set {V'i | i ∈ K} obtained in the above step S103. After converting V'i | i ∈ K} to {V''i | i ∈ K}, search using this {V''i | i ∈ K} and the pseudo-sparse feature U'. The ordered set of related documents of query Q {D i | i ∈ K} and its relevance degree {S i | i ∈ K} are output.
 以上により、本実施形態に係る検索装置10は、入力された検索クエリQに関連する文書の順序集合{D|i∈K}とその関連度{S|i∈K}とを得ることができる。このとき、本実施形態に係る検索装置10は、検索クエリQの疑似スパース特徴量U'と、転置インデックス生成装置20によって予め生成された転置インデックスとを用いることで、検索対象文書の文書量のオーダーに依存せずに、文書検索に要求される検索速度を満たした上で、検索クエリQと検索対象文書全体の文脈を考慮した関連文書とその関連度とを得ることができる。 As described above, the search device 10 according to the present embodiment obtains an ordered set {D i | i ∈ K} of documents related to the input search query Q and a degree of relevance {S i | i ∈ K}. Can be done. At this time, the search device 10 according to the present embodiment uses the pseudo-sparse feature amount U'of the search query Q and the translocation index generated in advance by the translocation index generation device 20 to reduce the document amount of the search target document. It is possible to obtain a related document and its relevance in consideration of the context of the search query Q and the entire search target document after satisfying the search speed required for the document search without depending on the order.
 ・転置インデックス生成時
 次に、転置インデックス生成装置20により転置インデックスを生成する場合について説明する。ここで、転置インデックス生成装置20は、検索対象文書の集合{D,・・・,D}を入力し、転置インデックスを出力するものとする。
-At the time of generating an inverted index Next, a case where an inverted index is generated by the inverted index generator 20 will be described. Here, it is assumed that the inverted index generation device 20 inputs a set of documents to be searched {D 1 , ..., D m } and outputs an inverted index.
 <転置インデックス生成装置20の全体構成>
 本実施形態に係る転置インデックス生成装置20の全体構成について、図3を参照しながら説明する。図3は、第一の実施形態に係る転置インデックス生成装置20の全体構成の一例を示す図である。
<Overall configuration of inverted index generator 20>
The overall configuration of the inverted index generation device 20 according to the present embodiment will be described with reference to FIG. FIG. 3 is a diagram showing an example of the overall configuration of the inverted index generation device 20 according to the first embodiment.
 図3に示すように、本実施形態に係る転置インデックス生成装置20は、文脈符号化部101と、疑似スパース符号化部102と、転置インデックス生成部105とを有する。ここで、文脈符号化部101及び疑似スパース符号化部102は、上記の検索時で説明した文脈符号化部101及び疑似スパース符号化部102と同一のニューラルネットワークで実現されており、そのモデルパラメータは予め学習済みであるものとする。 As shown in FIG. 3, the inverted index generation device 20 according to the present embodiment has a context coding unit 101, a pseudo sparse coding unit 102, and an inverted index generation unit 105. Here, the context coding unit 101 and the pseudo sparse coding unit 102 are realized by the same neural network as the context coding unit 101 and the pseudo sparse coding unit 102 described at the time of the above search, and their model parameters are realized. Is pre-learned.
 文脈符号化部101は、検索対象文書Dを入力として、学習済みモデルパラメータを用いて、この検索対象文書Dの特徴量Vを出力する。 The context coding unit 101 takes the search target document Di as an input and outputs the feature amount Vi of the search target document Di using the trained model parameters.
 疑似スパース符号化部102は、検索対象文書Dの特徴量Vを入力として、学習済みモデルパラメータを用いて、検索対象文書Dの疑似スパース特徴量V'を出力する。 The pseudo-sparse coding unit 102 inputs the feature amount V i of the search target document Di and outputs the pseudo sparse feature amount V'i of the search target document Di using the trained model parameters.
 転置インデックス生成部105は、各検索対象文書D(i=1,・・・,m)の疑似スパース特徴量の集合{V',・・・,V'}を入力として、転置インデックスを生成及び出力する。上述したように、転置インデックスは、疑似スパース特徴量の次元のインデックス又は次元番号をキーとし、キーrに関してC={(i,v'ir)|v'ir∈Wi∈{1,・・・,m}で表される集合をバリューとして設定した情報である。したがって、転置インデックス生成部105は、各疑似スパース特徴量V'(i=1,・・・,m)の各要素v'ir(r=1,・・・,d')が{v'1r,v'2r,・・・,v'mr}の上位t%に含まれるか否かを判定し、上位t%に含まれればキーがrであるバリューの集合Cに(i,v'ir)を追加することで、転置インデックスを生成する。なお、文書検索の際の検索速度は転置インデックスのバリューの集合Cの要素数(つまり、バリューの数)に応じて決まるが、この要素数は閾値tの値によって調整することができる。したがって、プロセッサ等の演算速度が既知であれば、tの値を調整することで、文書検索に要求される検索時間を満たすように検索速度(言い換えれば、検索量)を調整することが可能となる。 The inverted index generation unit 105 inputs the set { V'1 , ..., V'm } of the pseudo-sparse features of each search target document Di ( i = 1, ..., M) as an inverted index. Is generated and output. As described above, the inverted index uses the index or dimension number of the dimension of the pseudo-sparse feature as a key, and for the key r , C r = {(i, v'ir ) | v'ir ∈ Wr} i ∈ {1 , ..., m} is the information set as the value. Therefore, in the inverted index generation unit 105, each element v'ir (r = 1, ..., D') of each pseudo-sparse feature quantity V'i ( i = 1, ..., M) is {v'. It is determined whether or not it is included in the upper t% of 1r , v'2r , ..., V'mr }, and if it is included in the upper t%, it is included in the set Cr of values whose key is r (i, v). 'By adding ir ), an inverted index is generated. The search speed at the time of document retrieval is determined according to the number of elements (that is, the number of values) of the set Cr of the values of the inverted index, and the number of elements can be adjusted by the value of the threshold value t . Therefore, if the calculation speed of the processor or the like is known, it is possible to adjust the search speed (in other words, the search amount) so as to satisfy the search time required for document retrieval by adjusting the value of t. Become.
 <転置インデックス生成処理>
 入力された検索対象文書の集合{D,・・・,D}から転置インデックスを生成するための転置インデックス生成処理について、図4を参照しながら説明する。図4は、第一の実施形態に係る転置インデックス生成処理の一例を示すフローチャートである。なお、転置インデックス生成処理は、後述する学習処理が終了した後、上述した検索処理を実行する前に実行される。
<Inverted index generation process>
The inverted index generation process for generating an inverted index from the input set of search target documents {D 1 , ..., D m } will be described with reference to FIG. FIG. 4 is a flowchart showing an example of the inverted index generation process according to the first embodiment. The inverted index generation process is executed after the learning process described later is completed and before the search process described above is executed.
 ステップS201:まず、文脈符号化部101は、検索対象文書Dを入力として、学習済みモデルパラメータを用いて、この検索対象文書Dの特徴量Vを出力する。 Step S201: First, the context coding unit 101 inputs the search target document Di and outputs the feature amount Vi of the search target document Di using the trained model parameters.
 ステップS202:次に、疑似スパース符号化部102は、検索対象文書Dの特徴量Vを入力として、学習済みモデルパラメータを用いて、検索対象文書Dの疑似スパース特徴量V'を出力する。 Step S202: Next, the pseudo-sparse coding unit 102 inputs the feature amount Vi of the search target document Di and uses the trained model parameters to generate the pseudo-sparse feature amount V'i of the search target document Di. Output.
 上記のステップS201~ステップS202は、全ての検索対象文書D(i=1,・・・,m)に対して繰り返し実行される。 The above steps S201 to S202 are repeatedly executed for all the search target documents Di ( i = 1, ..., M).
 ステップS203:そして、転置インデックス生成部105は、各検索対象文書D(i=1,・・・,m)の疑似スパース特徴量の集合{V',・・・,V'}を入力として、転置インデックスを生成及び出力する。 Step S203: Then, the inverted index generation unit 105 sets { V'1 , ..., V'm } of pseudo-sparse features of each search target document Di ( i = 1, ..., M). Generates and outputs an inverted index as input.
 以上により、本実施形態に係る転置インデックス生成装置20は、入力された検索対象文書の集合{D,・・・,D}から転置インデックスを生成することができる。上述したように、この転置インデックスを用いることで、検索装置10は、検索対象文書の文書量のオーダーに依存せずに、文書検索に要求される検索速度を満たした上で、検索クエリQと検索対象文書全体の文脈を考慮した関連文書とその関連度とを得ることができる(すなわち、検索クエリQに関連する文書を検索することができる。)。 As described above, the inverted index generation device 20 according to the present embodiment can generate an inverted index from the set of input search target documents {D 1 , ..., D m }. As described above, by using this transposed index, the search device 10 satisfies the search speed required for the document search without depending on the order of the document amount of the search target document, and then performs the search query Q and the search query Q. It is possible to obtain a related document considering the context of the entire search target document and its degree of relevance (that is, a document related to the search query Q can be searched).
 ・学習時
 次に、学習装置30によりニューラルネットワーク(文脈符号化部101及び疑似スパース符号化部102を実現するニューラルネットワーク)の学習を行う場合について説明する。ここで、学習時ではモデルパラメータは学習済みでないものとして、学習装置30は、訓練データセットを入力し、このモデルパラメータを学習するものとする。訓練データセットはモデルパラメータの学習(訓練)に用いられる訓練データの集合のことである。
-During learning Next, a case where the neural network (neural network that realizes the context coding unit 101 and the pseudo-sparse coding unit 102) is learned by the learning device 30 will be described. Here, it is assumed that the model parameters have not been learned at the time of learning, and the learning device 30 inputs the training data set and learns the model parameters. A training data set is a set of training data used for training (training) model parameters.
 本実施形態では、例えば、参考文献3「Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, Tong Wang. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. arXiv preprint arXiv: 1611.09268, 2018.」に記載されているデータセットから訓練データセットを予め作成しておくものとする。 In this embodiment, for example, Reference 3 “Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen A training dataset shall be created in advance from the dataset described in "Saurabh Tiwary, Tong Wang. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. ArXiv preprint arXiv: 1611.09268, 2018.".
 上記の参考文献3に記載されているデータセットは、検索クエリ集合R={Q,・・・,Q}と検索対象文書の集合G={D,・・・,Dm'}とで構成される。cは検索クエリ数、m'は検索対象文書数である。なお、m'=mであってもよいし、m'≠mであってもよい。ただし、m'≧mであることが好ましい。 The data set described in Reference 3 above includes a search query set R = {Q 1 , ..., Q c } and a set of search target documents G = {D 1 , ..., D m' }. Consists of. c is the number of search queries, and m'is the number of documents to be searched. In addition, m'= m may be used, or m'≠ m. However, it is preferable that m'≧ m.
 また、検索クエリQ(i=1,・・・,c)に対して、この検索クエリQに関連する文書の集合G={D|DはQに関連する文書}が正解データとしてラベル付けされているものとする。 Further, for the search query Q i ( i = 1, ..., C), a set of documents related to this search query Q i = {D j | D j is a document related to Q i }. It shall be labeled as correct answer data.
 このとき、検索クエリQに関連する文書の集合Gからランダムに抽出した1つの文書をD 、検索クエリQに関連しない文書の集合G\Gからランダムに抽出した1つの文書をD として、(Q,D ,D )を訓練データとする(つまり、検索クエリQとその正例とその負例とで構成されるデータを訓練データとする。)。そして、これらの訓練データの集合{(Q,D ,D )|i=1,・・・,c}を訓練データセットとする。 At this time, one document randomly extracted from the set of documents G i related to the search query Q i is D i + , and one document randomly extracted from the set G \ G i of the documents not related to the search query Q i . Let Di- , and let (Qi, Di +, Di- ) be the training data (that is, let the data composed of the search query Qi , its positive example, and its negative example be the training data. ). Then, a set of these training data {(Q i , Di + , Di ) | i = 1, ..., C} is used as a training data set.
 <学習装置30の全体構成>
 本実施形態に係る学習装置30の全体構成について、図5を参照しながら説明する。図5は、第一の実施形態に係る学習装置30の全体構成の一例を示す図である。
<Overall configuration of learning device 30>
The overall configuration of the learning device 30 according to the present embodiment will be described with reference to FIG. FIG. 5 is a diagram showing an example of the overall configuration of the learning device 30 according to the first embodiment.
 図5に示すように、本実施形態に係る学習装置30は、文脈符号化部101と、疑似スパース符号化部102と、ランキング部104と、分割部106と、更新部107と、判定部108とを有する。ここで、文脈符号化部101及び疑似スパース符号化部102は、上記の検索時及び転置インデックス生成時で説明した文脈符号化部101及び疑似スパース符号化部102と同一のニューラルネットワークで実現されているが、そのモデルパラメータは学習済みでないものとする。 As shown in FIG. 5, the learning device 30 according to the present embodiment has a context coding unit 101, a pseudo sparse coding unit 102, a ranking unit 104, a division unit 106, an update unit 107, and a determination unit 108. And have. Here, the context coding unit 101 and the pseudo-sparse coding unit 102 are realized by the same neural network as the context coding unit 101 and the pseudo-sparse coding unit 102 described in the above-mentioned search time and inverted index generation time. However, it is assumed that the model parameters have not been trained.
 分割部106は、訓練データセットを入力として、この訓練データセットを複数のミニバッチにランダムに分割する。本実施形態では、ミニバッチ毎にモデルパラメータが繰り返し更新(学習)されるものとする。 The division unit 106 takes the training data set as an input and randomly divides this training data set into a plurality of mini-batch. In this embodiment, it is assumed that the model parameters are repeatedly updated (learned) for each mini-batch.
 判定部108は、モデルパラメータの繰り返し更新を終了するための終了条件を満たしたか否かを判定する。なお、一つの訓練データを何回繰り返して学習させるかをエポックといい、その繰り返し数のことはエポック数という。 The determination unit 108 determines whether or not the end condition for ending the repeated update of the model parameter is satisfied. The number of times one training data is repeatedly learned is called an epoch, and the number of repetitions is called the number of epochs.
 文脈符号化部101は、訓練データ(Q,D ,D )を入力として、学習済みでないモデルパラメータを用いて、この訓練データ(Q,D ,D )の特徴量(U,V ,V )を出力する。すなわち、文脈符号化部101は、検索クエリQ、正例D 及び負例D を入力として、それぞれの特徴量U、V 及びV を出力する。 The context coding unit 101 takes the training data (Q i , Di + , Di ) as an input, and uses the untrained model parameters to generate the training data (Q i , Di + , Di ). The feature quantity (U i , V i + , V i- ) is output. That is, the context coding unit 101 takes the search query Q i , the positive example Di + , and the negative example Di as inputs, and outputs the respective feature quantities U i , V i + , and V i .
 疑似スパース符号化部102は、訓練データ(Q,D ,D )の特徴量(U,V ,V )を入力として、学習済みでないモデルパラメータを用いて、訓練データ(Q,D ,D )の疑似スパース特徴量(U',V' ,V' )を得る。すなわち、疑似スパース符号化部102は、特徴量U、V 及びV を入力として、それぞれの疑似スパース特徴量U'、V' 及びV' を得る。なお、U'=(u'i1,u'i2,・・・,u'id')、V' =(v' i1,v' i2,・・・,v' id')及びV' =(v' i1,v' i2,・・・,v' id')とする。 The pseudo-sparse coding unit 102 takes the feature quantities (U i , V i + , V i- ) of the training data (Q i , Di + , Di ) as inputs, and uses the model parameters that have not been trained. Obtain pseudo - sparse features ( U'i , V'i + , V'i- ) of training data (Qi, Di + , Di- ) . That is, the pseudo-sparse coding unit 102 takes the feature quantities U i , V i + , and V i as inputs, and obtains the respective pseudo sparse feature quantities U'i , V'i + , and V'i . U'i = (u'i1, u'i2 , ..., u'id'), V'i + = (v'+ i1 , v ' + i2 , ... , v' + id') ) And V'i- = (v'-i1 , v' - i2 , ..., v' - id' ) .
 そして、疑似スパース符号化部102は、疑似スパース特徴量U'、V' 及びV' をそれぞれ学習用疑似スパース特徴量U''、V'' 及びV'' に変換する。 Then, the pseudo-sparse coding unit 102 sets the pseudo-sparse feature quantities U'i , V'i + and V'i - for learning pseudo-sparse feature quantities U''i , V''i + and V''i , respectively. Convert to- .
 ここで、疑似スパース特徴量U'(i=1,・・・,c)の集合の部分集合をZtr ={U'tr,1,U'tr,2,・・・,U'tr,m''},U'=U'tr,i=(u'i1,u'i2,・・・,u'id')とすると、疑似スパース符号化部102は、以下の式(2)により疑似スパース特徴量U'を学習用疑似スパース特徴量U''に変換する。 Here, the subset of the set of pseudo-sparse features U'i ( i = 1, ..., C) is Z tr 1 = { U'tr, 1 , U'tr, 2 , ..., U'. If tr, m'' }, U'i = U'tr, i = ( u'i1 , u'i2 , ..., u'id' ), the pseudo-sparse coding unit 102 uses the following equation ( 2) Converts the pseudo-sparse feature quantity U'i into the pseudo-sparse feature quantity U''i for learning.
Figure JPOXMLDOC01-appb-M000002
 W' は、Ztr に含まれる各疑似スパース特徴量U'tr,iのr次元目の要素を集めた集合{u'1r,u'2r,・・・,u'm''r}について、その値の降順に上位tque%を集めた部分集合である。このことは、その値が大きい要素のみを学習に利用することを意味する。また、m''はm''≦cを満たす任意の自然数であるが、例えば、ミニバッチに含まれる訓練データ数である。なお、tqueは予め設定された閾値(ただし、0<tque≦100)である。
Figure JPOXMLDOC01-appb-M000002
W'1r is a set { u'1r , u'2r , ..., u'm ' ' that collects the elements of the r -th dimension of each pseudo-sparse feature quantity U'tr, i contained in Z tr 1 . For r }, it is a subset of the top t que % in descending order of its value. This means that only the elements with large values are used for learning. Further, m'' is an arbitrary natural number satisfying m''≤c, and is, for example, the number of training data included in the mini-batch. Note that t que is a preset threshold value (however, 0 <t que ≦ 100).
 同様に、疑似スパース特徴量V' (i=1,・・・,c)の集合の部分集合をZtr ={V' tr,1,V' tr,2,・・・,V' tr,m''},V' =V' tr,i=(v' i1,v' i2,・・・,v' id')とすると、疑似スパース符号化部102は、以下の式(3)により疑似スパース特徴量V' を学習用疑似スパース特徴量V'' に変換する。 Similarly, a subset of the set of pseudo-sparse features V'i + ( i = 1, ..., C) is Z tr 2 = {V' + tr, 1 , V' + tr, 2 , ... , V' + tr, m'' }, V' + i = V' + tr, i = (v' + i1 , v' + i2 , ..., v' + id' ) The conversion unit 102 converts the pseudo-sparse feature amount V' + i for learning pseudo-sparse feature amount V'' + i by the following equation (3).
Figure JPOXMLDOC01-appb-M000003
 W' は、Ztr に含まれる各疑似スパース特徴量V' tr,iのr次元目の要素を集めた集合{v' 1r,v' 2r,・・・,v' m''r}について、その値の降順に上位tpas%を集めた部分集合である。このことは、その値が大きい要素のみを学習に利用することを意味する。なお、tpasは予め設定された閾値(ただし、0<tpas≦100)であるが、tqueとtpasは同一の値であってもよいし、異なる値であってもよい。また、tque=100かつtpas=100であってもよいが、この場合は通常の学習と等価となる。又は、tqueとtpasの何れか一方を100としてもよく、この場合、tpas=100とすると良い結果が得られることが実験的にわかっている。
Figure JPOXMLDOC01-appb-M000003
W'2r is a set {v' + 1r , v' + 2r , ..., V'that collects the elements of the r -th dimension of each pseudo-sparse feature quantity V' + tr, i contained in Z tr 2 . For + m''r }, it is a subset of the top tpas % in descending order of the value. This means that only the elements with large values are used for learning. Although t pas is a preset threshold value (however, 0 <t pas ≦ 100), t que and t pas may have the same value or different values. Further, t que = 100 and t pas = 100 may be set, but in this case, it is equivalent to normal learning. Alternatively, either one of t que and t pas may be set to 100, and in this case, it is experimentally known that good results can be obtained by setting t pas = 100.
 同様に、疑似スパース特徴量V' (i=1,・・・,c)の集合の部分集合をZtr ={V' tr,1,V' tr,2,・・・,V' tr,m''},V' =V' tr,i=(v' i1,v' i2,・・・,v' id')とすると、疑似スパース符号化部102は、以下の式(4)により疑似スパース特徴量V' を学習用疑似スパース特徴量V'' に変換する。 Similarly, a subset of the set of pseudo-sparse features V'i- ( i = 1, ..., C) is Z tr 3 = {V' - tr, 1 , V' - tr, 2 , ... , V' - tr, m'' }, V' - i = V' - tr, i = (v' - i1 , v' - i2 , ..., v' - id' ) The conversion unit 102 converts the pseudo-sparse feature amount V' -i into the learning pseudo-sparse feature amount V''- i by the following equation (4).
Figure JPOXMLDOC01-appb-M000004
 W' は、Ztr に含まれる各疑似スパース特徴量V' tr,iのr次元目の要素を集めた集合{v' 1r,v' 2r,・・・,v' m''r}について、その値の降順に上位tpas%を集めた部分集合である。
Figure JPOXMLDOC01-appb-M000004
W'3r is a set {v' - 1r , v' - 2r , ..., V'that collects the elements of the r - th dimension of each pseudo-sparse feature quantity V' - tr, i contained in Z tr 3 . -For m''r } , it is a subset of the top tpas % in descending order of the value.
 なお、上記の部分集合Ztr の各要素は同じモデルパラメータで得られた疑似スパース特徴量であることが好ましく、例えば、同じミニバッチ内で得られた疑似スパース特徴量の集合等で実現可能である。このことは、Ztr 及びZtr についても同様である。ただし、例えば、tque<(1/m'')×100やtpas<(1/m'')×100となった場合、上位tque%を集めた部分集合や上位tpas%を集めた部分集合は空集合となり得るため、これを避けるためにtque>(1/m'')×100かつtpas>(1/m'')×100である必要がある。更に、出力値のノルムを1としている場合(つまり、疑似スパース符号化部102を実現するニューラルネットワークの最終層の発火関数の出力値を半径1の超球面上へ射影している場合)、tque>(2/m'')×100かつtpas>(2/m'')×100を満たす必要がある。このため、例えば、同じモデルパラメータで、これを満たす大きさの疑似スパース特徴量の部分集合を得ることが困難な場合は、学習係数が大きくない場合に限り、直近の一定の学習ステップ間で得られた疑似スパース特徴量を当該部分集合に加えてもよい。 It is preferable that each element of the above subset Z tr 1 is a pseudo-sparse feature obtained with the same model parameters, and can be realized by, for example, a set of pseudo-sparse features obtained in the same mini-batch. be. This also applies to Z tr 2 and Z tr 3 . However, for example, when t que <(1 / m'') × 100 or t pas <(1 / m'') × 100, a subset in which the upper t que % is collected or the upper t pas % is collected. Since the subset can be an empty set, it is necessary to have t que > (1 / m'') × 100 and t pas > (1 / m'') × 100 in order to avoid this. Further, when the norm of the output value is 1 (that is, when the output value of the firing function of the final layer of the neural network that realizes the pseudo-sparse coding unit 102 is projected onto a hypersphere having a radius of 1), t. It is necessary to satisfy que > (2 / m'') × 100 and t pas > (2 / m'') × 100. Therefore, for example, when it is difficult to obtain a subset of pseudo-sparse features having the same model parameters and having a magnitude satisfying the same, it can be obtained between the latest fixed learning steps only when the learning coefficient is not large. The pseudo-sparse features obtained may be added to the subset.
 また、直近の一定の学習ステップ間で得られた疑似スパース特徴量を部分集合に加える場合に、メモリに計算グラフが保存できないときは、過去の学習ステップで得られた疑似スパース特徴量は上位tque%(又はtpas%)を計算するためだけに使用すればよい。 In addition, when the pseudo-sparse features obtained between the most recent fixed learning steps are added to the subset and the calculation graph cannot be saved in the memory, the pseudo-sparse features obtained in the past learning steps are in the upper t. It may only be used to calculate que % (or tpas %).
 更に、本実施形態では、訓練データセットをミニバッチ単位に分割して、ミニバッチ毎にモデルパラメータを繰り返し学習する場合(つまり、ミニバッチ学習)について説明するが、必ずしもミニバッチ学習である必要はなく、オンライン学習やバッチ学習等の他の任意の学習手法でモデルパラメータが学習されてもよい。ただし、上述したように、疑似スパース特徴量の部分集合が重要となるため、ミニバッチ学習でモデルパラメータを学習することが好ましい。 Further, in the present embodiment, a case where the training data set is divided into mini-batch units and model parameters are repeatedly learned for each mini-batch (that is, mini-batch learning) will be described, but it is not always necessary to use mini-batch learning, and online learning is performed. The model parameters may be learned by any other learning method such as batch learning or batch learning. However, as described above, since a subset of pseudo-sparse features is important, it is preferable to learn model parameters by mini-batch learning.
 ランキング部104は、学習用疑似スパース特徴量U''、V'' 及びV'' を入力として、検索クエリQに対する正例D の関連度S と検索クエリQに対する負例D の関連度S とを出力する。ここで、関連度S 及びS は、上記の検索時で説明した類似度関数sを用いて、それぞれS =s(U'',V'' )及びS =s(U'',V'' )で計算される。 The ranking unit 104 inputs the pseudo-sparse feature quantities U''i , V''i +, and V''i-for learning, and the relevance degree S i + of the regular example D i + to the search query Q i and the search query. Outputs the degree of association S i - of the negative example D i - with respect to Q i . Here, the relevance S i + and S i are S i + = s ( U''i, V''i +) and S i , respectively , using the similarity function s described at the time of the above search. - = S ( U''i , V''i- ) is calculated.
 更新部107は、関連度S 及びS を入力として、教師あり学習の手法によりモデルパラメータを更新する。ここで、教師あり学習の誤差関数としては、ランキング学習における誤差関数を用いればよい。 The update unit 107 updates the model parameters by the supervised learning method with the relevance degrees S i + and S i as inputs. Here, as the error function of supervised learning, the error function in ranking learning may be used.
 より具体的には、上記の非特許文献1に記載されているhinge loss(つまり、非特許文献1に記載されている式(3))を用いればよい。hinge lossは、任意に設定されるパラメータεを用いて、hinge loss=max{0,ε-(S -S )}で表される。 More specifically, the hinge loss described in Non-Patent Document 1 (that is, the formula (3) described in Non-Patent Document 1) may be used. Hinge loss is expressed by hinge loss = max {0, ε- (S i + -S i- ) } using an arbitrarily set parameter ε.
 <学習処理>
 入力された訓練データセットからモデルパラメータを学習するための学習処理について、図6を参照しながら説明する。図6は、第一の実施形態に係る学習処理の一例を示すフローチャートである。なお、モデルパラメータは適当な値で初期化されているものとする。
<Learning process>
The learning process for learning the model parameters from the input training data set will be described with reference to FIG. FIG. 6 is a flowchart showing an example of the learning process according to the first embodiment. It is assumed that the model parameters are initialized with appropriate values.
 ステップS301:まず、分割部106は、訓練データセットを入力として、この訓練データセットを複数のミニバッチにランダムに分割する。 Step S301: First, the division unit 106 takes the training data set as an input and randomly divides this training data set into a plurality of mini-batch.
 ステップS302:次に、学習装置30は、各ミニバッチに対して、モデルパラメータの更新処理を実行する。これにより、モデルパラメータ更新処理によってモデルパラメータが更新される。モデルパラメータ更新処理の詳細については後述する。なお、このモデルパラメータ更新処理は学習ステップとも呼ばれる。 Step S302: Next, the learning device 30 executes model parameter update processing for each mini-batch. As a result, the model parameters are updated by the model parameter update process. Details of the model parameter update process will be described later. This model parameter update process is also called a learning step.
 ステップS303:そして、判定部108は、所定の終了条件を満たしたか否かを判定する。学習装置30は、終了条件を満たしたと判定された場合(ステップS303でYES)は学習処理を終了する一方で、終了条件を満たしていないと判定された場合(ステップS303でNO)はステップS301に戻る。これにより、所定の終了条件を満たすまで、ステップS301~ステップS302が繰り返し実行される。 Step S303: Then, the determination unit 108 determines whether or not the predetermined end condition is satisfied. The learning device 30 ends the learning process when it is determined that the end condition is satisfied (YES in step S303), while the learning device 30 ends the learning process when it is determined that the end condition is not satisfied (NO in step S303). return. As a result, steps S301 to S302 are repeatedly executed until a predetermined end condition is satisfied.
 なお、所定の終了条件としては、例えば、エポック数が所定の第1の閾値以上となったこと、誤差関数が収束したこと(例えば、誤差関数の値が所定の第2の閾値未満となったこと、モデルパラメータの更新の前後で誤差関数の変化量が所定の第3の閾値未満となったこと等)等が挙げられる。 The predetermined end conditions are, for example, that the number of epochs is equal to or greater than the predetermined first threshold value and that the error function has converged (for example, the value of the error function is less than the predetermined second threshold value). That, the amount of change in the error function before and after updating the model parameters is less than the predetermined third threshold value, etc.).
 <モデルパラメータ更新処理>
 上記のステップS302のモデルパラメータ更新処理について、図7を参照しながら説明する。図7は、第一の実施形態に係るモデルパラメータ更新処理の一例を示すフローチャートである。なお、以降では、或るミニバッチを用いてモデルパラメータを更新する場合について説明する。
<Model parameter update process>
The model parameter update process in step S302 will be described with reference to FIG. 7. FIG. 7 is a flowchart showing an example of the model parameter update process according to the first embodiment. In the following, a case where model parameters are updated using a certain mini-batch will be described.
 ステップS401:まず、文脈符号化部101は、当該ミニバッチ中の訓練データ(Q,D ,D )を入力として、学習済みでないモデルパラメータを用いて、この訓練データ(Q,D ,D )の特徴量(U,V ,V )を出力する。 Step S401: First, the context coding unit 101 takes the training data (Q i , Di + , Di ) in the mini-batch as an input, and uses the untrained model parameters to use the training data (Q i ,). Outputs the features (U i , V i + , V i- ) of D i + , D i- ) .
 ステップS402:次に、疑似スパース符号化部102は、当該訓練データ(Q,D ,D )の特徴量(U,V ,V )を入力として、学習済みでないモデルパラメータを用いて、当該訓練データ(Q,D ,D )の疑似スパース特徴量(U',V' ,V' )を得る。 Step S402: Next, the pseudo-sparse coding unit 102 has been trained by inputting the features (U i , Vi + , Vi ) of the training data (Q i , Di + , Di ) . Pseudo - sparse features ( U'i , V'i + , V'i- ) of the training data (Qi, Di + , Di- ) are obtained using model parameters that are not.
 ステップS403:次に、疑似スパース符号化部102は、疑似スパース特徴量(U',V' ,V' )を学習用疑似スパース特徴量(U'',V'' ,V'' )に変換し、学習用疑似スパース特徴量(U'',V'' ,V'' )を出力する。 Step S403: Next, the pseudo-sparse coding unit 102 uses the pseudo-sparse features ( U'i , V'i + , V'i- ) for learning pseudo - sparse features ( U''i , V''i . It is converted to + , V''i- ) and the learning pseudosparse features ( U''i , V''i + , V''i- ) are output.
 ステップS404:次に、ランキング部104は、学習用疑似スパース特徴量(U'',V'' ,V'' )を入力として、検索クエリQに対する正例D の関連度S と検索クエリQに対する負例D の関連度S とを出力する。 Step S404: Next, the ranking unit 104 inputs the learning pseudo-sparse features ( U''i , V''i + , V''i ), and inputs the normal example D i + to the search query Q i . The relevance S i + and the relevance S i - of the negative example Di - for the search query Q i are output.
 上記のステップS401~ステップS404は、当該ミニバッチに含まれる全ての訓練データ(Q,D ,D )に対して繰り返し実行される。 The above steps S401 to S404 are repeatedly executed for all the training data ( Qi , Di + , Di ) included in the mini-batch.
 ステップS405:続いて、更新部107は、上記のステップS404で得られた各関連度S 及びS を入力として、誤差関数(例えば、hinge loss)の値とモデルパラメータに関する誤差関数の勾配とを計算する。なお、モデルパラメータに関する誤差関数の勾配は、例えば、誤差逆伝播法により計算すればよい。 Step S405: Subsequently, the update unit 107 takes the respective relevance S i + and S i- obtained in the above step S404 as inputs, and inputs the value of the error function (for example, hinge loss) and the error function regarding the model parameter. Calculate with the gradient. The gradient of the error function related to the model parameters may be calculated by, for example, the error back propagation method.
 ステップS406:そして、更新部107は、上記のステップS405で計算した誤差関数の値とその勾配とを用いて、任意の最適化手法によりモデルパラメータを更新する。 Step S406: Then, the update unit 107 updates the model parameters by an arbitrary optimization method using the value of the error function calculated in step S405 above and its gradient.
 以上により、本実施形態に係る学習装置30は、入力された訓練データセットを用いて、文脈符号化部101及び疑似スパース符号化部102を実現するニューラルネットワークのモデルパラメータを学習することができる。このとき、本実施形態に係る学習装置30は、疑似スパース特徴量の各要素のうち、その値が上位tque又はtpas%に含まれない要素を0とすることで、疑似スパース特徴量を学習用疑似スパース特徴量に変換した上で、この学習用疑似スパース特徴量を用いてモデルパラメータの学習を行う(言い換えれば、疑似スパース特徴量の各要素のうち、その値が大きい要素のみを学習に利用する。)。これにより、文書検索の際に疑似的にスパースとみなしもよい特徴量を安定的に獲得することが可能となる。なお、以降では、この学習手法を「top t学習」とも称する。 As described above, the learning device 30 according to the present embodiment can learn the model parameters of the neural network that realizes the context coding unit 101 and the pseudo-sparse coding unit 102 by using the input training data set. At this time, the learning device 30 according to the present embodiment sets the pseudo-sparse feature amount to 0 by setting 0 as an element whose value is not included in the upper t que or t pas % among the elements of the pseudo-sparse feature amount. After converting to a pseudo-sparse feature for learning, model parameters are learned using this pseudo-sparse feature for learning (in other words, only the element with a large value among the elements of the pseudo-sparse feature is learned. Used for.). This makes it possible to stably acquire features that can be regarded as sparse in a pseudo manner when searching a document. Hereinafter, this learning method will also be referred to as "top t learning".
 <評価実験>
 次に、本実施形態の手法(以下、「提案手法」という。)を評価するための評価実験について説明する。
<Evaluation experiment>
Next, an evaluation experiment for evaluating the method of the present embodiment (hereinafter referred to as “proposal method”) will be described.
  ≪データセット≫
 上記の参考文献3に記載されているMS MARCOのPassage and Document Retrievalタスクにより実験を行った。本タスクではReRankingとFull RankingがPassage単位、Document単位それぞれで存在しており、本実験ではPassage Rankingを用いて評価を行った。Passage Rankingのデータの詳細は以下の表1に示す。
≪Data set≫
Experiments were performed using the MS MARCO Passage and Document Retrieval task described in reference 3 above. In this task, ReRanking and Full Ranking exist in Passage unit and Document unit respectively, and in this experiment, evaluation was performed using Passage Ranking. Details of the Passage Ranking data are shown in Table 1 below.
Figure JPOXMLDOC01-appb-T000005


 ReRankingタスクでは予めBM25を用いて絞り込まれた上位1000件のpassageが与えられているが、Full Rankingでは約880万件の検索対象文書から検索を行うことが求められる。ランキング指標としてはMean Reciprocal Rank(MRR)が用いられている。なお、MRRの詳細については、例えば、参考文献4「Craswell, N.: Mean Reciprocal Rank, in Encyclopedia of Database Systems, p. 1703 (2009)」等を参照されたい。
Figure JPOXMLDOC01-appb-T000005


In the ReRanking task, the top 1000 passages narrowed down using BM25 are given in advance, but in the Full Ranking, it is required to search from about 8.8 million documents to be searched. Mean Reciprocal Rank (MRR) is used as a ranking index. For details of MRR, refer to Reference 4 “Craswell, N .: Mean Reciprocal Rank, in Encyclopedia of Database Systems, p. 1703 (2009)” and the like.
  ≪学習時の設定≫
 Train Triples Smallセットを用いて、バッチサイズ(ミニバッチに含まれる訓練データ数)100、エポック数を1として、4枚のGPU(Graphics Processing Unit)により学習を行った。最適化手法にはAdam(Adaptive Moment Estimation)を用い、β=0.9、β=0.999、ε=10-8とした。なお、Adamの詳細については、例えば、参考文献5「Kingma, D. P. and Ba, J.: Adam: A Method for Stochastic Optimization, in ICLR (2015)」等を参照されたい。
≪Settings during learning≫
Using the Train Triples Small set, learning was performed using four GPUs (Graphics Processing Units) with a batch size (the number of training data included in the mini-batch) of 100 and an epoch number of 1. Adam (Adaptive Moment Estimation) was used as the optimization method, and β 1 = 0.9, β 2 = 0.999, and ε = 10-8 . For details of Adam, refer to Reference 5 “Kingma, D.P. and Ba, J .: Adam: A Method for Stochastic Optimization, in ICLR (2015)” and the like.
 また、モデルパラメータは0で初期化したバイアスを除き、正規分布N(0,0.02)に従って初期化した。学習率は5×10-5とし、最終ステップで0になるように線形に減衰させた。勾配は最大ノルム1でクリッピングした。BERTはbaseモデル(768次元)を利用し、2層の出力層の中間層の次元数は1000、最終層(つまり、疑似スパース特徴量)の次元数Dは30000とした。hinge lossのマージンεは1.0とした。トークナイズにはBERT Wordpieceトークナイザ(語彙数30K)を用いた。 The model parameters were initialized according to the normal distribution N (0, 0.02) except for the bias initialized at 0. The learning rate was set to 5 × 10-5 , and was linearly attenuated so as to be 0 in the final step. The gradient was clipped with a maximum norm of 1. BERT used the base model (768 dimensions), and the number of dimensions of the intermediate layer of the two output layers was 1000, and the number of dimensions D of the final layer (that is, the pseudo-sparse feature amount) was 30,000. The margin ε of hinge loss was 1.0. A BERT Wordpiece tokenizer (vocabulary number 30K) was used for tokenizing.
 top t学習のためのハイパーパラメータとしてtque=0.1%とした。また、バッチサイズ100とした場合、或る閾値をTとしてミニバッチ内で上位T%を計算すると閾値Tがミニバッチ内の訓練データに強く依存しすぎるため、計算を安定させるために現在の学習ステップを含め過去20ステップ分の訓練データの疑似スパース特徴量を保存しておき、これらを合わせて用いることで上位T%の計算を安定させた。ここで、現在の学習ステップを除く19ステップ分の疑似スパース特徴量については上位T%を計算する際の閾値Tの決定にのみ用い、計算グラフは保持せずパラメータの更新は行わないものとした。 As a hyperparameter for top t learning, t que = 0.1%. Further, when the batch size is 100, when the upper T% is calculated in the mini-batch with a certain threshold as T, the threshold T is too strongly dependent on the training data in the mini-batch, so the current learning step is used to stabilize the calculation. Pseudo-sparse feature quantities of the training data for the past 20 steps including this were saved, and by using these together, the calculation of the top T% was stabilized. Here, the pseudo-sparse features for 19 steps excluding the current learning step are used only for determining the threshold value T when calculating the upper T%, and the calculation graph is not retained and the parameters are not updated. ..
  ≪ランキング精度及び検索速度の評価≫
 以下、提案手法では、文書検索時に2段階検索を行っているものとする。つまり、提案手法では、1段階目で上位k件の関連文書集合{D,・・・,D}を出力した後、2段階目でt=100として上位k'件の最終的な関連文書集合を出力するものとする。
≪Evaluation of ranking accuracy and search speed≫
Hereinafter, in the proposed method, it is assumed that a two-step search is performed at the time of document retrieval. That is, in the proposed method, after the upper k related document set {D 1 , ..., D k } is output in the first stage, the final relation of the upper k'is set as t = 100 in the second stage. A set of documents shall be output.
 ・Passage Full Rankingにおいて提案手法は従来手法の精度を上回るか?
 提案手法と従来手法の検索精度の結果を以下の表2に示す。なお、提案手法では、k=1000,k'=10とした。
・ Does the proposed method exceed the accuracy of the conventional method in Passage Full Ranking?
The results of the search accuracy of the proposed method and the conventional method are shown in Table 2 below. In the proposed method, k = 1000 and k'= 10.
Figure JPOXMLDOC01-appb-T000006
 上記の表2に示すように、提案手法はMRR@10でBM25の精度を大きく上回っており、また検索時のtを上げると検索精度も上がることが確認できた。更に、訓練中のtque=0.1%の結果がtque=tpas=0.5%の結果よりも良かったことから、検索クエリの疑似スパース特徴量と検索対象文書の疑似スパース特徴量との両方に上位T%を計算する操作を行わなくても効果が得られることがわかった。
Figure JPOXMLDOC01-appb-T000006
As shown in Table 2 above, it was confirmed that the proposed method greatly exceeded the accuracy of BM25 in MRR @ 10, and that the search accuracy increased when t at the time of search was increased. Furthermore, since the result of t que = 0.1% during training was better than the result of t que = t pas = 0.5%, the pseudo-sparse feature of the search query and the pseudo-sparse feature of the search target document. It was found that the effect can be obtained without performing the operation of calculating the upper T% in both cases.
 ・提案手法は大量文書から高速に精度よく検索可能か?
 tque=0.1%とした場合における提案手法の平均検索速度の結果を以下の表3に示す。なお、1st top-Tは1段階目における関連文書の出力数、1st RTは1段階目における応答時間、2st RTは2段階目における応答時間である。
・ Is the proposed method able to search a large number of documents at high speed and with high accuracy?
Table 3 below shows the results of the average search speed of the proposed method when t que = 0.1%. The 1st top-T is the number of output of related documents in the first stage, the 1st RT is the response time in the first stage, and the 2st RT is the response time in the second stage.
Figure JPOXMLDOC01-appb-T000007
 上記の表3に示すように、t=0.1%では約880万件のpassageの検索にかかる総計算時間は0.4秒未満と非常に高速であり、提案手法は大量の文書から高速に精度良く検索可能であると言える。また、t=0.5%においては検索に約8.4秒かかっているが、その分精度は良い。この結果から、検索で許容できる時間からtを設定することにより、精度を最大限引き出すことが可能である。なお、2段階目の検索では疑似スパース特徴量を計算する必要がない(つまり、1段階目で計算された疑似スパース特徴量を流用すればよい)ため、2段階目では非常に高速に計算できることが確認できた。
Figure JPOXMLDOC01-appb-T000007
As shown in Table 3 above, at t = 0.1%, the total calculation time for searching about 8.8 million passages is extremely fast, less than 0.4 seconds, and the proposed method is fast from a large number of documents. It can be said that the search is possible with high accuracy. Further, at t = 0.5%, the search takes about 8.4 seconds, but the accuracy is good accordingly. From this result, it is possible to maximize the accuracy by setting t from the time allowed in the search. In addition, since it is not necessary to calculate the pseudo-sparse feature amount in the second stage search (that is, the pseudo-sparse feature amount calculated in the first stage can be diverted), it can be calculated at a very high speed in the second stage. Was confirmed.
 ・top t学習は高速化に貢献しているか?
 top t学習を用いた場合とtop t学習を用いなかった場合における検索対象文書の疑似スパース特徴量の比較結果を図8A及び図8Bに示す。図8Aは、学習済みモデルパラメータを用いて、全検索対象文書のうちの8万文書の疑似スパース特徴量を計算した結果である。ただし、非0要素がない次元は除いている。図8Aの左図(top t学習無し)では非0要素がない次元は30000次元中24769次元であった。また、80000個の疑似スパース特徴量の全てで0でない要素を持つものも多く、非常に偏った疑似スパース特徴量集合が得られていることがわかる。これに対して、図8Aの右図(top t学習有り)では非0要素がない次元は4438次元であり、top t学習無しと比較して非0要素数の分布に偏りが少ないことがわかる。この結果からtop t学習は偏った部分空間へ写像するモデルパラメータへの収束を抑制する効果があることがわかる。
・ Does top t learning contribute to speeding up?
8A and 8B show the comparison results of the pseudo-sparse features of the search target document when top t learning is used and when top t learning is not used. FIG. 8A is a result of calculating the pseudo-sparse features of 80,000 documents out of all the search target documents using the trained model parameters. However, dimensions without non-zero elements are excluded. In the left figure of FIG. 8A (without top learning), the dimension without non-zero elements was 24769 out of 30000 dimensions. Further, it can be seen that many of the 80,000 pseudo-sparse features have non-zero elements, and a very biased pseudo-sparse feature set is obtained. On the other hand, in the right figure of FIG. 8A (with top t learning), the dimension without non-zero elements is 4438 dimensions, and it can be seen that the distribution of the number of non-zero elements is less biased than that without top t learning. .. From this result, it can be seen that top learning has the effect of suppressing convergence to model parameters that map to a biased subspace.
 図8Bは、学習済みモデルパラメータを用いて、全検索対象文書の疑似スパース特徴量を計算し、t=0.1%で転置インデックスを作成した場合の各インデックス(つまり、各次元r)に格納されている文書量の頻度分布である。ただし、図8Aと同様に、非0要素がない次元は除いている。図8Bの左図(top t学習無し)に示すように、top t学習を用いずに疑似スパース特徴量を計算し転置インデックスを作成した場合、大部分の文書が転置インデックスのどのインデックスにも格納されていないことがわかる。一方で、図8Bの右図(top t学習有り)に示すように、top t学習を用いて疑似スパース特徴量を計算し転置インデックスを作成した場合、90%以上のインデックスに文書が格納されており、また1インデックスあたりの文書数が8900未満であることから、高速な検索が可能であることがわかる。以上により、top t学習は高速化に貢献していると言える。 FIG. 8B calculates the pseudo-sparse features of all the searched target documents using the trained model parameters, and stores them in each index (that is, each dimension r) when the inverted index is created at t = 0.1%. This is the frequency distribution of the amount of documents that have been created. However, as in FIG. 8A, the dimension having no non-zero element is excluded. As shown in the left figure of FIG. 8B (without top t learning), when a pseudo-sparse feature is calculated and an inverted index is created without using top t learning, most documents are stored in any index of the inverted index. You can see that it has not been done. On the other hand, as shown in the right figure of FIG. 8B (with top t learning), when a pseudo-sparse feature is calculated using top t learning and an inverted index is created, the document is stored in 90% or more of the indexes. Moreover, since the number of documents per index is less than 8900, it can be seen that high-speed search is possible. From the above, it can be said that top t learning contributes to speeding up.
  ≪評価実験のまとめ≫
 本評価実験では文脈を考慮可能なニューラルネットワークを利用して、MS MARCO 2.1のpassage rankingタスクにおいて884万文書の大規模文書からの検索を1クエリあたり0.4秒未満で行い、かつ、BM25を上回る性能を達成した。
≪Summary of evaluation experiment≫
In this evaluation experiment, using a neural network that can consider the context, a search from a large document of 8.84 million documents was performed in less than 0.4 seconds per query in the passage ranking task of MS MARCO 2.1, and BM25 was used. Achieved superior performance.
 従来のBM25をはじめとするキーワード検索モデルは転置インデックスの利用により大規模本書検索を可能とした一方で、文脈の考慮ができないという問題がある。また、従来技術であるSNRM(standalone neural ranking model)はニューラルネットワークでスパースベクトルを作り、転置インデックスを作成することで高速かつ高精度な検索モデルの実現を試みたが、L1ノルムによるスパース制約ではベクトル集合が特定の部分空間に偏るという問題がある。これらの問題に対して、提案手法では、top t学習を導入することにより、文脈を考慮可能な大規模ニューラル検索を実現した。 While the conventional keyword search model such as BM25 made it possible to search this book on a large scale by using an inverted index, there is a problem that the context cannot be taken into consideration. In addition, the conventional technique SNRM (standalone neural ranking model) tried to realize a high-speed and high-precision search model by creating a sparse vector with a neural network and creating an inverted index, but the sparse constraint by the L1 norm is a vector. There is a problem that the set is biased to a specific subspace. To solve these problems, the proposed method realized a large-scale neural search that can consider the context by introducing top learning.
 [第二の実施形態]
 次に、第二の実施形態について説明する。本実施形態では、検索クエリQや検索対象文書Dの特徴量をスパース化する際に、正規化及び平均シフトを行うことでスパース性を調整可能にする場合について説明する。
[Second embodiment]
Next, the second embodiment will be described. In this embodiment, a case where the sparsity can be adjusted by performing normalization and mean shift when the feature quantities of the search query Q and the search target document Di are sparsified will be described.
 なお、第二の実施形態では、主に、第一の実施形態との相違点について説明し、第一の実施形態と実質的に同様の構成要素についてはその説明を省略する。 In the second embodiment, the differences from the first embodiment will be mainly described, and the description of the components substantially the same as those of the first embodiment will be omitted.
 ・検索時
 まず、検索装置10により文書検索を行う場合について説明する。
-At the time of search First, a case where a document search is performed by the search device 10 will be described.
 <検索装置10の全体構成>
 本実施形態に係る検索装置10の全体構成について、図9を参照しながら説明する。図9は、第二の実施形態に係る検索装置10の全体構成の一例を示す図である。
<Overall configuration of search device 10>
The overall configuration of the search device 10 according to the present embodiment will be described with reference to FIG. FIG. 9 is a diagram showing an example of the overall configuration of the search device 10 according to the second embodiment.
 図9に示すように、本実施形態に係る検索装置10は、文脈符号化部101と、正規化スパース符号化部109と、転置インデックス活用部103と、ランキング部104とを有する。つまり、本実施形態に係る検索装置10は、疑似スパース符号化部102の代わりに、正規化スパース符号化部109を有する。なお、例えば、文脈符号化部101と正規化スパース符号化部109とをまとめて符号化部100Aとしてもよい。 As shown in FIG. 9, the search device 10 according to the present embodiment has a context coding unit 101, a normalized sparse coding unit 109, an inverted index utilization unit 103, and a ranking unit 104. That is, the search device 10 according to the present embodiment has a normalized sparse coding unit 109 instead of the pseudo sparse coding unit 102. For example, the context coding unit 101 and the normalized sparse coding unit 109 may be combined into the coding unit 100A.
 正規化スパース符号化部109は、検索クエリQの特徴量Uを入力として、学習済みモデルパラメータを用いて、検索クエリQの疑似スパース特徴量U'(この疑似スパース特徴量を、本実施形態では「正規化スパース特徴量」ともいう。)を出力する。 The normalized sparse coding unit 109 takes the feature amount U of the search query Q as an input, uses the trained model parameters, and uses the pseudo-sparse feature amount U'of the search query Q (this pseudo-sparse feature amount is used in the present embodiment. It is also called "normalized sparse feature quantity").
 ここで、正規化スパース符号化部109を実現するニューラルネットワークは、第一の実施形態で説明した疑似スパース符号化部102を実現するニューラルネットワークの最終層の発火関数を用いる前に正規化及び平均シフトを行うモデルである。 Here, the neural network that realizes the normalized sparse coding unit 109 is normalized and averaged before using the firing function of the final layer of the neural network that realizes the pseudo-sparse coding unit 102 described in the first embodiment. It is a model that shifts.
 具体的には、学習時の訓練データに含まれる検索クエリQの特徴量Uを正規化スパース符号化部109に入力した際に最終層の発火関数を用いる前の全結合層で出力されるd'次元ベクトルをx=(xi1,・・・,xij,・・・,xid')とする。また、これらの適当な部分集合をX={x,・・・,x,・・・xs',}とする。なお、s'は部分集合Xの要素数である。このような部分集合は任意に決定してよいが、各x(i=1,2,・・・)から一様にサンプリングされることが好ましい。例えば、学習時の或るミニバッチに含まれる訓練データから得られた各xを部分集合Xとすることが考えられる。 Specifically, when the feature quantity Ui of the search query Qi included in the training data at the time of learning is input to the normalized sparse coding unit 109, it is output in the fully connected layer before using the firing function of the final layer. Let the d'dimensional vector be x i = (x i1 , ..., x ij , ..., x id' ). Further, let X = {x 1 , ..., x i , ... x s' ,} as these appropriate subsets. Note that s'is the number of elements of the subset X. Such a subset may be determined arbitrarily, but it is preferable that the subsets are uniformly sampled from each xi (i = 1, 2, ...). For example, it is conceivable to use each xi obtained from the training data included in a certain mini-batch at the time of learning as a subset X.
 このとき、検索クエリQの特徴量Uを正規化スパース符号化部109に入力した際に最終層の発火関数を用いる前の全結合層で出力されるd'ベクトルをz=(z,・・・,z,・・・,zd')とすれば、検索クエリQの正規化スパース特徴量U'=(u',・・・,u',・・・,u'd')は、以下の式(5)により計算される。 At this time, when the feature quantity U of the search query Q is input to the normalized sparse coding unit 109, the d'vector output in the fully connected layer before using the firing function of the final layer is z = (z 1 , ...・ ・, Z j , ..., z d' ), the normalized sparse feature of the search query Q U'= (u' 1 , ..., u'j , ..., u'd ' ) Is calculated by the following equation (5).
Figure JPOXMLDOC01-appb-M000008
 ただし、μ及びσは予め設定されるハイパーパラメータである。なお、上記の式(5)における和の部分(つまり、(1/s')(x1j+・・・+xs'j))は学習時に予め計算しておくことができるため、その計算結果を用いてもよい。
Figure JPOXMLDOC01-appb-M000008
However, μ and σ are hyperparameters set in advance. Since the sum part (that is, (1 / s') (x 1j + ... + x s'j )) in the above equation (5) can be calculated in advance at the time of learning, the calculation result. May be used.
 このように、正規化スパース符号化部109を実現するニューラルネットワークの最終層で発火関数を用いる前の全結合層の出力を正規化すると共に、μを用いて平均をシフトすることにより、正規化スパース特徴量のスパース性(つまり、非ゼロ要素の割合)を調整することが可能となる。 In this way, the output of the fully connected layer before using the firing function in the final layer of the neural network that realizes the normalized sparse coding unit 109 is normalized, and the average is shifted using μ. It is possible to adjust the sparseness of the sparse feature (that is, the proportion of non-zero elements).
 例えば、μ=-3、σ=1とした場合、最終層の発火関数を用いる前の全結合層で出力されるベクトルの各要素の値がμ=-3、σ=1の正規分布に従っているとすれば、ReLu関数の出力が正の値を取る要素数は全体で0.3%程度となることが期待される。 For example, when μ = -3 and σ = 1, the value of each element of the vector output in the fully connected layer before using the ignition function of the final layer follows a normal distribution of μ = -3 and σ = 1. If so, the number of elements for which the output of the ReLu function takes a positive value is expected to be about 0.3% as a whole.
 なお、上記の式(5)では一例としてReLu関数を用いたが、第一の実施形態で説明した通り、条件1-1~条件1-3の全てを満たす一般的な発火関数を用いることが可能である。 Although the ReLu function is used as an example in the above equation (5), as described in the first embodiment, it is possible to use a general ignition function that satisfies all of the conditions 1-1 to 1-3. It is possible.
 <検索処理>
 入力された検索クエリQの関連文書の順序集合{D|i∈K}とその関連度{S|i∈K}とを得るための検索処理について、図10を参照しながら説明する。図10は、第二の実施形態に係る検索処理の一例を示すフローチャートである。
<Search process>
The search process for obtaining the ordered set {D i | i ∈ K} of the related documents of the input search query Q and the degree of relevance {S i | i ∈ K} will be described with reference to FIG. FIG. 10 is a flowchart showing an example of the search process according to the second embodiment.
 ステップS501:まず、文脈符号化部101は、検索クエリQを入力として、学習済みモデルパラメータを用いて、この検索クエリQの特徴量Uを出力する。 Step S501: First, the context coding unit 101 inputs the search query Q and outputs the feature amount U of the search query Q using the trained model parameters.
 ステップS502:次に、正規化スパース符号化部109は、上記のステップS501で得られた特徴量Uを入力として、学習済みモデルパラメータを用いて、検索クエリQの正規化スパース特徴量U'を出力する。 Step S502: Next, the normalized sparse coding unit 109 inputs the feature amount U obtained in step S501 above, and uses the trained model parameters to generate the normalized sparse feature amount U'of the search query Q. Output.
 ステップS503:次に、転置インデックス活用部103は、上記のステップS502で得られた正規化スパース特徴量U'を入力として、予め生成された転置インデックスを用いて、検索対象文書の正規化スパース特徴量の部分集合{V'|i∈K}を得る。なお、本実施形態に係る転置インデックスは、第一の実施形態における転置インデックスの説明で「疑似スパース特徴量」を「正規化スパース特徴量」に読み替えたものであり、その構成等は第一の実施形態と同様である。 Step S503: Next, the inverted index utilization unit 103 takes the normalized sparse feature amount U'obtained in the above step S502 as an input, and uses the inverted index generated in advance to use the normalized sparse feature of the search target document. Obtain a subset of quantities {V'i | i ∈ K}. The inverted index according to the present embodiment is obtained by replacing "pseudo-sparse feature amount" with "normalized sparse feature amount" in the explanation of the inverted index in the first embodiment, and its configuration and the like are the first. It is the same as the embodiment.
 ステップS504:そして、ランキング部104は、上記のステップS502で得られた正規化スパース特徴量U'と、上記のステップS503で得られた集合{V'|i∈K}とを入力として、{V'|i∈K}を{V''|i∈K}に変換した上で、この{V''|i∈K}と、正規化スパース特徴量U'とを用いて、検索クエリQの関連文書の順序集合{D|i∈K}とその関連度{S|i∈K}とを出力する。 Step S504: Then, the ranking unit 104 inputs the normalized sparse feature quantity U'obtained in the above step S502 and the set {V'i | i ∈ K} obtained in the above step S503. After converting {V'i | i ∈ K} to {V''i | i ∈ K}, this {V''i | i ∈ K} and the normalized sparse feature U'are used. , The ordered set of related documents of the search query Q {D i | i ∈ K} and its relevance degree {S i | i ∈ K} are output.
 以上により、本実施形態に係る検索装置10は、入力された検索クエリQに関連する文書の順序集合{D|i∈K}とその関連度{S|i∈K}とを得ることができる。 As described above, the search device 10 according to the present embodiment obtains an ordered set {D i | i ∈ K} of documents related to the input search query Q and a degree of relevance {S i | i ∈ K}. Can be done.
 ・転置インデックス生成時
 次に、転置インデックス生成装置20により転置インデックスを生成する場合について説明する。
-At the time of generating an inverted index Next, a case where an inverted index is generated by the inverted index generator 20 will be described.
 <転置インデックス生成装置20の全体構成>
 本実施形態に係る転置インデックス生成装置20の全体構成について、図11を参照しながら説明する。図11は、第二の実施形態に係る転置インデックス生成装置20の全体構成の一例を示す図である。
<Overall configuration of inverted index generator 20>
The overall configuration of the inverted index generation device 20 according to the present embodiment will be described with reference to FIG. FIG. 11 is a diagram showing an example of the overall configuration of the inverted index generation device 20 according to the second embodiment.
 図11に示すように、本実施形態に係る転置インデックス生成装置20は、文脈符号化部101と、正規化スパース符号化部109と、転置インデックス生成部105とを有する。つまり、本実施形態に係る転置インデックス生成装置20は、疑似スパース符号化部102の代わりに、正規化スパース符号化部109を有する。 As shown in FIG. 11, the inverted index generation device 20 according to the present embodiment has a context coding unit 101, a normalized sparse coding unit 109, and an inverted index generation unit 105. That is, the inverted index generation device 20 according to the present embodiment has a normalized sparse coding unit 109 instead of the pseudo sparse coding unit 102.
 正規化スパース符号化部109は、検索対象文書Dの特徴量Vを入力として、学習済みモデルパラメータを用いて、検索対象文書Dの正規化スパース特徴量V'を出力する。 The normalized sparse coding unit 109 takes the feature amount V i of the search target document Di as an input, and outputs the normalized sparse feature amount V'i of the search target document Di using the trained model parameters.
 ここで、正規化スパース符号化部109は、検索時に説明したように、最終層の全結合層で出力されるd'ベクトルに対して正規化及び平均シフトを行った後に発火関数を用いることで、正規化スパース特徴量V'を得る。 Here, the normalized sparse coding unit 109 uses the firing function after performing normalization and mean shift on the d'vector output in the fully connected layer of the final layer, as explained at the time of retrieval. , The normalized sparse feature quantity V'i is obtained.
 具体的には、或る文書Dを正規化スパース符号化部109に入力した際に最終層の発火関数を用いる前の全結合層で出力されるd'次元ベクトルをy=(yi1,・・・,yij,・・・,yid')とする。また、これらの適当な部分集合をY={y,・・・,y,・・・ys',}とする。なお、s'は予め任意に設定される。 Specifically, when a certain document Di is input to the normalized sparse coding unit 109, the d'dimensional vector output in the fully connected layer before using the firing function of the final layer is y i = (y i 1 ). , ..., y ij , ..., y id' ). Further, let these appropriate subsets be Y = {y 1 , ..., y i , ... y s' ,}. In addition, s'is arbitrarily set in advance.
 このとき、検索対象文書Dの特徴量Uを正規化スパース符号化部109に入力した際に最終層の発火関数を用いる前の全結合層で出力されるd'ベクトルをw=(wi1,・・・,wij,・・・,wid')とすれば、検索対象文書Dの正規化スパース特徴量V'=(v'i1,・・・,v'ij,・・・,v'id')は、以下の式(6)により計算される。 At this time, when the feature quantity Ui of the search target document Di is input to the normalized sparse coding unit 109, the d'vector output in the fully connected layer before using the firing function of the final layer is wi = (. If w i1 , ..., wij, ..., wid ' ), then the normalized sparse feature amount of the search target document Di i ' = ( v'i1 , ..., v'ij , ..., V'id' ) is calculated by the following equation (6).
Figure JPOXMLDOC01-appb-M000009
 ここで、μ及びσは予め設定されるハイパーパラメータであるが、検索時と異なる値としてもよい(当然、検索時と転置インデックス生成時で同一の値としてもよい。)。
Figure JPOXMLDOC01-appb-M000009
Here, μ and σ are hyperparameters that are set in advance, but they may be different values at the time of search (of course, they may be the same values at the time of search and at the time of inverted index generation).
 なお、上記の文書Dは検索対象文書Dでもよいし、それ以外の文書(例えば、訓練データとして用いた文書等)であってもよい。また、上記の式(6)における和の部分(つまり、(1/s')(y1j+・・・+ys'j))は事前に計算しておき、その計算結果を用いてもよい。この場合、wがYに含まれなくてもよい。ただし、∃W;w∈Wであれば、Y⊂Wであることが好ましい。 The above-mentioned document Di may be a search target document Di or another document (for example, a document used as training data). Further, the sum part (that is, (1 / s') (y 1j + ... + y s'j )) in the above equation (6) may be calculated in advance and the calculation result may be used. .. In this case, wi does not have to be included in Y. However, if ∃W ; wi ∈ W, then Y⊂W is preferable.
 上述したように、μ及びσにより、正規化スパース特徴量のスパース性を調整することが可能となるため、それに応じて転置インデックスの各インデックスに格納される文書量を調整することができ、検索速度(つまり、検索量)の調整が可能となる。 As described above, μ and σ make it possible to adjust the sparseness of the normalized sparse features, so that the amount of documents stored in each index of the inverted index can be adjusted accordingly, and the search can be performed. The speed (that is, the amount of search) can be adjusted.
 なお、第一の実施形態における転置インデックスの説明で「疑似スパース特徴量」を「正規化スパース特徴量」に読み替えることで、同様に転置インデックスが生成される。 Note that the inverted index is similarly generated by replacing "pseudo-sparse feature amount" with "normalized sparse feature amount" in the description of the inverted index in the first embodiment.
 <転置インデックス生成処理>
 入力された検索対象文書の集合{D,・・・,D}から転置インデックスを生成するための転置インデックス生成処理について、図12を参照しながら説明する。図12は、第二の実施形態に係る転置インデックス生成処理の一例を示すフローチャートである。
<Inverted index generation process>
The inverted index generation process for generating an inverted index from the input set of search target documents {D 1 , ..., D m } will be described with reference to FIG. FIG. 12 is a flowchart showing an example of the inverted index generation process according to the second embodiment.
 ステップS601:まず、文脈符号化部101は、検索対象文書Dを入力として、学習済みモデルパラメータを用いて、この検索対象文書Dの特徴量Vを出力する。 Step S601: First, the context coding unit 101 inputs the search target document Di and outputs the feature amount Vi of the search target document Di using the trained model parameters.
 ステップS602:次に、正規化スパース符号化部109は、検索対象文書Dの特徴量Vを入力として、学習済みモデルパラメータを用いて、検索対象文書D正規化スパース特徴量V'を出力する。 Step S602: Next, the normalized sparse coding unit 109 takes the feature amount V i of the search target document Di as an input and uses the trained model parameters to search the search target document D i the normalized sparse feature amount V'i . Is output.
 上記のステップS601~ステップS602は、全ての検索対象文書D(i=1,・・・,m)に対して繰り返し実行される。 The above steps S601 to S602 are repeatedly executed for all the search target documents Di ( i = 1, ..., M).
 ステップS603:そして、転置インデックス生成部105は、各検索対象文書D(i=1,・・・,m)の正規化スパース特徴量の集合{V',・・・,V'}を入力として、転置インデックスを生成及び出力する。 Step S603: Then, the inverted index generation unit 105 sets the normalized sparse features of each search target document Di ( i = 1, ..., M) { V'1 , ..., V'm }. Is used as an input to generate and output an inverted index.
 以上により、本実施形態に係る転置インデックス生成装置20は、入力された検索対象文書の集合{D,・・・,D}から転置インデックスを生成することができる。上述したように、本実施形態では、μ及びσを調整することで、転置インデックスの各インデックスに格納される文書量を調整することができるため、文書検索に要求される検索時間を満たすように文書量を調整することが可能となる。なお、検索時及び転置インデックス生成時と、後述する学習時とでμ及びσの値は独立に設定可能である。 As described above, the inverted index generation device 20 according to the present embodiment can generate an inverted index from the set of input search target documents {D 1 , ..., D m }. As described above, in the present embodiment, the amount of documents stored in each index of the inverted index can be adjusted by adjusting μ and σ, so that the search time required for document retrieval is satisfied. It is possible to adjust the amount of documents. The values of μ and σ can be set independently at the time of searching, at the time of generating an inverted index, and at the time of learning described later.
 ・学習時
 次に、学習装置30によりニューラルネットワーク(文脈符号化部101及び正規化スパース符号化部109を実現するニューラルネットワーク)の学習を行う場合について説明する。
-During learning Next, a case where the neural network (neural network that realizes the context coding unit 101 and the normalized sparse coding unit 109) is learned by the learning device 30 will be described.
 <学習装置30の全体構成>
 本実施形態に係る学習装置30の全体構成について、図13を参照しながら説明する。図13は、第二の実施形態に係る学習装置30の全体構成の一例を示す図である。
<Overall configuration of learning device 30>
The overall configuration of the learning device 30 according to the present embodiment will be described with reference to FIG. FIG. 13 is a diagram showing an example of the overall configuration of the learning device 30 according to the second embodiment.
 図13に示すように、本実施形態に係る学習装置30は、文脈符号化部101と、正規化スパース符号化部109と、ランキング部104と、分割部106と、更新部107と、判定部108とを有する。ここで、文脈符号化部101及び正規化スパース符号化部109は、上記の検索時及び転置インデックス生成時で説明した文脈符号化部101及び正規化スパース符号化部109と同一のニューラルネットワークで実現されているが、そのモデルパラメータは学習済みでないものとする。 As shown in FIG. 13, the learning device 30 according to the present embodiment includes a context coding unit 101, a normalized sparse coding unit 109, a ranking unit 104, a division unit 106, an update unit 107, and a determination unit. It has 108 and. Here, the context coding unit 101 and the normalized sparse coding unit 109 are realized by the same neural network as the context coding unit 101 and the normalized sparse coding unit 109 described in the search time and the inverted index generation time. However, it is assumed that the model parameters have not been trained.
 正規化スパース符号化部109は、訓練データ(Q,D ,D )の特徴量(U,V ,V )を入力として、学習済みでないモデルパラメータを用いて、訓練データ(Q,D ,D )の正規化スパース特徴量(U',V' ,V' )を得る。すなわち、正規化スパース符号化部109は、特徴量U、V 及びV を入力として、上記の検索時及び転置インデックス生成時で説明したように、それぞれの正規化スパース特徴量U'、V' 及びV' を得る。ただし、μ及びσの値は学習段階(例えば、学習ステップ等)に応じて変化させてもよい。 The normalized sparse coding unit 109 takes the feature quantities (U i , V i + , V i ) of the training data (Q i , Di + , Di ) as inputs, and uses the model parameters that have not been trained. , The normalized sparse features ( U'i , V'i + , V'i- ) of the training data (Qi, Di + , Di- ) are obtained. That is, the normalized sparse coding unit 109 takes the feature quantities U i , Vi + , and Vi as inputs, and as described above at the time of searching and at the time of generating the inverted index, each normalized sparse feature quantity U ' i , V'i + and V'i- are obtained. However, the values of μ and σ may be changed according to the learning stage (for example, learning step).
 そして、疑似スパース符号化部102は、疑似スパース特徴量U'、V' 及びV' をそれぞれ学習用疑似スパース特徴量U''、V'' 及びV'' に変換する。 Then, the pseudo-sparse coding unit 102 sets the pseudo-sparse feature quantities U'i , V'i + and V'i - for learning pseudo-sparse feature quantities U''i , V''i + and V''i , respectively. Convert to- .
 <学習処理>
 次に、学習処理について説明する。本実施形態では、上記の式(5)や式(6)で示されるように、次元毎に任意の部分集合から平均と分散を計算するため、基本的にはミニバッチ学習を用いることが好ましい。ただし、各次元の平均や分散といった統計量を近似又は推定可能であれば、必ずしもミニバッチ学習を用いる必要はない。ミニバッチ学習を用いる場合、その処理の流れは図6と同様であるため、以降では、図6のステップS302のパラメータ更新処理について説明する。
<Learning process>
Next, the learning process will be described. In this embodiment, as shown by the above equations (5) and (6), since the mean and variance are calculated from an arbitrary subset for each dimension, it is basically preferable to use mini-batch learning. However, it is not always necessary to use mini-batch learning if statistics such as mean and variance of each dimension can be approximated or estimated. When the mini-batch learning is used, the flow of the process is the same as that of FIG. 6, and therefore, the parameter update process of step S302 of FIG. 6 will be described below.
 <モデルパラメータ更新処理>
 本実施形態に係るパラメータ更新処理について、図14を参照しながら説明する。図14は、第二の実施形態に係るモデルパラメータ更新処理の一例を示すフローチャートである。
<Model parameter update process>
The parameter update process according to the present embodiment will be described with reference to FIG. FIG. 14 is a flowchart showing an example of the model parameter update process according to the second embodiment.
 ステップS701:まず、文脈符号化部101は、当該ミニバッチ中の訓練データ(Q,D ,D )を入力として、学習済みでないモデルパラメータを用いて、この訓練データ(Q,D ,D )の特徴量(U,V ,V )を出力する。 Step S701: First, the context coding unit 101 takes the training data (Qi, Di + , Di- ) in the mini - batch as an input, and uses the untrained model parameters to use the training data ( Qi ,). Outputs the features (U i , V i + , V i- ) of D i + , D i- ) .
 ステップS702:次に、正規化スパース符号化部109は、当該訓練データ(Q,D ,D )の特徴量(U,V ,V )を入力として、学習済みでないモデルパラメータを用いて、当該訓練データ(Q,D ,D )の正規化スパース特徴量(U',V' ,V' )を得る。 Step S702: Next, the normalized sparse coding unit 109 learns by inputting the features (U i , V i + , V i ) of the training data (Q i , Di + , Di ). The normalized sparse features ( U'i , V'i + , V'i- ) of the training data (Qi, Di + , Di- ) are obtained using the model parameters that have not been completed.
 ステップS703:次に、正規化スパース符号化部109は、正規化スパース特徴量(U',V' ,V' )を学習用疑似スパース特徴量(U'',V'' ,V'' )に変換し、学習用疑似スパース特徴量(U'',V'' ,V'' )を出力する。 Step S703: Next, the normalized sparse coding unit 109 uses the normalized sparse features ( U'i , V'i + , V'i- ) for learning pseudo - sparse features ( U''i , V'. ' i + , V''i- ) is converted, and the learning pseudosparse features ( U''i , V''i + , V''i- ) are output.
 以降のステップS704~ステップS706は、図6のステップS404~ステップS406とそれぞれ同様であるため、その説明を省略する。 Subsequent steps S704 to S706 are the same as steps S404 to S406 in FIG. 6, and their description thereof will be omitted.
 以上により、本実施形態に係る学習装置30は、入力された訓練データセットを用いて、文脈符号化部101及び正規化スパース符号化部109を実現するニューラルネットワークのモデルパラメータを学習することができる。 As described above, the learning device 30 according to the present embodiment can learn the model parameters of the neural network that realizes the context coding unit 101 and the normalized sparse coding unit 109 by using the input training data set. ..
 [第三の実施形態]
 次に、第三の実施形態について説明する。本実施形態では、勾配推定型の誤差逆伝播法によりモデルパラメータを更新する際に、勾配が0となっている要素を削減し、安定的かつ効率的に学習させる場合について説明する。
[Third embodiment]
Next, the third embodiment will be described. In the present embodiment, when updating the model parameters by the gradient estimation type error back propagation method, the case where the elements having the gradient of 0 is reduced and the learning is performed stably and efficiently will be described.
 なお、第三の実施形態では、主に、第一の実施形態との相違点について説明し、第一の実施形態と実質的に同様の構成要素についてはその説明を省略する。 In the third embodiment, the differences from the first embodiment will be mainly described, and the description of the components substantially the same as those of the first embodiment will be omitted.
 ・検索時
 まず、検索装置10により文書検索を行う場合について説明する。
-At the time of search First, a case where a document search is performed by the search device 10 will be described.
 <検索装置10の全体構成>
 本実施形態に係る検索装置10の全体構成について、図15を参照しながら説明する。図15は、第三の実施形態に係る検索装置10の全体構成の一例を示す図である。
<Overall configuration of search device 10>
The overall configuration of the search device 10 according to the present embodiment will be described with reference to FIG. FIG. 15 is a diagram showing an example of the overall configuration of the search device 10 according to the third embodiment.
 図15に示すように、本実施形態に係る検索装置10は、文脈符号化部101と、勾配推定型疑似スパース符号化部110と、転置インデックス活用部103と、ランキング部104とを有する。つまり、本実施形態に係る検索装置10は、疑似スパース符号化部102の代わりに、勾配推定型疑似スパース符号化部110を有する。なお、例えば、文脈符号化部101と勾配推定型疑似スパース符号化部110とをまとめて符号化部100Bとしてもよい。 As shown in FIG. 15, the search device 10 according to the present embodiment has a context coding unit 101, a gradient estimation type pseudo-sparse coding unit 110, an inverted index utilization unit 103, and a ranking unit 104. That is, the search device 10 according to the present embodiment has a gradient estimation type pseudo-sparse coding unit 110 instead of the pseudo-sparse coding unit 102. For example, the context coding unit 101 and the gradient estimation type pseudo-sparse coding unit 110 may be collectively referred to as the coding unit 100B.
 勾配推定型疑似スパース符号化部110は、第一の実施形態で説明した疑似スパース符号化部102と同様に、検索クエリQの特徴量Uを入力として、学習済みモデルパラメータを用いて、検索クエリQの疑似スパース特徴量U'を出力する。 Similar to the pseudo-sparse coding unit 102 described in the first embodiment, the gradient estimation type pseudo-sparse coding unit 110 uses the feature quantity U of the search query Q as an input and uses the trained model parameters to make a search query. Outputs the pseudo-sparse feature amount U'of Q.
 なお、「勾配推定型疑似スパース符号化部」という名称は、学習時に勾配推定を行う際に勾配推定型疑似スパース符号化部110がニューラルネットワークの順伝播処理にて後述する閾値t2,r 等を計算するため、疑似スパース符号化部102とは厳密には処理内容が異なるために区別したものである。検索時及び転置インデックス生成時における勾配推定型疑似スパース符号化部110は、第一の実施形態で説明した疑似スパース符号化部102と同様の処理を行うため、検索時及び転置インデックス生成時における勾配推定型疑似スパース符号化部110を「疑似スパース符号化部102」としてもよい。したがって、本実施形態に係る検索処理と転置インデックス生成処理は、第一の実施形態と同様である。 The name "gradient estimation type pseudo-sparse coding unit" is a threshold value t 2, ru described later in the forward propagation process of the neural network by the gradient estimation type pseudo sparse coding unit 110 when performing gradient estimation during learning. Strictly speaking, the processing content is different from that of the pseudo-sparse coding unit 102 in order to calculate the above. Since the gradient estimation type pseudo-sparse coding unit 110 at the time of search and at the time of generating an inverted index performs the same processing as the pseudo-sparse coding unit 102 described in the first embodiment, the gradient at the time of searching and at the time of generating an inverted index The estimation type pseudo-sparse coding unit 110 may be referred to as a “pseudo-sparse coding unit 102”. Therefore, the search process and the inverted index generation process according to the present embodiment are the same as those of the first embodiment.
 ・転置インデックス生成時
 次に、転置インデックス生成装置20により転置インデックスを生成する場合について説明する。
-At the time of generating an inverted index Next, a case where an inverted index is generated by the inverted index generator 20 will be described.
 <転置インデックス生成装置20の全体構成>
 本実施形態に係る転置インデックス生成装置20の全体構成について、図16を参照しながら説明する。図16は、第三の実施形態に係る転置インデックス生成装置20の全体構成の一例を示す図である。
<Overall configuration of inverted index generator 20>
The overall configuration of the inverted index generation device 20 according to the present embodiment will be described with reference to FIG. FIG. 16 is a diagram showing an example of the overall configuration of the inverted index generation device 20 according to the third embodiment.
 図16に示すように、本実施形態に係る転置インデックス生成装置20は、文脈符号化部101と、勾配推定型疑似スパース符号化部110と、転置インデックス生成部105とを有する。つまり、本実施形態に係る転置インデックス生成装置20は、疑似スパース符号化部102の代わりに、勾配推定型疑似スパース符号化部110を有する。ただし、上述したように、転置インデックス生成時における勾配推定型疑似スパース符号化部110は疑似スパース符号化部102と同様の処理を行うため、勾配推定型疑似スパース符号化部110を「疑似スパース符号化部102」としてもよい。また、上述したように、本実施形態に係る転置インデックス生成処理は、第一の実施形態と同様である。 As shown in FIG. 16, the inverted index generation device 20 according to the present embodiment has a context coding unit 101, a gradient estimation type pseudo-sparse coding unit 110, and an inverted index generation unit 105. That is, the inverted index generation device 20 according to the present embodiment has a gradient estimation type pseudo-sparse coding unit 110 instead of the pseudo-sparse coding unit 102. However, as described above, since the gradient estimation type pseudo-sparse coding unit 110 performs the same processing as the pseudo-sparse coding unit 102 at the time of generating the inverted index, the gradient estimation type pseudo-sparse coding unit 110 is referred to as “pseudo-sparse code”. It may be "sparse part 102". Further, as described above, the inverted index generation process according to the present embodiment is the same as that of the first embodiment.
 ・学習時
 次に、学習装置30によりニューラルネットワーク(文脈符号化部101及び勾配推定型疑似スパース符号化部110を実現するニューラルネットワーク)の学習を行う場合について説明する。
-During learning Next, a case where the neural network (neural network that realizes the context coding unit 101 and the gradient estimation type pseudo-sparse coding unit 110) is learned by the learning device 30 will be described.
 <学習装置30の全体構成>
 本実施形態に係る学習装置30の全体構成について、図17を参照しながら説明する。図17は、第三の実施形態に係る学習装置30の全体構成の一例を示す図である。
<Overall configuration of learning device 30>
The overall configuration of the learning device 30 according to the present embodiment will be described with reference to FIG. FIG. 17 is a diagram showing an example of the overall configuration of the learning device 30 according to the third embodiment.
 図17に示すように、本実施形態に係る学習装置30は、文脈符号化部101と、勾配推定型疑似スパース符号化部110と、ランキング部104と、分割部106と、更新部107Aと、判定部108とを有する。ここで、文脈符号化部101及び勾配推定型疑似スパース符号化部110は、上記の検索時及び転置インデックス生成時で説明した文脈符号化部101及び勾配推定型疑似スパース符号化部110と同一のニューラルネットワークで実現されているが、そのモデルパラメータは学習済みでないものとする。 As shown in FIG. 17, the learning device 30 according to the present embodiment includes a context coding unit 101, a gradient estimation type pseudo-sparse coding unit 110, a ranking unit 104, a division unit 106, and an update unit 107A. It has a determination unit 108. Here, the context coding unit 101 and the gradient estimation type pseudo-sparse coding unit 110 are the same as the context coding unit 101 and the gradient estimation type pseudo-sparse coding unit 110 described in the above-mentioned search time and inverted index generation time. It is realized by a neural network, but its model parameters are not trained.
 勾配推定型疑似スパース符号化部110は、第一の実施形態と同様に、訓練データ(Q,D ,D )の特徴量(U,V ,V )を入力として、学習済みでないモデルパラメータを用いて、訓練データ(Q,D ,D )の学習用疑似スパース特徴量(U'',V'' ,V'' )を出力する。また、このとき、勾配推定型疑似スパース符号化部110は、後述する閾値t2,r 等を計算する。 The gradient estimation type pseudo-sparse coding unit 110 obtains the features (U i , V i + , V i ) of the training data (Q i , Di + , Di ) as in the first embodiment. Pseudo - sparse features for training ( U''i , V''i + , V''i- of training data (Qi, Di + , Di- ) using untrained model parameters as input. ) Is output. At this time, the gradient estimation type pseudo-sparse coding unit 110 calculates the threshold values t 2, ru, etc., which will be described later.
 ここで、本実施形態では、勾配推定型疑似スパース符号化部110を実現するニューラルネットワークの順伝播での変換を関数gで表す。これにより、例えば、検索クエリQの特徴量Uの要素uir(r=1,・・・,d')を入力として、当該検索クエリQの学習用疑似スパース特徴量U''の要素u''ir=g(uir)を得る変換gは以下の式(7)で表される。 Here, in the present embodiment, the transformation in the forward propagation of the neural network that realizes the gradient estimation type pseudo - sparse coding unit 110 is represented by the function g1. As a result, for example, the element ir (r = 1, ..., D') of the feature quantity U i of the search query Q i is input, and the learning pseudo-sparse feature quantity U''i of the search query Q i is input. The transformation g 1 to obtain the element u'' ir = g 1 (u ir ) of is expressed by the following equation (7).
Figure JPOXMLDOC01-appb-M000010
 ここで、t1,r は閾値であり、t1,r =minW' (つまり、W' に含まれる要素のうちの最小の要素)である。
Figure JPOXMLDOC01-appb-M000010
Here, t 1, ru is a threshold value, and t 1, ru = minW'1 r (that is, the smallest element among the elements included in W'1 r ).
 なお、V =(v i1,・・・,v ir,・・・,v id')として、「uir」を「v ir」、「t1,r 」を「t1,r v+」と読み替えれば、上記の式(7)は学習用疑似スパース特徴量V'' の要素v'' irを得る変換(関数)を表す。ただし、t1,r v+は、t1,r v+=minW' である。 It should be noted that, with V i + = (v + i1 , ..., v + ir , ..., v + id' ), " u ir " is "v + ir " and "t 1, ru" is "t 1, ru". When read as "t 1, r v + ", the above equation (7) represents a transformation (function) for obtaining the element v'' + ir of the learning pseudo-sparse feature quantity V''i + . However, t 1, r v + is t 1, r v + = minW'2 r .
 同様に、V =(v i1,・・・,v ir,・・・,v id')として、「uir」を「v ir」、「t1,r 」を「t1,r v-」と読み替えれば、上記の式(7)は学習用疑似スパース特徴量V'' の要素v'' irを得る変換(関数)を表す。ただし、t1,r v-は、t1,r v-=minW' である。 Similarly, as Vi- = (v-i1 , ... , v - ir , ... , v- id ' ), " u ir " is changed to "v - ir ", "t 1, ru". When read as "t 1, r v- ", the above equation (7) represents a transformation (function) for obtaining the element v'' - ir of the learning pseudo - sparse feature quantity V''i- . However, t 1, r v- is t 1, r v- = minW'3 r .
 更新部107Aは、第一の実施形態と同様に、関連度S 及びS を入力として、教師あり学習の手法によりモデルパラメータを更新する。このとき、更新部107Aは、誤差逆伝播法により誤差関数(例えば、hinge loss)の勾配を計算(推定)する際に、上記の式(7)に示す関数gの偏微分の代わりに、後述する関数gの偏微分を用いて誤差を逆伝播させることで、誤差関数の勾配を得る。 Similar to the first embodiment, the update unit 107A updates the model parameters by the supervised learning method with the relevance degrees S i + and S i as inputs. At this time, when the update unit 107A calculates (estimates) the gradient of the error function (for example, hinte loss) by the error back propagation method, instead of the partial differential of the function g1 shown in the above equation ( 7 ), the update unit 107A replaces the partial differential of the function g1. The gradient of the error function is obtained by back - propagating the error using the partial derivative of the function g2 described later.
 なお、上記の式(7)に示す関数gの偏微分は以下の式(8)となる。 The partial differential of the function g1 shown in the above equation (7) is the following equation (8).
Figure JPOXMLDOC01-appb-M000011
 本実施形態では、誤差逆伝播法により誤差を逆伝播させる際に、上記の式(8)に示す関数gの偏微分の代わりに、以下の式(9)に示す関数gの偏微分を用いる。
Figure JPOXMLDOC01-appb-M000011
In the present embodiment, when the error is back-propagated by the error back-propagation method, the partial differential of the function g 2 shown in the following equation (9) is replaced with the partial derivative of the function g 1 shown in the above equation (8). Is used.
Figure JPOXMLDOC01-appb-M000012
 ここで、t2,r は閾値であり、t1,r >t2,r を満たすように設定されるべき値である。
Figure JPOXMLDOC01-appb-M000012
Here, t 2, ru is a threshold value, and is a value to be set so as to satisfy t 1, ru > t 2, ru .
 b=t1,r 、a=t2,r とおいた場合における関数g及びgとその偏微分を図示したグラフを図18に示す。図18の左上は関数g、左下は関数g、右上は関数gの偏微分、右下は関数gの偏微分を図示したグラフである。つまり、関数gの偏微分は、a<u<bではg(a)=0,g(b)=1を通る1次関数で表される。 FIG. 18 shows a graph showing the functions g1 and g2 and their partial derivatives when b = t 1, ru and a = t 2, ru . The upper left of FIG. 18 is a graph showing the function g 1 , the lower left is the function g 2 , the upper right is the partial derivative of the function g 1 , and the lower right is the partial derivative of the function g 2 . That is, the partial derivative of the function g 2 is represented by a linear function that passes through g 2 (a) = 0 and g 2 (b) = 1 when a <u <b.
 図18の右上及び右下に示されるように、関数gの偏微分ではb以下の要素で0となってしまうのに対して、関数gの偏微分ではa以上b以下の要素で0とはならない。このため、逆伝播時に関数gの偏微分を用いることで、誤差関数の勾配が0となってしまう要素を削減することができ(言い換えれば、誤差を逆伝播させることができる要素を増やすことができ)、安定的かつ効率的に学習させることが可能となる。 As shown in the upper right and lower right of FIG. 18, in the partial differential of the function g 1 , the element of b or less is 0, whereas in the partial differential of the function g 2 , the element of a or more and b or less is 0. It does not become. Therefore, by using the partial derivative of the function g 2 at the time of back propagation, it is possible to reduce the elements in which the gradient of the error function becomes 0 (in other words, increase the elements in which the error can be back-propagated. It is possible to learn stably and efficiently.
 なお、学習用疑似スパース特徴量V'' の要素v'' irを得る変換(関数)に関しては、その偏微分の代わりに、上記の式(8)で「uir」を「v ir」、「t1,r 」を「t1,r v+」、「t2,r 」を「t2,r v+」と読み替えた偏微分を用いる。ただし、t2,r v+は閾値であり、t1,r v+>t2,r v+を満たすように設定されるべき値である。 Regarding the transformation (function) that obtains the element v'' + ir of the pseudo-sparse feature quantity for learning V''i + , instead of the partial differential, "u ir " is changed to "v" in the above equation (8). Partial differentiation is used in which " + ir " and "t 1, ru" are read as " t 1, r v + " and "t 2, ru" is read as " t 2, r v + ". However, t 2, r v + is a threshold value, and is a value that should be set so as to satisfy t 1, r v + > t 2, r v + .
 同様に、学習用疑似スパース特徴量V'' の要素v'' irを得る変換(関数)に関しては、その偏微分の代わりに、上記の式(8)で「uir」を「v ir」、「t1,r 」を「t1,r v-」、「t2,r 」を「t2,r v-」と読み替えた偏微分を用いる。ただし、t2,r v-は閾値であり、t1,r v->t2,r v-を満たすように設定されるべき値である。 Similarly, for the transformation (function) that obtains the element v'' - ir of the pseudo-sparse feature quantity V''i - for learning, instead of the partial differential, "u ir" is replaced with "u ir " in the above equation (8). Partial differentiation is used in which "v - ir " and "t 1, ru" are read as " t 1, r v- " and "t 2 , ru" is read as " t 2, r v- ". However, t 2, r v- is a threshold value and is a value that should be set so as to satisfy t 1, r v- > t 2, r v- .
 ここで、閾値t2,r 、t2,r v+及びt2,r v-の決め方について説明する。例えば、閾値t2,r は、Ztr に含まれる疑似スパース特徴量のr次元目の要素を集めた集合{u'1r,u'2r,・・・,u'm''r}について、その値の降順に上位2×tque%を集めた部分集合の最小値等とすることが考えられる。このような閾値t2,r は、勾配推定型疑似スパース符号化部110を実現するニューラルネットワークの順伝播時に計算される。 Here, how to determine the threshold values t 2, ru, t 2, r v + and t 2, r v- will be described. For example, the threshold values t 2 and ru are a set {u'1r, u'2r, ..., U'm''r } in which the elements of the r-th dimension of the pseudo-sparse features contained in Z tr 1 are collected. It is conceivable to set the minimum value of the subset obtained by collecting the top 2 × t que % in descending order of the value. Such threshold values t 2 and ru are calculated at the time of forward propagation of the neural network that realizes the gradient estimation type pseudo-sparse coding unit 110.
 同様に、閾値t2,r v+は、Ztr に含まれる疑似スパース特徴量のr次元目の要素を集めた集合{v' 1r,v' 2r,・・・,v' m''r}について、その値の降順に上位2×tpas%を集めた部分集合の最小値等とすることが考えられる。このような閾値t2,r v+は、勾配推定型疑似スパース符号化部110を実現するニューラルネットワークの順伝播時に計算される。 Similarly, the threshold values t 2, r v + are a set {v' + 1r , v' + 2r , ..., v' + m , which is a collection of elements of the r-th dimension of the pseudo-sparse features contained in Z tr 2 . For''r }, it is conceivable to set the minimum value of the subset obtained by collecting the top 2 × t pas % in descending order of the value. Such threshold values t 2, r v + are calculated at the time of forward propagation of the neural network that realizes the gradient estimation type pseudo-sparse coding unit 110.
 同様に、閾値t2,r v-は、Ztr に含まれる疑似スパース特徴量のr次元目の要素を集めた集合{v' 1r,v' 2r,・・・,v' m''r}について、その値の降順に上位2×tpas%を集めた部分集合の最小値等とすることが考えられる。このような閾値t2,r v-は、勾配推定型疑似スパース符号化部110を実現するニューラルネットワークの順伝播時に計算される。 Similarly, the threshold values t 2, r v- are a set {v' - 1r , v' - 2r , ..., V'-, which is a collection of elements of the r-th dimension of the pseudo - sparse features contained in Z tr 3 . For m''r }, it is conceivable to set the minimum value of the subset obtained by collecting the top 2 × t pas % in descending order of the value. Such threshold values t 2, r v- are calculated at the time of forward propagation of the neural network that realizes the gradient estimation type pseudo-sparse coding unit 110.
 これにより、次元rや学習時のミニバッチの取り方等によって変化する閾値t1,r 、t1,r v+及びt1,r v-に対して、t1,r >t2,r 、t1,r v+>t2,r v+及びt1,r v->t2,r v-をそれぞれ満たす閾値t2,r 、t2,r v+及びt2,r v-を自動的に計算及び決定することができる(この決め方を「決定方法1」という。)。なお、上記の2×tque%や2×tpas%は一例であって、2の代わりに任意の値L(ただし、L>1)を用いることが可能である。 As a result, t 1, r u > t 2, r for the thresholds t 1, ru, t 1, r v + and t 1, r v- , which change depending on the dimension r and how to take a mini-batch at the time of learning. The thresholds t 2, r u , t 2, r v + and t 2, r v- satisfying u , t 1, r v + > t 2, r v + and t 1, r v- > t 2, r v- , respectively. It can be calculated and determined automatically (this determination method is referred to as "determination method 1"). The above 2 × t que % and 2 × t pas % are examples, and an arbitrary value L (where L> 1) can be used instead of 2.
 ただし、上記の閾値t1,r 、t1,r v+及びt1,r v-の決め方は一例であって、これ以外の方法で当該閾値を決めてもよい。例えば、b=t1,r (又は、t1,r v+若しくはt1,r v-)、a=t2,r (又は、t2,r v+若しくはt2,r v-)として、ミニバッチ内における次元rの要素の最大値cを用いてa=2b-cとしてもよい(この決め方を「決定方法2」という。)。なお、cは、例えば、ミニバッチに含まれる訓練データ数をm''とした場合、b=t1,r 、a=t2,r のとき、c=max{u'1r,u'2r,・・・,u'm''r}である。同様に、b=t1,r v+、a=t2,r v+のときはc=max{v' 1r,v' 2r,・・・,v' m''r}であり、b=t1,r v-、a=t2,r v-のときはc=max{v' 1r,v' 2r,・・・,v' m''r}である。 However, the above-mentioned method of determining the threshold values t 1, ru, t 1, r v + and t 1, r v- is an example, and the threshold value may be determined by other methods. For example, as b = t 1, ru (or t 1, r v + or t 1, r v- ), a = t 2, ru (or t 2, r v + or t 2, r v- ). , The maximum value c of the element of the dimension r in the mini-batch may be used to set a = 2bc (this determination method is referred to as "determination method 2"). For c, for example, when the number of training data included in the mini-batch is m'', when b = t 1, ru, a = t 2, ru , c = max { u'1r , u '. 2r , ..., u'm''r }. Similarly, when b = t 1, r v + and a = t 2, r v + , c = max {v' + 1r , v' + 2r , ..., V' + m''r }. When b = t 1, r v- and a = t 2, r v- , c = max {v' - 1r , v' - 2r , ..., v' - m''r }.
 <学習処理>
 次に、学習処理について説明する。本実施形態では、第一の実施形態と同様に、ミニバッチ学習を用いる場合について説明する。ただし、ミニバッチ学習以外の他の学習手法を用いることも可能である。ミニバッチ学習の流れは図6と同様であるため、以降では、図6のステップS302のパラメータ更新処理について説明する。
<Learning process>
Next, the learning process will be described. In this embodiment, the case of using mini-batch learning will be described as in the first embodiment. However, it is also possible to use learning methods other than mini-batch learning. Since the flow of mini-batch learning is the same as that of FIG. 6, the parameter update process of step S302 of FIG. 6 will be described below.
 <モデルパラメータ更新処理>
 本実施形態に係るパラメータ更新処理について、図19を参照しながら説明する。図19は、第三の実施形態に係るモデルパラメータ更新処理の一例を示すフローチャートである。ステップS801~ステップS804は、図7のステップS401~ステップS404と同様であるため、その説明を省略する。
<Model parameter update process>
The parameter update process according to the present embodiment will be described with reference to FIG. FIG. 19 is a flowchart showing an example of the model parameter update process according to the third embodiment. Since steps S801 to S804 are the same as steps S401 to S404 in FIG. 7, the description thereof will be omitted.
 ステップS805:勾配推定型疑似スパース符号化部110は、閾値t2,r 、t2,r v+及びt2,r v-を計算する。 Step S805: The gradient estimation type pseudo-sparse coding unit 110 calculates the threshold values t 2, ru, t 2, r v + and t 2, r v- .
 ステップS806:続いて、更新部107Aは、ステップS804で得られた各関連度S 及びS を入力として、誤差関数(例えば、hinge loss)の値とモデルパラメータに関する誤差関数の勾配とを計算する。このとき、更新部107Aは、誤差逆伝播法により誤差関数の勾配を計算(推定)する際に、関数gの偏微分の代わりに関数gの偏微分を用いて誤差を逆伝播させて誤差関数の勾配を得る。 Step S806: Subsequently, the update unit 107A takes the relevance S i + and S i obtained in step S804 as inputs, and sets the value of the error function (for example, hinge loss) and the gradient of the error function with respect to the model parameter. To calculate. At this time, when the update unit 107A calculates (estimates) the gradient of the error function by the error back propagation method, the update unit 107A back-propagates the error by using the partial differential of the function g 2 instead of the partial differential of the function g 1 . Obtain the gradient of the error function.
 ステップS807:そして、更新部107Aは、上記のステップS806で計算した誤差関数の値とその勾配とを用いて、任意の最適化手法によりモデルパラメータを更新する。 Step S807: Then, the update unit 107A updates the model parameters by an arbitrary optimization method using the value of the error function calculated in step S806 and the gradient thereof.
 以上により、本実施形態に係る学習装置30は、入力された訓練データセットを用いて、文脈符号化部101及び勾配推定型疑似スパース符号化部110を実現するニューラルネットワークのモデルパラメータを学習することができる。このとき、本実施形態では、勾配推定型の誤差逆伝播法を用いることで、学習中の部分集合(つまり、Ztr 、Ztr 及びZtr )の取り方に依存する閾値の不安定さを考慮することが可能となり、学習をより安定化及び促進させることが可能となる。すなわち、例えば、第一の実施形態では部分集合の取り方次第で勾配が0になったり、ならなかったりする要素に対して、本実施形態では誤差を逆伝播させることが可能になる。 As described above, the learning device 30 according to the present embodiment learns the model parameters of the neural network that realizes the context coding unit 101 and the gradient estimation type pseudo-sparse coding unit 110 by using the input training data set. Can be done. At this time, in the present embodiment, by using the gradient estimation type backpropagation method, the threshold value does not depend on how to take the subset during learning (that is, Z tr 1 , Z tr 2 and Z tr 3 ). Stability can be taken into consideration, and learning can be further stabilized and promoted. That is, for example, in the first embodiment, the error can be back-propagated to the element whose gradient becomes 0 or does not become 0 depending on how the subset is taken.
 なお、本実施形態では、勾配推定型疑似スパース符号化部110を実現するニューラルネットワークの順伝播時に閾値t2,r 、t2,r v+及びt2,r v-を計算したが、例えば、上記のステップS806で更新部107Aによって計算されてもよい。この場合、本実施形態に係る学習装置30は、図21に示すように、符号化部100Bではなく、符号化部100を有していてもよい(つまり、勾配推定型疑似スパース符号化部110ではなく、疑似スパース符号化部102を有していてもよい。)。 In the present embodiment, the threshold values t 2, ru, t 2, r v + and t 2, r v- are calculated at the time of forward propagation of the neural network that realizes the gradient estimation type pseudo-sparse coding unit 110, but for example. , May be calculated by the update unit 107A in step S806 above. In this case, as shown in FIG. 21, the learning device 30 according to the present embodiment may have the coding unit 100 instead of the coding unit 100B (that is, the gradient estimation type pseudo-sparse coding unit 110). Instead, it may have a pseudo-sparse coding unit 102).
 <評価実験>
 次に、本実施形態の手法(以下、「提案手法」という。)を評価するための評価実験について説明する。なお、以下で特に言及しない点については、第一の実施形態で説明した評価実験と同様の設定(データセットや学習時の設定等)である。
<Evaluation experiment>
Next, an evaluation experiment for evaluating the method of the present embodiment (hereinafter referred to as “proposal method”) will be described. The points not specifically mentioned below are the same settings (data set, learning settings, etc.) as in the evaluation experiment described in the first embodiment.
 提案手法では第一の実施形態と同様に2段階の検索を行っているものとして、1段階目では勾配推定を行わず、2段階目は「勾配推定なし」、「勾配推定パターン1」、「勾配推定パターン2」の3種類とした。勾配推定パターン1は、上記の決定方法2により閾値t2,r 、t2,r v+及びt2,r v-を決めた場合である。一方で、勾配推定パターン2は、L=2として、上記の決定方法1により閾値t2,r 、t2,r v+及びt2,r v-を決めた場合である。 Assuming that the proposed method performs a two-step search as in the first embodiment, the first step does not perform gradient estimation, and the second step is "no gradient estimation", "gradient estimation pattern 1", and " There are three types of gradient estimation pattern 2 ”. The gradient estimation pattern 1 is a case where the threshold values t 2, ru, t 2, r v + and t 2, r v- are determined by the above determination method 2. On the other hand, the gradient estimation pattern 2 is a case where the threshold values t 2, ru, t 2, r v + and t 2, r v- are determined by the above determination method 1 with L = 2.
 また、従来手法としてはBM25を用いた。以下に評価結果を示す。 In addition, BM25 was used as the conventional method. The evaluation results are shown below.
Figure JPOXMLDOC01-appb-T000013
 なお、MRRは平均逆順位、Pは再現率を表す。また、Latencyは検索時間の平均値(単位はms)を表す。
Figure JPOXMLDOC01-appb-T000013
MRR represents the average reverse rank, and P represents the recall rate. Latency represents the average value of search time (unit is ms).
 上記の表4に示すように、提案手法(特に、勾配推定パターン1)は従来手法よりも高い性能を達成できていることがわかる。 As shown in Table 4 above, it can be seen that the proposed method (particularly, the gradient estimation pattern 1) can achieve higher performance than the conventional method.
 [ハードウェア構成]
 最後に、第一の実施形態乃至第三の実施形態に係る検索装置10、転置インデックス生成装置20及び学習装置30のハードウェア構成について説明する。検索装置10、転置インデックス生成装置20及び学習装置30は一般的なコンピュータ又はコンピュータシステムのハードウェア構成により実現可能であり、例えば、図21に示すコンピュータ500のハードウェア構成により実現可能である。図21は、コンピュータ500のハードウェア構成の一例を示す図である。
[Hardware configuration]
Finally, the hardware configurations of the search device 10, the inverted index generation device 20, and the learning device 30 according to the first to third embodiments will be described. The search device 10, the inverted index generation device 20, and the learning device 30 can be realized by the hardware configuration of a general computer or computer system, and can be realized by, for example, the hardware configuration of the computer 500 shown in FIG. FIG. 21 is a diagram showing an example of the hardware configuration of the computer 500.
 図21に示すコンピュータ500は、入力装置501と、表示装置502と、外部I/F503と、通信I/F504と、プロセッサ505と、メモリ装置506とを有する。これら各ハードウェアは、それぞれがバス507を介して通信可能に接続されている。 The computer 500 shown in FIG. 21 has an input device 501, a display device 502, an external I / F 503, a communication I / F 504, a processor 505, and a memory device 506. Each of these hardware is connected so as to be communicable via the bus 507.
 入力装置501は、例えば、キーボードやマウス、タッチパネル等である。表示装置502は、例えば、ディスプレイ等である。なお、コンピュータ500は、入力装置501及び表示装置502のうちの少なくとも一方を有していなくてもよい。 The input device 501 is, for example, a keyboard, a mouse, a touch panel, or the like. The display device 502 is, for example, a display or the like. The computer 500 may not have at least one of the input device 501 and the display device 502.
 外部I/F503は、外部装置とのインタフェースである。外部装置には、記録媒体503a等がある。コンピュータ500は、外部I/F503を介して、記録媒体503aの読み取りや書き込み等を行うことができる。記録媒体503aには、検索装置10が有する各機能部(文脈符号化部101、疑似スパース符号化部102(又は、正規化スパース符号化部109若しくは勾配推定型疑似スパース符号化部110)、転置インデックス活用部103及びランキング部104)を実現する1以上のプログラムが格納されていてもよい。同様に、記録媒体503aには、転置インデックス生成装置20が有する各機能部(文脈符号化部101、疑似スパース符号化部102(又は、正規化スパース符号化部109若しくは勾配推定型疑似スパース符号化部110)、及び転置インデックス生成部105)を実現する1以上のプログラムが格納されていてもよい。同様に、記録媒体503aには、学習装置30が有する各機能部(文脈符号化部101、疑似スパース符号化部102(又は、正規化スパース符号化部109若しくは勾配推定型疑似スパース符号化部110)、ランキング部104、分割部106、更新部107(又は、更新部107A)、及び判定部108)を実現する1以上のプログラムが格納されていてもよい。 The external I / F 503 is an interface with an external device. The external device includes a recording medium 503a and the like. The computer 500 can read and write the recording medium 503a via the external I / F 503. In the recording medium 503a, each functional unit (context coding unit 101, pseudo-sparse coding unit 102 (or normalized sparse coding unit 109 or gradient estimation type pseudo-sparse coding unit 110), and translocation of the search device 10 are included. One or more programs that realize the index utilization unit 103 and the ranking unit 104) may be stored. Similarly, the recording medium 503a has each functional unit (context coding unit 101, pseudo-sparse coding unit 102 (or normalized sparse coding unit 109 or gradient estimation type pseudo-sparse coding)) included in the inverted index generator 20. One or more programs that realize the unit 110) and the inverted index generation unit 105) may be stored. Similarly, the recording medium 503a has each functional unit (context coding unit 101, pseudo-sparse coding unit 102 (or normalized sparse coding unit 109 or gradient estimation type pseudo-sparse coding unit 110) included in the learning device 30. ), The ranking unit 104, the division unit 106, the update unit 107 (or the update unit 107A), and the determination unit 108) may be stored.
 なお、記録媒体503aには、例えば、CD(Compact Disc)、DVD(Digital Versatile Disc)、SDメモリカード(Secure Digital memory card)、USB(Universal Serial Bus)メモリカード等がある。 The recording medium 503a includes, for example, a CD (Compact Disc), a DVD (Digital Versatile Disc), an SD memory card (Secure Digital memory card), a USB (Universal Serial Bus) memory card, and the like.
 通信I/F504は、コンピュータ500を通信ネットワークに接続するためのインタフェースである。なお、検索装置10が有する各機能部を実現する1以上のプログラムは、通信I/F504を介して、所定のサーバ装置等から取得(ダウンロード)されてもよい。同様に、転置インデックス生成装置20が有する各機能部を実現する1以上のプログラムは、通信I/F504を介して、所定のサーバ装置等から取得されてもよい。同様に、学習装置30が有する各機能部を実現する1以上のプログラムは、通信I/F504を介して、所定のサーバ装置等から取得されてもよい。 The communication I / F 504 is an interface for connecting the computer 500 to the communication network. One or more programs that realize each functional unit of the search device 10 may be acquired (downloaded) from a predetermined server device or the like via the communication I / F 504. Similarly, one or more programs that realize each functional unit of the inverted index generation device 20 may be acquired from a predetermined server device or the like via the communication I / F 504. Similarly, one or more programs that realize each functional unit of the learning device 30 may be acquired from a predetermined server device or the like via the communication I / F 504.
 プロセッサ505は、例えば、CPU(Central Processing Unit)やGPU等の各種演算装置である。検索装置10が有する各機能部は、例えば、メモリ装置506に格納されている1以上のプログラムがプロセッサ505に実行させる処理により実現される。同様に、転置インデックス生成装置20が有する各機能部は、例えば、メモリ装置506に格納されている1以上のプログラムがプロセッサ505に実行させる処理により実現される。同様に、学習装置30が有する各機能部は、例えば、メモリ装置506に格納されている1以上のプログラムがプロセッサ505に実行させる処理により実現される。 The processor 505 is, for example, various arithmetic units such as a CPU (Central Processing Unit) and a GPU. Each functional unit included in the search device 10 is realized, for example, by a process in which one or more programs stored in the memory device 506 are executed by the processor 505. Similarly, each functional unit of the inverted index generation device 20 is realized by, for example, a process of causing the processor 505 to execute one or more programs stored in the memory device 506. Similarly, each functional unit included in the learning device 30 is realized by, for example, a process of causing the processor 505 to execute one or more programs stored in the memory device 506.
 メモリ装置506は、例えば、HDDやSSD、RAM(Random Access Memory)、ROM(Read Only Memory)、フラッシュメモリ等の各種記憶装置である。 The memory device 506 is, for example, various storage devices such as HDD, SSD, RAM (RandomAccessMemory), ROM (ReadOnlyMemory), and flash memory.
 第一の実施形態乃至第三の実施形態に係る検索装置10は、図21に示すコンピュータ500のハードウェア構成を有することにより、上述した検索処理を実現することができる。同様に、第一の実施形態乃至第三の実施形態に係る転置インデックス生成装置20は、図21に示すコンピュータ500のハードウェア構成を有することにより、上述した転置インデックス生成処理を実現することができる。同様に、第一の実施形態乃至第三の実施形態に係る学習装置30は、図21に示すコンピュータ500のハードウェア構成を有することにより、上述した学習処理を実現することができる。なお、図21に示すコンピュータ500のハードウェア構成は一例であって、コンピュータ500は、他のハードウェア構成を有していてもよい。例えば、コンピュータ500は、複数のプロセッサ505を有していてもよいし、複数のメモリ装置506を有していてもよい。 The search device 10 according to the first to third embodiments can realize the above-mentioned search process by having the hardware configuration of the computer 500 shown in FIG. 21. Similarly, the inverted index generation device 20 according to the first to third embodiments can realize the above-mentioned inverted index generation process by having the hardware configuration of the computer 500 shown in FIG. 21. .. Similarly, the learning device 30 according to the first to third embodiments can realize the above-mentioned learning process by having the hardware configuration of the computer 500 shown in FIG. 21. The hardware configuration of the computer 500 shown in FIG. 21 is an example, and the computer 500 may have another hardware configuration. For example, the computer 500 may have a plurality of processors 505 or may have a plurality of memory devices 506.
 以上の実施形態に関し、更に以下の付記を開示する。 Regarding the above embodiments, the following additional notes will be further disclosed.
 (付記1)
 メモリと、
 前記メモリに接続された少なくとも1つのプロセッサと、
 を含み、
 前記プロセッサは、
 検索クエリと、前記検索クエリに関連がある第1の文書と、前記検索クエリに関連がない第2の文書とが含まれる複数の訓練データを入力として、第1のニューラルネットワークのモデルパラメータを用いて、複数の前記検索クエリの特徴をそれぞれ表す複数の第1の特徴量ベクトルと、複数の前記第1の文書の特徴をそれぞれ表す複数の第2の特徴量ベクトルと、複数の前記第2の文書の特徴をそれぞれ表す複数の第3の特徴量ベクトルとを生成し、
 第2のニューラルネットワークのモデルパラメータを用いて、複数の前記第1の特徴量ベクトルと複数の前記第2の特徴量ベクトルと複数の前記第3の特徴量ベクトルとのそれぞれについて、正規化及び平均シフトにより各次元で0を取る要素の割合を調整することでスパース化した複数の第1の学習用スパース特徴量ベクトルと複数の第2の学習用スパース特徴量ベクトルと複数の第3の学習用スパース特徴量ベクトルとに変換し、
 複数の前記第1の学習用スパース特徴量ベクトルと複数の前記第2の学習用スパース特徴量ベクトルと複数の前記第3の学習用スパース特徴量ベクトルとを用いて、前記第1のニューラルネットワークのモデルパラメータと前記第2のニューラルネットワークのモデルパラメータとを更新する、学習装置。
(Appendix 1)
With memory
With at least one processor connected to the memory
Including
The processor
Using the model parameters of the first neural network as input to a plurality of training data including the search query, the first document related to the search query, and the second document not related to the search query. A plurality of first feature quantity vectors representing the features of the plurality of search queries, a plurality of second feature quantity vectors representing the features of the plurality of first documents, and a plurality of the second features. Generate a plurality of third feature vectors representing each feature of the document,
Normalization and averaging of each of the plurality of the first feature vector, the plurality of the second feature vector, and the plurality of the third feature vector using the model parameters of the second neural network. A plurality of first learning sparse feature vectors, a plurality of second learning sparse feature vectors, and a plurality of third learning sparses by adjusting the ratio of elements that take 0 in each dimension by shifting. Converted to a sparse feature vector and
Using the plurality of the first learning sparse feature quantity vectors, the plurality of the second learning sparse feature quantity vectors, and the plurality of the third learning sparse feature quantity vectors of the first neural network. A learning device that updates a model parameter and a model parameter of the second neural network.
 (付記2)
 前記プロセッサは、
 前記第1の特徴量ベクトルと、前記第2の特徴量ベクトルと、前記第3の特徴量ベクトルとのそれぞれについて、前記第2のニューラルネットワークの最終層に含まれる全結合層の出力ベクトルの各次元で要素の値を正規化及び平均シフトした後に、前記最終層の所定の条件を満たす発火関数の値を計算することで、第1のスパース特徴量ベクトルと第2のスパース特徴量ベクトルと第3のスパース特徴量ベクトルとに変換し、
 前記第1のスパース特徴量ベクトルと、前記第2のスパース特徴量ベクトルと、前記第3のスパース特徴量ベクトルとのそれぞれについて、各次元で所定の条件を満たす要素の値を0とすることで、前記第1の学習用スパース特徴量ベクトルと、前記第2の学習用スパース特徴量ベクトルと、前記第3の学習用スパース特徴量ベクトルとに変換する、付記1に記載の学習装置。
(Appendix 2)
The processor
For each of the first feature amount vector, the second feature amount vector, and the third feature amount vector, each of the output vectors of the fully connected layer included in the final layer of the second neural network. After normalizing and averaging the element values in the dimension, the first sparse feature vector, the second sparse feature vector, and the second sparse feature vector are calculated by calculating the value of the firing function that satisfies the predetermined condition of the final layer. Converted to 3 sparse feature vectors,
By setting the value of the element satisfying a predetermined condition in each dimension of each of the first sparse feature amount vector, the second sparse feature amount vector, and the third sparse feature amount vector to 0. The learning device according to Appendix 1, which converts the first learning sparse feature amount vector, the second learning sparse feature amount vector, and the third learning sparse feature amount vector into the third learning sparse feature amount vector.
 (付記3)
 前記プロセッサは、
 予め設定されたパラメータμ及びσとして、
 複数の前記第1の特徴量ベクトルに関する複数の前記出力ベクトルの各次元において、当該次元の要素を集めた集合の部分集合と前記パラメータσにより前記正規化を行うと共に、前記パラメータμにより平均シフトを行った後、前記発火関数の値を計算することで、前記第1の特徴量ベクトルを前記第1のスパース特徴量ベクトルに変換し、
 複数の前記第2の特徴量ベクトルに関する複数の前記出力ベクトルの各次元において、当該次元の要素を集めた集合の部分集合と前記パラメータσにより前記正規化を行うと共に、前記パラメータμにより平均シフトを行った後、前記発火関数の値を計算することで、前記第2の特徴量ベクトルを前記第2のスパース特徴量ベクトルに変換し、
 複数の前記第3の特徴量ベクトルに関する複数の前記出力ベクトルの各次元において、当該次元の要素を集めた集合の部分集合と前記パラメータσにより前記正規化を行うと共に、前記パラメータμにより平均シフトを行った後、前記発火関数の値を計算することで、前記第3の特徴量ベクトルを前記第3のスパース特徴量ベクトルに変換する、付記2に記載の学習装置。
(Appendix 3)
The processor
As preset parameters μ and σ,
In each dimension of the plurality of output vectors relating to the plurality of the first feature vector, the normalization is performed by the subset of the set of the elements of the dimension and the parameter σ, and the average shift is performed by the parameter μ. After that, by calculating the value of the firing function, the first feature quantity vector is converted into the first sparse feature quantity vector.
In each dimension of the plurality of output vectors relating to the plurality of the second feature vectors, the normalization is performed by the subset of the set of the elements of the dimension and the parameter σ, and the average shift is performed by the parameter μ. After that, by calculating the value of the firing function, the second feature amount vector is converted into the second sparse feature amount vector.
In each dimension of the plurality of output vectors relating to the plurality of the third feature vector, the normalization is performed by the subset of the set of the elements of the dimension and the parameter σ, and the average shift is performed by the parameter μ. The learning device according to Appendix 2, wherein the third feature amount vector is converted into the third sparse feature amount vector by calculating the value of the firing function.
 (付記4)
 メモリと、
 前記メモリに接続された少なくとも1つのプロセッサと、
 を含み、
 前記プロセッサは、
 検索クエリを入力として、第1のニューラルネットワークの学習済みモデルパラメータを用いて、前記検索クエリの特徴を表す特徴量ベクトルを生成し、
 第2のニューラルネットワークの学習済みモデルパラメータを用いて、前記特徴量ベクトルに対する全結合層の出力ベクトルを各次元で正規化及び平均シフトした後、所定の条件を満たす発火関数によりスパース化することで、第1のスパース特徴量ベクトルに変換し、
 前記第1のスパース特徴量ベクトルに含まれる非ゼロ要素に対応する次元のインデックスをキーとして、予め作成された転置インデックスを用いて、前記検索クエリに関連する文書の特徴をスパース化した第2のスパース特徴量ベクトルの集合をバリューとして取得し、
 tを、0<t≦100を満たす、予め設定された値として、前記第2のスパース特徴量ベクトルの同一次元の要素を集めた集合の中で上位t%に含まれない要素を0とした第3のスパース特徴量ベクトルを用いて、前記検索クエリと、前記検索クエリに関連する文書との関連度を算出する、検索装置。
(Appendix 4)
With memory
With at least one processor connected to the memory
Including
The processor
Using the search query as an input and using the trained model parameters of the first neural network, a feature vector representing the characteristics of the search query is generated.
By using the trained model parameters of the second neural network, the output vector of the fully connected layer with respect to the feature quantity vector is normalized and mean-shifted in each dimension, and then sparsed by an ignition function satisfying a predetermined condition. , Converted to the first sparse feature vector,
The second sparse feature of the document related to the search query is sparse using the inverted index created in advance, using the index of the dimension corresponding to the non-zero element included in the first sparse feature vector as a key. Get the set of sparse feature vectors as a value and
t is set as a preset value satisfying 0 <t ≦ 100, and the element not included in the upper t% in the set of the elements of the same dimension of the second sparse feature vector is set to 0. A search device that calculates the degree of association between the search query and a document related to the search query using a third sparse feature vector.
 (付記5)
 学習処理を実行するようにコンピュータによって実行可能なプログラムを記憶した非一時的記憶媒体であって、
 前記学習処理は、
 検索クエリと、前記検索クエリに関連がある第1の文書と、前記検索クエリに関連がない第2の文書とが含まれる複数の訓練データを入力として、第1のニューラルネットワークのモデルパラメータを用いて、複数の前記検索クエリの特徴をそれぞれ表す複数の第1の特徴量ベクトルと、複数の前記第1の文書の特徴をそれぞれ表す複数の第2の特徴量ベクトルと、複数の前記第2の文書の特徴をそれぞれ表す複数の第3の特徴量ベクトルとを生成し、
 第2のニューラルネットワークのモデルパラメータを用いて、複数の前記第1の特徴量ベクトルと複数の前記第2の特徴量ベクトルと複数の前記第3の特徴量ベクトルとのそれぞれについて、正規化及び平均シフトにより各次元で0を取る要素の割合を調整することでスパース化した複数の第1の学習用スパース特徴量ベクトルと複数の第2の学習用スパース特徴量ベクトルと複数の第3の学習用スパース特徴量ベクトルとに変換し、
 複数の前記第1の学習用スパース特徴量ベクトルと複数の前記第2の学習用スパース特徴量ベクトルと複数の前記第3の学習用スパース特徴量ベクトルとを用いて、前記第1のニューラルネットワークのモデルパラメータと前記第2のニューラルネットワークのモデルパラメータとを更新する、非一時的記憶媒体。
(Appendix 5)
A non-temporary storage medium that stores a program that can be executed by a computer to perform a learning process.
The learning process is
Using the model parameters of the first neural network as input to a plurality of training data including the search query, the first document related to the search query, and the second document not related to the search query. A plurality of first feature quantity vectors representing the features of the plurality of search queries, a plurality of second feature quantity vectors representing the features of the plurality of first documents, and a plurality of the second features. Generate a plurality of third feature vectors representing each feature of the document,
Normalization and averaging of each of the plurality of the first feature vector, the plurality of the second feature vector, and the plurality of the third feature vector using the model parameters of the second neural network. A plurality of first learning sparse feature vectors, a plurality of second learning sparse feature vectors, and a plurality of third learning sparses by adjusting the ratio of elements that take 0 in each dimension by shifting. Converted to a sparse feature vector and
Using the plurality of the first learning sparse feature vector, the plurality of the second learning sparse feature vector, and the plurality of the third learning sparse feature vector of the first neural network. A non-temporary storage medium that updates the model parameters and the model parameters of the second neural network.
 (付記6)
 検索処理を実行するようにコンピュータによって実行可能なプログラムを記憶した非一時的記憶媒体であって、
 前記検索処理は、
 検索クエリを入力として、第1のニューラルネットワークの学習済みモデルパラメータを用いて、前記検索クエリの特徴を表す特徴量ベクトルを生成し、
 第2のニューラルネットワークの学習済みモデルパラメータを用いて、前記特徴量ベクトルに対する全結合層の出力ベクトルを各次元で正規化及び平均シフトした後、所定の条件を満たす発火関数によりスパース化することで、第1のスパース特徴量ベクトルに変換し、
 前記第1のスパース特徴量ベクトルに含まれる非ゼロ要素に対応する次元のインデックスをキーとして、予め作成された転置インデックスを用いて、前記検索クエリに関連する文書の特徴をスパース化した第2のスパース特徴量ベクトルの集合をバリューとして取得し、
 tを、0<t≦100を満たす、予め設定された値として、前記第2のスパース特徴量ベクトルの同一次元の要素を集めた集合の中で上位t%に含まれない要素を0とした第3のスパース特徴量ベクトルを用いて、前記検索クエリと、前記検索クエリに関連する文書との関連度を算出する、非一時的記憶媒体。
(Appendix 6)
A non-temporary storage medium that stores a program that can be executed by a computer to perform a search process.
The search process is
Using the search query as an input and using the trained model parameters of the first neural network, a feature vector representing the characteristics of the search query is generated.
By using the trained model parameters of the second neural network, the output vector of the fully connected layer with respect to the feature quantity vector is normalized and mean-shifted in each dimension, and then sparsed by an ignition function satisfying a predetermined condition. , Converted to the first sparse feature vector,
The second sparse feature of the document related to the search query is sparse using the inverted index created in advance, using the index of the dimension corresponding to the non-zero element included in the first sparse feature vector as a key. Get the set of sparse feature vectors as a value and
t is set as a preset value satisfying 0 <t ≦ 100, and the element not included in the upper t% in the set of the elements of the same dimension of the second sparse feature vector is set to 0. A non-temporary storage medium for calculating the degree of association between the search query and a document related to the search query using a third sparse feature vector.
 本発明は、具体的に開示された上記の実施形態に限定されるものではなく、請求の範囲の記載から逸脱することなく、種々の変形や変更、既知の技術との組み合わせ等が可能である。 The present invention is not limited to the above-described embodiment specifically disclosed, and various modifications and modifications, combinations with known techniques, and the like are possible without departing from the description of the claims. ..
 10    検索装置
 20    転置インデックス生成装置
 30    学習装置
 100   符号化部
 100A  符号化部
 100B  符号化部
 101   文脈符号化部
 102   疑似スパース符号化部
 103   転置インデックス活用部
 104   ランキング部
 105   転置インデックス生成部
 106   分割部
 107   更新部
 107A  更新部
 108   判定部
 109   正規化スパース符号化部
 110   勾配推定型疑似スパース符号化部
10 Search device 20 Inverted index generator 30 Learning device 100 Coding unit 100A Coding unit 100B Coding unit 101 Contextual coding unit 102 Pseudo-sparse coding unit 103 Inverted index utilization unit 104 Ranking unit 105 Inverted index generation unit 106 Division 107 Update part 107A Update part 108 Judgment part 109 Normalized sparse coding part 110 Gradient estimation type pseudo sparse coding part

Claims (7)

  1.  検索クエリと、前記検索クエリに関連がある第1の文書と、前記検索クエリに関連がない第2の文書とが含まれる複数の訓練データを入力として、第1のニューラルネットワークのモデルパラメータを用いて、複数の前記検索クエリの特徴をそれぞれ表す複数の第1の特徴量ベクトルと、複数の前記第1の文書の特徴をそれぞれ表す複数の第2の特徴量ベクトルと、複数の前記第2の文書の特徴をそれぞれ表す複数の第3の特徴量ベクトルとを生成する特徴量生成部と、
     第2のニューラルネットワークのモデルパラメータを用いて、複数の前記第1の特徴量ベクトルと複数の前記第2の特徴量ベクトルと複数の前記第3の特徴量ベクトルとのそれぞれについて、正規化及び平均シフトにより各次元で0を取る要素の割合を調整することでスパース化した複数の第1の学習用スパース特徴量ベクトルと複数の第2の学習用スパース特徴量ベクトルと複数の第3の学習用スパース特徴量ベクトルとに変換する変換部と、
     複数の前記第1の学習用スパース特徴量ベクトルと複数の前記第2の学習用スパース特徴量ベクトルと複数の前記第3の学習用スパース特徴量ベクトルとを用いて、前記第1のニューラルネットワークのモデルパラメータと前記第2のニューラルネットワークのモデルパラメータとを更新する更新部と、
     を有することを特徴とする学習装置。
    Using the model parameters of the first neural network as input to a plurality of training data including the search query, the first document related to the search query, and the second document not related to the search query. A plurality of first feature quantity vectors representing the features of the plurality of search queries, a plurality of second feature quantity vectors representing the features of the plurality of first documents, and a plurality of the second features. A feature amount generator that generates a plurality of third feature amount vectors representing the features of the document, respectively.
    Normalization and averaging of each of the plurality of the first feature vector, the plurality of the second feature vector, and the plurality of the third feature vector using the model parameters of the second neural network. A plurality of first learning sparse feature vectors, a plurality of second learning sparse feature vectors, and a plurality of third learning sparses by adjusting the ratio of elements that take 0 in each dimension by shifting. A converter that converts to a sparse feature vector,
    Using the plurality of the first learning sparse feature vector, the plurality of the second learning sparse feature vector, and the plurality of the third learning sparse feature vector of the first neural network. An updater that updates the model parameters and the model parameters of the second neural network,
    A learning device characterized by having.
  2.  前記変換部は、
     前記第1の特徴量ベクトルと、前記第2の特徴量ベクトルと、前記第3の特徴量ベクトルとのそれぞれについて、前記第2のニューラルネットワークの最終層に含まれる全結合層の出力ベクトルの各次元で要素の値を正規化及び平均シフトした後に、前記最終層の所定の条件を満たす発火関数の値を計算することで、第1のスパース特徴量ベクトルと第2のスパース特徴量ベクトルと第3のスパース特徴量ベクトルとに変換し、
     前記第1のスパース特徴量ベクトルと、前記第2のスパース特徴量ベクトルと、前記第3のスパース特徴量ベクトルとのそれぞれについて、各次元で所定の条件を満たす要素の値を0とすることで、前記第1の学習用スパース特徴量ベクトルと、前記第2の学習用スパース特徴量ベクトルと、前記第3の学習用スパース特徴量ベクトルとに変換する、ことを特徴とする請求項1に記載の学習装置。
    The conversion unit
    For each of the first feature amount vector, the second feature amount vector, and the third feature amount vector, each of the output vectors of the fully connected layer included in the final layer of the second neural network. After normalizing and averaging the element values in the dimension, the first sparse feature vector, the second sparse feature vector, and the second sparse feature vector are calculated by calculating the value of the firing function that satisfies the predetermined condition of the final layer. Converted to 3 sparse feature vectors,
    By setting the value of the element satisfying a predetermined condition in each dimension of each of the first sparse feature amount vector, the second sparse feature amount vector, and the third sparse feature amount vector to 0. The first aspect of claim 1, wherein the first learning sparse feature amount vector is converted into the second learning sparse feature amount vector and the third learning sparse feature amount vector. Learning device.
  3.  前記変換部は、
     予め設定されたパラメータμ及びσとして、
     複数の前記第1の特徴量ベクトルに関する複数の前記出力ベクトルの各次元において、当該次元の要素を集めた集合の部分集合と前記パラメータσにより前記正規化を行うと共に、前記パラメータμにより平均シフトを行った後、前記発火関数の値を計算することで、前記第1の特徴量ベクトルを前記第1のスパース特徴量ベクトルに変換し、
     複数の前記第2の特徴量ベクトルに関する複数の前記出力ベクトルの各次元において、当該次元の要素を集めた集合の部分集合と前記パラメータσにより前記正規化を行うと共に、前記パラメータμにより平均シフトを行った後、前記発火関数の値を計算することで、前記第2の特徴量ベクトルを前記第2のスパース特徴量ベクトルに変換し、
     複数の前記第3の特徴量ベクトルに関する複数の前記出力ベクトルの各次元において、当該次元の要素を集めた集合の部分集合と前記パラメータσにより前記正規化を行うと共に、前記パラメータμにより平均シフトを行った後、前記発火関数の値を計算することで、前記第3の特徴量ベクトルを前記第3のスパース特徴量ベクトルに変換する、ことを特徴とする請求項2に記載の学習装置。
    The conversion unit
    As preset parameters μ and σ,
    In each dimension of the plurality of output vectors relating to the plurality of the first feature vector, the normalization is performed by the subset of the set of the elements of the dimension and the parameter σ, and the average shift is performed by the parameter μ. After that, by calculating the value of the firing function, the first feature quantity vector is converted into the first sparse feature quantity vector.
    In each dimension of the plurality of output vectors relating to the plurality of the second feature vectors, the normalization is performed by the subset of the set of the elements of the dimension and the parameter σ, and the average shift is performed by the parameter μ. After that, by calculating the value of the firing function, the second feature amount vector is converted into the second sparse feature amount vector.
    In each dimension of the plurality of output vectors relating to the plurality of the third feature vector, the normalization is performed by the subset of the set of the elements of the dimension and the parameter σ, and the average shift is performed by the parameter μ. The learning device according to claim 2, wherein the third feature amount vector is converted into the third sparse feature amount vector by calculating the value of the firing function.
  4.  検索クエリを入力として、第1のニューラルネットワークの学習済みモデルパラメータを用いて、前記検索クエリの特徴を表す特徴量ベクトルを生成する特徴量生成部と、
     第2のニューラルネットワークの学習済みモデルパラメータを用いて、前記特徴量ベクトルに対する全結合層の出力ベクトルを各次元で正規化及び平均シフトした後、所定の条件を満たす発火関数によりスパース化することで、第1のスパース特徴量ベクトルに変換する変換部と、
     前記第1のスパース特徴量ベクトルに含まれる非ゼロ要素に対応する次元のインデックスをキーとして、予め作成された転置インデックスを用いて、前記検索クエリに関連する文書の特徴をスパース化した第2のスパース特徴量ベクトルの集合をバリューとして取得する転置インデックス活用部と、
     tを、0<t≦100を満たす、予め設定された値として、前記第2のスパース特徴量ベクトルの同一次元の要素を集めた集合の中で上位t%に含まれない要素を0とした第3のスパース特徴量ベクトルを用いて、前記検索クエリと、前記検索クエリに関連する文書との関連度を算出する算出部と、
     を有することを特徴とする検索装置。
    A feature quantity generator that generates a feature quantity vector representing the characteristics of the search query using the trained model parameters of the first neural network with the search query as an input.
    By using the trained model parameters of the second neural network, the output vector of the fully connected layer with respect to the feature quantity vector is normalized and mean-shifted in each dimension, and then sparsed by an ignition function satisfying a predetermined condition. , A conversion unit that converts to the first sparse feature vector,
    The second sparse feature of the document related to the search query is sparse using the inverted index created in advance, using the index of the dimension corresponding to the non-zero element included in the first sparse feature vector as a key. Inverted index utilization unit that acquires a set of sparse feature vectors as a value,
    t is set as a preset value satisfying 0 <t ≦ 100, and the element not included in the upper t% in the set of the elements of the same dimension of the second sparse feature vector is set to 0. A calculation unit that calculates the degree of association between the search query and the document related to the search query using the third sparse feature vector.
    A search device characterized by having.
  5.  検索クエリと、前記検索クエリに関連がある第1の文書と、前記検索クエリに関連がない第2の文書とが含まれる複数の訓練データを入力として、第1のニューラルネットワークのモデルパラメータを用いて、複数の前記検索クエリの特徴をそれぞれ表す複数の第1の特徴量ベクトルと、複数の前記第1の文書の特徴をそれぞれ表す複数の第2の特徴量ベクトルと、複数の前記第2の文書の特徴をそれぞれ表す複数の第3の特徴量ベクトルとを生成する特徴量生成手順と、
     第2のニューラルネットワークのモデルパラメータを用いて、複数の前記第1の特徴量ベクトルと複数の前記第2の特徴量ベクトルと複数の前記第3の特徴量ベクトルとのそれぞれについて、正規化及び平均シフトにより各次元で0を取る要素の割合を調整することでスパース化した複数の第1の学習用スパース特徴量ベクトルと複数の第2の学習用スパース特徴量ベクトルと複数の第3の学習用スパース特徴量ベクトルとに変換する変換手順部と、
     複数の前記第1の学習用スパース特徴量ベクトルと複数の前記第2の学習用スパース特徴量ベクトルと複数の前記第3の学習用スパース特徴量ベクトルとを用いて、前記第1のニューラルネットワークのモデルパラメータと前記第2のニューラルネットワークのモデルパラメータとを更新する更新手順と、
     をコンピュータが実行することを特徴とする学習方法。
    Using the model parameters of the first neural network as input to a plurality of training data including the search query, the first document related to the search query, and the second document not related to the search query. A plurality of first feature quantity vectors representing the features of the plurality of search queries, a plurality of second feature quantity vectors representing the features of the plurality of first documents, and a plurality of the second features. A feature amount generation procedure for generating a plurality of third feature amount vectors representing each feature of a document, and a feature amount generation procedure.
    Normalization and averaging of each of the plurality of the first feature vector, the plurality of the second feature vector, and the plurality of the third feature vector using the model parameters of the second neural network. A plurality of first learning sparse feature vectors, a plurality of second learning sparse feature vectors, and a plurality of third learning sparses by adjusting the ratio of elements that take 0 in each dimension by shifting. Conversion procedure part to convert to sparse feature vector,
    Using the plurality of the first learning sparse feature vector, the plurality of the second learning sparse feature vector, and the plurality of the third learning sparse feature vector of the first neural network. An update procedure for updating the model parameters and the model parameters of the second neural network, and
    A learning method characterized by a computer performing.
  6.  検索クエリを入力として、第1のニューラルネットワークの学習済みモデルパラメータを用いて、前記検索クエリの特徴を表す特徴量ベクトルを生成する特徴量生成手順と、
     第2のニューラルネットワークの学習済みモデルパラメータを用いて、前記特徴量ベクトルに対する全結合層の出力ベクトルを各次元で正規化及び平均シフトした後、所定の条件を満たす発火関数によりスパース化することで、第1のスパース特徴量ベクトルに変換する変換手順と、
     前記第1のスパース特徴量ベクトルに含まれる非ゼロ要素に対応する次元のインデックスをキーとして、予め作成された転置インデックスを用いて、前記検索クエリに関連する文書の特徴をスパース化した第2のスパース特徴量ベクトルの集合をバリューとして取得する転置インデックス活用手順と、
     tを、0<t≦100を満たす、予め設定された値として、前記第2のスパース特徴量ベクトルの同一次元の要素を集めた集合の中で上位t%に含まれない要素を0とした第3のスパース特徴量ベクトルを用いて、前記検索クエリと、前記検索クエリに関連する文書との関連度を算出する算出手順と、
     をコンピュータが実行することを特徴とする検索方法。
    A feature quantity generation procedure for generating a feature quantity vector representing the characteristics of the search query using the trained model parameters of the first neural network with the search query as an input.
    By using the trained model parameters of the second neural network, the output vector of the fully connected layer with respect to the feature quantity vector is normalized and mean-shifted in each dimension, and then sparsed by an ignition function satisfying a predetermined condition. , The conversion procedure to convert to the first sparse feature vector,
    The second sparse feature of the document related to the search query is sparse using the inverted index created in advance, using the index of the dimension corresponding to the non-zero element included in the first sparse feature vector as a key. Inverted index utilization procedure to acquire a set of sparse feature vectors as a value,
    t is set as a preset value satisfying 0 <t ≦ 100, and the element not included in the upper t% in the set of the elements of the same dimension of the second sparse feature vector is set to 0. A calculation procedure for calculating the degree of association between the search query and the document related to the search query using the third sparse feature vector.
    A search method characterized by a computer running.
  7.  コンピュータを、請求項1乃至3の何れか一項に記載の学習装置、又は、請求項4に記載の検索装置、として機能させるプログラム。 A program that causes a computer to function as the learning device according to any one of claims 1 to 3 or the search device according to claim 4.
PCT/JP2020/045898 2020-12-09 2020-12-09 Learning device, search device, learning method, search method, and program WO2022123695A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/045898 WO2022123695A1 (en) 2020-12-09 2020-12-09 Learning device, search device, learning method, search method, and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/045898 WO2022123695A1 (en) 2020-12-09 2020-12-09 Learning device, search device, learning method, search method, and program

Publications (1)

Publication Number Publication Date
WO2022123695A1 true WO2022123695A1 (en) 2022-06-16

Family

ID=81973397

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/045898 WO2022123695A1 (en) 2020-12-09 2020-12-09 Learning device, search device, learning method, search method, and program

Country Status (1)

Country Link
WO (1) WO2022123695A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024042648A1 (en) * 2022-08-24 2024-02-29 日本電信電話株式会社 Training device, training method, and program

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6189002B1 (en) * 1998-12-14 2001-02-13 Dolphin Search Process and system for retrieval of documents using context-relevant semantic profiles
US20120233127A1 (en) * 2011-03-10 2012-09-13 Textwise Llc Method and System for Unified Information Representation and Applications Thereof
CN110928997A (en) * 2019-12-04 2020-03-27 北京文思海辉金信软件有限公司 Intention recognition method and device, electronic equipment and readable storage medium
CN111985228A (en) * 2020-07-28 2020-11-24 招联消费金融有限公司 Text keyword extraction method and device, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6189002B1 (en) * 1998-12-14 2001-02-13 Dolphin Search Process and system for retrieval of documents using context-relevant semantic profiles
US20120233127A1 (en) * 2011-03-10 2012-09-13 Textwise Llc Method and System for Unified Information Representation and Applications Thereof
CN110928997A (en) * 2019-12-04 2020-03-27 北京文思海辉金信软件有限公司 Intention recognition method and device, electronic equipment and readable storage medium
CN111985228A (en) * 2020-07-28 2020-11-24 招联消费金融有限公司 Text keyword extraction method and device, computer equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024042648A1 (en) * 2022-08-24 2024-02-29 日本電信電話株式会社 Training device, training method, and program

Similar Documents

Publication Publication Date Title
Chen et al. Knowprompt: Knowledge-aware prompt-tuning with synergistic optimization for relation extraction
Bui et al. Neural graph learning: Training neural networks using graphs
Wang et al. Online collective matrix factorization hashing for large-scale cross-media retrieval
Chen et al. Extreme learning machine and its applications in big data processing
Bui et al. Neural graph machines: Learning neural networks using graphs
Sybrandt et al. AGATHA: automatic graph mining and transformer based hypothesis generation approach
Li et al. A deep graph structured clustering network
WO2022123695A1 (en) Learning device, search device, learning method, search method, and program
WO2021215042A1 (en) Learning device, search device, learning method, search method, and program
Pal et al. Parameter-efficient sparse retrievers and rerankers using adapters
Hajjar et al. Unsupervised extractive text summarization using frequency-based sentence clustering
JP7363929B2 (en) Learning device, search device, learning method, search method and program
Zhang et al. Meta-complementing the semantics of short texts in neural topic models
Chen et al. SAEA: self-attentive heterogeneous sequence learning model for entity alignment
WO2024042648A1 (en) Training device, training method, and program
Wróbel et al. Improving text classification with vectors of reduced precision
Yuan et al. SSF: sentence similar function based on Word2vector similar elements
He et al. Bayesian attribute bagging-based extreme learning machine for high-dimensional classification and regression
Zhang et al. Topic Modeling on Document Networks with Dirichlet Optimal Transport Barycenter
Li et al. Stochastic variational inference-based parallel and online supervised topic model for large-scale text processing
Manjunath et al. Encoder-attention-based automatic term recognition (ea-atr)
Gong et al. Neuro-Symbolic Embedding for Short and Effective Feature Selection via Autoregressive Generation
Pradana et al. Movie recommendation system using hybrid filtering with word2vec and restricted boltzmann machines
Pourbahman et al. Deep neural ranking model using distributed smoothing
Alemayehu et al. A submodular optimization framework for imbalanced text classification with data augmentation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20965078

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20965078

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP