WO2014020834A1 - 単語潜在トピック推定装置および単語潜在トピック推定方法 - Google Patents
単語潜在トピック推定装置および単語潜在トピック推定方法 Download PDFInfo
- Publication number
- WO2014020834A1 WO2014020834A1 PCT/JP2013/004242 JP2013004242W WO2014020834A1 WO 2014020834 A1 WO2014020834 A1 WO 2014020834A1 JP 2013004242 W JP2013004242 W JP 2013004242W WO 2014020834 A1 WO2014020834 A1 WO 2014020834A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- topic
- word
- topics
- estimation
- probability
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/258—Heading extraction; Automatic titling; Numbering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
Definitions
- the present invention relates to a word potential topic estimation apparatus and a word potential topic estimation method for estimating a potential topic for a word in document data.
- a latent topic estimation device In the field of natural language processing, it is required to handle the meaning behind words rather than just text data as a string of symbols.
- a latent topic estimation device (hereinafter, referred to as a latent topic estimation device) has attracted attention.
- Topic is data that expresses the concept, meaning, and field behind each word.
- a latent topic is a topic that is automatically extracted based on the assumption that “words with similar topics are likely to co-occur in the same document” instead of a topic that has been manually defined in advance. To do.
- a potential topic may be simply referred to as a topic.
- the estimation of the potential topic is based on the assumption that there are k potential topics behind the word included in the document, and the number 0 to (k ⁇ 1) for each word. Is a process of estimating a value indicating whether or not each of the potential topics is related.
- LSA latent semantic analysis
- PLSA probabilistic latent semantic analysis
- LDA latent Dirichlet allocation method
- LDA is a latent topic estimation method assuming that each document is a mixture of k latent topics.
- the LDA assumes a document generation model based on this assumption, and can estimate a probability distribution in which each word expresses a relationship between potential topics according to the generation model.
- the generation of a document in the LDA is determined by the following two types of parameters.
- ⁇ _ ⁇ t ⁇ is a parameter of the Dirichlet distribution that generates the topic t.
- ⁇ _ ⁇ t, v ⁇ represents a probability (word topic probability) that the word v is selected from the topic t. Note that _ ⁇ t, v ⁇ indicates that the subscripts t and v are added below ⁇ .
- the LDA generation model is a model that generates words in the following procedure according to these parameters.
- word generation is repeated a long time according to the mixture ratio.
- Each word is generated by determining one topic t according to the topic mixing ratio ⁇ _ ⁇ j, t ⁇ and then selecting the word v according to the probability ⁇ _ ⁇ t, v ⁇ .
- LDA assumes such a generation model and can estimate ⁇ and ⁇ by giving document data. This estimation is based on the principle of maximum likelihood estimation, and is performed by calculating ⁇ _ ⁇ t ⁇ and ⁇ _ ⁇ t, v ⁇ that are highly likely to reproduce a set of document data.
- LDA latent topic estimation
- Non-Patent Document 1 describes a method of estimating ⁇ and ⁇ sequentially (each time a document is added).
- the latent topic estimation apparatus to which the method described in Non-Patent Document 1 is applied estimates the word topic probability ⁇ by repeating the calculation of the following parameters when a document j is given.
- FIG. 9 is an explanatory diagram illustrating an example of the configuration of the latent topic estimation device to which the method described in Non-Patent Document 1 is applied.
- the latent topic estimation apparatus shown in FIG. 9 repeatedly calculates the following parameters in order to estimate ⁇ .
- ⁇ _ ⁇ j, t ⁇ ⁇ ⁇ k ⁇ is a parameter (document topic parameter) on the Dirichlet distribution representing the appearance probability of the topic t on the document j.
- ⁇ ⁇ k ⁇ indicates that the subscript k is added on ⁇ .
- ⁇ _ ⁇ j, i, t ⁇ ⁇ ⁇ k ⁇ is a probability (document word topic probability) that the i-th word in document j is assigned to topic t.
- n_ ⁇ j, t ⁇ ⁇ ⁇ k ⁇ is an expected value (number of document topics) of the number of times the document j has been assigned to the topic t.
- n_ ⁇ j, t, v ⁇ ⁇ ⁇ k ⁇ is an expected value (number of word topics) in which word v is assigned to topic t in document j.
- FIG. 9 shows the configuration of the latent topic estimation device that pays particular attention only to the estimation of the word topic probability ⁇ .
- the latent topic estimation apparatus shown in FIG. 9 includes a document data adding unit 501 for registering document data including one or more words input by a user operation or an external program, and a mixed topic distribution for the added document.
- the topic estimation unit 502 that estimates the latent topic by repeatedly calculating the document word topic probability according to the generation model based on the assumption, the topic distribution storage unit 504 that stores the number of word topics calculated by the topic estimation unit 502, and the topic
- the data update unit 503 that updates data in the topic distribution storage unit 504 based on the number of word topics calculated by the estimation unit 502, and the number of word topics in the topic distribution storage unit 504 when called by a user operation or an external program
- the word topic probability is calculated based on the word topic and the result is output.
- FIG. 10 is a flowchart showing the topic estimation processing of the latent topic estimation device shown in FIG. 9
- the latent topic estimation apparatus shown in FIG. 9 starts processing when a document including one or more words is added to the document data adding unit 501.
- the added document is input to the topic estimation unit 502.
- the topic estimation unit 502 sequentially examines words in the document data, and performs probability estimation by repeatedly updating the document word topic probability, the number of document topics, the number of word topics, and the document topic parameter.
- the topic estimation unit 502 first calculates an initial value of the following values (step n1).
- N_ ⁇ j, t ⁇ ⁇ ⁇ old ⁇ is an initial value of the number of document topics, and is calculated by Expression 2 '.
- n_ ⁇ j, t, v ⁇ ⁇ ⁇ old ⁇ is an initial value of the number of word topics, and is calculated by Expression 2 ′.
- ⁇ _ ⁇ j, t ⁇ ⁇ ⁇ k ⁇ is an initial value of the document topic parameter, and is calculated by Expression 3.
- ⁇ _ ⁇ t, v ⁇ ⁇ ⁇ k ⁇ is an initial value of the word topic probability, and is calculated by Expression 4 ′.
- ⁇ _ ⁇ j, i, t ⁇ ⁇ ⁇ old ⁇ is an initial value of the document word topic probability and is randomly assigned.
- the function I (condition) in the expressions 2 and 2 ' is a function that returns 1 when the condition is satisfied and returns 0 when the condition is not satisfied.
- w_ ⁇ j, i ⁇ means the i-th word of document j.
- the topic estimation unit 502 for each word, for each topic t (0 ⁇ t ⁇ k), ⁇ _ ⁇ j, i, t ⁇ ⁇ ⁇ k ⁇ , ⁇ _ ⁇ t, v ⁇ ⁇ ⁇ k ⁇ , ⁇ _
- a process of updating the value of ⁇ j, t ⁇ ⁇ ⁇ k ⁇ is performed (step n2). These update processes are performed by calculating Formula 1, Formula 2, Formula 3, and Formula 4 in order.
- Equation 1 ⁇ (x) represents a digamma function, and exp (x) represents an exponential function.
- A_ ⁇ t, v ⁇ in Expression 4 is stored in the topic distribution storage unit 504. When there is no corresponding value in the topic distribution storage unit 504, such as when the first document is added, A_ ⁇ t, v ⁇ is set to 0.
- the topic estimation unit 502 prepares for the next update process, ⁇ _ ⁇ j, i, t ⁇ ⁇ ⁇ old ⁇ , n_ ⁇ j, t ⁇ ⁇ ⁇ old ⁇ , n_ ⁇ j, Let t, v ⁇ ⁇ ⁇ old ⁇ be the values ⁇ _ ⁇ j, i, t ⁇ ⁇ ⁇ k ⁇ , n_ ⁇ j, t ⁇ ⁇ ⁇ k ⁇ , n_ ⁇ j, t, v calculated in the current topic estimation. ⁇ Replace with ⁇ k ⁇ . Then, the expressions 1 to 4 are updated again for each word.
- the topic estimation unit 502 makes an end determination (step n3).
- the number of times that step n2 has been performed is remembered, and when the number of steps n2 is completed (Yes in step n3), topic estimation unit 502 ends the process.
- the data update unit 503 updates the value in the topic distribution storage unit 504 based on the number of word topics n_ ⁇ j, t, v ⁇ among the values calculated by the topic estimation unit 502.
- the update is performed according to Equation 5.
- the word topic distribution output unit 505 is called by a user operation or an external program.
- the word topic distribution output unit 505 outputs ⁇ _ ⁇ t, v ⁇ by Expression 6 based on the value in the topic distribution storage unit 504.
- This method accumulates all documents, does not repeat the estimation process for all documents, and performs estimation only for the added documents when documents are added. As a result, probability estimation can be performed efficiently, and it is known that the operation is faster than a general LDA. However, the speed is not sufficient. In particular, when the number of topics k is large, a processing time proportional to the number of topics is required, which takes a long time. For this problem, it is conceivable to use a hierarchical clustering method.
- Non-Patent Document 2 describes a hierarchical clustering method.
- topic assignment to each document can be calculated in the order of log (K).
- this method is a technique in a similar field, it is a technique for assigning topics to a document to the last, and it is impossible to estimate a topic assignment probability to words.
- there is a single topic assigned to each data and a mixed state of a plurality of topics cannot be expressed.
- the potential topic estimation method that can handle mixed topics takes a large processing time in proportion to the number of topics.
- a method of efficiently processing even when the number of topics is large there is a hierarchical latent topic estimation method.
- a topic tree and a hierarchical latent topic estimation are defined.
- the topic tree is data of a W-ary tree having a depth D, in which topics are nodes and semantic inclusion relations between topics are edges.
- Each topic in the topic tree has a unique ID (topic ID) in each layer.
- Each solid circle shown in FIG. 11 represents a topic, and the number described in the circle represents a topic ID.
- This topic has two levels. Specifically, it has an upper hierarchy of 0 to 2 topics at the first level when viewed from the root, and a lower hierarchy of 0 to 8 topics below it.
- the edge between the upper hierarchy and the lower hierarchy means an inclusion relationship.
- the concept of topics 0-2 in the lower hierarchy is included in the concept of topic 0 in the upper hierarchy.
- the concept of topics 3-5 in the lower hierarchy means that it is included in the concept of topic 1 in the upper hierarchy.
- Hierarchical latent topic estimation means topic estimation for each word in each hierarchy so that there is no contradiction in the meaning inclusion relationship between topics when such a topic tree is assumed.
- the present invention provides a word potential topic estimation apparatus and a word potential topic estimation method that can perform hierarchical processing and can estimate a word potential topic at high speed while considering a mixed state of topics.
- the purpose is to provide.
- the word potential topic estimation apparatus includes a document data adding unit that inputs a document including one or more words, and a hierarchical structure of topics in order to estimate the potential topics of words in each hierarchy.
- a hierarchy setting unit that sets the number of topics, and for a word in a document based on the result of topic estimation in a certain hierarchy, an identifier of a topic that may be assigned to the word and a probability that the word is assigned to the topic
- the upper constraint is referred to, and the probability assigned to the parent topic in the upper hierarchy is used as a weight.
- a topic estimation unit with upper constraints that performs estimation processing on lower topics are used as a topic.
- a document including one or more words is input, and the number of topics in each layer is set according to the hierarchical structure of the topic in order to hierarchically estimate the word potential topic.
- a top constraint indicating the topic identifier that may be assigned to the word and the probability assigned to the topic is created and input
- the upper limit is referred to, and the probability assigned to the parent topic in the upper hierarchy is used as a weight to estimate the lower topic.
- Embodiment 1 FIG. A first embodiment of the present invention will be described below with reference to the drawings.
- FIG. 1 is a block diagram showing an example of the configuration of the first embodiment of the word latent topic estimation device according to the present invention.
- the word latent topic estimation apparatus according to the present invention includes a document data addition unit 101, a hierarchy setting unit 102, a topic estimation unit 103 with upper constraints, a data update unit 104, a higher constraint creation unit 105, and a topic distribution storage unit 106. And an upper constraint buffer 107 and a word topic distribution output unit 108.
- the document data adding unit 101 registers document data including one or more words input by a user operation or an external program.
- the hierarchy setting unit 102 sets the number of topics k based on the preset setting parameters of width W and depth D, and calls the processing of the topic estimation unit 103 with upper constraints.
- the topic estimation unit 103 with upper constraints receives the number of topics k set in the hierarchy setting unit 102, the document data passed from the document data addition unit 101, and the upper constraints in the upper constraint buffer 107 as inputs, and the number of topics k topic estimation is performed.
- the data update unit 104 updates the data in the topic distribution storage unit 106 based on the number of word topics calculated by the topic estimation unit 103 with upper constraints.
- the upper constraint creation unit 105 is called after the processing of the data update unit 104, and creates an upper constraint based on the document word topic probability calculated by the topic estimation unit 103 with upper constraint.
- the upper constraint creation unit 105 registers the created upper constraint in the upper constraint buffer 107 and calls the hierarchy setting unit 102.
- the topic distribution storage unit 106 stores the number of word topics passed from the data update unit 104.
- the topic distribution storage unit 106 holds the number of word topics using the word v, the number of topics k, and the topic t as keys.
- the topic distribution storage unit 106 stores information of the following data structure.
- Word k: Topic ID ⁇ Number of word topics
- This example shows that, in the topic estimation with 4 topics, the number of word topics for the 0th topic of the word “children” is 2.0, and the number of word topics for the 1st topic of the word “children” is 1.0.
- the upper constraint buffer 107 is a buffer that stores the upper constraint created by the upper constraint creation unit 105.
- the upper constraint buffer 107 includes a topic ID that may be assigned to the i-th word in the document in the topic estimation of the upper hierarchy performed immediately before, and the document word topic probability ⁇ for the topic. Hold.
- the upper constraint buffer 107 stores information of the following data structure.
- This example shows that the probability that the fifth word is assigned to topic 0 is 0.3 and the probability that topic 5 is assigned is 0.7.
- the word topic distribution output unit 108 When the word topic distribution output unit 108 is called by a user operation or an external program, it calculates a word topic probability based on the number of word topics in the topic distribution storage unit 106 and outputs the result.
- the document data adding unit 101, the hierarchy setting unit 102, the topic estimation unit 103 with upper constraints, the data update unit 104, the upper constraint creation unit 105, and the word topic distribution output unit 108 are controlled by a CPU or the like included in the word latent topic estimation device. Realized.
- topic distribution storage unit 106 and the upper constraint buffer 107 are realized by a storage device such as a memory provided in the word latent topic estimation device, for example.
- the processing of this embodiment is large and consists of a document data addition flow and an output flow.
- FIG. 2 is a flowchart showing a document data addition flow in the first embodiment. First, the flow of adding document data will be described with reference to FIG.
- the document data addition flow starts when a user operation or an external program inputs document data including one or more words.
- the hierarchy setting unit 102 When document data is added, the hierarchy setting unit 102 first sets an initial value 1 to the number of topics k (step u01). Next, the hierarchy setting unit 102 multiplies the topic number k by W (step u02). Next, the hierarchy setting unit 102 determines whether to end the process based on the value of k (step u03).
- step u07 the word latent topic estimation device performs a termination process. In the end process, the upper constraint buffer 107 is emptied in preparation for the next document addition. Otherwise (No in step u03), the word latent topic estimation device proceeds to the process in step u04 for performing the topic estimation process for the topic number k. In step u04, the upper-constrained topic estimation unit 103 performs a latent topic estimation process for the topic number k.
- the topic estimator 103 with upper constraints performs the same processing as the topic estimator 502 shown in FIG. 9 (steps n1 to n3 shown in FIG. 10). Make an estimate.
- the topic estimation unit 103 with upper constraint performs an estimation process so as to assign only to a topic that satisfies the upper constraint in the latent topic estimation of each word.
- FIG. 3 is a flowchart showing the topic estimation process with upper constraints in the document data addition flow, that is, the process of step u04 in FIG. Details of the process of step u04 will be described with reference to FIG.
- the topic estimation unit 103 with upper constraints first refers to the upper constraint buffer 107 and acquires upper constraints for all positions in the document. (Step u041). If the upper constraint buffer 107 is empty (Yes in step u042), the topic estimation unit 103 with upper constraint shifts to the process of step n1 shown in FIG. 10 and executes the processes up to step n3. When the normal topic estimation process is completed, the topic estimation unit 103 with upper constraints ends the process. Otherwise (No at step u042), the process proceeds to step u043.
- the topic estimation unit 103 with upper constraints compares the quotient obtained by dividing the topic ID of the upper layer included in the upper constraints with the topic IDs of 0 to k ⁇ 1 by W, and creates an allowed topic list. .
- the top-constrained topic estimation unit 103 calculates an initial value of the probability parameter (step u044). This process is the same as the process of step n1 shown in FIG. Of these, the initial value of ⁇ _ ⁇ j, i, t ⁇ ⁇ ⁇ old ⁇ is not assigned a probability to all topics at random, but is assigned a probability only to an allowed topic, Other topics may be assigned 0.
- the topic estimation unit 103 with upper constraints updates the values of Expression 7, Expression 2, Expression 3, and Expression 4 for each word in the document (Step u045).
- this update process is performed only for topics that satisfy the upper constraints.
- This processing is performed using Expression 7, Expression 2, Expression 3, and Expression 4, and ⁇ _ ⁇ j, i, t ⁇ ⁇ ⁇ k ⁇ , ⁇ _ ⁇ t, v ⁇ ⁇ ⁇ k ⁇ , n_ ⁇ j, t ⁇ ⁇
- This is implemented by sequentially updating ⁇ k ⁇ , n_ ⁇ j, t, v ⁇ ⁇ ⁇ k ⁇ , ⁇ _ ⁇ j, t ⁇ ⁇ ⁇ k ⁇ .
- cons in Expression 7 represents a set of IDs of allowable topics.
- ⁇ _ ⁇ j, i, t / W ⁇ ⁇ ⁇ k / W ⁇ represents the document word topic probability of the parent topic included in the upper constraint.
- Expression 7 fixes the probability value other than the allowed topic to 0, and the document word topic probability ⁇ _ ⁇ j, i, t / W ⁇ ⁇ ⁇ k of the parent topic. The difference is that / W ⁇ is used as a weight.
- Equation 7 in the probability estimation of the number of topics k, it is possible to assign a probability considering the result of the probability estimation of the topic k / W.
- the topic estimation unit 103 When the update process is completed for all the words in the document j, the topic estimation unit 103 with upper constraints prepares for the next update process, ⁇ _ ⁇ j, i, t ⁇ ⁇ ⁇ old ⁇ , n_ ⁇ j, t ⁇ ⁇ ⁇ Old ⁇ , n_ ⁇ j, t, v ⁇ ⁇ ⁇ old ⁇ is replaced with the value ⁇ _ ⁇ j, i, t ⁇ ⁇ ⁇ k ⁇ , n_ ⁇ j, t ⁇ ⁇ ⁇ k calculated by the current topic estimation. ⁇ , N_ ⁇ j, t, v ⁇ ⁇ ⁇ k ⁇ .
- step u046 the topic estimation unit 103 with upper constraints performs an end determination of step u046. This process is the same as the process of step n3 shown in FIG.
- the word latent topic estimation apparatus proceeds to the process of step u05.
- the data update unit 104 stores the value in the topic distribution storage unit 106 based on the number of word topics n_ ⁇ j, t, v ⁇ ⁇ ⁇ k ⁇ among the values calculated by the topic estimation unit 103 with higher constraints. Update the value.
- the update is performed according to Equation 5.
- the upper constraint creation unit 105 creates a higher constraint based on the document word topic probability ⁇ calculated by the topic estimation unit 103 with upper constraint (step u06). This process is performed as follows.
- the upper constraint creation unit 105 empties the upper constraint buffer 107 at that time.
- the upper constraint creation unit 105 performs the following processing for each word.
- IDs of topics whose document word topic probability values are larger than a threshold TOPIC_MIN are extracted and placed in the allowed list cons (j, i).
- the upper constraint creation unit 105 updates the value of ⁇ _ ⁇ j, i, t ⁇ ⁇ ⁇ k ⁇ with respect to t in the allowance list using Expression 8. Then, the upper constraint creation unit 105 stores the ID of the topic included in cons (j, i) and ⁇ _ ⁇ j, i, t ⁇ ⁇ ⁇ k ⁇ for the topic in the upper constraint buffer 107 using the position i as a key. to add.
- the word latent topic estimation device returns to the processing of step u02 and performs the processing of the next layer.
- the output flow is started when the word topic distribution output unit 108 is called by a user operation or an external program. Based on the data stored in the topic distribution storage unit 106, the word topic distribution output unit 108 calculates the word topic probability of each topic t with respect to all the words v for each topic number k using Equation 6. Output.
- step u04 latent topic estimation of 4 topics is performed.
- the same iterative process as the topic estimation unit 502 shown in FIG. 9 is performed in step u04 and step u05.
- step u06 the upper constraint creation unit 105 creates a constraint for the lower hierarchy based on these ⁇ . This process will be described.
- the upper constraint creation unit 105 updates the values of ⁇ _ ⁇ 555,1,1 ⁇ ⁇ ⁇ 4 ⁇ and ⁇ _ ⁇ 555,1,2 ⁇ ⁇ ⁇ 4 ⁇ by Equation 8 as follows.
- the upper constraint creation unit 105 adds the next upper constraint to the upper constraint buffer 107.
- the topic estimation unit 103 with upper constraint first reads the next data from the upper constraint buffer 107 in step u041.
- the topic estimation unit 103 with upper constraints creates an allowable topic list at each position for each word. For the position 0, since the topic of the upper constraint is 0, among 0 to 15, ⁇ 0, 1, 2, 3 ⁇ where the quotient of W is 0 is created as the allowed topic list. For position 1, since the topic of the upper constraint is ⁇ 1, 2 ⁇ , out of 0 to 15, the quotient of W is 1, and the quotient of W is 2 From ⁇ 8, 9, 10, 11 ⁇ , ⁇ 4, 5, 6, 7, 8, 9, 10, 11 ⁇ is created as an allowed topic list.
- the topic estimation unit 103 with upper constraints calculates initial values of ⁇ , ⁇ , ⁇ , and n (step u044).
- the topic estimation unit 103 performs the update process of ⁇ , ⁇ , ⁇ , n corresponding to the topics in the allowed topic list (step u045).
- attention is focused on the calculation of ⁇ _ ⁇ 555, i, t ⁇ ⁇ ⁇ 16 ⁇ .
- base_ ⁇ j, i, t ⁇ ⁇ ⁇ k ⁇ given by Equation 9 among Equation 7 is calculated as follows.
- the calculation is performed as follows. From the upper constraint 0 ⁇ 0: 1, the document word topic probability ⁇ _ ⁇ 555, 0, t ⁇ ⁇ ⁇ 4 ⁇ in the upper layer can be regarded as follows.
- Allowable topics are 0-3, and subsequent calculations need only be performed for topics 0-3. That is, in the calculation of ⁇ _ ⁇ 555, 0, 0 ⁇ ⁇ ⁇ 16 ⁇ , only the following calculation is performed.
- ⁇ _ ⁇ 555,0,1 ⁇ ⁇ ⁇ 4 ⁇ , ⁇ _ ⁇ 555,0,2 ⁇ ⁇ ⁇ 4 ⁇ , and ⁇ _ ⁇ 555,0,3 ⁇ ⁇ ⁇ 4 ⁇ are 0. it can.
- ⁇ _ ⁇ 555,0, t ⁇ ⁇ ⁇ 16 ⁇ is calculated as follows.
- the calculation for the word “year” at position 1 is performed. Again, pay attention to the calculation of ⁇ . Since the upper constraint for position 1 is “1 ⁇ 1: 0.25, 2: 0.75”, the document word topic probability ⁇ _ ⁇ 555,1, t ⁇ ⁇ ⁇ 4 ⁇ in the upper layer is considered as follows. be able to.
- the hierarchy setting unit 102 updates k to 64 in step u02. As a result, since k> 16, the end process of step u07 is performed, and this process flow ends.
- topics can be estimated hierarchically without performing estimation processing for extra topics while considering the mixing probability of a plurality of topics due to upper constraints.
- the normal latent topic estimation requires estimation of 100 topics for each word.
- FIG. 4 is a block diagram showing an example of the configuration of the second embodiment of the word latent topic estimation device according to the present invention.
- the word latent topic estimation apparatus of the second embodiment includes an initial value storage unit 201 and an initial value update unit 202 in addition to the configuration of the first embodiment.
- the initial value storage unit 201 stores an initial value of the topic number k set by the hierarchy setting unit 102. Specifically, the initial value storage unit 201 holds an initial value initK of k. It is assumed that initK is set as W ⁇ (D-1) before the document is added.
- the initial value update unit 202 is called from the hierarchy setting unit 102 and is based on the document word topic probability calculated by the higher-constrained topic estimation unit 103 and the number of documents added so far, in the initial value storage unit 201. Update the initial value of the topic number k.
- the initial value update unit 202 is realized by a CPU or the like provided in the word latent topic estimation device.
- the initial value storage unit 201 is realized by a storage device such as a memory provided in the word latent topic estimation device, for example.
- FIG. 5 is a flowchart showing a document data addition flow in the second embodiment.
- the document data addition flow in the second embodiment starts processing when a user operation or an external program inputs document data including one or more words.
- the hierarchy setting unit 102 reads the initial value initK of k in the initial value storage unit 201 and sets it as the initial value of k (step u101).
- the subsequent processing (steps u102 to u105) is the same as the processing of steps u02 to u05 of the first embodiment.
- FIG. 6 is a flowchart showing the process of step u106, that is, the process of assigning the number of word topics to the upper topic by the data updating unit 104.
- the assignment of the number of word topics n_ ⁇ j, t, v ⁇ to the upper topic is performed according to the processing flow of FIG.
- the initial value of the parameter p is set to 1 (step u0621).
- the value of p is updated to W times (step u0622).
- the data updating unit 104 compares the current k and p, and if k and p are equal (Yes in step u0623), the process is terminated. Otherwise (No in step u0623), the data updating unit 104 adds n_ ⁇ j, t, v ⁇ to A_ ⁇ k / p, t / p, v ⁇ for the upper layer topic to store the topic distribution.
- the data in the unit 106 is updated (step u0624).
- step u0624 is performed for each t.
- the word latent topic estimation device performs the upper constraint creation process of the subsequent step u107, and proceeds to the end process (step u108).
- the hierarchy value setting unit 102 calls the initial value update unit 202 after performing the termination process.
- the initial value update unit 202 updates the initial value initK of the topic number k in the initial value storage unit 201 in preparation for the next and subsequent document addition (step u109).
- the amount of calculation can be reduced by performing filtering in the upper layer, and initK is set to be small according to the reduction effect.
- This reduction effect E is calculated by the following equation.
- nCost (k) represents a calculation amount when the latent topic estimation of the number of topics k is performed by a normal calculation method, and is calculated by Expression 10.
- len_ ⁇ j ⁇ in Expression 10 means the number of words included in the document j.
- initK is used as it is
- the number of topics k is initK * W
- this topic estimation means that a calculation amount k times the number of words included in the document j is required.
- upCost (k) is a function that expresses the amount of calculation when the upper-level constraint obtained as a result of estimating the number of potential topics of k / W topics on one layer is used. upCost (k) is calculated by Equation 11.
- Equation 11 means the amount of calculation required to perform the potential level estimation of the next upper layer, that is, the number of topics k / W.
- F (k) in the second term means the amount of calculation necessary for estimating the potential topic k * W using the upper constraints created by the topic number k latent topic estimation.
- F (k) is calculated by Expression 12.
- Equation 13 In order to calculate F (k / W) in Equation 11, ⁇ _ ⁇ j, i, t ⁇ ⁇ ⁇ k / W ⁇ in topic estimation on one layer is required. This value is estimated by Equation 13 from the value of ⁇ _ ⁇ j, i, t ⁇ ⁇ ⁇ k ⁇ calculated by the topic estimation unit 103 with upper constraints.
- the function c (p) in the expression means a set of topics that are children of the topic p in the topic tree.
- the initial value update unit 202 calculates E, and when E is larger than a threshold value E_MIN (for example, 0), it updates initK in the initial value storage unit 201 to initK / W.
- E_MIN for example, 0
- the reduction effect is estimated in order to update the initial value initK.
- InitK may be updated according to the number of documents added so far. For example, initK is updated to initK / W when the number of added documents reaches 10,000 or more. According to such a method, switching between hierarchical topic estimation or normal topic estimation, that is, topic estimation for all topics, can be switched based only on the number of documents.
- the reduction effect E is estimated every time a document is added, but in order to reduce the processing time for estimating the reduction effect E, it is not once every time, but once every X documents are added. You may use the method of implementing.
- initK it is determined whether or not to update initK based on the reduction effect E for one document, but an average value mean (E) of the reduction effects E for a plurality of documents, for example, Y documents, is taken.
- E average value mean
- a method may be employed in which initK is updated when the average value becomes larger than the threshold value E_MIN.
- the following ⁇ and n are obtained as a result.
- step u105 After the process of step u105 is performed, the data update unit 104 performs the process of step u106.
- topic estimation in the upper hierarchy is not performed in the learning initial state.
- the number of word topics in the parent topic can be calculated by this process. Therefore, the word topic probability can be calculated without degrading accuracy when starting the topic estimation of the upper hierarchy.
- the initial value updating unit 202 first calculates ⁇ _ ⁇ 300, i, t ⁇ ⁇ ⁇ 4 ⁇ from ⁇ _ ⁇ 300, i, t ⁇ ⁇ ⁇ 16 ⁇ using Expression 13.
- ⁇ _ ⁇ 300, i, t ⁇ ⁇ ⁇ 16 ⁇ estimated by the topic estimation unit 103 with upper constraints has the following values.
- step u109 The process of step u109 in this case will be described.
- the initial value update unit 202 calculates ⁇ _ ⁇ 300, i, t ⁇ ⁇ ⁇ 4 ⁇ by Expression 13. Then, the following result is obtained.
- the reduction effect E is estimated, and the initial value initK of the number of topics k is set according to the reduction effect E. Accordingly, it is possible to switch between normal topic estimation and hierarchical latent topic estimation in accordance with the reduction effect E.
- normal topic estimation is performed at the initial stage of learning with a small amount of document data, probability calculation for all topics is performed, and after confirming the reduction effect, switching to hierarchical latent topic estimation is performed. be able to. Therefore, it is possible to avoid a decrease in accuracy and an increase in processing time in the initial stage of learning.
- the present invention in an information management system or the like that manages text information, it is possible to handle latent topics and automatically extract characteristic word information at high speed without using a dictionary. Therefore, the present invention enables efficient document summarization and document retrieval.
- FIG. 7 is a block diagram showing the minimum configuration of the word latent topic estimation apparatus according to the present invention.
- FIG. 8 is a block diagram showing another minimum configuration of the word latent topic estimation apparatus according to the present invention.
- the word latent topic estimation apparatus includes a document data adding unit 11 (corresponding to the document data adding unit 101 shown in FIG. 1) for inputting a document including one or more words, and a word.
- a hierarchical setting unit 12 (corresponding to the hierarchical setting unit 102 shown in FIG. 1) for setting the number of topics in each hierarchy in accordance with the hierarchical structure of the topics,
- an upper constraint creating unit 15 (see FIG. 1) creates a higher constraint indicating the identifier of a topic that may be assigned to the word and the probability assigned to the topic.
- the upper constraint is referred to when estimating the probability that each word in the input document is assigned to each topic, and the parent in the upper hierarchy.
- the probability assigned to the pick used as weights, and a high-order constrained topic estimating unit 13 for estimating process to subtopic (corresponding to the upper constrained topic estimating unit 13 shown in FIG.).
- the upper constraint creation unit 15 has a probability that each word is assigned to each topic is smaller than a preset threshold value.
- a word latent topic estimation device that, when having a value, corrects the probability value by setting the probability to 0, and creates an upper constraint including an identifier of a topic having a probability greater than 0 and the corrected probability value.
- topic estimation for all topics is performed instead of hierarchical topic estimation.
- the word latent topic estimation apparatus provided with the initial value update part 22 which sets the initial value of the number of topics so that it may be.
Abstract
Description
β_{t,v}
φ_{j,i,t}^{k}
n_{j,t}^{k}
n_{j,t,v}^{k}
n_{j,t}^{old} (0<=t<k)
n_{j,t,v}^{old} (0<=t<k)
γ_{j,t}^{k} (0<=t<k)
β_{t,v}^{k} (0<=t<k)
以下、本発明の第1の実施形態を図面を参照して説明する。
children:4:1→1.0
α_{t}^{k}=0 (t∈トピック木,k∈{4,16})
TOPIC_MIN=0.2
children:4:1→50
children:4:2→5
children:4:3→5
φ_{555,0,0}^{4}=0.8
φ_{555,0,1}^{4}=0.1
φ_{555,0,2}^{4}=0.01
φ_{555,0,3}^{4}=0.09
位置1にあるyearに対する文書単語トピック確率
φ_{555,1,0}^{4}=0.01
φ_{555,1,1}^{4}=0.225
φ_{555,1,2}^{4}=0.675
φ_{555,1,3}^{4}=0.09
φ_{555,1,2}^{4}=0.675
φ_{555,2,2}^{4}=0.675/(0.225+0.675)=0.75
1→1:0.25,2:0.75
φ_{555,0,1}^{4}=0
φ_{555,0,2}^{4}=0
φ_{555,0,3}^{4}=0
φ_{555,0,0}^{4}とbase_{555,0,1}^{16}の積=1/16
φ_{555,0,0}^{4}とbase_{555,0,2}^{16}の積=1/16
φ_{555,0,0}^{4}とbase_{555,0,3}^{16}の積=1/16
φ_{555,0,1}^{16}=1/4
φ_{555,0,2}^{16}=1/4
φ_{555,0,3}^{16}=1/4
φ_{555,0,t}^{16}=0 (4<=t<16)
φ_{555,1,1}^{4}=0.25(=1/4)
φ_{555,2,2}^{4}=0.75(=3/4)
φ_{555,3,3}^{4}=0
φ_{555,1,1}^{4}とbase_{555,1,5}^{16}の積=1/64
φ_{555,1,1}^{4}とbase_{555,1,6}^{16}の積=1/64
φ_{555,1,1}^{4}とbase_{555,1,7}^{16}の積=1/64
φ_{555,1,2}^{4}とbase_{555,1,8}^{16}の積=3/64
φ_{555,1,2}^{4}とbase_{555,1,9}^{16}の積=3/64
φ_{555,1,2}^{4}とbase_{555,1,10}^{16}の積=3/64
φ_{555,1,2}^{4}とbase_{555,1,11}^{16}の積=3/64
φ_{555,1,5}^{16}=1/16
φ_{555,1,6}^{16}=1/16
φ_{555,1,7}^{16}=1/16
φ_{555,1,8}^{16}=3/16
φ_{555,1,9}^{16}=3/16
φ_{555,1,10}^{16}=3/16
φ_{555,1,11}^{16}=3/16
φ_{555,1,t}^{16}=0 (t<4 or t>12)
以下、本発明の第2の実施形態を図面を参照して説明する。
n_{300,t,“school”}^{16}=1/16 (0<=t<16)
n_{300,t,“children”}^{16}=1/16 (0<=t<16)
A_{4,0,“school”}にn_{300,1,“school”}^{16}を加算
A_{4,0,“school”}にn_{300,2,“school”}^{16}を加算
A_{4,0,“school”}にn_{300,3,“school”}^{16}を加算
A_{4,1,“school”}にn_{300,4,“school”}^{16}を加算
A_{4,1,“school”}にn_{300,5,“school”}^{16}を加算
A_{4,1,“school”}にn_{300,6,“school”}^{16}を加算
…
φ_{300,0,15}^{16}=7/28
φ_{300,0,t}^{16}=1/28 (1<=t<15)
φ_{300,1,0}^{16}=7/28
φ_{300,1,1}^{16}=7/28
φ_{300,1,t}^{16}=1/28 (2<=t)
φ_{300,0,1}^{4}=4/28
φ_{300,0,2}^{4}=4/28
φ_{300,0,3}^{4}=10/28
φ_{300,1,0}^{4}=16/28
φ_{300,1,1}^{4}=4/28
φ_{300,1,2}^{4}=4/28
φ_{300,1,3}^{4}=4/28
12、102 階層設定部
13、103 上位制約付きトピック推定部
15、105 上位制約作成部
22、202 初期値更新部
104、503 データ更新部
106、504 トピック分布記憶部
107 上位制約バッファ
108、505 単語トピック分布出力部
201 初期値記憶部
502 トピック推定部
Claims (8)
- 一つ以上の単語を含む文書を入力する文書データ追加部と、
単語の潜在トピックを階層的に推定するためにトピックの階層構造に応じて、各階層におけるトピック数を設定する階層設定部と、
ある階層でのトピック推定の結果をもとに文書内の単語について、当該単語に割り当てられる可能性があるトピックの識別子とそのトピックに割り当てられる確率とを示す上位制約を作成する上位制約作成部と、
入力された文書内の各単語が各トピックに割り当てられる確率を推定する際に、前記上位制約を参照し、上位階層における親トピックへ割り当てられる確率を重みとして使用し、下位トピックへの推定処理を行う上位制約付きトピック推定部とを備えた
ことを特徴とする単語潜在トピック推定装置。 - 上位制約作成部は、
ある階層において推定された、文書内の各単語が各トピックに割り当てられる確率を元に、各単語が各トピックへ割り当てられる確率が予め設定された閾値より小さい値を持つ場合に、その確率を0として確率値を補正するとともに、確率が0より大きいトピックの識別子と補正された確率値とを含む上位制約を作成する
請求項1に記載の単語潜在トピック推定装置。 - 上位制約付きトピック推定部が階層的なトピック推定を実施した後に、現在のトピック数の初期値をそのまま用いる場合の計算量と、トピック数の初期値を小さくした場合の計算量とを算出し、その差が予め定められた閾値よりも大きい場合に、トピック数の初期値を小さくする初期値更新部を備えた
請求項1または請求項2に記載の単語潜在トピック推定装置。 - 追加された文書数をカウントし、カウントした前記文書数が予め定められた閾値より小さい場合には、階層的なトピック推定ではなく、全トピックに対するトピック推定が実施されるようにトピック数の初期値を設定する初期値更新部を備えた
請求項1または請求項2に記載の単語潜在トピック推定装置。 - 一つ以上の単語を含む文書を入力し、
単語の潜在トピックを階層的に推定するためにトピックの階層構造に応じて、各階層におけるトピック数を設定し、
ある階層でのトピック推定の結果をもとに文書内の単語について、当該単語に割り当てられる可能性があるトピックの識別子とそのトピックに割り当てられる確率とを示す上位制約を作成し、
入力された文書内の各単語が各トピックに割り当てられる確率を推定する際に、前記上位制約を参照し、上位階層における親トピックへ割り当てられる確率を重みとして使用し、下位トピックへの推定処理を行う
ことを特徴とする単語潜在トピック推定方法。 - ある階層において推定された、文書内の各単語が各トピックに割り当てられる確率を元に、各単語が各トピックへ割り当てられる確率が予め設定された閾値より小さい値を持つ場合に、その確率を0として確率値を補正するとともに、確率が0より大きいトピックの識別子と補正された確率値とを含む上位制約を作成する
請求項5に記載の単語潜在トピック推定方法。 - 階層的なトピック推定を実施した後に、現在のトピック数の初期値をそのまま用いる場合の計算量と、トピック数の初期値を小さくした場合の計算量とを算出し、その差が予め定められた閾値よりも大きい場合に、トピック数の初期値を小さくする
請求項5または請求項6に記載の単語潜在トピック推定方法。 - 追加された文書数をカウントし、カウントした前記文書数が予め定められた閾値より小さい場合には、階層的なトピック推定ではなく、全トピックに対するトピック推定が実施されるようにトピック数の初期値を設定する
請求項5または請求項6に記載の単語潜在トピック推定方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2014527964A JPWO2014020834A1 (ja) | 2012-07-31 | 2013-07-09 | 単語潜在トピック推定装置および単語潜在トピック推定方法 |
US14/417,855 US9519633B2 (en) | 2012-07-31 | 2013-07-09 | Word latent topic estimation device and word latent topic estimation method |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2012169986 | 2012-07-31 | ||
JP2012-169986 | 2013-07-31 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014020834A1 true WO2014020834A1 (ja) | 2014-02-06 |
Family
ID=50027548
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2013/004242 WO2014020834A1 (ja) | 2012-07-31 | 2013-07-09 | 単語潜在トピック推定装置および単語潜在トピック推定方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US9519633B2 (ja) |
JP (1) | JPWO2014020834A1 (ja) |
WO (1) | WO2014020834A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2016095568A (ja) * | 2014-11-12 | 2016-05-26 | Kddi株式会社 | モデル構築装置及びプログラム |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9519871B1 (en) * | 2015-12-21 | 2016-12-13 | International Business Machines Corporation | Contextual text adaptation |
CN106919997B (zh) * | 2015-12-28 | 2020-12-22 | 航天信息股份有限公司 | 一种基于lda的电子商务的用户消费预测方法 |
CN107368487B (zh) * | 2016-05-12 | 2020-09-29 | 阿里巴巴集团控股有限公司 | 一种页面组件动态布局方法、装置及客户端 |
US9715495B1 (en) | 2016-12-15 | 2017-07-25 | Quid, Inc. | Topic-influenced document relationship graphs |
WO2021080033A1 (ko) * | 2019-10-23 | 2021-04-29 | 엘지전자 주식회사 | 음성 분석 방법 및 장치 |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7130837B2 (en) * | 2002-03-22 | 2006-10-31 | Xerox Corporation | Systems and methods for determining the topic structure of a portion of text |
JP4807881B2 (ja) * | 2006-12-19 | 2011-11-02 | 日本電信電話株式会社 | 潜在話題語抽出装置、潜在話題語抽出方法、プログラムおよび記録媒体 |
US20100153318A1 (en) * | 2008-11-19 | 2010-06-17 | Massachusetts Institute Of Technology | Methods and systems for automatically summarizing semantic properties from documents with freeform textual annotations |
JP5440815B2 (ja) * | 2009-06-26 | 2014-03-12 | 日本電気株式会社 | 情報分析装置、情報分析方法、及びプログラム |
US8521662B2 (en) * | 2010-07-01 | 2013-08-27 | Nec Laboratories America, Inc. | System and methods for finding hidden topics of documents and preference ranking documents |
JP5691289B2 (ja) * | 2010-08-11 | 2015-04-01 | ソニー株式会社 | 情報処理装置、情報処理方法、及び、プログラム |
US10198431B2 (en) * | 2010-09-28 | 2019-02-05 | Siemens Corporation | Information relation generation |
US8630975B1 (en) * | 2010-12-06 | 2014-01-14 | The Research Foundation For The State University Of New York | Knowledge discovery from citation networks |
US8484245B2 (en) * | 2011-02-08 | 2013-07-09 | Xerox Corporation | Large scale unsupervised hierarchical document categorization using ontological guidance |
US8484228B2 (en) * | 2011-03-17 | 2013-07-09 | Indian Institute Of Science | Extraction and grouping of feature words |
US20120296637A1 (en) * | 2011-05-20 | 2012-11-22 | Smiley Edwin Lee | Method and apparatus for calculating topical categorization of electronic documents in a collection |
US8527448B2 (en) * | 2011-12-16 | 2013-09-03 | Huawei Technologies Co., Ltd. | System, method and apparatus for increasing speed of hierarchial latent dirichlet allocation model |
US9251250B2 (en) * | 2012-03-28 | 2016-02-02 | Mitsubishi Electric Research Laboratories, Inc. | Method and apparatus for processing text with variations in vocabulary usage |
US9183193B2 (en) * | 2013-02-12 | 2015-11-10 | Xerox Corporation | Bag-of-repeats representation of documents |
-
2013
- 2013-07-09 JP JP2014527964A patent/JPWO2014020834A1/ja active Pending
- 2013-07-09 WO PCT/JP2013/004242 patent/WO2014020834A1/ja active Application Filing
- 2013-07-09 US US14/417,855 patent/US9519633B2/en active Active
Non-Patent Citations (2)
Title |
---|
HARUKA SHIGEMATSU ET AL.: "''Senzai Topic no Hiritsu ni Motozuku Bunsho Yoyaku Shuho no Teian'', 2012 Nendo Annual Conference of JSAI (Dai 26 Kai) Ronbunshu [CD-ROM] 2012 Nendo Annual Conference of JSAI (Dai 26 Kai) Ronbunshu, 12 June 2012 (12.06.2012), [ISSN] 1347-9881, pages 1 to 4", 2012 NENDO ANNUAL CONFERENC OF JSAI (DAY * |
KOKI HAYASHI ET AL.: "Probabilistic Topic Models with Category Hierarchy and its Applications", IPSJ SIG NOTES, HEISEI 22 NENDO A5V IPSJ SIG NOTES KEKYU HOKOKU, SHIZEN GENGO SHORI (NL), 15 February 2011 (2011-02-15), pages 1 - 8 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2016095568A (ja) * | 2014-11-12 | 2016-05-26 | Kddi株式会社 | モデル構築装置及びプログラム |
Also Published As
Publication number | Publication date |
---|---|
JPWO2014020834A1 (ja) | 2016-07-21 |
US9519633B2 (en) | 2016-12-13 |
US20150193425A1 (en) | 2015-07-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6889270B2 (ja) | ニューラルネットワークアーキテクチャの最適化 | |
WO2014020834A1 (ja) | 単語潜在トピック推定装置および単語潜在トピック推定方法 | |
US8903824B2 (en) | Vertex-proximity query processing | |
KR101711839B1 (ko) | 구절 시퀀스들의 재사용을 통한 문서 전개를 추론함에 의한 문서 유사도 측정 | |
US20120246098A1 (en) | Role Mining With User Attribution Using Generative Models | |
US10891334B2 (en) | Learning graph | |
CN105183923A (zh) | 新词发现方法及装置 | |
JP6498095B2 (ja) | 単語埋込学習装置、テキスト評価装置、方法、及びプログラム | |
US8639643B2 (en) | Classification of a document according to a weighted search tree created by genetic algorithms | |
US20150269243A1 (en) | Computer product, software dividing apparatus, and software dividing method | |
US11068655B2 (en) | Text recognition based on training of models at a plurality of training nodes | |
US20180005122A1 (en) | Constructing new formulas through auto replacing functions | |
Acar et al. | Adaptive inference on general graphical models | |
US20090094177A1 (en) | Method for efficient machine-learning classification of multiple text categories | |
Chauveau et al. | Improving convergence of the Hastings–Metropolis algorithm with an adaptive proposal | |
JP6715420B2 (ja) | データ量圧縮方法、装置、プログラム及びicチップ | |
Roos et al. | Analysis of textual variation by latent tree structures | |
Goré et al. | Sound global state caching for ALC with inverse roles | |
JP5555238B2 (ja) | ベイジアンネットワーク構造学習のための情報処理装置及びプログラム | |
WO2015040860A1 (ja) | 分類辞書生成装置、分類辞書生成方法及び記録媒体 | |
CN111078886B (zh) | 基于dmcnn的特殊事件提取系统 | |
JP2021092925A (ja) | データ生成装置およびデータ生成方法 | |
US11281747B2 (en) | Predicting variables where a portion are input by a user and a portion are predicted by a system | |
Boll et al. | Users and automated driving systems: How will we interact with tomorrow's vehicles?(Dagstuhl Seminar 19132). | |
EP4250180A1 (en) | Method and apparatus for generating neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13826270 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2014527964 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14417855 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 13826270 Country of ref document: EP Kind code of ref document: A1 |