WO2020109921A1 - 文書検索方法、文書検索システム、プログラム、及び非一時的コンピュータ可読記憶媒体 - Google Patents
文書検索方法、文書検索システム、プログラム、及び非一時的コンピュータ可読記憶媒体 Download PDFInfo
- Publication number
- WO2020109921A1 WO2020109921A1 PCT/IB2019/059907 IB2019059907W WO2020109921A1 WO 2020109921 A1 WO2020109921 A1 WO 2020109921A1 IB 2019059907 W IB2019059907 W IB 2019059907W WO 2020109921 A1 WO2020109921 A1 WO 2020109921A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- search
- sentence
- block
- text
- target
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/338—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3322—Query formulation using system suggestions
- G06F16/3323—Query formulation using system suggestions using document space presentation or visualization, e.g. category, hierarchy or range presentation and selection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/313—Selection or weighting of terms for indexing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3334—Selection or weighting of terms from queries, including natural language queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3346—Query execution using probabilistic model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/414—Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
Definitions
- One aspect of the present invention relates to a document search method, a document search system, a program, and a non-transitory computer-readable storage medium.
- the technical field of one embodiment of the present invention includes a semiconductor device, a display device, a light-emitting device, a power storage device, a storage device, an electronic device, a lighting device, an input device (e.g., a touch sensor), and an input/output device (e.g., a touch panel). ), their driving method, or those manufacturing methods can be mentioned as an example.
- Patent Document 1 discloses a similar document search method.
- the similar document may be entirely similar to the target document, or may have extremely high similarity in one part and extremely low similarity in another part.
- Patent Document 1 the degree of detail is calculated as an index for determining whether a similar document is wholly similar or only partially similar to a target document.
- some of the documents for which a high degree of similarity is calculated with respect to the target document may have some degree of similarity as a whole even though they are not actually similar. It may include documents that are calculated frequently.
- a document having a part having an extremely high similarity may be calculated as having a low similarity for the entire document. ..
- the latter document is preferable to the former document.
- the specification referred to when creating a new specification is not limited to one. Therefore, it is desirable to be able to easily understand not only which specification is used to create a new specification, but which part of which specification is used as a reference and which part of a new specification is created. .. This is true not only in the description but also in all documents. However, when creating a new document, it is time-consuming and complicated to record in detail which part of which document was referred to.
- An object of one embodiment of the present invention is to provide a document search method capable of searching a document with high accuracy. Another object of one embodiment of the present invention is to provide a document search system that can search for a document with high accuracy. Another object of one embodiment of the present invention is to realize a highly accurate document search, particularly a document search for an intellectual property, with a simple input method.
- One aspect of the present invention is a document search method for searching a specific text block from a plurality of text blocks created by dividing each of a plurality of search target documents.
- a document search method for searching at least one sentence block similar to the search sentence block.
- the first search text block is one of the plurality of search text blocks.
- a second search text block which is another part of the search document, is prepared, and at least a part of the plurality of text blocks is set as a third target, and the second search text block is set as a search condition.
- the second relevance of each of the text blocks included in the third target with respect to the second search text block is calculated by performing a full-text search using, and based on the height of the second relevance.
- a fourth target is determined from the third target, and a second similarity with each of the sentences included in the fourth target is calculated for each sentence included in the second search sentence block, It is preferable to search at least one text block similar to the second search text block using the second similarity.
- the first target and the third target may be the same or different from each other.
- One aspect of the present invention is a document search method for searching a similar text block from a plurality of text blocks created by dividing a plurality of search target documents for each of a plurality of search text blocks. Therefore, by dividing the search document, a plurality of search text blocks are created, and for each of the plurality of search text blocks, at least a part of the plurality of text blocks is used as the first target for the search.
- the step of calculating the degree of association of each text block included in the first target with the text block for search and the degree of the degree of association Using a step of determining a second target from the first target, a step of calculating a similarity with each sentence included in the second target for each sentence included in the search text block, and using the similarity. And a step of searching for at least one text block similar to the text block for search.
- One aspect of the present invention is a document search method for searching a specific text block from a plurality of text blocks created by dividing a plurality of search target documents, and is a part of a search document.
- Prepare a first search sentence block perform at least a part of a plurality of sentence blocks as a first target, and perform a full-text search using each sentence included in the first search sentence block as a search condition.
- the first relevance of each sentence included in the first target with respect to each sentence included in the first search sentence block is calculated, and for each sentence included in the first search sentence block,
- the second target is determined from the sentences included in the first target based on the first degree of association, and the second target is determined for each sentence included in the first search sentence block.
- the first search text block is one of the plurality of search text blocks.
- a second search text block which is another part of the search document, is prepared, and at least a part of the plurality of text blocks is included in the second search text block as a third target.
- a second degree of relevance of each sentence included in the third target with respect to each sentence included in the second search sentence block is calculated.
- the fourth target is determined from the sentences included in the third target based on the second degree of relevance, and the second search text block is determined.
- the second similarity with each of the sentences included in the fourth target is calculated, and at least a sentence block similar to the second search sentence block is calculated using the second similarity. It is preferable to search for one.
- the first target and the third target may be the same or different from each other.
- One aspect of the present invention is a document search method for searching a similar text block from a plurality of text blocks created by dividing a plurality of search target documents for each of a plurality of search text blocks. Therefore, by dividing the search document, a plurality of search text blocks are created, and for each of the plurality of search text blocks, at least a part of the plurality of text blocks is used as the first target for the search.
- One aspect of the present invention is a document search system having a function of performing any of the above document search methods.
- One aspect of the present invention is a document search system that searches for a specific text block from a plurality of text blocks created by dividing a plurality of search target documents, and has a processing unit.
- the section includes a function of preparing a first search text block, which is one of a plurality of search text blocks created by dividing a search document, and a function of preparing at least a part of the plurality of text blocks.
- a function of calculating a first similarity with each of the sentences included in the second target, and a function of searching at least one sentence block similar to the first search sentence block using the first similarity. Which is a document search system.
- One aspect of the present invention is a program having a function of causing a processor to execute any one of the above document search methods.
- One aspect of the present invention is a non-transitory computer-readable storage medium in which the program is stored.
- the program may be supplied to the computer by various types of temporary computer-readable storage media.
- Transitory computer readable storage media include electrical signals, optical signals, and electromagnetic waves.
- the temporary computer-readable storage medium can supply the program to the computer via a wired communication path such as an electric wire and an optical fiber, or a wireless communication path.
- One aspect of the present invention is a program for searching a specific text block from a plurality of text blocks created by dividing each of a plurality of search target documents, which is created by dividing a search document.
- Performing a full-text search by using as a search condition a step of calculating a first degree of relevance of each of the sentence blocks included in the first target with respect to the first search sentence block; A step of determining a second target from the first target based on the height; and a first sentence for each sentence included in the first search sentence block and a sentence included in the second target. And a step of searching for at least one sentence block similar to the first search sentence block using the first similarity degree.
- One aspect of the present invention is a non-transitory computer-readable storage medium in which the program is stored.
- non-transitory computer-readable storage medium examples include volatile memory such as RAM (Random Access Memory) and non-volatile memory such as ROM (Read Only Memory).
- volatile memory such as RAM (Random Access Memory) and non-volatile memory such as ROM (Read Only Memory).
- ROM Read Only Memory
- recording media drives such as hard disk drives (Hard Disc Drives: HDDs) and solid state drives (Solid State Drives: SSDs), magneto-optical disks, CD-ROMs, CD-Rs and the like can be mentioned.
- a document search method capable of searching for similar documents for each block of a document.
- a document search system that can search for similar documents for each block of a document can be provided.
- a document search method that can search a document with high accuracy
- a document search system that can search for a document with high accuracy
- a highly accurate document search particularly a document search relating to intellectual property can be realized with a simple input method.
- FIG. 1 is a flowchart showing an example of a document search method.
- FIG. 2 is a diagram showing an example of processing at a pre-stage for performing a search.
- 3A, 3B, and 3C are diagrams showing an example of a document search method.
- 4A, 4B, and 4C are diagrams showing an example of a document search method.
- 5A and 5B are diagrams showing an example of a document search method.
- 6A, 6B, and 6C are diagrams showing an example of a document search method.
- 7A, 7B, and 7C are diagrams showing an example of a document search method.
- 8A, 8B, and 8C are diagrams showing an example of a document search method.
- 9A and 9B are diagrams showing an example of a document search method.
- FIG. 1 is a flowchart showing an example of a document search method.
- FIG. 2 is a diagram showing an example of processing at a pre-stage for performing a search.
- FIG. 10 is a flowchart showing an example of the document search method.
- FIG. 11 is a flowchart showing an example of the document search method.
- FIG. 12 is a diagram showing an example of a document search method.
- FIG. 13 is a block diagram showing an example of the document search system.
- FIG. 14 is a block diagram showing an example of a document search system.
- One aspect of the present invention is a document search method for searching for a specific text block from among a plurality of text blocks created by dividing a plurality of search target documents.
- a first search text block which is a part of the search document, is prepared.
- the first search sentence block can be created by extracting a part of the search document.
- the first search text block may be one of a plurality of search text blocks created by dividing the search document.
- a plurality of text blocks are created in advance from a plurality of search target documents, and further, a search text block is created from a search document during a search.
- a text block similar to the search text block can be searched. Therefore, as compared with the case where the entire search document is used as the search condition and the case where the search target is the entire document, it becomes easier to understand the correspondence relationship between similar portions.
- each of the text blocks included in the first target is searched. , And calculates a first degree of association with the first search text block.
- the text block (first target) to be searched can be narrowed down for each search text block, so that the processing amount can be reduced and the search speed can be increased.
- a second target is determined from the first targets based on the first degree of association.
- the second target is determined from the first targets and the similarity is calculated after narrowing down the target, so that the time required for document search can be shortened.
- the degree of similarity can be calculated based on the degree of matching of the character faces of the sentences. Unlike the full-text search, the order of the words in the sentence is considered when calculating the similarity. Therefore, a sentence having a common word with the sentence included in the first search sentence block but having a different word arrangement order has a low degree of similarity.
- the time required for the document search can be shortened.
- the full-text search may be performed by using the sentences included in the first search-use text block as search conditions one by one.
- the first degree of association of each sentence included in the first target with each sentence included in the first search sentence block is calculated.
- the second target is determined from the sentences included in the first target based on the degree of the first degree of association.
- a sentence block contains multiple sentences. Most of the sentences included in the sentence block are not similar to the sentence included in the first search sentence block. Therefore, in order to search a sentence block having a high degree of similarity with high accuracy, it is necessary to calculate the degree of similarity for many sentence blocks, and it may take a long time to calculate the degree of similarity. Further, in order to reduce the time required to calculate the degree of similarity, the number of sentence blocks that are the second target is reduced, so that there is a possibility that a sentence block including a sentence having a high degree of similarity may be dropped.
- the first target it is preferable to narrow down the first target to the second target in sentence units, not in sentence block units. Specifically, it is preferable to search a sentence having a high degree of association for each sentence included in the first search sentence block, and narrow down the target for which the similarity is calculated for each sentence.
- search a sentence having a high degree of association for each sentence included in the first search sentence block it is preferable to search a sentence having a high degree of association for each sentence included in the first search sentence block, and narrow down the target for which the similarity is calculated for each sentence.
- By narrowing down the target on a sentence-by-sentence basis it is possible to suppress missing of sentences (and sentence blocks) with a high degree of similarity and shorten the time required to calculate the degree of similarity, compared to narrowing down the target on a sentence block basis. Can be planned.
- FIG. 1 shows a flowchart of the document search method.
- the document search method according to one aspect of the present invention has six steps, steps A1 to A6.
- search target document TD when describing a configuration having a plurality of elements (a document, a text block, a sentence, etc.), when explaining matters common to each element, variables and alphabets are used. The description will be omitted.
- search target document TD1 when a matter common to the search target document TD1, the search target document TD2, the search target document TDn, etc. is described, it may be referred to as the search target document TD.
- a plurality of search target documents TD are divided to create a plurality of sentence blocks TB.
- a plurality of documents prepared in advance are divided into blocks.
- the input search document is also divided into blocks.
- a text block similar to each block of the search document can be searched.
- FIG. 2 shows an example in which n (n is an integer of 2 or more) search target documents TD are prepared.
- the search target document TD is not particularly limited, and various documents can be used.
- Examples of the search target document TD include documents related to intellectual property.
- Examples of documents relating to intellectual property include the specifications used for patent applications, the scope of claims, and abstracts.
- examples of documents related to intellectual property include publications such as patent documents (open patent publications, patent publications, etc.), utility model publications, design publications, and papers. Not limited to domestic publications, publications issued worldwide may be used as documents relating to intellectual property.
- search target document TD various works including books, papers, reports, columns, or other sentences may be used.
- a medical document or the like may be used as the search target document TD.
- the language of the document is also not particularly limited, and for example, documents in Japanese, English, Chinese, Korean, etc. can be used.
- the search target document TD1 shown in FIG. 2 is divided into x (x is an integer of 2 or more) sentence blocks (from the sentence block TB1(1) to the sentence block TB1(x)).
- search target document TD2 is divided into y (y is an integer of 2 or more) sentence blocks (from the sentence block TB2(1) to the sentence block TB2(y)).
- search target document TDn is divided into z (z is an integer of 2 or more) sentence blocks (from the sentence block TBn(1) to the sentence block TBn(z)).
- a plurality of text blocks may be created by dividing the document into chapters.
- a plurality of text blocks may be created by using all the sentences of the search target document, or a plurality of text blocks may be created by using only a necessary part of the search target document.
- a plurality of text blocks may be created without using the “description of code”.
- the preprocessing is performed at least once before performing the document search (before performing step A1).
- the pretreatment may be performed multiple times depending on the application. For example, it is possible to improve search accuracy and convenience by regularly performing preprocessing and adding, updating, or deleting a search target document.
- index file for use in full-text search using a plurality of text blocks TB.
- full-text search can be performed in a short time.
- the structure of the index file is not particularly limited, and can have information such as a character string, document name, sentence block name, and appearance frequency.
- the index file may include information as to whether or not there is a translated sentence in each language of the search target document TD (or the text block TB).
- conditions such as "there is an English translation” and "there is a Chinese translation” can be specified during the search.
- Step A1 Creation of Multiple Search Text Blocks STB
- a plurality of search text blocks STB are created by dividing the search document STD (FIG. 3A).
- the search document STD is divided into w (w is an integer of 2 or more) search text blocks (search text block STB(1) to search text block STB(w)). It
- the input search document STD is divided into a plurality of search text blocks STB, a similar document (text block TB) is searched for each search text block STB.
- the search document STD is not particularly limited, and various documents can be used.
- Examples of the search document STD include documents related to intellectual property before translation. As a result, a similar translated document can be searched from the search target document TD, and the translated sentence can be referred to or cited.
- search document STD various works including books, papers, reports, columns, or sentences can be used.
- a similar document can be searched from the search target document TD, and it can be confirmed whether or not there is a suspicion of plagiarism or plagiarism in the search document STD.
- a medical care document can be used as the search document STD.
- Step A2 Selection of Search Text Block STB(i)]
- a search text block STB(i) (i is an integer of 1 or more and w or less) to be searched is selected from w search text blocks STB.
- the search text block STB may be created by extracting necessary parts from the search document STD in step A1.
- one search may be performed one by one (see Example 3 of the document search method) or a plurality of search may be performed in parallel (the document search method). Example 4) and the search may be performed by combining the sequential processing and the parallel processing.
- Step A3 Calculation of Relevance to Search Text Block STB(i)]
- the degree of association with the search text block STB(i) is calculated.
- the full-text search is performed using the search text block STB(i) as a search condition, and the degree of association of each search target text block TB with the search text block STB(i) is calculated. ..
- the relevance to the search text block STB(i) may be calculated for all the text blocks TB, or the relevance to the search text block STB(i) may be calculated for some of the text blocks TB. May be.
- each embodiment of the search target document may be a search target, and the “background, problem, means, and effect” may be excluded from the search target. it can.
- each embodiment of the “with an English translation” search target document can be the search target.
- the sentence block TB whose degree of association is calculated is automatically selected, for example, based on the information included in the index file.
- the text block TB for which the degree of association is calculated may be designated.
- the first example of the document search method shows a case where the search text block STB(i) is used as one search condition for the full text search.
- each sentence included in the search sentence block STB(i) may be used as a search condition for the full-text search (see Example 2 of document search method). That is, the number of search conditions may be the same as the number of sentences included in the search text block STB(i).
- the full-text search method is not particularly limited, and sequential search, index search, etc. can be used.
- the index search is preferable because the search speed does not easily decrease even when there are many text blocks TB to be searched.
- the text block TB to be searched is scanned in advance and an index file that enables high-speed search is prepared.
- N-gram is preferable to morphological analysis because it is more advantageous for exact match search and technical terms, new words, abbreviations, etc. are less likely to cause problems.
- TF-IDF Term Frequency-Inverse Document Frequency
- the TF value represents the frequency of appearance of each word in a certain text block
- the IDF value represents the degree of occurrence of words concentrated in some text blocks. The more a word appears in one sentence block, the higher the TF value of the word in the sentence block.
- the IDF values of words that appear in many text blocks are small, and the IDF values of words that appear only in some text blocks are high.
- the calculation of the degree of association is not limited to the method using TF-IDF.
- full-text search can be performed using Apache Lucene, which is an open source search engine library.
- FIG. 3B shows an example of calculating the degree of association with the search text block STB(1). Also, an example is shown in which the first target 110(1) that is the search target is the first sentence block TB(1) included in each search target document TD.
- Step A4 Determine Second Target 120(i) from First Target 110(i)]
- the second target 120(i) is determined from the first target 110(i) based on the degree of association.
- the number of sentence blocks TB included in the second target 120(i) is not particularly limited.
- the second target 120(i) is a target whose similarity is calculated in the next step.
- the time required for the process of calculating the similarity tends to be longer than that of the full-text search.
- the sentence block TB having a high degree of relevance to the search sentence block STB(i) can be grasped.
- FIG. 3C shows an example in which the top 10 text blocks TB having a high degree of association with the search text block STB(1) are used as the second target 120(1).
- the sentence block TB4(1) is ranked first (Rank 1)
- the sentence block TB1(1) is ranked second (Rank 2)
- the sentence block TB9(1) is ranked 10th (Rank 10). Is shown.
- Step A5 Calculation of Similarity to Search Text Block STB(i)
- the degree of similarity to the search text block STB(i) is calculated. Specifically, for each sentence included in the search sentence block STB(i), the degree of similarity with each sentence included in the second target 120(i) is calculated.
- the degree of similarity between sentences is obtained. Specifically, it is preferable to calculate the degree of similarity on the basis of the degree of matching of the character faces of the sentences.
- the degree of similarity can be calculated using diff, which is an algorithm for obtaining the difference between documents.
- the degree of similarity between the first sentence STS1 of the search sentence block STB(1) and each sentence included in the second target 120(1) is calculated.
- the similarity between the second sentence STS2 of the search sentence block STB(1) and each sentence included in the second target 120(1) is calculated.
- the degree of similarity between each sentence of the search sentence block STB(1) and each sentence included in the second target 120(1) is calculated.
- FIG. 4C shows an example in which p is an integer of 3 or more.
- the similarity calculation for a plurality of sentences in the search text block STB(1) may be performed in parallel.
- the process shown in FIG. 4A, the process shown in FIG. 4B, and the process shown in FIG. 4C may all be performed in parallel.
- a sentence block TB similar to the search sentence block STB(1) can be obtained by using the calculated similarity.
- each text block TB the sum of the similarities of the sentences having the highest similarity to each sentence of the search text block STB(1) is calculated, and the sum is calculated for the sentences of the search text block STB(1).
- the standardized similarity of the sentence block TB with respect to the search sentence block STB(1) can be obtained.
- the sentence having the highest similarity to the first sentence STS1 of the search sentence block STB(1) is the first sentence S1 (the similarity is 1)
- the sentence with the highest similarity to the second sentence STS2 is the second sentence S2 (similarity is 0.9)
- the sentence with the highest similarity to the last sentence STSp is the third sentence S3( The degree of similarity is 0.5).
- a value that is equal to or higher than a threshold value among the degrees of similarity between sentences because the accuracy of search can be improved.
- a threshold value 0.8
- the sentence S3 having the highest similarity to the last sentence STSp has a similarity of 0.5, and thus the sum of the similarities. Will not be used (calculated as 0).
- Step A6 Result output
- FIG. 5B is an example in which sentence blocks TB (Block) are arranged in descending order of standardized similarity.
- the normalized similarity is expressed as a percentage is shown as Score.
- step A3 the order of sentences and words is not considered, and thus the calculated degree of association is different from the degree of similarity.
- the ten sentence blocks TB determined as the second target 120(1) in step A4 are highly similar to the search sentence block STB(1). They can be arranged in order (Fig. 5B).
- a similar document (text block TB) is searched for the search text block STB.
- the time required for the document search can be shortened.
- Step A3 Calculation of Relevance to Search Text Block STB(i)
- step A3 in Example 2 of the document search method full text search is performed using each sentence included in the search text block STB(i) as a search condition.
- the degree of relevance of each sentence included in the search target with respect to each sentence included in the search sentence block STB(i) is calculated.
- the degree of association with each sentence included in the search text block STB(i) may be calculated, and for some text blocks TB, the search text block STB(i) may be calculated.
- the degree of association for each included sentence may be calculated.
- the processing amount can be reduced and the time required for the document search can be shortened.
- Example 1 of the document search method As the full-text search method and the method of calculating the degree of relevance, the same method as in Example 1 of the document search method can be used.
- each sentence included in the first target 110(1) is searched.
- the sentence included in the first target 110(1) refers to a sentence forming a plurality of sentence blocks TB included in the first target 110(1).
- a full-text search is performed by using the second sentence STS2 of the search text block STB(1) as a search condition, so that each of the objects included in the first target 110(1) is searched.
- the degree of association of the sentence with the second sentence STS2 is calculated.
- the degree of association of each sentence in the search sentence block STB(1) is calculated.
- the degree of relevance is calculated up to the last sentence STSp (p is an integer of 2 or more) of the search sentence block STB(1), so that it is included in the first target 110(1).
- the degree of relevance of the sentence to be included in each sentence included in the search sentence block STB(1) is calculated. Note that FIG. 6C shows an example in which p is an integer of 3 or more.
- the full-text search using each sentence of the search text block STB(1) as a search condition may be performed in parallel.
- the process shown in FIG. 6A, the process shown in FIG. 6B, and the process shown in FIG. 6C may all be performed in parallel.
- Step A4 Determine Second Target 120(i) from First Target 110(i)]
- the second target 120(i) is selected from the sentences included in the first target 110(i) based on the degree of relevance. To decide.
- the number of sentences included in the second target 120(i) is not particularly limited.
- the second target 120(i) is a target whose similarity is calculated in the next step.
- the time required for the process of calculating the similarity tends to be longer than that of the full-text search.
- step A3 by sorting the results of the full-text search in step A3 in descending order of relevance, sentences with high relevance to each sentence included in the search text block STB(i) can be grasped.
- FIG. 7A shows an example in which the top 300 sentences having a high degree of association with the first sentence STS1 of the search sentence block STB(1) are used as the second target 120(1) (STS1).
- the first sentence TB4(1)_S1 of the sentence block TB4(1) is ranked first (Rank 1)
- the first sentence TB3(1)_S1 of the sentence block TB3(1) is The case is shown where the second sentence (Rank 2) and the sixth sentence TB6(1)_S6 of the sentence block TB6(1) are the 300th (Rank 300).
- FIG. 7B shows an example in which the top 300 sentences having a high degree of association with the second sentence STS2 of the search sentence block STB(1) are used as the second target 120(1) (STS2).
- the second sentence TB1(1)_S2 of the sentence block TB1(1) is ranked first (Rank 1)
- the second sentence TB3(1)_S2 of the sentence block TB3(1) is The case where the second sentence (Rank 2) and the eighth sentence TB62(1)_S8 of the sentence block TB62(1) are 300th (Rank 300) are shown.
- the second target 120(1) (STSp) is determined as the top 300 sentences with high relevance to the last sentence STSp of the search sentence block STB(1).
- the ninth sentence TB2(1)_S9 of the sentence block TB2(1) is ranked first (Rank 1)
- the eighth sentence TB6(1)_S8 of the sentence block TB6(1) is The second case (Rank 2) and the case where the 12th sentence TB7(1)_S12 of the sentence block TB7(1) is 300th (Rank 300) are shown.
- the second target 120(1) is determined for each of all the sentences included in the search sentence block STB(1).
- the second target 120 is selected from the sentences included in the first target 110(i) based on the degree of relevance. Determine (i).
- Step A5 Calculation of Similarity to Search Text Block STB(i)
- the degree of similarity to the search text block STB(i) is calculated. Specifically, for each sentence included in the search sentence block STB(i), the degree of similarity with each sentence included in the second target 120(i) is calculated.
- the same method as in the first example of the document search method can be used.
- the degree of similarity between the first sentence STS1 of the search sentence block STB(1) and each sentence included in the second target 120(1) (STS1) is calculated.
- the similarity between the second sentence STS2 of the search sentence block STB(1) and each sentence included in the second target 120(1) is calculated. ..
- the degree of similarity between each sentence of the search sentence block STB(1) and each sentence included in the second target 120(1) is calculated.
- the similarity is calculated up to the last sentence STSp of the search text block STB(1), and the second sentence is calculated for all the sentences included in the search text block STB(1).
- the degree of similarity with each sentence included in the target 120(1) is calculated.
- the similarity calculation for a plurality of sentences in the search text block STB(1) may be performed in parallel.
- the process shown in FIG. 8A, the process shown in FIG. 8B, and the process shown in FIG. 8C may all be performed in parallel.
- a sentence block TB similar to the search sentence block STB(1) can be obtained by using the calculated similarity.
- each text block TB the sum of the similarities of the sentences having the highest similarity to each sentence of the search text block STB(1) is calculated, and the sum is calculated for the sentences of the search text block STB(1).
- the standardized similarity of the sentence block TB with respect to the search sentence block STB(1) can be obtained.
- the sentence having the highest similarity to the first sentence STS1 of the search sentence block STB(1) is the first sentence S1 (the similarity is 1)
- the sentence having the highest similarity to the second sentence STS2 is the second sentence S2 (the similarity is 0.90).
- the highest similarity to each of the p sentences is added and divided by the number of sentences p to obtain the standardized similarity of the sentence block TB4(1) to the search sentence block STB(1).
- the 26th sentence S26 also has a high similarity (similarity 0.80) to the first sentence STS1 of the search sentence block STB(1), but Since it is lower than the sentence S1 of S1, the similarity value of S26 is not used.
- the sentence having the highest similarity to the first sentence STS1 of the search sentence block STB(1) is the second sentence S2 (similarity is 0.70).
- the sentence having the highest similarity to the second sentence STS2 is the first sentence S1 (similarity is 0.60)
- the third sentence is the sentence having the highest similarity to the last sentence STSp. S3 (similarity is 0.60).
- the similarity values of these three sentences are used to calculate the highest sum of similarities for each of the p sentences.
- the threshold value is 0.8
- the similarity value of these three sentences is less than the threshold value, and therefore is not used when the sum of the similarity degrees is calculated (assumed to be 0).
- Step A6 Result output
- FIG. 9B is an example in which sentence blocks TB are arranged in order of increasing standardized similarity.
- the normalized similarity is expressed as a percentage is shown as Score.
- Example 2 of the document search method a sentence to be the second target 120(i) is determined from the first target 110(i) for each sentence included in the search text block STB(i). Therefore, among the sentences included in the sentence block TB, only the sentence that is highly related to the sentence included in the search sentence block STB(i) is similar to the sentence included in the search sentence block STB(i). Can be calculated.
- By narrowing down the target on a sentence-by-sentence basis it is possible to suppress missed sentences (and sentence blocks) with a high degree of similarity and reduce the time required to calculate the similarity, as compared to narrowing down the target on a sentence block basis. You can In addition, it is possible to prevent the degree of similarity of sentence blocks TB that are not actually similar from increasing.
- the sentence blocks TB7(1), TB3(1), and TB6(1) that did not reach the top 10 in the example 1 of the document search method are ranked in the top 10. It is possible that it will be ranked (Fig. 9B).
- Example 2 of the document search method has a portion having extremely high similarity (for example, a complete match sentence) even though the remaining portion has extremely low similarity. It is possible to calculate a high degree of block similarity.
- Example 3 of document search method a method of sequentially searching for similar text blocks among a plurality of search text blocks STB will be described. It should be noted that in the third example of the document search method, an example is shown in which similar sentence blocks are searched for in all the search sentence blocks STB, but the present invention is not limited to this, and similar sentences are found in some search sentence blocks STB. You may search for blocks.
- FIG. 10 shows a flowchart of the document search method.
- Step B1 Creation of Multiple Search Text Blocks STB(1) to STB(w)
- a plurality of search text blocks STB are created by dividing the search document STD.
- w is an integer of 2 or more
- search text blocks search text block STB(1) to search text block STB(w)
- Step B1 can be performed in the same manner as step A1 shown in FIG. 3A.
- a search text block STB(i) (i is an integer of 1 or more and w or less) to be searched is selected from w search text blocks STB.
- search text blocks STB the order of searching for similar text blocks is not particularly limited.
- Step B3 Calculation of Relevance to Search Text Block STB(i)]
- the degree of association with the search text block STB(i) is calculated.
- the first step B3 can be performed in the same manner as step A3 shown in FIG. 3B.
- Step B4 Determine Second Target 120(i) from First Target 110(i)]
- the second target 120(i) is determined from the first target 110(i) based on the degree of association.
- the first step B4 can be performed in the same manner as step A4 shown in FIG. 3C.
- Step B5 Calculation of Similarity to Search Text Block STB(i)
- the degree of similarity to the search text block STB(i) is calculated. Specifically, for each sentence included in the search sentence block STB(i), the degree of similarity with each sentence included in the second target 120(i) is calculated.
- the first step B5 can be performed in the same manner as step A5 shown in FIGS. 4A to 4C and 5A.
- the above processing from step B3 to step B5 is sequentially performed for all the search text blocks STB. If there is a search text block STB for which the degree of similarity has not been calculated, the process returns to step B3 via step B7. When the similarity is calculated for all the search text blocks STB, the process proceeds to step B8.
- 1 is added to i as step B7. That is, the second steps B3 to B5 are performed on the search text block STB(2). In this way, steps B3 to B5 are repeated until the similarity is calculated for the search text block STB(w).
- Step B8 Output result
- FIG. 12 shows an example in which the text blocks TB are arranged in descending order of standardized similarity for each search text block STB. Further, like Score shown in FIG. 5B, a value indicating the high degree of similarity may be output.
- Example 4 of document search method shows an example in which similar sentence blocks are searched for in all the search sentence blocks STB, the present invention is not limited to this, and a similar sentence is found in some search sentence blocks STB. You may search for blocks.
- FIG. 11 shows a flowchart of the document search method.
- Step C1 Creation of Search Text Blocks STB
- a plurality of search text blocks STB are created by dividing the search document STD.
- w is an integer of 2 or more
- search text blocks search text block STB(1) to search text block STB(w)
- Step C1 can be performed in the same manner as step A1 shown in FIG. 3A.
- steps C2 to C5 can be performed in parallel for two or more search text blocks STB.
- Example 4 of the text search method an example of performing w search text blocks STB in parallel is shown.
- a search text block STB(i) (i is an integer of 1 or more and w or less) to be searched is selected from w search text blocks STB.
- the degree of association with the search text block STB(i) is calculated.
- step C3(1) shown in FIG. 11 the degree of association with the search text block STB(1) is calculated.
- Step C3(1) can be performed in the same manner as step A3 shown in FIG. 3B.
- step C3(2) performed in parallel with step C3(1), the degree of association with the search text block STB(2) is calculated, and in step C3(w), the association with the search text block STB(w) is calculated. Calculate the degree.
- Step C4(i) Determine Second Target 120(i) from First Target 110(i)]
- the second target 120(i) is determined from the first target 110(i) based on the degree of association.
- step C4(1) shown in FIG. 11 the second target 120(1) is determined from the first targets 110(1) based on the degree of association.
- Step C4(1) can be performed in the same manner as step A4 shown in FIG. 3C.
- Step C4(2) performed in parallel with Step C4(1), the second object 120(2) is determined from the first objects 110(2) based on the degree of association, In Step C4(w), the second target 120(w) is determined from the first targets 110(w) based on the degree of association.
- Step C5 Calculation of Similarity to Search Text Block STB(i)
- the degree of similarity to the search text block STB(i) is calculated. Specifically, for each sentence included in the search sentence block STB(i), the degree of similarity with each sentence included in the second target 120(i) is calculated.
- step C5(1) shown in FIG. 11 the similarity with respect to the search text block STB(1) is calculated.
- Step C5(1) can be performed in the same manner as step A5 shown in FIGS. 4A to 4C and 5A.
- step C5(2) performed in parallel with step C5(1), the similarity to the search text block STB(2) is calculated, and in step C4(w), similarity to the search text block STB(w) is calculated. Calculate the degree.
- Step C6 Output Result
- FIG. 12 shows an example in which the text blocks TB are arranged in descending order of standardized similarity for each search text block STB. Note that a value indicating the high degree of similarity may be output as in Score shown in FIG. 5B.
- the description part of the search target document that is similar to the specific part of the search document can be accurately determined. You can search. As a result, it becomes easier to understand the correspondence relationship between similar portions than when the entire search document is used as the search condition or when the search target is the entire document.
- the full-text search result is used to narrow down the targets for which the similarity is calculated for the search text block. As a result, the time required for document search can be shortened.
- the document search system can search for a document using the document search method described in the first embodiment. Specifically, a document block prepared in advance can be searched for a document (sentence block) similar to the input search document (search sentence block thereof).
- FIG. 13 shows a block diagram of the document search system 100.
- the constituent elements are classified by function and the block diagram is shown as an independent block from each other, but it is difficult to completely separate actual constituent elements by function. It is possible that a component is responsible for more than one function. Further, one function may be related to a plurality of constituent elements, and for example, the processing performed by the processing unit 103 may be executed by different servers depending on the processing.
- the document search system 100 has at least a processing unit 103.
- the document search system 100 shown in FIG. 13 further includes an input unit 101, a transmission path 102, a storage unit 105, a database 107, and an output unit 109.
- the search document STD is supplied to the input unit 101 from outside the document search system 100.
- the search document STD supplied to the input unit 101 is supplied to the processing unit 103, the storage unit 105, or the database 107 via the transmission path 102.
- the transmission path 102 has a function of transmitting various data. Data transmission/reception among the input unit 101, the processing unit 103, the storage unit 105, the database 107, and the output unit 109 can be performed via the transmission path 102. For example, data such as the search document STD, the search text block STB, the search target document TD, and the text block TB is transmitted/received via the transmission path 102.
- the processing unit 103 has a function of performing an operation using data supplied from the input unit 101, the storage unit 105, the database 107, and the like.
- the processing unit 103 can supply the calculation result to the storage unit 105, the database 107, the output unit 109, and the like.
- a transistor including a metal oxide in a channel formation region is preferably used. Since the off-state current of the transistor is extremely low, the data holding period can be secured for a long time by using the transistor as a switch for holding charge (data) flowing into the capacitor functioning as a memory element. ..
- the processing unit 103 is operated only when necessary, and in other cases, the information of the immediately preceding processing is saved in the storage element. As a result, the processing unit 103 can be turned off. That is, normally-off computing becomes possible, and the power consumption of the document retrieval system can be reduced.
- a transistor including an oxide semiconductor or a metal oxide in a channel formation region is referred to as an Oxide Semiconductor transistor or an OS transistor.
- the channel formation region of the OS transistor preferably contains a metal oxide.
- the metal oxide is a metal oxide in a broad sense. Metal oxides are classified into oxide insulators, oxide conductors (including transparent oxide conductors), oxide semiconductors (also referred to as Oxide Semiconductor or simply OS), and the like. For example, when a metal oxide is used for a semiconductor layer of a transistor, the metal oxide may be referred to as an oxide semiconductor. That is, when the metal oxide has at least one of an amplification action, a rectification action, and a switching action, the metal oxide can be referred to as a metal oxide semiconductor, which is abbreviated as OS.
- the metal oxide included in the channel formation region preferably contains indium (In).
- the carrier mobility (electron mobility) of the OS transistor is high.
- the metal oxide included in the channel formation region is preferably an oxide semiconductor containing the element M.
- the element M is preferably aluminum (Al), gallium (Ga), or tin (Sn).
- Other elements applicable to the element M include boron (B), silicon (Si), titanium (Ti), iron (Fe), nickel (Ni), germanium (Ge), yttrium (Y), zirconium (Zr).
- the element M is, for example, an element having a high binding energy with oxygen.
- it is an element having a binding energy with oxygen higher than that of indium.
- the metal oxide included in the channel formation region preferably contains zinc (Zn). Metal oxide containing zinc may be easily crystallized.
- the metal oxide included in the channel formation region is not limited to the metal oxide containing indium.
- the semiconductor layer may be, for example, a metal oxide containing zinc, a metal oxide containing zinc, a metal oxide containing gallium, a metal oxide containing tin, or the like, which does not contain indium, such as zinc tin oxide or gallium tin oxide.
- a transistor including silicon in a channel formation region may be used.
- a transistor including an oxide semiconductor in a channel formation region and a transistor including silicon in a channel formation region in combination in the treatment portion 103 it is preferable to use a transistor including an oxide semiconductor in a channel formation region and a transistor including silicon in a channel formation region in combination in the treatment portion 103.
- the processing unit 103 has, for example, an arithmetic circuit or a central processing unit (CPU: Central Processing Unit).
- CPU Central Processing Unit
- the processing unit 103 may include a microprocessor such as a DSP (Digital Signal Processor) and a GPU (Graphics Processing Unit).
- the microprocessor may have a configuration realized by PLD (Programmable Logic Device) such as FPGA (Field Programmable Gate Array) and FPAA (Field Programmable Analog Array).
- PLD Programmable Logic Device
- FPGA Field Programmable Gate Array
- FPAA Field Programmable Analog Array
- the processing unit 103 can perform various data processing and program control by interpreting and executing instructions from various programs by the processor.
- the program that can be executed by the processor is stored in at least one of the memory area of the processor and the storage unit 105.
- the processing unit 103 may have a main memory.
- the main memory has at least one of a volatile memory such as a RAM and a non-volatile memory such as a ROM.
- RAM for example, DRAM (Dynamic Random Access Memory), SRAM (Static Random Access Memory), or the like is used, and a memory space is virtually allocated and used as a work space of the processing unit 103.
- the operating system, application programs, program modules, program data, lookup tables, etc. stored in the storage unit 105 are loaded into the RAM for execution. These data, programs, and program modules loaded in the RAM are directly accessed and operated by the processing unit 103, respectively.
- the ROM can store BIOS (Basic Input/Output System) and firmware that do not require rewriting.
- BIOS Basic Input/Output System
- Examples of the ROM include a mask ROM, an OTPROM (One Time Programmable Read Only Memory), and an EPROM (Erasable Programmable Read Only Memory).
- an EPROM Erasable Programmable Read Only Memory
- a UV-EPROM Ultra-Violet Erasable Programmable Read Only Memory
- EEPROM Electrical Erasable Programmable Memory
- the storage unit 105 has a function of storing a program executed by the processing unit 103.
- the storage unit 105 may also have a function of storing the calculation result generated by the processing unit 103, the data input to the input unit 101, and the like.
- the storage unit 105 has at least one of a volatile memory and a non-volatile memory.
- the storage unit 105 may include, for example, a volatile memory such as DRAM or SRAM.
- the storage unit 105 includes, for example, ReRAM (Resistive Random Access Memory, also referred to as resistance change type memory), PRAM (Phase change Random Memory Memory), FeRAM (Ferroelectric Random Memory), and FeRAM (Ferroelectric Random Memory). (Also referred to as ”) or a non-volatile memory such as a flash memory.
- the storage unit 105 may have a recording media drive such as a hard disk drive (Hard Disc Drive: HDD) and a solid state drive (Solid State Drive: SSD).
- the database 107 has at least a function of storing data such as the search target document TD and the sentence block TB. Further, the database 107 may have a function of storing the calculation result generated by the processing unit 103, the data input to the input unit 101, and the like. The storage unit 105 and the database 107 do not have to be separated from each other.
- the document search system may include a storage unit having the functions of both the storage unit 105 and the database 107.
- processing unit 103 the storage unit 105, and the memory included in the database 107 can each be an example of a non-transitory computer-readable storage medium.
- the output unit 109 has a function of supplying data to the outside of the document search system 100.
- the calculation result in the processing unit 103 can be supplied to the outside.
- FIG. 14 shows a block diagram of the document search system 150.
- the document search system 150 includes a server 151 and a terminal 152 (personal computer or the like).
- the server 151 has a communication unit 161a, a transmission line 162, a processing unit 163a, and a database 167. Although not shown in FIG. 14, the server 151 may further include a storage unit, an input/output unit, and the like.
- the terminal 152 has a communication unit 161b, a transmission path 168, a processing unit 163b, a storage unit 165, and an input/output unit 169. Although not shown in FIG. 14, the terminal 152 may further include a database and the like.
- the user of the document search system 150 inputs the search document STD into the server 151 from the terminal 152.
- the search document STD is transmitted from the communication unit 161b to the communication unit 161a.
- the search document STD received by the communication unit 161a is stored in the database 167 or a storage unit (not shown) via the transmission path 162. Alternatively, the search document STD may be directly supplied from the communication unit 161a to the processing unit 163a.
- the processing unit 163a included in the server 151 has a higher processing capacity than the processing unit 163b included in the terminal 152. Therefore, each of these processes is preferably performed by the processing unit 163a.
- the processing unit 163a generates a search result.
- the search result is stored in the database 167 or a storage unit (not shown) via the transmission path 162.
- the search result may be directly supplied from the processing unit 163a to the communication unit 161a.
- the search result is output from the server 151 to the terminal 152.
- the search result is transmitted from the communication unit 161a to the communication unit 161b.
- Input/output unit 169 Data is supplied to the input/output unit 169 from outside the document search system 150.
- the input/output unit 169 has a function of supplying data to the outside of the document search system 150.
- the input unit and the output unit may be separated as in the document search system 100.
- Transmission line 162 and transmission line 168 have a function of transmitting data.
- Data transmission/reception among the communication unit 161a, the processing unit 163a, and the database 167 can be performed via the transmission path 162.
- Data transmission/reception among the communication unit 161b, the processing unit 163b, the storage unit 165, and the input/output unit 169 can be performed via the transmission path 168.
- the processing unit 163a has a function of performing an operation using the data supplied from the communication unit 161a, the database 167, and the like.
- the processing unit 163b has a function of performing calculation using data supplied from the communication unit 161b, the storage unit 165, the input/output unit 169, and the like.
- the description of the processing unit 103 can be referred to.
- the processing unit 163a preferably has a higher processing capacity than the processing unit 163b.
- the storage unit 165 has a function of storing a program executed by the processing unit 163b. Further, the storage unit 165 has a function of storing the calculation result generated by the processing unit 163b, the data input to the communication unit 161b, the data input to the input/output unit 169, and the like.
- the database 167 has a function of storing the search target document TD and the sentence block TB. Further, the database 167 may have a function of storing the calculation result generated by the processing unit 163a, the data input to the communication unit 161a, and the like. Alternatively, the server 151 has a storage unit separately from the database 167, and the storage unit has a function of storing the calculation result generated by the processing unit 163a, the data input to the communication unit 161a, and the like. Good.
- Communication unit 161a and communication unit 161b Data can be transmitted and received between the server 151 and the terminal 152 by using the communication unit 161a and the communication unit 161b.
- a hub, a router, a modem, or the like can be used as the communication unit 161a and the communication unit 161b.
- Data may be transmitted and received by wire or wirelessly (for example, radio waves, infrared rays, etc.).
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Geometry (AREA)
- Computer Graphics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
図2は検索を行う前段階の処理の一例を示す図である。
図3A、図3B、図3Cは文書検索方法の一例を示す図である。
図4A、図4B、図4Cは文書検索方法の一例を示す図である。
図5A、図5Bは文書検索方法の一例を示す図である。
図6A、図6B、図6Cは文書検索方法の一例を示す図である。
図7A、図7B、図7Cは文書検索方法の一例を示す図である。
図8A、図8B、図8Cは文書検索方法の一例を示す図である。
図9A、図9Bは文書検索方法の一例を示す図である。
図10は文書検索方法の一例を示すフロー図である。
図11は文書検索方法の一例を示すフロー図である。
図12は文書検索方法の一例を示す図である。
図13は文書検索システムの一例を示すブロック図である。
図14は文書検索システムの一例を示すブロック図である。
本実施の形態では、本発明の一態様の文書検索方法について図1~図12を用いて説明する。なお、データの模式図は一例であり、これに限定されない。
図1に、文書検索方法のフローチャートを示す。図1に示すように、本発明の一態様の文書検索方法は、ステップA1~ステップA6の6つのステップを有する。
まず、図2を用いて、検索を行う前段階の処理について説明する。
まず、検索用文書STDを分割することで、複数の検索用文章ブロックSTBを作成する(図3A)。
次に、w個の検索用文章ブロックSTBの中から、検索を行う検索用文章ブロックSTB(i)(iは、1以上w以下の整数)を選択する。
次に、検索用文章ブロックSTB(i)に対する関連度を算出する。
次に、関連度の高さに基づいて、第1の対象110(i)の中から第2の対象120(i)を決定する。
次に、検索用文章ブロックSTB(i)に対する類似度を算出する。具体的には、検索用文章ブロックSTB(i)に含まれる文ごとに、第2の対象120(i)に含まれる文それぞれとの類似度を算出する。
そして、検索用文章ブロックSTB(i)に対する規格化類似度が高い文章ブロックTBを出力する。
次に、図6~図9を用いて、ステップA3以降の変形例を説明する。具体的には、検索用文章ブロックSTB(i)に含まれる各文を、全文検索の検索条件として用いる場合について説明する。
文書検索方法の例2におけるステップA3では、検索用文章ブロックSTB(i)に含まれる各文を検索条件に用いて全文検索を行う。これにより、検索対象に含まれる文それぞれの、検索用文章ブロックSTB(i)に含まれる各文に対する関連度を算出する。
次に、検索用文章ブロックSTB(i)に含まれる文ごとに、関連度の高さに基づいて、第1の対象110(i)に含まれる文の中から第2の対象120(i)を決定する。
次に、検索用文章ブロックSTB(i)に対する類似度を算出する。具体的には、検索用文章ブロックSTB(i)に含まれる文ごとに、第2の対象120(i)に含まれる文それぞれとの類似度を算出する。
そして、検索用文章ブロックSTB(i)に対する規格化類似度が高い文章ブロックTBを出力する。
次に、複数の検索用文章ブロックSTBについて、類似する文章ブロックを逐次検索する方法について説明する。なお、文書検索方法の例3では、全ての検索用文章ブロックSTBについて、類似する文章ブロックを検索する例を示すが、これに限定されず、一部の検索用文章ブロックSTBについて、類似する文章ブロックを検索してもよい。図10に、文書検索方法のフローチャートを示す。
まず、検索用文書STDを分割することで、複数の検索用文章ブロックSTBを作成する。ここでは、w個(wは2以上の整数)の検索用文章ブロック(検索用文章ブロックSTB(1)から検索用文章ブロックSTB(w))に分割する例を示す。ステップB1は、図3Aに示すステップA1と同様に行うことができる。
次に、w個の検索用文章ブロックSTBの中から、検索を行う検索用文章ブロックSTB(i)(iは、1以上w以下の整数)を選択する。
次に、検索用文章ブロックSTB(i)に対する関連度を算出する。
次に、関連度の高さに基づいて、第1の対象110(i)の中から第2の対象120(i)を決定する。
次に、検索用文章ブロックSTB(i)に対する類似度を算出する。具体的には、検索用文章ブロックSTB(i)に含まれる文ごとに、第2の対象120(i)に含まれる文それぞれとの類似度を算出する。
以上のステップB3からステップB5までの処理を、全ての検索用文章ブロックSTBに対して順に行う。類似度を算出していない検索用文章ブロックSTBがある場合は、ステップB7を経由してステップB3に戻る。そして、全ての検索用文章ブロックSTBに対して類似度を算出した場合は、ステップB8に進む。
ステップB6からステップB3に戻る際に、ステップB7として、iに1を加える。つまり、2回目のステップB3~B5は、検索用文章ブロックSTB(2)に対して行う。このように、検索用文章ブロックSTB(w)に対して類似度を算出するまで、ステップB3~B5を繰り返し行う。
そして、各検索用文章ブロックSTBに対する規格化類似度が高い文章ブロックTBを出力する。
次に、複数の検索用文章ブロックSTBについて、並列で、類似する文章ブロックを検索する方法について説明する。なお、文書検索方法の例4では、全ての検索用文章ブロックSTBについて、類似する文章ブロックを検索する例を示すが、これに限定されず、一部の検索用文章ブロックSTBについて、類似する文章ブロックを検索してもよい。図11に、文書検索方法のフローチャートを示す。
まず、検索用文書STDを分割することで、複数の検索用文章ブロックSTBを作成する。ここでは、w個(wは2以上の整数)の検索用文章ブロック(検索用文章ブロックSTB(1)から検索用文章ブロックSTB(w))に分割する例を示す。ステップC1は、図3Aに示すステップA1と同様に行うことができる。
次に、w個の検索用文章ブロックSTBの中から、検索を行う検索用文章ブロックSTB(i)(iは、1以上w以下の整数)を選択する。
次に、検索用文章ブロックSTB(i)に対する関連度を算出する。
次に、関連度の高さに基づいて、第1の対象110(i)の中から第2の対象120(i)を決定する。
次に、検索用文章ブロックSTB(i)に対する類似度を算出する。具体的には、検索用文章ブロックSTB(i)に含まれる文ごとに、第2の対象120(i)に含まれる文それぞれとの類似度を算出する。
そして、各検索用文章ブロックSTBに対する規格化類似度が高い文章ブロックTBを出力する。
本実施の形態では、本発明の一態様の文書検索システムについて図13及び図14を用いて説明する。
図13に、文書検索システム100のブロック図を示す。なお、本明細書に添付した図面では、構成要素を機能ごとに分類し、互いに独立したブロックとしてブロック図を示しているが、実際の構成要素は機能ごとに完全に切り分けることが難しく、一つの構成要素が複数の機能に係わることもあり得る。また、一つの機能が複数の構成要素に係わることもあり得、例えば、処理部103で行われる処理は、処理によって異なるサーバで実行されることがある。
入力部101には、文書検索システム100の外部から検索用文書STDが供給される。入力部101に供給された検索用文書STDは、伝送路102を介して、処理部103、記憶部105、またはデータベース107に供給される。
伝送路102は、各種データを伝達する機能を有する。入力部101、処理部103、記憶部105、データベース107、及び出力部109の間のデータの送受信は、伝送路102を介して行うことができる。例えば、検索用文書STD、検索用文章ブロックSTB、検索対象文書TD、及び文章ブロックTBなどのデータが、伝送路102を介して、送受信される。
処理部103は、入力部101、記憶部105、データベース107などから供給されたデータを用いて、演算を行う機能を有する。処理部103は、演算結果を、記憶部105、データベース107、出力部109などに供給することができる。
記憶部105は、処理部103が実行するプログラムを記憶する機能を有する。また、記憶部105は、処理部103が生成した演算結果、及び、入力部101に入力されたデータなどを記憶する機能を有していてもよい。
データベース107は、少なくとも、検索対象文書TD及び文章ブロックTBなどのデータを記憶する機能を有する。また、データベース107は、処理部103が生成した演算結果、及び、入力部101に入力されたデータなどを記憶する機能を有していてもよい。なお、記憶部105及びデータベース107は互いに分離されていなくてもよい。例えば、文書検索システムは、記憶部105及びデータベース107の双方の機能を有する記憶ユニットを有していてもよい。
出力部109は、文書検索システム100の外部にデータを供給する機能を有する。例えば、処理部103における演算結果を外部に供給することができる。
図14に、文書検索システム150のブロック図を示す。文書検索システム150は、サーバ151と、端末152(パーソナルコンピュータなど)とを有する。
入出力部169には、文書検索システム150の外部からデータが供給される。入出力部169は、文書検索システム150の外部にデータを供給する機能を有する。なお、文書検索システム100のように、入力部と出力部が分かれていてもよい。
伝送路162及び伝送路168は、データを伝達する機能を有する。通信部161a、処理部163a、及びデータベース167の間のデータの送受信は、伝送路162を介して行うことができる。通信部161b、処理部163b、記憶部165、及び入出力部169の間のデータの送受信は、伝送路168を介して行うことができる。
処理部163aは、通信部161a及びデータベース167などから供給されたデータを用いて、演算を行う機能を有する。処理部163bは、通信部161b、記憶部165、及び入出力部169などから供給されたデータを用いて、演算を行う機能を有する。処理部163a及び処理部163bは、処理部103の説明を参照できる。処理部163aは、処理部163bに比べて処理能力が高いことが好ましい。
記憶部165は、処理部163bが実行するプログラムを記憶する機能を有する。また、記憶部165は、処理部163bが生成した演算結果、通信部161bに入力されたデータ、及び入出力部169に入力されたデータなどを記憶する機能を有する。
データベース167は、検索対象文書TD及び文章ブロックTBを記憶する機能を有する。また、データベース167は、処理部163aが生成した演算結果、及び通信部161aに入力されたデータなどを記憶する機能を有していてもよい。または、サーバ151は、データベース167とは別に記憶部を有し、当該記憶部が、処理部163aが生成した演算結果、及び通信部161aに入力されたデータなどを記憶する機能を有していてもよい。
通信部161a及び通信部161bを用いて、サーバ151と端末152との間で、データの送受信を行うことができる。通信部161a及び通信部161bとしては、ハブ、ルータ、モデムなどを用いることができる。データの送受信には、有線を用いても無線(例えば、電波、赤外線など)を用いてもよい。
Claims (15)
- 複数の検索対象文書をそれぞれ分割することで作成された複数の文章ブロックの中から、特定の文章ブロックを検索する文書検索方法であって、
検索用文書の一部である、第1の検索用文章ブロックを準備し、
前記複数の文章ブロックのうち少なくとも一部を第1の対象として、前記第1の検索用文章ブロックを検索条件に用いて全文検索を行うことで、前記第1の対象に含まれる文章ブロックそれぞれの、前記第1の検索用文章ブロックに対する第1の関連度を算出し、
前記第1の関連度の高さに基づいて、前記第1の対象の中から第2の対象を決定し、
前記第1の検索用文章ブロックに含まれる文ごとに、前記第2の対象に含まれる文それぞれとの第1の類似度を算出し、
前記第1の類似度を用いて、前記第1の検索用文章ブロックに類似する文章ブロックを少なくとも1つ検索する、文書検索方法。 - 請求項1において、
前記検索用文書を分割することで、複数の検索用文章ブロックを作成し、
前記第1の検索用文章ブロックは、前記複数の検索用文章ブロックの一つである、文書検索方法。 - 請求項1において、
前記検索用文書の他の一部である、第2の検索用文章ブロックを準備し、
前記複数の文章ブロックのうち少なくとも一部を第3の対象として、前記第2の検索用文章ブロックを検索条件に用いて全文検索を行うことで、前記第3の対象に含まれる文章ブロックそれぞれの、前記第2の検索用文章ブロックに対する第2の関連度を算出し、
前記第2の関連度の高さに基づいて、前記第3の対象の中から第4の対象を決定し、
前記第2の検索用文章ブロックに含まれる文ごとに、前記第4の対象に含まれる文それぞれとの第2の類似度を算出し、
前記第2の類似度を用いて、前記第2の検索用文章ブロックに類似する文章ブロックを少なくとも1つ検索する、文書検索方法。 - 請求項3において、
前記第1の対象と前記第3の対象とは、同一である、文書検索方法。 - 請求項1乃至4のいずれか一において、
前記第1の類似度のうち閾値以上の値を用いて、前記第1の検索用文章ブロックに類似する文章ブロックを少なくとも1つ検索する、文書検索方法。 - 複数の検索用文章ブロックのそれぞれについて、複数の検索対象文書をそれぞれ分割することで作成された複数の文章ブロックの中から、類似する文章ブロックを検索する文書検索方法であって、
検索用文書を分割することで、前記複数の検索用文章ブロックを作成し、
前記複数の検索用文章ブロックのそれぞれについて、
前記複数の文章ブロックのうち少なくとも一部を第1の対象として、前記検索用文章ブロックを検索条件に用いて全文検索を行うことで、前記第1の対象に含まれる文章ブロックそれぞれの、前記検索用文章ブロックに対する関連度を算出するステップと、
前記関連度の高さに基づいて、前記第1の対象の中から第2の対象を決定するステップと、
前記検索用文章ブロックに含まれる文ごとに、前記第2の対象に含まれる文それぞれとの類似度を算出するステップと、
前記類似度を用いて、前記検索用文章ブロックに類似する文章ブロックを少なくとも1つ検索するステップと、を行う、文書検索方法。 - 複数の検索対象文書をそれぞれ分割することで作成された複数の文章ブロックの中から、特定の文章ブロックを検索する文書検索方法であって、
検索用文書の一部である、第1の検索用文章ブロックを準備し、
前記複数の文章ブロックのうち少なくとも一部を第1の対象として、前記第1の検索用文章ブロックに含まれる各文を検索条件に用いて全文検索を行うことで、前記第1の対象に含まれる文それぞれの、前記第1の検索用文章ブロックに含まれる各文に対する第1の関連度を算出し、
前記第1の検索用文章ブロックに含まれる文ごとに、前記第1の関連度の高さに基づいて、前記第1の対象に含まれる文の中から第2の対象を決定し、
前記第1の検索用文章ブロックに含まれる文ごとに、前記第2の対象に含まれる文それぞれとの第1の類似度を算出し、
前記第1の類似度を用いて、前記第1の検索用文章ブロックに類似する文章ブロックを少なくとも1つ検索する、文書検索方法。 - 請求項7において、
前記検索用文書を分割することで、複数の検索用文章ブロックを作成し、
前記第1の検索用文章ブロックは、前記複数の検索用文章ブロックの一つである、文書検索方法。 - 請求項7において、
前記検索用文書の他の一部である、第2の検索用文章ブロックを準備し、
前記複数の文章ブロックのうち少なくとも一部を第3の対象として、前記第2の検索用文章ブロックに含まれる各文を検索条件に用いて全文検索を行うことで、前記第3の対象に含まれる文それぞれの、前記第2の検索用文章ブロックに含まれる各文に対する第2の関連度を算出し、
前記第2の検索用文章ブロックに含まれる文ごとに、前記第2の関連度の高さに基づいて、前記第3の対象に含まれる文の中から第4の対象を決定し、
前記第2の検索用文章ブロックに含まれる文ごとに、前記第4の対象に含まれる文それぞれとの第2の類似度を算出し、
前記第2の類似度を用いて、前記第2の検索用文章ブロックに類似する文章ブロックを少なくとも1つ検索する、文書検索方法。 - 請求項9において、
前記第1の対象と前記第3の対象とは、同一である、文書検索方法。 - 請求項7乃至10のいずれか一において、
前記第1の類似度のうち閾値以上の値を用いて、前記第1の検索用文章ブロックに類似する文章ブロックを少なくとも1つ検索する、文書検索方法。 - 複数の検索用文章ブロックのそれぞれについて、複数の検索対象文書をそれぞれ分割することで作成された複数の文章ブロックの中から、類似する文章ブロックを検索する文書検索方法であって、
検索用文書を分割することで、前記複数の検索用文章ブロックを作成し、
前記複数の検索用文章ブロックのそれぞれについて、
前記複数の文章ブロックのうち少なくとも一部を第1の対象として、前記検索用文章ブロックに含まれる各文を検索条件に用いて全文検索を行うことで、前記第1の対象に含まれる文それぞれの、前記検索用文章ブロックに含まれる各文に対する関連度を算出するステップと、
前記検索用文章ブロックに含まれる文ごとに、前記関連度の高さに基づいて、前記第1の対象に含まれる文の中から第2の対象を決定するステップと、
前記検索用文章ブロックに含まれる文ごとに、前記第2の対象に含まれる文それぞれとの類似度を算出するステップと、
前記類似度を用いて、前記検索用文章ブロックに類似する文章ブロックを少なくとも1つ検索するステップと、を行う、文書検索方法。 - 複数の検索対象文書をそれぞれ分割することで作成された複数の文章ブロックの中から、特定の文章ブロックを検索する文書検索システムであって、
処理部を有し、
前記処理部は、
検索用文書を分割することで作成された複数の検索用文章ブロックの1つである、第1の検索用文章ブロックを準備する機能と、
前記複数の文章ブロックのうち少なくとも一部を第1の対象として、前記第1の検索用文章ブロックを検索条件に用いて全文検索を行うことで、前記第1の対象に含まれる文章ブロックそれぞれの、前記第1の検索用文章ブロックに対する第1の関連度を算出する機能と、
前記第1の関連度の高さに基づいて、前記第1の対象の中から第2の対象を決定する機能と、
前記第1の検索用文章ブロックに含まれる文ごとに、前記第2の対象に含まれる文それぞれとの第1の類似度を算出する機能と、
前記第1の類似度を用いて、前記第1の検索用文章ブロックに類似する文章ブロックを少なくとも1つ検索する機能と、を有する、文書検索システム。 - 複数の検索対象文書をそれぞれ分割することで作成された複数の文章ブロックの中から、特定の文章ブロックを検索するプログラムであって、
検索用文書を分割することで作成された複数の検索用文章ブロックの1つである、第1の検索用文章ブロックを準備するステップと、
前記複数の文章ブロックのうち少なくとも一部を第1の対象として、前記第1の検索用文章ブロックを検索条件に用いて全文検索を行うことで、前記第1の対象に含まれる文章ブロックそれぞれの、前記第1の検索用文章ブロックに対する第1の関連度を算出するステップと、
前記第1の関連度の高さに基づいて、前記第1の対象の中から第2の対象を決定するステップと、
前記第1の検索用文章ブロックに含まれる文ごとに、前記第2の対象に含まれる文それぞれとの第1の類似度を算出するステップと、
前記第1の類似度を用いて、前記第1の検索用文章ブロックに類似する文章ブロックを少なくとも1つ検索するステップと、を、プロセッサに実行させるプログラム。 - 請求項14に記載のプログラムが記憶された非一時的コンピュータ可読記憶媒体。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2020557017A JP7499183B2 (ja) | 2018-11-30 | 2019-11-19 | 翻訳用の文書検索システム |
CN201980076644.XA CN113168415A (zh) | 2018-11-30 | 2019-11-19 | 文件检索方法、文件检索系统、程序以及非暂时性计算机可读存储介质 |
US17/294,930 US20220004570A1 (en) | 2018-11-30 | 2019-11-19 | Document search method, document search system, program, and non-transitory computer readable storage medium |
DE112019005976.9T DE112019005976T5 (de) | 2018-11-30 | 2019-11-19 | Verfahren zur Dokumentensuche, System zur Dokumentensuche, Programm und nicht-transitorisches, von einem Computer lesbares Speichermedium |
KR1020217016842A KR20210095155A (ko) | 2018-11-30 | 2019-11-19 | 문서 검색 방법, 문서 검색 시스템, 프로그램, 및 비일시적 컴퓨터 가독 기억 매체 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2018224825 | 2018-11-30 | ||
JP2018-224825 | 2018-11-30 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020109921A1 true WO2020109921A1 (ja) | 2020-06-04 |
Family
ID=70851931
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2019/059907 WO2020109921A1 (ja) | 2018-11-30 | 2019-11-19 | 文書検索方法、文書検索システム、プログラム、及び非一時的コンピュータ可読記憶媒体 |
Country Status (6)
Country | Link |
---|---|
US (1) | US20220004570A1 (ja) |
JP (1) | JP7499183B2 (ja) |
KR (1) | KR20210095155A (ja) |
CN (1) | CN113168415A (ja) |
DE (1) | DE112019005976T5 (ja) |
WO (1) | WO2020109921A1 (ja) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021005433A1 (ja) * | 2019-07-05 | 2021-01-14 | 株式会社半導体エネルギー研究所 | 読解支援システム及び読解支援方法 |
JP7476201B2 (ja) | 2019-07-19 | 2024-04-30 | 株式会社半導体エネルギー研究所 | テキスト生成方法およびテキスト生成システム |
KR102540939B1 (ko) * | 2022-10-05 | 2023-06-08 | (주)유알피 | 자연어 검색의 적절도 향상 시스템 및 적절도 향상 방법 |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013098886A1 (ja) * | 2011-12-27 | 2013-07-04 | 三菱電機株式会社 | 検索装置 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4238616B2 (ja) | 2003-03-28 | 2009-03-18 | 株式会社日立製作所 | 類似文書検索方法および類似文書検索装置 |
JP2006092135A (ja) | 2004-09-22 | 2006-04-06 | Fuji Xerox Co Ltd | 関連文書検索用コンピュータプログラムならびに関連文書検索システムおよび方法。 |
JP2012104051A (ja) | 2010-11-12 | 2012-05-31 | Kansai Electric Power Co Inc:The | 文書インデックス作成装置 |
US10430445B2 (en) * | 2014-09-12 | 2019-10-01 | Nuance Communications, Inc. | Text indexing and passage retrieval |
CN107491547B (zh) * | 2017-08-28 | 2020-11-10 | 北京百度网讯科技有限公司 | 基于人工智能的搜索方法和装置 |
-
2019
- 2019-11-19 DE DE112019005976.9T patent/DE112019005976T5/de active Pending
- 2019-11-19 KR KR1020217016842A patent/KR20210095155A/ko unknown
- 2019-11-19 US US17/294,930 patent/US20220004570A1/en active Pending
- 2019-11-19 WO PCT/IB2019/059907 patent/WO2020109921A1/ja active Application Filing
- 2019-11-19 CN CN201980076644.XA patent/CN113168415A/zh active Pending
- 2019-11-19 JP JP2020557017A patent/JP7499183B2/ja active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013098886A1 (ja) * | 2011-12-27 | 2013-07-04 | 三菱電機株式会社 | 検索装置 |
Also Published As
Publication number | Publication date |
---|---|
DE112019005976T5 (de) | 2021-08-19 |
JPWO2020109921A1 (ja) | 2020-06-04 |
KR20210095155A (ko) | 2021-07-30 |
JP7499183B2 (ja) | 2024-06-13 |
CN113168415A (zh) | 2021-07-23 |
US20220004570A1 (en) | 2022-01-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020109921A1 (ja) | 文書検索方法、文書検索システム、プログラム、及び非一時的コンピュータ可読記憶媒体 | |
US11341419B2 (en) | Method of and system for generating a prediction model and determining an accuracy of a prediction model | |
US7689574B2 (en) | Index and method for extending and querying index | |
US20210011956A1 (en) | Information search system, intellectual property information search system, information search method, and intellectual property information search method | |
KR20160145785A (ko) | 빅 데이터 질의 엔진을 위한 플래시 최적화된 열 데이터 배치 및 데이터 액세스 처리 알고리즘 | |
US12019636B2 (en) | Document search system, document search method, program, and non-transitory computer readable storage medium | |
US20220207070A1 (en) | Document search system and document search method | |
US9047363B2 (en) | Text indexing for updateable tokenized text | |
US20200387678A1 (en) | Machine translation method, machine translation system, program, and non-transitory computer-readable storage medium | |
US20210256002A1 (en) | Integrated system for entity deduplication | |
Monjalet et al. | Predicting file lifetimes with machine learning | |
JP7453987B2 (ja) | 文書データ処理方法、及び、文書データ処理システム | |
WO2021005433A1 (ja) | 読解支援システム及び読解支援方法 | |
WO2023073500A1 (ja) | 文書検索結果の出力方法、文書検索システム | |
US20230026321A1 (en) | Document retrieval system | |
US20230350949A1 (en) | Document Retrieval System and Method For Retrieving Document | |
US20230334097A1 (en) | Information Retrieval System And Information Retrieval Method | |
US20240004936A1 (en) | Document search system and method for outputting document search result | |
WO2024110824A1 (ja) | 文書検索支援方法、プログラム、文書検索支援システム | |
Wolff et al. | Self-selection bias of similarity metrics in translation memory evaluation | |
WO2024134406A1 (ja) | 文書検索装置、及び文書検索方法 | |
Dai et al. | Author disambiguation: a nonparametric topic and co-authorship model | |
WO2024023624A1 (ja) | 文書閲覧装置 | |
Tambouratzis et al. | Language-independent hybrid MT: Comparative evaluation of translation quality | |
KR20230091995A (ko) | 독해 지원 시스템 및 독해 지원 방법 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19888395 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2020557017 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20217016842 Country of ref document: KR Kind code of ref document: A |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19888395 Country of ref document: EP Kind code of ref document: A1 |