WO2022090849A1 - 読解支援システム及び読解支援方法 - Google Patents
読解支援システム及び読解支援方法 Download PDFInfo
- Publication number
- WO2022090849A1 WO2022090849A1 PCT/IB2021/059488 IB2021059488W WO2022090849A1 WO 2022090849 A1 WO2022090849 A1 WO 2022090849A1 IB 2021059488 W IB2021059488 W IB 2021059488W WO 2022090849 A1 WO2022090849 A1 WO 2022090849A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- designated
- phrases
- words
- document
- phrase
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 60
- 230000000295 complement effect Effects 0.000 claims abstract description 45
- 238000012545 processing Methods 0.000 claims description 102
- 230000006870 function Effects 0.000 claims description 59
- 238000003860 storage Methods 0.000 claims description 33
- 239000004065 semiconductor Substances 0.000 description 35
- 238000004891 communication Methods 0.000 description 28
- 230000015654 memory Effects 0.000 description 26
- 229910044991 metal oxide Inorganic materials 0.000 description 23
- 150000004706 metal oxides Chemical class 0.000 description 23
- 230000005540 biological transmission Effects 0.000 description 18
- 150000001875 compounds Chemical class 0.000 description 15
- 238000004458 analytical method Methods 0.000 description 14
- 230000008569 process Effects 0.000 description 14
- 238000013528 artificial neural network Methods 0.000 description 12
- 238000010586 diagram Methods 0.000 description 12
- 230000000877 morphologic effect Effects 0.000 description 7
- 239000013598 vector Substances 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 6
- 238000010801 machine learning Methods 0.000 description 6
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 5
- 238000004140 cleaning Methods 0.000 description 5
- 239000012212 insulator Substances 0.000 description 5
- 229910052710 silicon Inorganic materials 0.000 description 5
- 239000010703 silicon Substances 0.000 description 5
- 229910052738 indium Inorganic materials 0.000 description 4
- APFVFJFRJDLVQX-UHFFFAOYSA-N indium atom Chemical compound [In] APFVFJFRJDLVQX-UHFFFAOYSA-N 0.000 description 4
- 229910001218 Gallium arsenide Inorganic materials 0.000 description 3
- PXHVJJICTQNCMI-UHFFFAOYSA-N Nickel Chemical compound [Ni] PXHVJJICTQNCMI-UHFFFAOYSA-N 0.000 description 3
- HCHKCACWOHOZIP-UHFFFAOYSA-N Zinc Chemical compound [Zn] HCHKCACWOHOZIP-UHFFFAOYSA-N 0.000 description 3
- 230000009471 action Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 239000002245 particle Substances 0.000 description 3
- 239000000126 substance Substances 0.000 description 3
- 229910052725 zinc Inorganic materials 0.000 description 3
- 239000011701 zinc Substances 0.000 description 3
- ATJFFYVFTNAWJD-UHFFFAOYSA-N Tin Chemical compound [Sn] ATJFFYVFTNAWJD-UHFFFAOYSA-N 0.000 description 2
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 description 2
- 239000004020 conductor Substances 0.000 description 2
- 239000013604 expression vector Substances 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 238000012015 optical character recognition Methods 0.000 description 2
- 229910052760 oxygen Inorganic materials 0.000 description 2
- 239000001301 oxygen Substances 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 239000010936 titanium Substances 0.000 description 2
- ZOXJGFHDIHLPTG-UHFFFAOYSA-N Boron Chemical compound [B] ZOXJGFHDIHLPTG-UHFFFAOYSA-N 0.000 description 1
- 229910052684 Cerium Inorganic materials 0.000 description 1
- GYHNNYVSQQEPJS-UHFFFAOYSA-N Gallium Chemical compound [Ga] GYHNNYVSQQEPJS-UHFFFAOYSA-N 0.000 description 1
- XEEYBQQBJWHFJM-UHFFFAOYSA-N Iron Chemical compound [Fe] XEEYBQQBJWHFJM-UHFFFAOYSA-N 0.000 description 1
- ZOKXTWBITQBERF-UHFFFAOYSA-N Molybdenum Chemical compound [Mo] ZOKXTWBITQBERF-UHFFFAOYSA-N 0.000 description 1
- 229910052779 Neodymium Inorganic materials 0.000 description 1
- RTAQQCXQSZGOHL-UHFFFAOYSA-N Titanium Chemical compound [Ti] RTAQQCXQSZGOHL-UHFFFAOYSA-N 0.000 description 1
- 229910052782 aluminium Inorganic materials 0.000 description 1
- XAGFODPZIPBFFR-UHFFFAOYSA-N aluminium Chemical compound [Al] XAGFODPZIPBFFR-UHFFFAOYSA-N 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 229910052796 boron Inorganic materials 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- GWXLDORMOJMVQZ-UHFFFAOYSA-N cerium Chemical compound [Ce] GWXLDORMOJMVQZ-UHFFFAOYSA-N 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 210000001072 colon Anatomy 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000012553 document review Methods 0.000 description 1
- 238000001962 electrophoresis Methods 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 229910052733 gallium Inorganic materials 0.000 description 1
- YZZNJYQZJKSEER-UHFFFAOYSA-N gallium tin Chemical compound [Ga].[Sn] YZZNJYQZJKSEER-UHFFFAOYSA-N 0.000 description 1
- 229910052732 germanium Inorganic materials 0.000 description 1
- GNPVGFCGXDBREM-UHFFFAOYSA-N germanium atom Chemical compound [Ge] GNPVGFCGXDBREM-UHFFFAOYSA-N 0.000 description 1
- 229910052735 hafnium Inorganic materials 0.000 description 1
- VBJZVLUMGGDVMO-UHFFFAOYSA-N hafnium atom Chemical compound [Hf] VBJZVLUMGGDVMO-UHFFFAOYSA-N 0.000 description 1
- 230000012447 hatching Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 229910052746 lanthanum Inorganic materials 0.000 description 1
- FZLIPJUXYLNCLC-UHFFFAOYSA-N lanthanum atom Chemical compound [La] FZLIPJUXYLNCLC-UHFFFAOYSA-N 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000012528 membrane Substances 0.000 description 1
- 239000003094 microcapsule Substances 0.000 description 1
- 229910052750 molybdenum Inorganic materials 0.000 description 1
- 239000011733 molybdenum Substances 0.000 description 1
- QEFYFXOXNSNQGX-UHFFFAOYSA-N neodymium atom Chemical compound [Nd] QEFYFXOXNSNQGX-UHFFFAOYSA-N 0.000 description 1
- 229910052759 nickel Inorganic materials 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000001151 other effect Effects 0.000 description 1
- 230000003647 oxidation Effects 0.000 description 1
- 238000007254 oxidation reaction Methods 0.000 description 1
- KYKLWYKWCAYAJY-UHFFFAOYSA-N oxotin;zinc Chemical compound [Zn].[Sn]=O KYKLWYKWCAYAJY-UHFFFAOYSA-N 0.000 description 1
- 239000000843 powder Substances 0.000 description 1
- 239000002096 quantum dot Substances 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- VSZWPYCFIRKVQL-UHFFFAOYSA-N selanylidenegallium;selenium Chemical compound [Se].[Se]=[Ga].[Se]=[Ga] VSZWPYCFIRKVQL-UHFFFAOYSA-N 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012916 structural analysis Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 229910052715 tantalum Inorganic materials 0.000 description 1
- GUVRBAGPIYLISA-UHFFFAOYSA-N tantalum atom Chemical compound [Ta] GUVRBAGPIYLISA-UHFFFAOYSA-N 0.000 description 1
- JBQYATWDVHIOAR-UHFFFAOYSA-N tellanylidenegermanium Chemical compound [Te]=[Ge] JBQYATWDVHIOAR-UHFFFAOYSA-N 0.000 description 1
- 229910001887 tin oxide Inorganic materials 0.000 description 1
- 229910052719 titanium Inorganic materials 0.000 description 1
- WFKWXMTUELFFGS-UHFFFAOYSA-N tungsten Chemical compound [W] WFKWXMTUELFFGS-UHFFFAOYSA-N 0.000 description 1
- 229910052721 tungsten Inorganic materials 0.000 description 1
- 239000010937 tungsten Substances 0.000 description 1
- 229910052727 yttrium Inorganic materials 0.000 description 1
- VWQVUPCCIRVNHF-UHFFFAOYSA-N yttrium atom Chemical compound [Y] VWQVUPCCIRVNHF-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
- G06F16/345—Summarisation for human users
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/9038—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/93—Document management systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/20—Drawing from basic elements, e.g. lines or circles
- G06T11/206—Drawing of charts or graphs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
Definitions
- One aspect of the present invention relates to a document reading comprehension support system and a reading comprehension support method.
- one aspect of the present invention is not limited to the above technical fields.
- the technical fields of one aspect of the present invention include semiconductor devices, display devices, light emitting devices, power storage devices, storage devices, electronic devices, lighting devices, input devices (for example, touch sensors, etc.), input / output devices (for example, touch panels, etc.). ), Their driving method, or their manufacturing method can be given as an example.
- Patent Document 1 When reading a document, how to read the document depends on the purpose of the reader or the type of document. In some cases, it is read throughout the document, and in other cases, it is sufficient to search the document for the necessary information and read only the relevant part for the purpose of finding the necessary information for the reader.
- searching for necessary information in a document there is a method of using a table of contents or an index. If it is an electronic document, there is also a method of searching with a keyword word to find desired information. Further, a method of structurally analyzing a document according to a set rule has been proposed (Patent Document 1).
- One aspect of the present invention is to provide a document reading comprehension support system or a document reading comprehension support method that accurately presents information necessary for a user.
- One aspect of the present invention is to provide a reading comprehension support system or a reading comprehension support method for supporting a user to understand a document.
- One aspect of the present invention is to provide a document reading comprehension support system or a document reading comprehension support method that is easy for a user to operate.
- One aspect of the present invention is a reading comprehension support system having a reception unit, a processing unit, and an output unit.
- the reception unit has a function of accepting a designated document and a function of accepting a plurality of designated words and phrases.
- the processing unit has a function of creating a first graph showing the structure of the designated document using words and phrases included in the designated document, and a function of searching the first graph using a plurality of designated words and phrases.
- the output unit has a function of outputting a plurality of words and phrases included in the first graph and a function of outputting the search result of the first graph.
- the plurality of designated words / phrases are at least a part of the plurality of words / phrases included in the first graph.
- the output unit preferably outputs at least a second graph showing the shortest path between any two of the plurality of designated words in the first graph. It is preferable that the output unit has a function of outputting a sentence including the designated phrase in a paragraph containing two or more designated phrases in the designated document.
- the shortest path is a route connecting any two of the plurality of designated words and phrases via at least one complementary phrase, and the complementary phrase is preferably a phrase different from the plurality of designated words and phrases.
- the output unit has a function of outputting a sentence including at least one of the designated phrase and the complementary phrase in the paragraph containing at least one of the plurality of designated phrases and at least one of the complementary phrases in the designated document. ..
- the output unit preferably outputs, as a search result, at least a second graph showing the shortest path between each of the plurality of designated words in the first graph. It is preferable that the output unit has a function of outputting a sentence including the designated phrase in a paragraph containing two or more designated phrases in the designated document.
- the shortest path connecting any two of a plurality of designated words is a route connecting the two designated words via at least one complementary phrase, and the complementary phrase may be a phrase different from the plurality of designated words.
- the output unit has a function of outputting a sentence including at least one of the designated phrase and the complementary phrase in the paragraph containing at least one of the plurality of designated phrases and at least one of the complementary phrases in the designated document. ..
- the reading comprehension support system of one aspect of the present invention preferably further has a storage unit for storing search results.
- a designated document is accepted, a first graph showing the structure of the designated document is created using words and phrases contained in the designated document, and two or more words and phrases included in the first graph are output.
- This is a reading comprehension support method that accepts a plurality of designated words and phrases from the output words and phrases, searches the first graph using the plurality of designated words and phrases, and outputs the search results.
- the shortest path is a route connecting any two of the plurality of designated words and phrases via at least one complementary phrase
- the complementary phrase is preferably a phrase different from the plurality of designated words and phrases. It is preferable to output a sentence including at least one of the designated phrase and the complementary phrase in the paragraph containing at least one of the plurality of designated phrases and at least one of the complementary phrases in the designated document together with the search result.
- the search result it is preferable to output at least a second graph showing the shortest path between each of the plurality of designated words in the first graph. It is preferable to output the sentence including the designated phrase in the paragraph containing two or more designated phrases in the designated document together with the search result.
- the shortest path connecting any two of a plurality of designated words is a route connecting the two designated words via at least one complementary phrase, and the complementary phrase may be a phrase different from the plurality of designated words. preferable. It is preferable to output a sentence including at least one of the designated phrase and the complementary phrase in the paragraph containing at least one of the plurality of designated phrases and at least one of the complementary phrases in the designated document together with the search result.
- a document reading comprehension support system or a document reading comprehension support method that accurately presents information necessary for a user.
- FIG. 1 is a diagram showing an example of a reading comprehension support system.
- FIG. 2 is a diagram showing an example of a reading comprehension support method.
- 3A to 3D are diagrams showing an example of a reading comprehension support method.
- 4A to 4E are diagrams showing an example of a reading comprehension support method.
- 5A to 5C are diagrams showing an example of a graph.
- FIG. 6 is a diagram showing an example of output contents.
- FIG. 7 is a diagram showing an example of a graph.
- FIG. 8 is a diagram showing an example of a reading comprehension support system.
- FIG. 9 is a diagram showing an example of a reading comprehension support system.
- membrane and the word “layer” can be interchanged with each other in some cases or depending on the situation.
- conductive layer can be changed to the term “conductive layer”.
- insulating film can be changed to the term “insulating layer”.
- a designated document is accepted, a first graph showing the structure of the designated document is created using words and phrases contained in the designated document, and two words and phrases included in the first graph are created. Output above. Then, a plurality of designated words / phrases are accepted from the output words / phrases, the first graph is searched using the plurality of designated words / phrases, and the search result is output.
- the graph can also be referred to as a graph structure.
- words and phrases that exist at close positions in a document can be directly connected to each other. For example, if two words and phrases exist in the same sentence, the two words and phrases can be directly connected. Also, for example, if two words and phrases exist in the same paragraph, the two words and phrases can be directly connected. Further, for example, when a sentence containing one phrase exists in the vicinity of a sentence containing the other phrase for two words (for example, it exists within n sentences before and after (n is an integer of 1 or more)), the two words are concerned. You can connect two words directly. In this way, it is possible to create a graph showing the structure of a document by connecting words and phrases that are close to each other in the document. By creating such a graph, it is possible to show the relevance of each word in the document.
- the user of the reading comprehension support system specifies a document to be read as a designated document.
- the user further specifies a plurality of keywords related to the information to be obtained as a designated phrase.
- the reading comprehension support system accepts a designated document, creates a first graph, and then outputs words and phrases contained in the first graph.
- the user of the reading comprehension support system can select a keyword from the output words and phrases. Therefore, it is easy to select a keyword, it is difficult for a difference in user's skill to occur, and it is possible to quickly find necessary information from a document.
- each keyword is scattered in the document, and it may be difficult to understand the relationship between the selected multiple keywords.
- the index of a book is used to refer to the description of a plurality of keywords, the contents may not be connected. Therefore, it may take time to search and read comprehension, such as increasing the number of keywords or reading between a plurality of referenced pages.
- the reading comprehension support system of one aspect of the present invention can output a second graph showing the relevance of a plurality of designated words by searching the first graph using the received plurality of designated words. As a result, the user can easily grasp the relevance of the designated phrase. Further, the reading comprehension support system according to one aspect of the present invention can extract and output a sentence including a plurality of designated words and phrases designated by the user. The user can efficiently obtain the necessary information by reading the extracted sentence.
- the reading comprehension support system of one aspect of the present invention can present the shortest path between each of the plurality of designated words in the first graph. For example, by outputting a second graph showing the shortest path, it is possible to present the user with the relevance of a plurality of designated words.
- the shortest path between the first designated phrase and the second designated phrase may include other designated phrases.
- the user can grasp the relevance of a plurality of designated words and deepen the understanding of the document.
- the shortest path may include complementary words that are different from the plurality of designated words.
- complementary words and phrases not specified by the user, it is possible to promote understanding and understanding of the contents of the document.
- the user can deepen the understanding of the document by grasping the complementary phrase itself and the relationship between the complementary phrase and the designated phrase.
- the complementary phrase is a phrase included in the designated document (that is, a phrase included in the first graph) and is different from the designated phrase.
- the reading comprehension support system of one aspect of the present invention can output a sentence including a designated phrase in a designated document together with the second graph. At this time, for example, all the sentences including any of the designated words can be output. However, depending on the specified phrase, there are cases where too many sentences are output and it takes time to reach the information that the user wants.
- the reading comprehension support system of one aspect of the present invention extracts and outputs a sentence from a document based on each shortest path.
- a sentence including a designated phrase in a paragraph containing two or more designated phrases in a designated document it is possible to output a sentence including at least one of the designated phrase and the complementary phrase in a paragraph containing at least one of the plurality of designated phrases and at least one of the complementary phrases in the designated document.
- the user can efficiently confirm the sentence necessary for grasping the relevance of a plurality of designated words. And the necessary information can be obtained quickly.
- the reading comprehension support system of one aspect of the present invention presents at least the shortest path between any two of a plurality of designated words. That is, the reading comprehension support system of one aspect of the present invention may present the shortest path between some designated words and phrases, and the reading comprehension support system of one aspect of the present invention may present the shortest path between all the designated words and phrases. May be presented.
- two designated words may not be connected even if they are connected to each other, and the route may not be shown. Further, for example, if a criterion for determining the high degree of relevance of two designated words is set and the system determines that the two designated words are highly related, the shortest path of the two designated words may be presented. good. Specifically, when the shortest path of two designated words is connected via a predetermined number of words or less, it can be determined that the two designated words are highly related. On the contrary, when the shortest path of two designated words is connected via more than a predetermined number of words, it can be determined that the two designated words are less related.
- the reading comprehension support system of one aspect of the present invention can also be used for document review. For example, you may find an isolated phrase that is not associated with another designated phrase. At this time, the reading comprehension support system of one aspect of the present invention may output a phrase that is not associated with another designated phrase as an isolated phrase. In addition, the content of the output graph may differ from the assumption, such as the related designated words and phrases are not connected to each other. At this time, there is a possibility that errors or omissions have occurred in the document. As described above, by using the reading comprehension support system of one aspect of the present invention, the document can be efficiently reviewed.
- the reading comprehension support system of one aspect of the present invention can also be used to grasp one or both of the relevance and differences of a plurality of documents.
- the reading comprehension support system according to one aspect of the present invention creates a first graph showing the structure of a plurality of designated documents using words and phrases contained in each designated document, and searches for each first graph. And the search result can be output. The user can also easily confirm the relevance and differences of a plurality of documents by comparing the output results.
- the reading comprehension support system may have a function of comparing search results for a plurality of documents and presenting at least one of relevance and difference.
- the reading comprehension support system of one aspect of the present invention can create a graph showing the shortest path between designated words in each document as a search result. Then, by vectorizing the graph and calculating the similarity of each vector, the similarity of a plurality of documents can be evaluated.
- each first graph may be output, and the designated words / phrases may be accepted for each designated document.
- a designated phrase common to all designated documents may be accepted. If synonyms or synonyms exist in other designated documents for words and phrases contained in a certain designated document, it is preferable to link these words and phrases. For example, if "insulating film” and "insulating layer” are linked and “insulating film” is selected as the designated phrase, in one designated document, the graph is searched using "insulating film", and in another designated document, the graph is searched. The graph may be searched using the "insulating layer".
- FIG. 1 shows a block diagram of the reading comprehension support system 100.
- the reading comprehension support system 100 includes a reception unit 110, a storage unit 120, a processing unit 130, an output unit 140, and a transmission line 150.
- the reading comprehension support system 100 may be provided in an information processing device such as a personal computer used by the user.
- the server may be provided with a processing unit of the reading comprehension support system 100, and may be accessed and used from the client PC via the network.
- the reception unit 110 receives the designated document. In addition, the reception unit accepts designated words and phrases. The data supplied to the reception unit 110 is supplied to one or both of the storage unit 120 and the processing unit 130 via the transmission line 150.
- a document is a description of an event in natural language, which is digitized and machine-readable.
- Documents include, but are not limited to, patent application documents, case law, contracts, contracts, product manuals, novels, publications, white papers, technical documents, and the like.
- the storage unit 120 has a function of storing a program executed by the processing unit 130. Further, it is preferable that the storage unit 120 has a function of storing the graph generated by the processing unit 130. The graph should be associated with the document so that you can see from which document it was created. Further, the storage unit 120 may have a function of storing the calculation result and the inference result generated by the processing unit 130, the data input to the reception unit 110, and the like.
- the storage unit 120 has at least one of a volatile memory and a non-volatile memory.
- volatile memory include DRAM (Dynamic Random Access Memory) and SRAM (Static Random Access Memory).
- SRAM Static Random Access Memory
- non-volatile memory ReRAM (Restive Random Access Memory, also referred to as resistance change type memory), PRAM (Phase-change Random Access Memory), FeRAM (Feroelectric Random Memory Memory Access Memory), FeRAM (Feroelectric Random Memory Access Memory) Also referred to as), flash memory, and the like.
- the storage unit 120 may have a recording media drive. Examples of the recording media drive include a hard disk drive (Hard Disk Drive: HDD), a solid state drive (Solid State Drive: SSD), and the like.
- the storage unit 120 may have a database having document data.
- the reading comprehension support system 100 may have a function of extracting document data from a database existing outside the system.
- the reading comprehension support system may have a function of retrieving data from a database existing outside the system.
- the reading comprehension support system 100 may have a function of extracting data from both its own database and an external database.
- the database can be configured to include, for example, one or both of text data and image data.
- the database instead of the database, one or both of the storage and the file server may be used.
- the database when using a file owned by a file server, it is preferable that the database has a path of the file stored in the file server.
- the database may be an application database.
- the application include a patent application, a utility model registration application, and an application relating to intellectual property such as a design registration application.
- the status of each application is not limited, and it does not matter whether it is published, whether it is pending at the JPO, or whether it is registered.
- the application database can have at least one of a pre-examination application, an under-examination application, and a registered application, and may have all of them.
- the application database preferably has one or both of the specification and claims in a plurality of patent applications.
- the specification and claims are stored, for example, as text data.
- the application database contains the application management number (including the company's own number) for identifying the application, the application family management number for identifying the application family, the application number, the publication number, the registration number, the drawing, the abstract, the filing date, etc. It may have at least one such as priority date, publication date, status, classification (patent classification, utility model classification, etc.), category, and keyword. Each of these pieces of information may be used to identify a document when accepting a designated document. Alternatively, each of these pieces of information may be output together with the processing result of the processing unit 130.
- the database has at least the textual data of the document.
- the database may further have at least one number, title, date such as publication date, author, publisher, etc. that identifies each document.
- Each of these pieces of information may be used to identify a document when accepting a designated document. Alternatively, each of these pieces of information may be output together with the processing result of the processing unit 130.
- the processing unit 130 has a function of performing processing such as calculation and inference using data supplied from one or both of the reception unit 110 and the storage unit 120. Further, the processing unit 130 has a function of performing processing using various data included in the database. The processing unit 130 can supply processing results such as calculation results and inference results to one or both of the storage unit 120 and the output unit 140.
- the processing unit 130 has a function of performing morphological analysis.
- the processing unit 130 has a function of dividing each sentence included in the document into the smallest unit (also referred to as a token, a morpheme, a word, etc.) having a meaning in the language, and discriminating the part of speech of each token.
- the process of dividing each sentence into the smallest units can also be called lexical analysis.
- the processing unit 130 preferably has a function of performing compound word analysis. In other words, it is preferable to have a function of performing morphological analysis in consideration of compound words (compound nouns and the like). For example, the processing unit 130 has a function of generating a new token whose part of speech is a compound noun (redefining the token) by combining several tokens in order to group consecutive nouns in one sentence. Is preferable. Even if the part of speech of the token is a compound noun, the part of speech of the token may be simply described as a noun.
- the processing unit 130 has a function of calculating the distance between each token. For example, it is preferable that the processing unit 130 can acquire information that the two tokens are in the same sentence or in the same paragraph. Further, it is preferable that the processing unit 130 can calculate how many paragraphs, sentences, words, or character strings the two tokens are separated from each other.
- the processing unit 130 has a function of acquiring related words of each token.
- Related words include synonyms, synonyms, hypernyms, and hyponyms.
- the processing unit 130 has a function of calculating the degree of similarity between the tokens.
- a dictionary such as a concept dictionary.
- the dictionary may be possessed by the reading comprehension support system or may be provided outside the system.
- a conceptual dictionary is a list with word classifications, relationships with other words, and so on.
- the concept dictionary may be an existing concept dictionary.
- a concept dictionary specialized in the field of documents may be created.
- words are vectorized (quantified), one or both of the similarity and distance between multiple words are calculated, and the degree of similarity between multiple words or the closeness of distance is used as a basis. You may get the related words of the node.
- Examples of the method for obtaining the similarity between the two vectors include cosine similarity, covariance, unbiased covariance, and Pearson's product-moment correlation coefficient. Of these, it is particularly preferable to use the cosine similarity.
- Methods for determining the distance between the two vectors include the Euclidean distance, the standard (standardized, average) Euclidean distance, the Maharanobis distance, the Manhattan distance, the Chebyshev distance, and the Minkowski distance.
- the processing unit 130 may have a function of calculating the appearance frequency of each token. For example, it is preferable to calculate the TF (Term Frequency) value of each token.
- the TF value can represent the frequency of appearance of each token in the designated document.
- the processing unit 130 may have a function of calculating the importance of each token. For example, it is preferable to calculate the TF-IDF (Term Frequency-Inverse Document Frequency) value of each token.
- the IDF value represents the degree to which tokens appear concentrated in some documents. The IDF value of the token that appears in many documents is small, and the IDF value of the token that appears only in some documents is large. For example, it is preferable to calculate the IDF value of the token using the document contained in the database. By obtaining the product of the TF value and the IDF value of each token, it is possible to calculate the score of whether or not the token is a token that characterizes the designated document.
- the processing unit 130 has a function of creating a graph showing the structure of the document by using the words and phrases contained in the document.
- the graph has nodes (vertices) and edges (edges). Each node and edge can have a label.
- the above token can be used as the label of the node.
- a token whose part of speech is a noun can be used as a label for a node.
- the edge label the distance between each of the above tokens and one or both of the related terms of each token can be used.
- a directed graph using an edge having an orientation or an undirected graph using an edge having no orientation may be created.
- edges are connected by edges.
- the edge between the two nodes may be single or plural.
- straight lines and curves can be used to represent the edges.
- the structure of one document may be represented by a plurality of graphs.
- both directed and undirected graphs may be used to represent the structure of a document.
- An unorientated edge preferably connects the two nodes so that the relationship between the two nodes in the document can be understood.
- the conditions for connecting nodes are that nodes in the same sentence are connected by edges, nodes in the same paragraph are connected by edges, and within a predetermined distance (for example, a certain number of words or a certain number of characters). For example, connecting nodes with edges.
- the processing unit 130 has a function of performing parsing. In other words, it is preferable that the processing unit 130 has a function of dividing each sentence included in the document into tokens, determining the part of speech of each token, and determining the dependency of each token. Note that some of the processes included in the syntax analysis can also be referred to as the above-mentioned lexical analysis or morphological analysis. By parsing, the direction of the dependency can be indicated by an arrow in the directed graph.
- the edge may be directed from a node that appears earlier to a node that appears later. Further, the direction of the edge may be determined based on the relationship between the dependencies obtained by parsing, the relationship between the hypernym and the hyponym, the frequency of occurrence, or the importance of the word.
- the graph may be created based on the rules based on the relationship between the dependencies of the tokens. Further, the graph may be created using a trained model using machine learning. For example, a conditional random field (CRF) may be used to perform machine learning to label nodes and edges based on a list of tokens. This makes it possible to label nodes and edges based on the list of tokens.
- CRF conditional random field
- RNN recurrent neural network
- LSTM long short-term memory
- a list of tokens is input and the orientation of the node and edge is output. You may study. This allows you to output the node and edge orientations from the list of tokens.
- the processing unit 130 has a function of searching for the created graph.
- the processing unit 130 can find the shortest path between each of the plurality of words.
- Examples of the method for finding the shortest path include the Dijkstra method, the Bellman-Ford method, and the Floyd-Warshall method.
- the route with the smallest number of included nodes (phrases) can be the shortest route.
- the processing unit 130 has a function of creating a graph showing the shortest path between each of the plurality of designated words.
- the graph created by the processing unit 130 is output by the output unit 140.
- the processing unit 130 has a function of vectorizing a graph which is a search result (for example, a graph showing the shortest path between each of a plurality of designated words).
- a search result for example, a graph showing the shortest path between each of a plurality of designated words.
- Examples of the method for vectorizing a graph include the Weisfiler-Lehman kernel.
- the processing unit 130 has a function of calculating the similarity of the vectors. This makes it possible to vectorize the graph that is the search result of a plurality of documents and calculate the similarity of the plurality of documents.
- the similarity of a plurality of documents may be determined with high accuracy by using a graph created by abstracting tokens.
- abstracting the token the document can be grasped conceptually. Therefore, it is not easily affected by the structure and expression of the document, and the similarity can be calculated based on the concept of the document.
- the processing unit 130 may create both a graph created by abstracting the tokens for reading comprehension support and a graph created by abstracting the tokens for calculating the similarity. ..
- the abstraction of tokens means replacing tokens with representative words or hypernyms.
- the acquisition of the representative word and the hypernym may be performed by using a concept dictionary or by machine learning.
- the abstraction of tokens is carried out, for example, by vectorizing the tokens with the morphemes contained in the tokens and classifying them by a classifier.
- a classifier an algorithm such as a decision tree, a support vector machine, a random forest, or a multi-layer perceptron may be used.
- “oxide semiconductors”, “amorphous semiconductors”, “silicon semiconductors”, and “GaAs semiconductors” may be classified into “semiconductors”.
- oxide semiconductor layer In addition, “oxide semiconductor layer”, “oxide semiconductor film”, “amorphous semiconductor layer”, “amorphous semiconductor film”, “silicon semiconductor layer”, “silicon semiconductor film”, “GaAs semiconductor layer” and “GaAs semiconductor” “Film” should also be classified as “semiconductor”.
- the processing unit 130 may have, for example, an arithmetic circuit.
- the processing unit 130 can have, for example, a central processing unit (CPU: Central Processing Unit).
- the processing unit 130 may have a microprocessor such as a DSP (Digital Signal Processor) or a GPU (Graphics Processing Unit).
- the microprocessor may have a configuration realized by a PLD (Programmable Logic Device) such as FPGA (Field Programmable Gate Array) or FPGA (Field Programmable Analog Array).
- PLD Programmable Logic Device
- FPGA Field Programmable Gate Array
- FPGA Field Programmable Analog Array
- the processing unit 130 may have a main memory.
- the main memory has at least one of a volatile memory such as RAM (Random Access Memory) and a non-volatile memory such as ROM (Read Only Memory).
- RAM Random Access Memory
- ROM Read Only Memory
- RAM for example, DRAM, SRAM, or the like is used, and a memory space is virtually allocated and used as a work space of the processing unit 130.
- the operating system, application program, program module, program data, lookup table, etc. stored in the storage unit 120 are loaded into the RAM for execution. These data, programs, and program modules loaded in the RAM are each directly accessed and operated by the processing unit 130.
- the ROM can store BIOS (Basic Input / Output System), firmware, and the like that do not require rewriting.
- BIOS Basic Input / Output System
- Examples of the ROM include a mask ROM, an OTPROM (One Time Program Read Only Memory), an EPROM (Erasable Programmable Read Only Memory), and the like.
- Examples of EPROM include UV-EPROM (Ultra-Violet Erasable Project Only Memory), EEPROM (Electrically Erasable Erasable Memory), etc., which enable erasure of stored data by ultraviolet irradiation.
- the reading comprehension support system uses artificial intelligence (AI: Artificial Intelligence) for at least a part of the processing.
- AI Artificial Intelligence
- ANN Artificial Neural Network
- neural network are realized by circuits (hardware) or programs (software).
- the neural network refers to a general model that imitates the neural network of an organism, determines the connection strength between neurons by learning, and has problem-solving ability.
- the neural network has an input layer, an intermediate layer (hidden layer), and an output layer.
- determining the connection strength (also referred to as a weighting coefficient) between neurons from existing information may be referred to as "learning”.
- the output unit 140 outputs information based on the processing result of the processing unit 130. For example, one or both of the calculation result and the inference result in the processing unit 130 can be supplied to the outside of the reading comprehension support system 100. Further, the output unit 140 can output various data included in the database based on the processing result of the processing unit 130. The output unit 140 can output information to a display, a speaker, or the like used by the user.
- the transmission line 150 has a function of transmitting data. Data can be transmitted / received between the reception unit 110, the storage unit 120, the processing unit 130, and the output unit 140 via the transmission line 150.
- the reading comprehension support method of one aspect of the present invention includes the processes from step S1 to step S6 shown in FIG.
- step S1 the designated document is accepted.
- the designated document is, for example, a document that the user wants to read.
- the designated document may be singular or plural.
- the user can directly input the text data of the designated document. Further, the image data of one or both of the drawings and the table included in the designated document may be input together with the text data.
- the voice data or image data is converted into text data before proceeding to step S2.
- the designated document is a document included in the database or the like
- the user can specify the document to be read by inputting the information for specifying the document (searching the database).
- the reading comprehension support system extracts data related to the designated document (specifically, data necessary for subsequent processing) from a database or the like based on the information input by the user.
- Information that identifies the document includes a number that identifies the document, a title, and the like.
- the user may specify a part of the document when he / she wants to read a part of the designated document (for example, a specific chapter).
- step S2 a graph showing the structure of the designated document is created using the words and phrases included in the designated document.
- a graph is created for each designated document.
- one or more graphs can be created for one designated document.
- each sentence is divided into tokens, the part of speech of each token is determined, and the dependency of each token is further determined.
- step S2 it is preferable to perform compound word analysis. That is, it is preferable to generate a new token by combining several tokens after the part of speech of the token is determined. For example, it is possible to combine consecutive nouns in one sentence into one to generate a new token whose part of speech is a compound noun.
- each token is used for the label of the node, and each node is connected at the edge.
- the conditions for connecting the nodes at the edge can be determined as appropriate.
- a node connecting at an edge can be determined based on the distance in the document between the tokens used for the node's label.
- the two words and phrases can be directly connected.
- the two words and phrases can be directly connected.
- a sentence containing one phrase exists in the vicinity of the sentence containing the other phrase (for example, it exists within n sentences before and after (n is an integer of 1 or more, preferably 1 or more). In the case of an integer of 5 or less, more preferably an integer of 3 or more and 5 or less)), the two words can be directly connected.
- one or both of the appearance frequency and importance of each token may be calculated in order to determine the direction of the edge.
- step S2 it is preferable to acquire one or both of the token distance information and the token relevance information.
- the acquired token distance information and the token relevance information can be displayed in characters as edge labels when the graph is visualized.
- the color or thickness of the edges may be determined according to the closeness of the distance.
- the color or thickness of the edges may be determined depending on the strength of the relevance.
- edge information For example, as the distance information of two tokens, whether the two tokens were in the same sentence, in the same paragraph, how many paragraphs, sentences, words, or character strings were separated, etc. are registered as edge information. be able to.
- the edge information As information relating to the relevance of two words, it is possible to indicate that one word is a related word of the other word, the degree of relevance of the two words, and the like on the edge label.
- Related words include synonyms, synonyms, hypernyms, and hyponyms.
- other tokens words such as noun phrases, verb phrases, adverb phrases, etc.
- sentences indicating the relationship between the two words can be registered as edge information.
- FIGS. 3A to 3D show Japanese and the corresponding Roman alphabet.
- FIG. 3A shows the sentence 300 "the oxide semiconductor layer is above the insulator layer (SANKABUTSUHANDOUTAISOUHAZETSUENTAISOUNOJOUHOUNIARU)".
- step S2 the sentence 300 is morphologically analyzed, the sentence 300 is divided into a plurality of tokens, and the part of speech of each token is determined.
- sentence 300 is divided into 12 tokens from token 301 to token 312.
- the part of speech is written below each token.
- the character string of the token 301 shown in FIG. 3B is "oxidation (SANKA)", the character string of the token 302 is “thing (BUTSU)”, and the character string of the token 303 is “semiconductor”. (HANDOUTAI) ”, and the character string of the token 304 is“ layer (SOU) ”.
- the part of speech of tokens 301 to 304 is a noun. Therefore, as shown in FIG. 3C, they are combined into one token 321.
- the character string of the token 321 is "SANKABUTSUHANDOUTIASOU", and the part of speech is a noun (compound noun).
- the character string of the token 305 shown in FIGS. 3B and 3C is "ha (HA)", and the part of speech is a particle.
- the character string of the token 306 shown in FIG. 3B is "ZETSUEN"
- the character string of the token 307 is “body (TAI)”
- the character string of the token 308 is “layer (SOU)”.
- the part of speech of tokens 306 to 308 are all nouns. Therefore, as shown in FIG. 3C, they are combined into one token 322.
- the character string of the token 322 is an "insulator layer (ZETSUENTAISOU)", and the part of speech is a noun (compound noun).
- the character string of the token 309 shown in FIGS. 3B and 3C is "(NO)", and the part of speech is a particle.
- the character string of the token 310 is "JOUHOU”, and the part of speech is a noun.
- the character string of the token 311 is "ni (NI)”, and the part of speech is a particle.
- the character string of the token 312 is "ARU”, and the part of speech is a verb.
- FIG. 3D shows an example in which the sentence 300 is graphed.
- the token 321 and the token 322 whose part of speech is a noun are used for the labels of the node 323 and the node 324, and the token 310 whose part of speech is a noun is used for the label 325 of the edge.
- the edge label 325 may represent at least one of information on the distance between nodes, information on node relevance, and the like, instead of or in addition to the token.
- the arrows shown in FIG. 3D are shown pointing from node 323 to node 324. That is, the start point of the arrow is a token that appears first in the sentence 300, and the end point of the arrow is a token that appears later.
- the method of determining the direction of the arrow is not limited to this, and the above-mentioned example can be referred to. Therefore, in some cases, the start point of the arrow may be the node 324 and the end point of the arrow may be the node 323. However, it is desirable to unify the method of determining the direction of the arrow in the graph.
- the structure of the entire document can be represented by one graph.
- one or both of the node 323 and the node 324 may be further connected to a phrase present in the other sentence via an edge.
- a part of the document may be represented by one graph.
- You may also create a graph for each chapter of the document. That is, a plurality of graphs may be created from one document.
- FIG. 4A shows the sentence 330 "A semiconductor device device composing: an oxide semiconductor device layer over an insulator layer.”
- step S2 it is preferable to perform a document cleaning process.
- the cleaning process removes noise contained in the document.
- the cleaning process includes removing a semicolon, replacing a colon with a comma, and so on.
- the accuracy of morphological analysis can be improved.
- the semicolon can be deleted and the sentence 330a can be obtained as shown in FIG. 4B.
- the sentence 330a is divided into a plurality of tokens by performing a morphological analysis of the sentence 330a.
- the part of speech of the token is not shown in FIG. 4C, the part of speech of each token can be determined by morphological analysis.
- sentence 330a is divided into 12 tokens from token 331 to token 342.
- the character string of the token 331 shown in FIG. 4C is "A”
- the character string of the token 332 is “semiconductor”
- the character string of the token 333 is "device”.
- the part of speech of token 331 is an indefinite article
- the part of speech of token 332 and token 333 are all nouns. Therefore, as shown in FIG. 4D, they are combined into one token 351.
- the character string of the token 351 is "A semiconductor device”
- the part of speech is a noun (compound noun).
- the character string of the token 335 shown in FIG. 4C is "an"
- the character string of the token 336 is "oxide”
- the character string of the token 337 is “semiconductor”
- the character string of the token 338 is. Is a "layer”.
- the part of speech of token 335 is an indefinite article
- the part of speech of tokens 336 to 338 is a noun. Therefore, as shown in FIG. 4D, they are combined into one token 352.
- the character string of the token 352 is "an oxide semiconductor layer”
- the part of speech is a noun (compound noun).
- the character string of the token 340 shown in FIG. 4C is "an"
- the character string of the token 341 is “insulator”
- the character string of the token 342 is "layer”.
- the part of speech of token 340 is an indefinite article
- the part of speech of tokens 341 and 342 is a noun. Therefore, as shown in FIG. 4D, they are combined into one token 353.
- the character string of the token 353 is "aninsulator layer", and the part of speech is a noun (compound noun).
- step S2 the sentence 330 is graphed.
- FIG. 4E shows an example of graphing sentence 330.
- the tokens 351 to 353 whose part words are nouns are used for the labels of the nodes 354 to 356, the token 334 is used for the label 357 of the edge between the node 354 and the node 355, and the token 339 is used for the node 355 and the node.
- An example used for label 358 of the edge between 356 is shown.
- One of the arrows shown in FIG. 4E is shown pointing from node 354 to node 355, and the other arrow is shown pointing from node 355 to node 356. That is, the start point of the arrow is a token that appears first in the sentence 330, and the end point of the arrow is a token that appears later.
- the process from the document to the creation of the graph has been described by taking sentences in Japanese and sentences in English as examples, but the process is particularly limited to the language of the document. There is no. For example, even in a document in which a language such as Chinese, Korean, German, French, Russian, or Hindi is used, a graph can be created from the document by going through the same process.
- step S3 a plurality of words and phrases included in the graph are output.
- the output method is not particularly limited, and for example, a list of words and phrases can be displayed as a list. Further, the graph itself created in step S2 may be displayed. Also, both a graph and a list may be displayed.
- step S4 a plurality of designated words and phrases are accepted.
- the user selects a plurality of designated words / phrases from the plurality of words / phrases output in step S3.
- Table 1 shows an example in which a plurality of words / phrases are displayed as a list in step S3 and the user specifies the words / phrases in step S4. As shown in Table 1, in the following, a case where two of "layer A" and "layer B" are selected as a plurality of designated words and phrases will be described as an example.
- step S5 the graph is searched using the plurality of designated words and phrases received in step S4.
- step S5 the shortest path between each of the plurality of designated words in the graph can be calculated.
- FIG. 5A shows an example in which only the parts related to “layer A” and “layer B” are extracted from the graph created in step S2.
- the graph shown in FIG. 5A has nodes 151 to 156.
- “Layer A” is the label of the node 151
- “layer B” is the label of the node 152.
- a node 153 having "layer C” as a label, a node 154 having "word D” as a label, a node 155 having "word E” as a label, and a node 156 having "word F” as a label are nodes. It is included in the route connecting 151 and node 152.
- FIGS. 5 to 7 the nodes to which the designated phrase is attached as a label are shown by hatching with diagonal lines.
- the route with the smallest number of included nodes can be said to be the shortest route. That is, in the graph shown in FIG. 5A, the shortest path connecting the node 151 and the node 152 is a route via the node 153 having "layer C" as a label (the route shown by a thick line in FIG. 5A). In this way, the shortest path between each of the plurality of designated words is calculated.
- step S6 In step S6, the result of searching the graph in step S5 is output.
- FIG. 5B The shortest path connecting the node 151 and the node 152 in FIG. 5A is shown in FIG. 5B.
- the relationship between “layer A” and “layer B” can be presented.
- “layer C” is included in the information that the user wants to grasp. It is possible to show the user that it may be strongly related.
- At least one of the edge labels, orientations, colors, and thicknesses can be used to further present information about the plurality of designations.
- the undirected graph shown in FIG. 5B is shown as a directed graph. Further, a label 159 is given to the edge between the node 151 and the node 153, and a label 160 is given to the edge between the node 153 and the node 152.
- layer A is a hypernym of "layer C”.
- a specific example of “layer A” is a “semiconductor layer”
- a specific example of “layer C” is an "oxide semiconductor layer”.
- the edge information can be used to present to the user information about the designated phrase shown to the node.
- the graphs displayed in step S6 are not limited to one.
- the length of the edge and the position of the node associated therewith can be displayed differently, and are not particularly limited.
- FIG. 6 shows an example of the output content.
- FIG. 6 shows an example in which three designated words, “layer A”, “layer B”, and “device G” are selected.
- Graph 510 shown in FIG. 6 has nodes 151 to 153, node 157, and node 158.
- Layer A is the label of node 151
- layer B is the label of node 152
- device G is the label of node 157.
- the node 153 having "layer C” as a label and the node 158 having "word H” as a label are included in the graph 510.
- Graph 510 shows the shortest path between each of the plurality of designated terms. It can be seen that the shortest path of "layer A” and “layer B” is a route connected via the complementary phrase “layer C”. It can be seen that the route directly connected to "layer A” and “device G” is the shortest route. It can be seen that the shortest path for "device G” and “layer B” is the route connected via the complementary phrase "word H”.
- the extracted sentence 520 shown in FIG. 6 is the result of extracting a sentence from the document based on each shortest path.
- the graph 510 is created by directly connecting tokens contained in the same sentence or the same paragraph will be described as an example.
- the sentence extracted as the extracted sentence 520 contains information such as a figure, a table, a mathematical formula, or a chemical formula
- FIG. 7 shows an output example of a graph different from that of FIG.
- FIG. 7 shows an example in which five designated words, “layer A”, “layer B”, “layer C”, “layer D”, and “layer E” are selected.
- the graph shown in FIG. 7 has nodes 161 to 167.
- Layer A is the label of node 161
- layer B is the label of node 162
- layer C is the label of node 163
- layer D is the label of node 164
- “E” is node 165.
- a node 166 having "word X” as a label and a node 167 having "word Y” as a label are included in the graph.
- FIG. 7 shows the shortest path between each designated phrase.
- the directly connected route is the shortest route for “layer A” and “layer B”.
- the route directly connected to "layer A” and “layer C” is the shortest route.
- the shortest path for "layer A” and “layer E” is a route connected via the complementary phrase "word Y”.
- layer B and “layer E” are a route connected via the designated phrase “layer C” and the complementary phrase “word Y”, and the complementary phrases “word X” and “word”, respectively. It can be seen that there are two shortest paths, one connected via "Y”. In this case, two shortest paths can be shown and sentences can be extracted based on each.
- the graph can be created and searched in the same manner as described above, and the search result can be output.
- the user can easily confirm the relevance and differences of a plurality of documents by comparing the output results.
- the similarity of a plurality of documents may be evaluated and presented to the user by vectorizing a graph showing the shortest path between the designated words as a search result and calculating the similarity of each vector.
- the reading comprehension support system of the present embodiment it is possible to present a graph showing the relevance of a plurality of designated words and phrases of the document designated by the user, and to support the reading comprehension of the document to the user.
- the user can efficiently read the document. This allows the user to quickly find the required information in the document.
- FIG. 8 shows a block diagram of the reading comprehension support system 210.
- the reading comprehension support system 210 includes a server 220 and a terminal 230 (personal computer or the like).
- a server 220 for the same components as the reading comprehension support system 100 shown in FIG. 1, the description of the ⁇ reading comprehension support system 1> of the first embodiment can also be referred to.
- the server 220 has a communication unit 171a, a transmission line 172, a storage unit 120, and a processing unit 130. Although not shown in FIG. 8, the server 220 may further include at least one such as a reception unit, a database, an output unit, and an input unit.
- the terminal 230 has a communication unit 171b, a transmission line 174, an input unit 115, a storage unit 125, a processing unit 135, and a display unit 145.
- Examples of the terminal 230 include a tablet-type personal computer, a notebook-type personal computer, and various portable information terminals. Further, the terminal 230 may be a desktop personal computer having no display unit 145, and the terminal 230 may be connected to a monitor or the like functioning as the display unit 145.
- the user of the reading comprehension support system 210 inputs information about the designated document to the server 220 from the input unit 115 of the terminal 230.
- the information is transmitted from the communication unit 171b to the communication unit 171a.
- the text data of the designated document is transmitted from the communication unit 171b to the communication unit 171a.
- at least one kind of image data of drawings, chemical formulas, mathematical formulas, and tables may be transmitted.
- information for specifying a document is transmitted from the communication unit 171b to the communication unit 171a.
- the information received by the communication unit 171a is stored in the memory or the storage unit 120 of the processing unit 130 via the transmission line 172. Further, information may be supplied from the communication unit 171a to the processing unit 130 via the reception unit (see the reception unit 110 shown in FIG. 1).
- the various processes described in ⁇ Reading Comprehension Support Method> of the first embodiment are performed by the processing unit 130. Since these processes are required to have high processing capacity, it is preferable to perform these processes in the processing unit 130 of the server 220. It is preferable that the processing unit 130 has a higher processing capacity than the processing unit 135.
- the processing result of the processing unit 130 is stored in the memory or the storage unit 120 of the processing unit 130 via the transmission line 172. After that, the processing result is output from the server 220 to the display unit 145 of the terminal 230.
- the processing result is transmitted from the communication unit 171a to the communication unit 171b. Further, various data included in the database may be transmitted from the communication unit 171a to the communication unit 171b based on the processing result of the processing unit 130. Further, the processing result may be supplied from the processing unit 130 to the communication unit 171a via the output unit (output unit 140 shown in FIG. 1).
- Communication unit 171a and communication unit 171b Data can be transmitted and received between the server 220 and the terminal 230 by using the communication unit 171a and the communication unit 171b.
- a hub, a router, a modem, or the like can be used as the communication unit 171a and the communication unit 171b.
- Wired or wireless for example, radio waves, infrared rays, etc. may be used for transmitting and receiving data.
- the transmission line 172 and the transmission line 174 have a function of transmitting data. Data can be transmitted and received between the communication unit 171a, the storage unit 120, and the processing unit 130 via the transmission line 172. Data can be transmitted / received between the communication unit 171b, the input unit 115, the storage unit 125, the processing unit 135, and the output unit 140 via the transmission line 174.
- the input unit 115 can be used when the user specifies a document and a phrase.
- the input unit 115 can have a function of operating the terminal 230, and specific examples thereof include a mouse, a keyboard, a touch panel, a microphone, a scanner, and a camera.
- the reading comprehension support system 210 may have a function of converting voice data into text data.
- at least one of the processing unit 130 and the processing unit 135 may have the function.
- the reading comprehension support system 210 may have an optical character recognition (OCR) function. This makes it possible to recognize characters included in the image data and create text data.
- OCR optical character recognition
- at least one of the processing unit 130 and the processing unit 135 may have the function.
- the storage unit 125 may store one or both of the data related to the designated document and the data supplied from the server 220. Further, the storage unit 125 may have at least a part of the data that the storage unit 120 can have.
- the processing unit 135 has a function of performing an operation or the like using data supplied from the communication unit 171b, the storage unit 125, the input unit 115, and the like.
- the processing unit 135 may have a function of executing at least a part of the processing that can be performed by the processing unit 130.
- the processing unit 130 and the processing unit 135 can each have one or both of a transistor (OS transistor) having a metal oxide in the channel forming region and a transistor (Si transistor) having silicon in the channel forming region.
- a transistor OS transistor
- Si transistor silicon
- a transistor using an oxide semiconductor or a metal oxide in the channel forming region is referred to as an Oxide Semiconductor transistor or an OS transistor.
- the channel forming region of the OS transistor preferably has a metal oxide.
- a metal oxide is a metal oxide in a broad sense. Metal oxides are classified into oxide insulators, oxide conductors (including transparent oxide conductors), oxide semiconductors (also referred to as Oxide Semiconductor or simply OS) and the like. For example, when a metal oxide is used for the semiconductor layer of a transistor, the metal oxide may be referred to as an oxide semiconductor. That is, when the metal oxide has at least one of an amplification action, a rectifying action, and a switching action, the metal oxide can be referred to as a metal oxide semiconductor, or OS for short.
- the metal oxide contained in the channel forming region preferably contains indium (In).
- the metal oxide contained in the channel forming region is a metal oxide containing indium, the carrier mobility (electron mobility) of the OS transistor becomes high.
- the metal oxide contained in the channel forming region is preferably an oxide semiconductor containing the element M.
- the element M is preferably at least one of aluminum (Al), gallium (Ga) and tin (Sn).
- Other elements applicable to the element M include boron (B), silicon (Si), titanium (Ti), iron (Fe), nickel (Ni), germanium (Ge), yttrium (Y), and zirconium (Zr).
- the element M is, for example, an element having a high binding energy with oxygen.
- the metal oxide contained in the channel forming region is preferably a metal oxide containing zinc (Zn). Metal oxides containing zinc may be more likely to crystallize.
- the metal oxide contained in the channel forming region is not limited to the metal oxide containing indium.
- the semiconductor layer may be, for example, a metal oxide containing zinc, a metal oxide containing zinc, a metal oxide containing tin, or the like, such as zinc tin oxide or gallium tin oxide.
- the processing unit 130 preferably has an OS transistor. Since the off-current of the OS transistor is extremely small, the data retention period can be secured for a long period of time by using the OS transistor as a switch for holding the electric charge (data) flowing into the capacitive element that functions as a storage element. .. By using this characteristic for at least one of the register and the cache memory of the processing unit 130, the processing unit 130 is operated only when necessary, and in other cases, the information of the immediately preceding processing is saved in the storage element. This makes it possible to turn off the processing unit 130. That is, normally off-computing becomes possible, and the power consumption of the reading comprehension support system can be reduced.
- the display unit 145 has a function of displaying the output result.
- Examples of the display unit 145 include a liquid crystal display device, a light emitting display device, and the like.
- Examples of the light emitting element that can be used in the light emitting display device include an LED (Light Emitting Diode), an OLED (Organic LED), a QLED (Quantum-dot LED), and a semiconductor laser.
- the display unit 145 is a display device using a shutter type or optical interference type MEMS (Micro Electro Electro Mechanical Systems) element, a microcapsule method, an electrophoresis method, an electrowetting method, or an electronic powder fluid (registered trademark). It is also possible to use a display device or the like using a display element to which a method or the like is applied.
- MEMS Micro Electro Electro Mechanical Systems
- FIG. 9 shows an image diagram of the reading comprehension support system of the present embodiment.
- the reading comprehension support system shown in FIG. 9 has a server 5100 and a terminal (which can also be said to be an electronic device). Communication between the server 5100 and each terminal can be performed via the Internet line 5110.
- the server 5100 can perform an operation using the data input from the terminal via the Internet line 5110.
- the server 5100 can transmit the result of the calculation to the terminal via the Internet line 5110. This makes it possible to reduce the burden of calculation on the terminal.
- FIG. 9 shows an information terminal 5300, an information terminal 5400, and an information terminal 5500 as terminals.
- the information terminal 5300 is an example of a mobile information terminal such as a smartphone.
- the information terminal 5400 is an example of a tablet terminal. Further, the information terminal 5400 can also be used as a notebook type information terminal by connecting to a housing 5450 having a keyboard.
- the information terminal 5500 is an example of a desktop type information terminal.
- the user can access the server 5100 from the information terminal 5300, the information terminal 5400, the information terminal 5500, and the like. Then, the user can receive the service provided by the administrator of the server 5100 by the communication via the Internet line 5110. Examples of the service include a service using the reading comprehension support method according to one aspect of the present invention. In the service, artificial intelligence may be used on the server 5100.
- 100 Reading support system, 110: Reception unit, 115: Input unit, 120: Storage unit, 125: Storage unit, 130: Processing unit, 135: Processing unit, 140: Output unit, 145: Display unit, 150: Transmission path , 151: node, 152: node, 153: node, 154: node, 155: node, 156: node, 157: node, 158: node, 159: label, 160: label, 161: node, 162: node, 163.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Business, Economics & Management (AREA)
- General Business, Economics & Management (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
図2は、読解支援方法の一例を示す図である。
図3A乃至図3Dは、読解支援方法の一例を示す図である。
図4A乃至図4Eは、読解支援方法の一例を示す図である。
図5A乃至図5Cは、グラフの一例を示す図である。
図6は、出力内容の一例を示す図である。
図7は、グラフの一例を示す図である。
図8は、読解支援システムの一例を示す図である。
図9は、読解支援システムの一例を示す図である。
本実施の形態では、本発明の一態様の読解支援システム及び読解支援方法について図1乃至図7を用いて説明する。
図1に、読解支援システム100のブロック図を示す。読解支援システム100は、受付部110、記憶部120、処理部130、出力部140、及び、伝送路150を有する。
受付部110は、指定文書を受け付ける。また、受付部は、指定語句を受け付ける。受付部110に供給されたデータは、伝送路150を介して、記憶部120及び処理部130の一方または双方に供給される。
記憶部120は、処理部130が実行するプログラムを記憶する機能を有する。また、記憶部120は、処理部130が生成したグラフを記憶する機能を有することが好ましい。グラフは、どの文書から作成されたかがわかるよう、文書と紐付けされていることが望ましい。また、記憶部120は、処理部130が生成した演算結果及び推論結果、並びに、受付部110に入力されたデータなどを記憶する機能を有していてもよい。
処理部130は、受付部110及び記憶部120の一方または双方から供給されたデータを用いて、演算及び推論などの処理を行う機能を有する。また、処理部130は、データベースに含まれる各種データを用いて処理を行う機能を有する。処理部130は、演算結果及び推論結果などの処理結果を、記憶部120及び出力部140の一方または双方に供給することができる。
出力部140は、処理部130の処理結果に基づいて情報を出力する。例えば、処理部130における演算結果及び推論結果の一方または双方を、読解支援システム100の外部に供給することができる。また、出力部140は、処理部130の処理結果に基づいて、データベースに含まれる各種データを出力することができる。出力部140は、ユーザが用いるディスプレイ、スピーカ等に情報を出力することができる。
伝送路150は、データを伝達する機能を有する。受付部110、記憶部120、処理部130、及び、出力部140の間のデータの送受信は、伝送路150を介して行うことができる。
本発明の一態様の読解支援方法は、図2に示すステップS1からステップS6までの処理を有する。
ステップS1では、指定文書を受け付ける。指定文書は、例えば、ユーザが読解したい文書である。指定文書は、単数であっても複数であってもよい。
ステップS2では、指定文書に含まれる語句を用いて指定文書の構造を表すグラフを作成する。複数の指定文書が指定された場合、指定文書ごとに、グラフを作成する。また、一つの指定文書に対して、一つ以上のグラフを作成することができる。
ステップS3では、グラフに含まれる複数の語句を出力する。
ステップS4では、複数の指定語句を受け付ける。
ステップS5では、ステップS4で受け付けた複数の指定語句を用いて、グラフを探索する。
ステップS6では、ステップS5にてグラフを探索した結果を出力する。
本実施の形態では、本発明の一態様の読解支援システムについて図8及び図9を用いて説明する。
図8に、読解支援システム210のブロック図を示す。読解支援システム210は、サーバ220と、端末230(パーソナルコンピュータなど)と、を有する。なお、図1に示す読解支援システム100と同じ構成要素については、実施の形態1の<読解支援システム1>の説明も参照できる。
通信部171a及び通信部171bを用いて、サーバ220と端末230との間で、データの送受信を行うことができる。通信部171a及び通信部171bとしては、ハブ、ルータ、モデムなどを用いることができる。データの送受信には、有線を用いても無線(例えば、電波、赤外線など)を用いてもよい。
伝送路172及び伝送路174は、データを伝達する機能を有する。通信部171a、記憶部120、及び、処理部130の間のデータの送受信は、伝送路172を介して行うことができる。通信部171b、入力部115、記憶部125、処理部135、及び、出力部140の間のデータの送受信は、伝送路174を介して行うことができる。
入力部115は、ユーザが文書及び語句を指定する際に用いることができる。例えば、入力部115は端末230を操作する機能を有することができ、具体的には、マウス、キーボード、タッチパネル、マイク、スキャナ、カメラ等が挙げられる。
記憶部125は、指定文書に関するデータ、及び、サーバ220から供給されたデータの一方または双方を記憶してもよい。また、記憶部120が有することができるデータの少なくとも一部を、記憶部125が有していてもよい。
処理部135は、通信部171b、記憶部125、及び入力部115などから供給されたデータを用いて、演算などを行う機能を有する。処理部135は、処理部130で行うことができる処理の少なくとも一部を実行する機能を有していてもよい。
表示部145は、出力結果を表示する機能を有する。表示部145としては、液晶表示装置、発光表示装置などが挙げられる。発光表示装置に用いることができる発光素子としては、LED(Light Emitting Diode)、OLED(Organic LED)、QLED(Quantum−dot LED)、及び、半導体レーザなどが挙げられる。また、表示部145には、シャッター方式または光干渉方式のMEMS(Micro Electro Mechanical Systems)素子を用いた表示装置、マイクロカプセル方式、電気泳動方式、エレクトロウェッティング方式、または電子粉流体(登録商標)方式等を適用した表示素子を用いた表示装置などを用いることもできる。
Claims (19)
- 受付部、処理部、及び、出力部を有し、
前記受付部は、指定文書を受け付ける機能と、複数の指定語句を受け付ける機能と、を有し、
前記処理部は、前記指定文書に含まれる語句を用いて前記指定文書の構造を表す第1のグラフを作成する機能と、前記複数の指定語句を用いて前記第1のグラフを探索する機能と、を有し、
前記出力部は、前記第1のグラフに含まれる複数の語句を出力する機能と、前記第1のグラフの探索結果を出力する機能と、を有し、
前記複数の指定語句は、前記第1のグラフに含まれる前記複数の語句の少なくとも一部である、読解支援システム。 - 請求項1において、
前記出力部は、前記探索結果として、少なくとも、前記第1のグラフにおける前記複数の指定語句のいずれか二つの間の最短経路を示す第2のグラフを出力する、読解支援システム。 - 請求項2において、
前記出力部は、前記指定文書中の、前記複数の指定語句を二つ以上含む段落における、前記指定語句を含む文を出力する機能を有する、読解支援システム。 - 請求項2または3において、
前記最短経路は、前記複数の指定語句のいずれか二つを、少なくとも一つの補完語句を介して結ぶ経路であり、
前記補完語句は、前記複数の指定語句とは異なる語句である、読解支援システム。 - 請求項4において、
前記出力部は、前記指定文書中の、前記複数の指定語句の少なくとも一つと、前記補完語句の少なくとも一つと、を含む段落における、前記指定語句及び前記補完語句の少なくとも一方を含む文を出力する機能を有する、読解支援システム。 - 請求項1において、
前記出力部は、前記探索結果として、少なくとも、前記第1のグラフにおける前記複数の指定語句のそれぞれの間の最短経路を示す第2のグラフを出力する、読解支援システム。 - 請求項6において、
前記出力部は、前記指定文書中の、前記複数の指定語句を二つ以上含む段落における、前記指定語句を含む文を出力する機能を有する、読解支援システム。 - 請求項6または7において、
前記複数の指定語句のいずれか二つを結ぶ前記最短経路は、二つの前記指定語句を、少なくとも一つの補完語句を介して結ぶ経路であり、
前記補完語句は、前記複数の指定語句とは異なる語句である、読解支援システム。 - 請求項8において、
前記出力部は、前記指定文書中の、前記複数の指定語句の少なくとも一つと、前記補完語句の少なくとも一つと、を含む段落における、前記指定語句及び前記補完語句の少なくとも一方を含む文を出力する機能を有する、読解支援システム。 - 請求項1乃至9のいずれか一において、
前記探索結果を記憶する記憶部を有する、読解支援システム。 - 指定文書を受け付け、
前記指定文書に含まれる語句を用いて前記指定文書の構造を表す第1のグラフを作成し、
前記第1のグラフに含まれる語句を二つ以上出力し、
前記出力した語句の中から複数の指定語句を受け付け、
前記複数の指定語句を用いて前記第1のグラフを探索し、探索結果を出力する、読解支援方法。 - 請求項11において、
前記探索結果として、少なくとも、前記第1のグラフにおける前記複数の指定語句のいずれか二つの間の最短経路を示す第2のグラフを出力する、読解支援方法。 - 請求項12において、
前記探索結果とともに、前記指定文書中の、前記複数の指定語句を二つ以上含む段落における、前記指定語句を含む文を出力する、読解支援方法。 - 請求項12または13において、
前記最短経路は、前記複数の指定語句のいずれか二つを、少なくとも一つの補完語句を介して結ぶ経路であり、
前記補完語句は、前記複数の指定語句とは異なる語句である、読解支援方法。 - 請求項14において、
前記探索結果とともに、前記指定文書中の、前記複数の指定語句の少なくとも一つと、前記補完語句の少なくとも一つと、を含む段落における、前記指定語句及び前記補完語句の少なくとも一方を含む文を出力する、読解支援方法。 - 請求項11において、
前記探索結果として、少なくとも、前記第1のグラフにおける前記複数の指定語句のそれぞれの間の最短経路を示す第2のグラフを出力する、読解支援方法。 - 請求項16において、
前記探索結果とともに、前記指定文書中の、前記複数の指定語句を二つ以上含む段落における、前記指定語句を含む文を出力する、読解支援方法。 - 請求項16または17において、
前記複数の指定語句のいずれか二つを結ぶ前記最短経路は、二つの前記指定語句を、少なくとも一つの補完語句を介して結ぶ経路であり、
前記補完語句は、前記複数の指定語句とは異なる語句である、読解支援方法。 - 請求項18において、
前記探索結果とともに、前記指定文書中の、前記複数の指定語句の少なくとも一つと、前記補完語句の少なくとも一つと、を含む段落における、前記指定語句及び前記補完語句の少なくとも一方を含む文を出力する、読解支援方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202180073009.3A CN116457773A (zh) | 2020-10-30 | 2021-10-15 | 阅读支援系统及阅读支援方法 |
US18/031,392 US20240012979A1 (en) | 2020-10-30 | 2021-10-15 | Reading comprehension support system and reading comprehension support method |
JP2022558370A JPWO2022090849A1 (ja) | 2020-10-30 | 2021-10-15 | |
KR1020237017434A KR20230091995A (ko) | 2020-10-30 | 2021-10-15 | 독해 지원 시스템 및 독해 지원 방법 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2020182488 | 2020-10-30 | ||
JP2020-182488 | 2020-10-30 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022090849A1 true WO2022090849A1 (ja) | 2022-05-05 |
Family
ID=81383374
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2021/059488 WO2022090849A1 (ja) | 2020-10-30 | 2021-10-15 | 読解支援システム及び読解支援方法 |
Country Status (5)
Country | Link |
---|---|
US (1) | US20240012979A1 (ja) |
JP (1) | JPWO2022090849A1 (ja) |
KR (1) | KR20230091995A (ja) |
CN (1) | CN116457773A (ja) |
WO (1) | WO2022090849A1 (ja) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH1115830A (ja) * | 1997-06-20 | 1999-01-22 | Fuji Xerox Co Ltd | 文短縮装置及び文短縮プログラムを記録した媒体 |
JP2004348555A (ja) * | 2003-05-23 | 2004-12-09 | Nippon Telegr & Teleph Corp <Ntt> | 文書分析方法及び装置及び文書分析プログラム及び文書分析プログラムを格納した記憶媒体 |
JP2017187898A (ja) * | 2016-04-04 | 2017-10-12 | 株式会社東芝 | 情報処理装置、情報処理方法およびプログラム |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11184837A (ja) * | 1997-12-11 | 1999-07-09 | Internatl Business Mach Corp <Ibm> | 最短経路探索システム |
AU2003201799A1 (en) * | 2002-01-16 | 2003-07-30 | Elucidon Ab | Information data retrieval, where the data is organized in terms, documents and document corpora |
US7774198B2 (en) * | 2006-10-06 | 2010-08-10 | Xerox Corporation | Navigation system for text |
US20090024385A1 (en) * | 2007-07-16 | 2009-01-22 | Semgine, Gmbh | Semantic parser |
US8676565B2 (en) * | 2010-03-26 | 2014-03-18 | Virtuoz Sa | Semantic clustering and conversational agents |
US8566273B2 (en) * | 2010-12-15 | 2013-10-22 | Siemens Aktiengesellschaft | Method, system, and computer program for information retrieval in semantic networks |
JP6232736B2 (ja) | 2013-05-08 | 2017-11-22 | 株式会社リコー | 文書読解支援装置、文書読解支援システム、文書読解支援方法およびプログラム |
RU2639655C1 (ru) * | 2016-09-22 | 2017-12-21 | Общество с ограниченной ответственностью "Аби Продакшн" | Система для создания документов на основе анализа текста на естественном языке |
US10936796B2 (en) * | 2019-05-01 | 2021-03-02 | International Business Machines Corporation | Enhanced text summarizer |
-
2021
- 2021-10-15 US US18/031,392 patent/US20240012979A1/en active Pending
- 2021-10-15 JP JP2022558370A patent/JPWO2022090849A1/ja active Pending
- 2021-10-15 CN CN202180073009.3A patent/CN116457773A/zh active Pending
- 2021-10-15 WO PCT/IB2021/059488 patent/WO2022090849A1/ja active Application Filing
- 2021-10-15 KR KR1020237017434A patent/KR20230091995A/ko unknown
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH1115830A (ja) * | 1997-06-20 | 1999-01-22 | Fuji Xerox Co Ltd | 文短縮装置及び文短縮プログラムを記録した媒体 |
JP2004348555A (ja) * | 2003-05-23 | 2004-12-09 | Nippon Telegr & Teleph Corp <Ntt> | 文書分析方法及び装置及び文書分析プログラム及び文書分析プログラムを格納した記憶媒体 |
JP2017187898A (ja) * | 2016-04-04 | 2017-10-12 | 株式会社東芝 | 情報処理装置、情報処理方法およびプログラム |
Also Published As
Publication number | Publication date |
---|---|
US20240012979A1 (en) | 2024-01-11 |
KR20230091995A (ko) | 2023-06-23 |
CN116457773A (zh) | 2023-07-18 |
JPWO2022090849A1 (ja) | 2022-05-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Pham et al. | End-to-end recurrent neural network models for vietnamese named entity recognition: Word-level vs. character-level | |
Helwe et al. | Arabic named entity recognition via deep co-learning | |
Duyen et al. | An empirical study on sentiment analysis for Vietnamese | |
Syed et al. | Lexicon based sentiment analysis of Urdu text using SentiUnits | |
Rezaeian et al. | Persian text classification using naive bayes algorithms and support vector machine algorithm | |
Biswas et al. | Scope of sentiment analysis on news articles regarding stock market and GDP in struggling economic condition | |
Ekbal et al. | Simultaneous feature and parameter selection using multiobjective optimization: application to named entity recognition | |
Sitender et al. | Sanskrit to universal networking language EnConverter system based on deep learning and context-free grammar | |
Uslu et al. | Computing Classifier-based Embeddings with the Help of text2ddc | |
Barteld et al. | Token-based spelling variant detection in Middle Low German texts | |
Alsayadi et al. | Integrating semantic features for enhancing arabic named entity recognition | |
Foroozan Yazdani et al. | NgramPOS: a bigram-based linguistic and statistical feature process model for unstructured text classification | |
Mahmoud et al. | Hybrid Attention-based Approach for Arabic Paraphrase Detection | |
WO2022090849A1 (ja) | 読解支援システム及び読解支援方法 | |
Ciaramita et al. | Dependency parsing with second-order feature maps and annotated semantic information | |
Pertsas et al. | Ontology-driven information extraction from research publications | |
Borin et al. | Language technology for digital linguistics: Turning the linguistic survey of India into a rich source of linguistic information | |
WO2022074505A1 (ja) | 情報検索システム、及び、情報検索方法 | |
WO2024084365A1 (ja) | 文書検索方法、文書検索システム | |
Bölücü et al. | Joint PoS tagging and stemming for agglutinative languages | |
WO2021140406A1 (ja) | 文書検索システム、文書を検索する方法 | |
Wang et al. | Learning word hierarchical representations with neural networks for document modeling | |
Patil et al. | Exploring various emotion-shades for Marathi Sentiment Analysis | |
Pande et al. | Named Entity Recognition for Nepali Using BERT Based Models | |
WO2024110824A1 (ja) | 文書検索支援方法、プログラム、文書検索支援システム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21885452 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2022558370 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202180073009.3 Country of ref document: CN |
|
ENP | Entry into the national phase |
Ref document number: 20237017434 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21885452 Country of ref document: EP Kind code of ref document: A1 |