WO2018221119A1 - Dispositif de stockage d'informations de document de recherche - Google Patents

Dispositif de stockage d'informations de document de recherche Download PDF

Info

Publication number
WO2018221119A1
WO2018221119A1 PCT/JP2018/017599 JP2018017599W WO2018221119A1 WO 2018221119 A1 WO2018221119 A1 WO 2018221119A1 JP 2018017599 W JP2018017599 W JP 2018017599W WO 2018221119 A1 WO2018221119 A1 WO 2018221119A1
Authority
WO
WIPO (PCT)
Prior art keywords
search term
search
keyword
extracted
term
Prior art date
Application number
PCT/JP2018/017599
Other languages
English (en)
Japanese (ja)
Inventor
潔 関根
Original Assignee
株式会社インタラクティブソリューションズ
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社インタラクティブソリューションズ filed Critical 株式会社インタラクティブソリューションズ
Priority to CN201880035902.5A priority Critical patent/CN110678858B/zh
Priority to CA3062842A priority patent/CA3062842C/fr
Priority to JP2019522051A priority patent/JP6646184B2/ja
Priority to US16/618,092 priority patent/US10824657B2/en
Publication of WO2018221119A1 publication Critical patent/WO2018221119A1/fr
Priority to US17/035,627 priority patent/US20210042339A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/93Document management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3322Query formulation using system suggestions
    • G06F16/3323Query formulation using system suggestions using document space presentation or visualization, e.g. category, hierarchy or range presentation and selection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3325Reformulation based on results of preceding query
    • G06F16/3326Reformulation based on results of preceding query using relevance feedback from the user, e.g. relevance feedback on documents, documents sets, document terms or passages
    • G06F16/3328Reformulation based on results of preceding query using relevance feedback from the user, e.g. relevance feedback on documents, documents sets, document terms or passages using graphical result space presentation or visualisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3349Reuse of stored results of previous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/0482Interaction with lists of selectable items, e.g. menus
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Definitions

  • the present invention relates to a retrieval material information storage device. More specifically, the present invention effectively proposes a search term related to each page so that each page of the presentation material can be searched effectively, and then searches for information related to each page and searches related to each page.
  • the present invention relates to a retrieval material information storage device capable of storing terms in association with each other.
  • Japanese Unexamined Patent Application Publication No. 2019-16355 discloses a search information management device, a search information management method, and a search information management program.
  • search terms for searching are often associated with various materials. Users can search for appropriate materials by using search terms.
  • the search terms attached to each material are not necessarily suitable for the search, so it is desirable to propose search terms suitable for the search so that the user's intention can be taken into consideration.
  • the present invention aims to provide a system that can appropriately propose search term candidates for each page of a document.
  • the present invention provides a search material information storage device capable of storing information relating to each page and search terms relating to each page in association with each other so that each page of the material can be effectively searched. For the purpose.
  • the present invention extracts the terms included in each page of the document as keywords, extracts the topic words related to the keywords, and displays the topic words with high evaluation, thereby displaying each page of the document. Based on the knowledge that candidate search terms suitable for
  • the present invention relates to a retrieval material information storage device.
  • This device is a computer processing device, which is term extraction means 3, keyword storage means 5, keyword extraction means 7, topic word storage means 9, topic word extraction means 11, and search term candidate extraction means 13.
  • Each means is a means by a computer, and is achieved by cooperation of hardware and software.
  • the term extraction means 3 is means for extracting terms in the material that are terms included in a certain page of the material.
  • the keyword storage means 5 is means for storing a term that becomes a keyword related to the term in the material.
  • the keyword extraction means 7 is a means for extracting a plurality of keywords that are related to the terms in the material from the keyword storage means 5 using the terms in the material extracted by the term extraction means 3.
  • the topic word storage means 9 is a means for storing the topic words related to the keyword.
  • the topic word extraction unit 11 is a unit for extracting a topic word related to the keyword from the topic word storage unit 9 using a plurality of keywords extracted by the keyword extraction unit 7.
  • the search term candidate extraction means 13 is a means for extracting search term candidates on a page of the material from the topic words extracted by the topic word extraction means 11 and a plurality of keywords extracted by the keyword extraction means 7.
  • the search term candidate display means 17 is a means for causing the display unit 15 to display the search term candidates extracted by the search term candidate extraction means 13.
  • the search term input unit 19 is a unit for receiving an input indicating that the search term candidate among the search term candidates displayed on the display unit 15 is a search term.
  • the material search information storage unit 21 is a unit for storing the search term input by the search term input unit 19 and information related to a page with the material in association with each other.
  • the above search information storage device is Furthermore, you may have the category word memory
  • the category word storage means 25 is a means for storing category words related to topics words.
  • the category word extraction means 27 is a means for extracting a category word related to the topic word from the category word storage means 25 using the topic word extracted by the topic word extraction means 11. Then, the search term candidate display means 17 of the search material information storage device further extracts the category word extracted by the category word extraction means 27 as one of the search term candidates.
  • the above search information storage device is The keyword storage means 5 stores a plurality of keywords and the score of each keyword in association with each other,
  • the keyword extraction means 7 may extract the score of each keyword together with a plurality of keywords.
  • the above search information storage device is The topic word storage means 9 stores the topic words and the scores of the respective topic words in association with each other,
  • the topic word extracting unit 11 selects a predetermined number (one or two or more) having a high score from a plurality of keywords extracted by the keyword extracting unit 7 as a topic word influential candidate, and stores a predetermined number (one) from the topic word storage unit 9.
  • Search term candidate extraction means 13 A predetermined number (one or two or more) having a high score among a plurality of keywords extracted by the keyword extraction means 7 is extracted as a search term candidate, and a topic word is extracted using the keyword score and the topic word score.
  • a predetermined number (one or two or more) of search terms may be extracted from the topic words extracted by the means 11.
  • the above search information storage device is Search term candidate display means 17
  • the display unit 15 searches for a predetermined number (one or two or more) of keywords extracted as search term candidates and a predetermined number (one or two or more) of topic words extracted as search term candidates.
  • a candidate for terms Of the plurality of keywords extracted by the keyword extraction means 7, those not extracted as search term candidates and the topic words extracted by the topic word extraction means 11 that are not extracted as search term candidates are used as search terms.
  • the search term input means 19 When an input indicating that a search term preliminary candidate is used as a search term is received, the preliminary search term candidate is set as a search term. What is displayed as a search term candidate may be used as a search term except for a case where an input indicating that it is not a search term is received.
  • Term extraction means 3 which extracts a term in the document, which is a term included in the page with the document, Keyword storage means 5 for storing a term that becomes a keyword related to the term in the document 5, Keyword extraction means 7 for extracting a plurality of keywords that are related to the terms in the material from the keyword storage means 5 using the terms in the material extracted by the term extraction means 3; Topics word storage means 9 for storing topics words related to the keywords, A topic word extraction unit 11 that extracts a topic word related to the keyword from the topic word storage unit 9 using a plurality of keywords extracted by the keyword extraction unit 7; A search term candidate extraction unit 13 for extracting a search term candidate on a page having a document from the topic word extracted by the topic word extraction unit 11 and the plurality of keywords extracted by the keyword extraction unit 7; Search term candidate display means 17 for displaying the search term candidates extracted by the search term candidate extraction means 13 on the display unit 15;
  • the search term input means 19 which receives input indicating that it is a search term among the search term candidates displayed on the
  • the present invention can provide a system that can appropriately propose search term candidates for each page of a document.
  • INDUSTRIAL APPLICABILITY The present invention can provide a search material information storage device that can store information relating to each page and search terms related to each page in association with each other so that each page of the material can be effectively searched. . *
  • FIG. 1 is a block diagram for explaining a search material information storage device according to the present invention.
  • FIG. 2 is a block diagram showing the basic configuration of the computer.
  • FIG. 3 is a conceptual diagram showing an example system of the present invention.
  • FIG. 4 is an example of a page with presentation material.
  • FIG. 5 is a conceptual diagram showing a storage example of the keyword storage means.
  • FIG. 6 is a conceptual diagram showing a storage example of the topic word storage means.
  • FIG. 7 is a conceptual diagram showing a storage example of the category word storage means.
  • FIG. 8 is a conceptual diagram showing extracted (category words), topics words, keywords, and terms in the material.
  • FIG. 9 is an example of a display screen.
  • FIG. 10 is a flowchart for explaining an example of use of the retrieval material information storage device of the present invention.
  • FIG. 11 is a conceptual diagram for explaining an example of use of the retrieval material information storage device of the present invention.
  • FIG. 1 is a block diagram for explaining a retrieval material information storage device of the present invention.
  • This device is a computer processing device.
  • the computer may be any one of a portable terminal, a desktop personal computer, and a server, or a combination of two or more. These are usually connected so that information can be exchanged over the Internet (intranet) or the like.
  • the functions may be shared by using a plurality of computers, such as giving some functions to one of the computers.
  • FIG. 2 is a block diagram showing the basic configuration of the computer.
  • the computer has an input unit 31, an output unit 33, a control unit 35, a calculation unit 37, and a storage unit 39, and each element is connected by a bus 41 or the like to exchange information.
  • the control unit may be stored in the storage unit, or various types of information may be stored.
  • the control unit reads a control program stored in the storage unit.
  • a control part reads the information memorize
  • the arithmetic unit performs arithmetic processing using the received various information and stores it in the storage unit.
  • the control unit reads out the calculation result stored in the storage unit and outputs it from the output unit. In this way, various processes are executed. Each means executes these various processes.
  • FIG. 3 is a conceptual diagram showing an example system of the present invention.
  • the system of the present invention (a system including the apparatus of the present invention) includes a portable terminal 45 connected to the Internet or intranet 43 and a server 47 connected to the Internet or intranet 43. It may be. Of course, a single computer or portable terminal may function as the apparatus of the present invention, or a plurality of servers may exist.
  • the retrieval material information storage device 1 has information (for example, an identification number and a page number of a presentation material) for reading out each page of the presentation material, and 1 related to the page so that the user can easily retrieve the desired information.
  • the search material information storage device 1 may include any terminal device and a storage unit (storage device) of a computer (or server).
  • the retrieval material information storage device may include a database and database management software.
  • the pages of the presentation material may be ranked or scored for each retrieval term. For example, consider a case where a plurality of pages are stored in association with a search term “diabetes”. In this case, the storage section also stores information such as the highest ranked page for the search term “diabetes”, the highest ranked page, and the search term “diabetes” for the highest score page and the next highest score page. May be.
  • the retrieval material information storage device 1 includes a term extraction means 3, a keyword storage means 5, a keyword extraction means 7, a topic word storage means 9, a topic word extraction means 11, and a search.
  • a term candidate extraction unit 13, a search term candidate display unit 17, a search term input unit 19, and a material search information storage unit 21 are included.
  • Each means is a means by a computer, and each process is achieved by cooperation of hardware and software.
  • the term extraction means 3 is a means for extracting terms in the material that are terms contained in a certain page of the material.
  • materials are so-called presentation materials.
  • the format of the presentation material is not particularly limited.
  • Examples of presentation software include Microsoft (registered trademark) PowerPoint (registered trademark), King Soft (registered trademark) King Soft Office (registered trademark), Apache (registered trademark) Open Office Impress (registered trademark), Keynote (registered trademark) ), Lotus Freelance (registered trademark), Illustrator (registered trademark), PDF (registered trademark) and Pretzie (registered trademark).
  • Examples of materials are materials created by any of these presentation software, for example.
  • the presentation software is software that can display the contents of each page on a display unit such as a screen.
  • Fig. 4 shows an example of a page with presentation materials.
  • the presentation material includes a plurality of texts input by the creator.
  • the user can visually recognize a plurality of characters.
  • the computer stores information such as text input by the user and input information related to the text (character size, character color, presence / absence of animation) together with the text.
  • a preferable example of the term extracting means 3 is to give a text evaluation (score) according to input information (text size, character color, presence / absence of animation) related to the text when extracting the text. . For example, the larger the character, the more often it indicates the content of the presentation material, so a higher score is given.
  • the term extraction means 3 stores an evaluation (score) on the effect related to the text, reads out the term as a text-related score when extracting the term, and calculates other scores when calculating a score to be described later. Evaluation may be performed by addition or multiplication.
  • the term extraction means 3 itself is known.
  • the presentation material has a plurality of text information.
  • the presentation material is stored in, for example, a server (or in a computer) storage unit.
  • the term extraction means 3 reads each page of the stored presentation material and reads the text included in each page. Then, the term extraction means 3 analyzes the part of speech of the read text.
  • a part of speech database exists in the storage unit, and various terms and parts of speech are stored.
  • scores as search terms for various terms may be stored together depending on the application.
  • the term extraction means 3 extracts terms (especially nouns) included in the text, and extracts one or more terms in the material using the frequency and the term scores stored in the storage unit. do it.
  • the term extraction means 3 extracts the terms A, B and C from a certain page, the term C appears twice, the terms A and B appear once, and the term A stored in the storage unit.
  • the term extracting means 3 may extract the terms C and B as terms in the material. Then, the terms in the extracted material (terms C and B) are stored in the storage unit in association with information about the page from which the page can be read. Then, the terms C and B can be read out together with the page.
  • Another example of the term extracting means 3 is to identify a portion where the largest font is used in a page of a presentation. A predetermined coefficient is given to the term in the material included in the portion where the largest font is used.
  • the coefficient (first coefficient: a 1 ) only needs to be stored in the storage unit.
  • the term extraction means 3 stores the first coefficient in the storage unit together with the term in the material included in the portion where the largest font is used. Further, the term extracting means 3 may store a coefficient (second coefficient: a 2 ) corresponding to the font size together with the term in the material in the storage unit.
  • the keyword storage means 5 is a means for storing a term that becomes a keyword related to the term in the material.
  • the keyword storage unit 5 may be realized by a storage unit and an element (for example, a control program) for reading information from the storage unit.
  • the keyword is a term for making it easy to search each page by using not only a term in a plurality of materials but also a related term as a search term when searching each page. As a result, the search terms stored in association with each page are reduced, and the search can be performed quickly.
  • the terms in the material may be keywords.
  • the keyword can be said to be the first conversion word related to the term in the material.
  • a keyword may be a term selected from a plurality of types of terms in a document and suitable for use in a search.
  • Terms in the material are terms included in the presentation. For this reason, the terms in the document may not necessarily match the search terms or may not be suitable as search terms.
  • the term ob gene or ob / ob mouse is included in the presentation. This is associated with obesity genes (and obesity, obesity experimental animals).
  • the keyword storage means 5 stores the obesity genes (and obesity and obesity experimental animals) that are the keywords in association with the ob genes and ob / ob mice that are the terms in the material.
  • the search terms stored in association with each page are unified terms. For this reason, when a search is performed, a related page can be read quickly.
  • FIG. 5 is a conceptual diagram showing a storage example of the keyword storage means.
  • the keyword storage means stores one or a plurality of keywords in association with each of the plurality of terms in the material, and calculates a score (this score is b 1 ) for each keyword. It is associated and remembered. It is preferable that this score is input in advance so as to be higher for a term suitable for a search.
  • the keyword extraction means 7 is a means for extracting a plurality of keywords that are related to the terms in the material from the keyword storage means 5 using the terms in the material extracted by the term extraction means 3.
  • the keyword storage means 5 stores terms that become keywords in association with the terms in the material. For this reason, the keyword extraction means 7 can read the term used as the keyword relevant to the term in a material from the keyword memory
  • multiple terms are extracted from a page. For this reason, a plurality of terms that are keywords for a certain page are usually extracted. Also, there are usually multiple terms that are keywords related to the terms in the material (scores may be assigned to each). For this reason, a plurality of terms that are keywords for a certain page are usually extracted.
  • the keyword extraction means 7 may evaluate the score of each keyword using the coefficient of the term in the material and the keyword score stored in the storage unit.
  • An example of the keyword score is a 1 ⁇ a 2 ⁇ b 1 .
  • the control unit reads out the control program and also reads out each coefficient and score stored in the storage unit. Thus, it is only necessary to cause the calculation unit to perform calculation for obtaining a 1 ⁇ a 2 ⁇ b 1 and to store the calculation result in the storage unit.
  • the storage unit stores the appearance frequency of the term in the material (this coefficient is a 21 ) and an addition coefficient (this coefficient is a 22 ) when a specific keyword is extracted from a plurality of types of material terms.
  • the keyword score may be obtained by storing a 1 ⁇ a 2 ⁇ a 21 ⁇ a 22 ⁇ b 1 and stored in the storage unit.
  • a strong coefficient may be given to the emphasis color included in a certain page. In this case, it has means for analyzing the color of the term from the page and a storage unit for storing the coefficient for each color, and if the coefficient for the color is read from the storage unit using the color of the analyzed term. Good.
  • coefficients and scores are stored for various elements, read out, and multiplied or added to obtain the scores. You can find superior candidates by memorizing and comparing word scores.
  • the topic word storage means 9 is a means for storing the topic words related to the keyword.
  • the topic word storage unit 9 may be realized by a storage unit and an element (for example, a control program) for reading information from the storage unit.
  • the topic word storage means may store the topic word “obesity” in association with keywords of obesity genes, obesity, and obesity experimental animals.
  • the topic word may be a term in which a plurality of keywords are further unified or a generalized term. By using topic words, the search can be performed more quickly. Examples of topic words are disease names, drug names, active ingredient names, and pharmaceutical company names. That is, the topic word can be said to be the second conversion word related to the term in the material.
  • the topic word may be a term in which a term suitable for use in a search is assigned to a plurality of types of keywords. Further, the topic language may relate to a message.
  • the topic word extraction unit 11 is a unit for extracting a topic word related to the keyword from the topic word storage unit 9 using a plurality of keywords extracted by the keyword extraction unit 7.
  • the topic word storage means 9 stores topic words related to the keyword. Therefore, the topic word extraction unit 11 extracts a topic word related to the keyword from the topic word storage unit 9 using the plurality of keywords extracted by the keyword extraction unit 7.
  • FIG. 6 is a conceptual diagram showing a storage example of the topic word storage means.
  • the topic word storage means stores one or a plurality of topic words in association with each of a plurality of keywords, and stores a score associated with each topic word. It is preferable that this score is input in advance so as to be higher for a term suitable for a search.
  • the search term candidate extraction means 13 is a means for extracting search term candidates on a page of the material from the topic words extracted by the topic word extraction means 11 and a plurality of keywords extracted by the keyword extraction means 7.
  • one or more topic words that are related to a certain page are stored in one or more storage units.
  • a plurality of keywords that are related to a certain page are stored. For example, if the control program performs control such that all the topic words are candidates for search terms and several keywords (for example, four in consideration of the size displayed on the display unit), the search terms are candidates.
  • the term candidate extraction unit 13 sets all the read topic words as search term candidates, and sets four of the keywords as search term candidates.
  • the keyword storage means 5 may store a plurality of keywords and the scores of the keywords in association with each other, and the keyword extraction means 7 may extract the scores of the keywords together with the keywords. .
  • a keyword with a high score is extracted as a search term candidate.
  • the topic word storage means 9 stores the topic words and the scores of the respective topic words in association with each other, and the topic word extraction means 11 has a predetermined number (1) having a high score among the plurality of keywords extracted by the keyword extraction means 7. Or two or more) may be used as topic word influential candidates, and topic words related to a predetermined number of topic word influential candidates may be extracted from the topic word storage means 9.
  • the above search information storage device is Furthermore, you may have the category word memory
  • the category word storage means 25 is a means for storing category words related to topics words.
  • the category word extraction means 27 is a means for extracting a category word related to the topic word from the category word storage means 25 using the topic word extracted by the topic word extraction means 11.
  • the category word can be said to be the third conversion word related to the term in the document.
  • the category word may be a term selected from a plurality of types of topic words suitable for use in category search. Examples of categorical words may indicate subjects that are considered interested in the material.
  • FIG. 7 is a conceptual diagram showing a storage example of the category word storage means.
  • the category word storage means stores one or more category words in association with each of a plurality of topic words, and stores a score in association with each category word. It is preferable that this score is input in advance so as to be higher for a term suitable for a search.
  • FIG. 8 is a conceptual diagram showing extracted (category words), topic words, keywords, and terms in the material.
  • the search term candidate extraction unit 13 may extract a predetermined number (one or two or more) having a high score among the plurality of keywords extracted by the keyword extraction unit 7 as search term candidates. Further, the search term candidate extraction unit 13 extracts a predetermined number (one or two or more) of search terms from the topic words extracted by the topic word extraction unit 11 using the keyword score and the topic word score. You may do.
  • the topic word storage means 9 stores the topic words and the scores of the respective topic words in association with each other.
  • the keyword storage means 5 stores a plurality of keywords and the score of each keyword in association with each other. A certain topic word has an original keyword. That is, topics words are read using keywords. Topics words are always associated with one or more keywords.
  • the search term candidate extraction unit 13 reads a score related to a topic word from the topic word storage unit 9 and also reads a score of each keyword from which the topic word is extracted from the keyword storage unit 5. Then, for example, when there are a plurality of keywords for a certain topic word, the search term candidate extraction means 13 causes the calculation unit to sum the score of each keyword and the topic word score and the keyword score (or the keyword sum) Multiply score). In this way, the score after aggregation relating to the topic words is obtained and stored in the storage unit.
  • the search term candidate extraction unit 13 reads the score after aggregation for a plurality of topic words, compares the score with a calculation unit, and extracts a predetermined number (one or more) of topic words. In this way, even when the number of topic words to be extracted is determined, the search term candidate extraction means 13 can extract a predetermined number of topic words.
  • the search term candidate display means 17 is a means for causing the display unit 15 to display the search term candidates extracted by the search term candidate extraction means 13.
  • Search term candidate display means 17 The display unit 15 searches for a predetermined number (one or two or more) of keywords extracted as search term candidates and a predetermined number (one or two or more) of topic words extracted as search term candidates.
  • a candidate for terms Of the plurality of keywords extracted by the keyword extraction means 7, those not extracted as search term candidates and the topic words extracted by the topic word extraction means 11 that are not extracted as search term candidates are used as search terms.
  • the search term input means 19 When an input indicating that a search term preliminary candidate is used as a search term is received, the preliminary search term candidate is set as a search term. What is displayed as a search term candidate may be used as a search term except for a case where an input indicating that it is not a search term is received.
  • the material search information storage means 21 is a means for storing the search term input by the search term input means 19 and information related to a page with the material in association with each other.
  • the apparatus of the present invention may further display content type candidates according to the type of presentation material, and store the content type in association with each page of the presentation (or the presentation itself).
  • the apparatus of the present invention reads a presentation format (Powerpoint (registered trademark), PDF (registered trademark), Word (registered trademark), etc.) stored in the storage unit.
  • the apparatus of the present invention reads text included in the read format.
  • the apparatus of the present invention includes a content analysis term database that stores content analysis terms.
  • the apparatus of the present invention analyzes the content type using terms stored in the term database for content analysis. For example, if the material is PDF (registered trademark) and the text “attached document” exists relatively first, “attached document” is extracted as a candidate for the content type of the material. Then, “attached document” is displayed as a content type on the display unit, and when an approval is input from the user, “attached document” is stored with respect to the content type in association with the material.
  • Fig. 9 shows an example of the display screen.
  • a page with presentation material is displayed in the upper half of the display screen.
  • search term candidates each search term candidate is displayed together with an icon (check box) that is adopted or not adopted.
  • the search term candidates are arranged in the order of category words, topics words, and keywords from the left. Terms in the document may also be displayed on the display unit.
  • the adoption check box is marked.
  • the device 1 that has received the input from the computer stores the page associated with the presentation in association with the approved search terms (and the score of each search term) in the storage unit.
  • the search term input means 19 is a means for receiving an input indicating that it is a search term among the search term candidates displayed on the display unit 15.
  • the input by the check box functions as the search term input means 19.
  • the user inputs to reject a search term candidate that is in an adopted state, for example, a mark is input to a check box that is not adopted.
  • the device 1 that has received the non-adopted input from the check box sets the instructed search term candidate to the non-adopted state.
  • the search term candidate is not adopted.
  • the apparatus 1 may store search term candidates that have been rejected as a search term related to the above page after lowering the score (for example, by halving the score).
  • the search term candidate extraction means 13 does not extract as search terms
  • the check boxes for not adopting are marked (or none of the check boxes are marked).
  • a mark is entered in the adoption check box.
  • the device 1 that has received the input of adoption from the check box adopts the designated search term candidate.
  • search term candidates are adopted. That is, the search term is stored in association with the page as a search term for a certain page. At this time, since the search term is selected by the user, the search term may be stored in a state where the score is added or multiplied.
  • FIG. 10 is a flowchart for explaining an example of use of the retrieval material information storage device of the present invention. That is, this figure is a diagram for explaining a retrieval material information storage method using the retrieval material information storage device.
  • S means a step (process).
  • the user's terminal or computer stores the presentation material in the storage unit (or the storage unit of the server).
  • the device 1 extracts a term in the material, which is a term included in the page, for each page of the presentation material (S102). At this time, the device 1 may give a score to the term in the material. For example, if terms in a document appear frequently, or if accompanied by bold, colored characters, animations, etc., register points in advance and assign a score to the terms in the document using the registered point information. Also good.
  • the device 1 has a dictionary of terms in the material, and the dictionary stores various terms in the material in association with the terms in the material and the score. The device 1 stores the terms in the material. The score may be read out.
  • the score of the term in the document may be obtained using the score of the term in the document existing in the dictionary and the score related to the added points (for example, addition or multiplication). In this case, if the number of terms in the document is set in advance, the one with the highest score may be used as the term in the document.
  • the apparatus 1 extracts a plurality of keywords that are related to the term in the material from the storage unit using the extracted one or more terms in the material (S103).
  • the storage unit records terms that are keywords related to the terms in the material. For this reason, the apparatus 1 can extract the keyword relevant to it from a memory
  • a score as a search term may be given to each keyword.
  • a score related to the high frequency of the keyword may be registered, the score corresponding to the number of times the keyword is duplicated may be read, and added or multiplied with the score. In this way, a plurality of keywords (and the score of each keyword) are obtained.
  • the apparatus 1 may extract a category word related to the topic word from the storage unit using the extracted topic word (S105). This step is an optional step.
  • the device 1 extracts a search term candidate for a page with a document from topics words and a plurality of keywords (and category words) (S106).
  • the device 1 stores in advance control commands for extracting search term candidates, and in accordance with the control commands, searches for candidate search terms on a page with a document from topics words and a plurality of keywords (and category words). Extract it.
  • An example of the control command is that four high scores among a plurality of keywords and two high topic words (and all category words) are extracted as search term candidates. In this way, search term candidates for pages with presentation materials are automatically extracted.
  • the storage unit may store extracted search term candidates as search terms for a certain page.
  • the apparatus 1 may display the extracted search term candidate on the display unit (S107).
  • the presentation target page (which is made smaller), topics words that are not candidates for the search terms, and a plurality of keywords (and category words) may be displayed together on the display unit. In this case, the user can select a search term.
  • the terminal receives an input related to the approval, and the search term candidates extracted by the device 1 are stored as they are in the storage unit as the search terms related to the page with the presentation material (S111). .
  • the search term that reflects these corrections Candidates are used as search terms related to pages in the storage unit (S121).
  • the terminal receives input related to the approval, and the corrected search term candidate is stored in the storage unit as a search term related to a page of the presentation material. (S122).
  • Term extraction means 3 which extracts a term in the document, which is a term included in the page with the document, Keyword storage means 5 for storing a term that becomes a keyword related to the term in the document 5, Keyword extraction means 7 for extracting a plurality of keywords that are related to the terms in the material from the keyword storage means 5 using the terms in the material extracted by the term extraction means 3; Topics word storage means 9 for storing topics words related to the keywords, A topic word extraction unit 11 that extracts a topic word related to the keyword from the topic word storage unit 9 using a plurality of keywords extracted by the keyword extraction unit 7; A search term candidate extraction unit 13 for extracting a search term candidate on a page having a document from the topic word extracted by the topic word extraction unit 11 and the plurality of keywords extracted by the keyword extraction unit 7; Search term candidate display means 17 for displaying the search term candidates extracted by the search term candidate extraction means 13 on the display unit 15;
  • the search term input means 19 which receives input indicating that it is a search term among the search term candidates displayed on the
  • FIG. 11 is a conceptual diagram (block diagram) for explaining an example of use of the retrieval material information storage device of the present invention.
  • the basic database (DB) includes a content DB, a customer DB, a log DB, and a DB that stores other information.
  • These databases are connected to an engine called an interactive pro framework through an interface.
  • This engine can exchange information with various terminals (for example, a PC tablet, a mobile terminal, and a mobile phone) via an application programming interface (API).
  • the engine can exchange information with control programs and applications in the client, HTML data, moving image data, power point data, PDF data, document data, and database management software.
  • This engine is synchronized with the server (cloud) so that information can be exchanged.
  • information can be exchanged with various databases and software including BI (business intelligence), CRM (customer relationship management), and DWH (data warehouse) via the server.
  • BI business intelligence
  • CRM customer relationship management
  • DWH data warehouse
  • the present invention can be used in the information providing industry.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention a pour objet de fournir un système capable de proposer de façon appropriée un candidat terme de recherche pour chaque page d'un document. À cet effet, l'invention porte sur un dispositif de stockage d'informations de document de recherche comprenant : un moyen d'extraction de vocabulaire (3); un moyen de stockage de mot-clé (5); un moyen d'extraction de mot-clé (7); un moyen de stockage de termes de sujet (9); un moyen d'extraction de terme de sujet (11); un moyen d'extraction de candidat de terme de recherche (13); un moyen d'affichage de candidat de terme de recherche (17); un moyen d'entrée de terme de recherche (19); et un moyen de stockage d'informations de recherche de document (21).
PCT/JP2018/017599 2017-06-01 2018-05-07 Dispositif de stockage d'informations de document de recherche WO2018221119A1 (fr)

Priority Applications (5)

Application Number Priority Date Filing Date Title
CN201880035902.5A CN110678858B (zh) 2017-06-01 2018-05-07 检索用资料信息存储装置
CA3062842A CA3062842C (fr) 2017-06-01 2018-05-07 Dispositif de stockage d'informations de document de recherche
JP2019522051A JP6646184B2 (ja) 2017-06-01 2018-05-07 検索用資料情報記憶装置
US16/618,092 US10824657B2 (en) 2017-06-01 2018-05-07 Search document information storage device
US17/035,627 US20210042339A1 (en) 2017-06-01 2020-09-28 Search Document Information Storage Device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2017109339 2017-06-01
JP2017-109339 2017-06-01

Related Child Applications (2)

Application Number Title Priority Date Filing Date
US16/618,092 A-371-Of-International US10824657B2 (en) 2017-06-01 2018-05-07 Search document information storage device
US17/035,627 Continuation US20210042339A1 (en) 2017-06-01 2020-09-28 Search Document Information Storage Device

Publications (1)

Publication Number Publication Date
WO2018221119A1 true WO2018221119A1 (fr) 2018-12-06

Family

ID=64455791

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/017599 WO2018221119A1 (fr) 2017-06-01 2018-05-07 Dispositif de stockage d'informations de document de recherche

Country Status (6)

Country Link
US (2) US10824657B2 (fr)
JP (4) JP6646184B2 (fr)
CN (2) CN110678858B (fr)
CA (1) CA3062842C (fr)
SG (1) SG10202111510VA (fr)
WO (1) WO2018221119A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110334178A (zh) * 2019-03-28 2019-10-15 平安科技(深圳)有限公司 数据检索方法、装置、设备及可读存储介质
WO2020153111A1 (fr) * 2019-01-25 2020-07-30 株式会社インタラクティブソリューションズ Système d'aide à la présentation
JP2021012700A (ja) * 2020-08-06 2021-02-04 株式会社インタラクティブソリューションズ プレゼンテーション支援システム

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6771251B1 (ja) 2020-04-24 2020-10-21 株式会社インタラクティブソリューションズ 音声解析システム
CN113449073B (zh) * 2021-06-21 2022-05-31 福州米鱼信息科技有限公司 一种关键词的选取方法及系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08305726A (ja) * 1995-04-28 1996-11-22 Fuji Xerox Co Ltd 情報検索装置
JPH08314947A (ja) * 1995-05-22 1996-11-29 Mainichi Shinbunsha:Kk キーワード自動抽出装置
JPH08314974A (ja) * 1995-05-22 1996-11-29 Mainichi Shinbunsha:Kk キーワード自動抽出装置および文書検索装置
JP2011242844A (ja) * 2010-05-14 2011-12-01 Ricoh Co Ltd キーワード抽出装置、キーワード抽出方法、キーワード抽出プログラムおよびキーワード抽出システム

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0634207B2 (ja) * 1987-07-24 1994-05-02 日本電気株式会社 話題予測装置
JPH03122768A (ja) * 1989-10-05 1991-05-24 Ricoh Co Ltd 索引付け支援装置
JP3122768B2 (ja) 1993-12-24 2001-01-09 キヤノン株式会社 静電荷像現像用現像剤
JP5223284B2 (ja) * 2006-11-10 2013-06-26 株式会社リコー 情報検索装置、方法およびプログラム
JP4342575B2 (ja) * 2007-06-25 2009-10-14 株式会社東芝 キーワード提示のための装置、方法、及びプログラム
US8010545B2 (en) * 2008-08-28 2011-08-30 Palo Alto Research Center Incorporated System and method for providing a topic-directed search
US8775918B2 (en) * 2008-10-07 2014-07-08 Visual Software Systems Ltd. System and method for automatic improvement of electronic presentations
US9047283B1 (en) * 2010-01-29 2015-06-02 Guangsheng Zhang Automated topic discovery in documents and content categorization
US9449080B1 (en) * 2010-05-18 2016-09-20 Guangsheng Zhang System, methods, and user interface for information searching, tagging, organization, and display
CN102024027B (zh) * 2010-11-17 2013-03-20 北京健康在线网络技术有限公司 一种医学数据库的建立方法
CN102087669B (zh) * 2011-03-11 2013-01-02 北京汇智卓成科技有限公司 基于语义关联的智能搜索引擎系统
CN102567464B (zh) * 2011-11-29 2015-08-05 西安交通大学 基于扩展主题图的知识资源组织方法
CN103198066A (zh) * 2012-01-06 2013-07-10 腾讯科技(深圳)有限公司 一种基于词表的信息搜索方法及搜索系统
CN103870461B (zh) * 2012-12-10 2019-09-10 腾讯科技(深圳)有限公司 主题推荐方法、装置和服务器
CN103544267B (zh) * 2013-10-16 2017-05-03 北京奇虎科技有限公司 一种基于搜索建议词进行搜索的方法以及装置
CN103699625B (zh) * 2013-12-20 2017-05-10 北京百度网讯科技有限公司 基于关键词进行检索的方法及装置
CN103886034B (zh) * 2014-03-05 2019-03-19 北京百度网讯科技有限公司 一种建立索引及匹配用户的查询输入信息的方法和设备
GB201405875D0 (en) * 2014-04-01 2014-05-14 Kainos Evolve Ltd Computer-implemented system and method for indexing electronic documents
CN104978347A (zh) * 2014-04-11 2015-10-14 中国中医科学院中医临床基础医学研究所 中文生物医学文献数据库中敏感关键词的数据挖掘方法和系统
JP6584756B2 (ja) * 2014-07-15 2019-10-02 Nttテクノクロス株式会社 関連トピック表示制御装置、関連トピック表示制御方法、及びプログラム
US20180007100A1 (en) * 2016-06-30 2018-01-04 Microsoft Technology Licensing, Llc Candidate participant recommendation
CN106776714A (zh) * 2016-11-21 2017-05-31 辽宁工程技术大学 检索方法、装置和系统
CN107369113A (zh) 2017-07-10 2017-11-21 胜俣和彦 试题出题及考试结果判定方法、系统及考试运营系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08305726A (ja) * 1995-04-28 1996-11-22 Fuji Xerox Co Ltd 情報検索装置
JPH08314947A (ja) * 1995-05-22 1996-11-29 Mainichi Shinbunsha:Kk キーワード自動抽出装置
JPH08314974A (ja) * 1995-05-22 1996-11-29 Mainichi Shinbunsha:Kk キーワード自動抽出装置および文書検索装置
JP2011242844A (ja) * 2010-05-14 2011-12-01 Ricoh Co Ltd キーワード抽出装置、キーワード抽出方法、キーワード抽出プログラムおよびキーワード抽出システム

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020153111A1 (fr) * 2019-01-25 2020-07-30 株式会社インタラクティブソリューションズ Système d'aide à la présentation
JP2020119399A (ja) * 2019-01-25 2020-08-06 株式会社インタラクティブソリューションズ プレゼンテーション支援システム
CN111902831A (zh) * 2019-01-25 2020-11-06 互动解决方案公司 演示支援系统
CN110334178A (zh) * 2019-03-28 2019-10-15 平安科技(深圳)有限公司 数据检索方法、装置、设备及可读存储介质
CN110334178B (zh) * 2019-03-28 2023-06-20 平安科技(深圳)有限公司 数据检索方法、装置、设备及可读存储介质
JP2021012700A (ja) * 2020-08-06 2021-02-04 株式会社インタラクティブソリューションズ プレゼンテーション支援システム

Also Published As

Publication number Publication date
JP2021073590A (ja) 2021-05-13
CN113407671A (zh) 2021-09-17
CA3062842C (fr) 2022-03-08
JP6646184B2 (ja) 2020-02-14
SG10202111510VA (en) 2021-12-30
JP7313069B2 (ja) 2023-07-24
JP6836294B2 (ja) 2021-02-24
JPWO2018221119A1 (ja) 2020-01-09
CN110678858B (zh) 2021-07-09
CA3062842A1 (fr) 2019-11-29
US10824657B2 (en) 2020-11-03
US20210042339A1 (en) 2021-02-11
JP2020119590A (ja) 2020-08-06
CN110678858A (zh) 2020-01-10
US20200125594A1 (en) 2020-04-23
JP2020074144A (ja) 2020-05-14
JP6691642B1 (ja) 2020-04-28

Similar Documents

Publication Publication Date Title
JP6783483B2 (ja) 表示装置
WO2018221119A1 (fr) Dispositif de stockage d'informations de document de recherche
US11403715B2 (en) Method and system for providing domain-specific and dynamic type ahead suggestions for search query terms
US11868411B1 (en) Techniques for compiling and presenting query results
US10157171B2 (en) Annotation assisting apparatus and computer program therefor
US11042591B2 (en) Analytical search engine
US20180181544A1 (en) Systems for Automatically Extracting Job Skills from an Electronic Document
Kehl et al. Natural language processing and futures studies
US11734517B1 (en) Systems and methods for measuring automatability of report generation using a natural language generation system
US20150186363A1 (en) Search-Powered Language Usage Checks
JP6710360B1 (ja) 登録済質問文判定方法、コンピュータプログラム及び情報処理装置
WO2014168961A1 (fr) Génération d'analyse de données à l'aide d'un modèle de domaine
Iwashokun et al. Structural vetting of academic proposals
Mealand Hellenistic Greek and the New Testament: A stylometric perspective
Alghazal Talent Acquisition Process Optimization Using Machine Learning in Resumes’ Ranking and Matching to Job Descriptions
JP4405187B2 (ja) 辞書評価プログラム及びシステム並びに方法
JP2020052593A (ja) 分析装置及び分析プログラム
JP2023177430A (ja) 情報処理装置、情報処理システム、情報処理方法、及びプログラム
JP2021002278A (ja) 情報処理装置、制御方法、及びプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18808926

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019522051

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18808926

Country of ref document: EP

Kind code of ref document: A1