CN109948073B - Content retrieval method, terminal, server, electronic device, and storage medium - Google Patents

Content retrieval method, terminal, server, electronic device, and storage medium Download PDF

Info

Publication number
CN109948073B
CN109948073B CN201710872842.XA CN201710872842A CN109948073B CN 109948073 B CN109948073 B CN 109948073B CN 201710872842 A CN201710872842 A CN 201710872842A CN 109948073 B CN109948073 B CN 109948073B
Authority
CN
China
Prior art keywords
content
page
retrieval
entities
entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710872842.XA
Other languages
Chinese (zh)
Other versions
CN109948073A (en
Inventor
金刚铭
叶骏
徐羽
范跃伟
胡博
李未
周疏影
王剑
钭伟雨
刘秀芳
吕雪
何枫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201710872842.XA priority Critical patent/CN109948073B/en
Priority to PCT/CN2018/107273 priority patent/WO2019057191A1/en
Publication of CN109948073A publication Critical patent/CN109948073A/en
Application granted granted Critical
Publication of CN109948073B publication Critical patent/CN109948073B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a content retrieval method, which comprises the following steps: receiving a page content retrieval triggering instruction; acquiring a page address of the page content according to the page content retrieval triggering instruction; generating a content entity knowledge graph corresponding to the page content based on the page address; and displaying the content entity knowledge graph so that the user can perform keyword content retrieval operation. The invention also provides a content retrieval terminal and a content retrieval server, and the content retrieval method, the terminal and the server of the invention generate the corresponding content entity knowledge graph through page content, and a user can perform content retrieval operation through keywords in the content entity knowledge graph, thereby expanding the application scene range of content retrieval and improving the retrieval efficiency of content retrieval.

Description

Content retrieval method, terminal, server, electronic device, and storage medium
Technical Field
The present invention relates to the field of data processing, and in particular, to a content retrieval method, a terminal, a server, an electronic device, and a storage medium.
Background
Along with the development of science and technology, people rely on the Internet more and more, and people can acquire various information through the Internet at any time. When a user wants to know a certain content, a keyword corresponding to the content can be input into a search engine, so that the search engine can provide a content entity introduction related to the keyword through a search engine result page, for example, the user can be helped to know the content in a knowledge graph mode.
However, in the above manner, the user is required to input the content keyword, if the user cannot input the keyword (such as inconvenient use of the input method) or the user does not know the keyword (such as the user wants to search the information of a certain actor in a certain movie, etc.), the search engine cannot provide a better content search service for the user; at this time, the user may abandon searching for the content or take more time to find the keyword of the content, so that the existing content searching method and the content searching device have a smaller application scenario range and lower content searching efficiency.
Disclosure of Invention
The embodiment of the invention provides a content retrieval method, a content retrieval device and a computer readable storage medium, which have a larger content retrieval application scene range and higher content retrieval efficiency; the method and the device solve the technical problems that the prior content retrieval method and the prior content retrieval device have smaller content retrieval application scene range and lower content retrieval efficiency.
The embodiment of the invention provides a content retrieval method, which comprises the following steps:
receiving a page content retrieval triggering instruction;
acquiring a page address of the page content according to the page content retrieval triggering instruction;
Generating a content entity knowledge graph corresponding to the page content based on the page address; and
and displaying the content entity knowledge graph so that the user can perform keyword content retrieval operation.
The embodiment of the invention also provides a content retrieval method, which comprises the following steps:
receiving a page address of the page content from the retrieval terminal;
extracting page contents according to the page address;
extracting content entities from the page content by using a page crawler;
creating a content entity knowledge graph according to the extracted content entities and the relevance among the content entities; and
and sending the content entity knowledge graph to the retrieval terminal for display so that a user can perform keyword content retrieval operation.
The embodiment of the invention also provides a content retrieval terminal, which comprises:
the trigger instruction receiving module is used for receiving the page content retrieval trigger instruction;
the page address acquisition module is used for acquiring the page address of the page content according to the page content retrieval triggering instruction;
the knowledge graph generation module is used for generating a content entity knowledge graph corresponding to the page content based on the page address; and
And the map display module is used for displaying the content entity knowledge map so that a user can perform keyword content retrieval operation.
The embodiment of the invention also provides a content retrieval server, which comprises:
the page address receiving module is used for receiving the page address of the page content from the retrieval terminal;
the page content extraction module is used for extracting page content according to the page address;
the content entity extraction module is used for extracting the content entity from the page content by using a page crawler;
the knowledge graph creation module is used for creating the knowledge graph of the content entity according to the extracted content entity and the relevance between the content entities; and
and the knowledge graph sending module is used for sending the content entity knowledge graph to the retrieval terminal for display so that a user can perform keyword content retrieval operation.
Embodiments of the present invention also provide a computer-readable storage medium having stored therein processor-executable instructions that are loaded by one or more processors to perform the content retrieval method described above.
The embodiment of the invention also provides electronic equipment, which comprises a processor and a memory, wherein the memory is provided with a computer program, and the processor is used for executing the content retrieval method by calling the computer program.
Compared with the prior art, the content retrieval method, the terminal, the server, the electronic equipment and the storage medium generate the corresponding content entity knowledge graph through the page content, and a user can perform content retrieval operation through keywords in the content entity knowledge graph, so that the application scene range of content retrieval is enlarged, and the retrieval efficiency of content retrieval is improved; the method and the device solve the technical problems that the prior content retrieval method and the prior content retrieval device have smaller content retrieval application scene range and lower content retrieval efficiency.
Drawings
FIG. 1 is a flow chart of a first embodiment of a content retrieval method of the present invention;
FIG. 2 is a flow chart of a second embodiment of the content retrieval method of the present invention;
FIG. 3 is a flowchart of a background server generating a knowledge graph of content entities of page content according to a second embodiment of the content retrieval method of the present invention;
FIG. 4 is a flow chart of a third embodiment of the content retrieval method of the present invention;
fig. 5 is a schematic structural view of a first embodiment of the content retrieval terminal of the present invention;
fig. 6 is a schematic structural view of a second embodiment of the content retrieval terminal of the present invention;
fig. 7 is a schematic structural diagram of a background server corresponding to a second embodiment of the content retrieval terminal of the present invention;
Fig. 8 is a schematic structural diagram of a page content extraction module of a background server corresponding to a second embodiment of the content retrieval terminal of the present invention;
FIG. 9 is a schematic diagram illustrating the structure of a content retrieval server according to an embodiment of the present invention;
FIG. 10 is a schematic diagram illustrating a structure of a page content extraction module of an embodiment of a content retrieval server according to the present invention;
FIG. 11 is a timing diagram of a content retrieval process according to an embodiment of the present invention;
FIG. 12a is a schematic diagram of page content of an embodiment of a content retrieval method, content retrieval terminal, and content retrieval server of the present invention;
FIGS. 12b and 12c are schematic diagrams of knowledge graphs of content entities of specific embodiments of a content retrieval method, a content retrieval terminal, and a content retrieval server according to the present invention;
fig. 13 is a schematic diagram of the working environment of the electronic device where the content search terminal and the content search server of the present invention are located.
Detailed Description
Referring to the drawings, wherein like reference numbers refer to like elements throughout, the principles of the present invention are illustrated in an appropriate computing environment. The following description is based on illustrative embodiments of the invention and should not be taken as limiting other embodiments of the invention not described in detail herein.
In the description that follows, embodiments of the invention will be described with reference to the steps and symbols of operations performed by one or more computers, unless otherwise indicated. It will thus be appreciated that the steps and operations, to name a few, are performed by a computer, including being manipulated by a computer processing element representing electronic signals representing data in a structured form. This manipulation transforms the data or maintains it in a location within the computer's memory system, which may be re-configured or otherwise alter the computer's operation in a manner well known to those skilled in the art. The data structure maintained by the data is the physical location of the memory having specific characteristics defined by the data format. However, the principles of the present invention are described in the foregoing text and are not meant to be limiting, and those of skill in the art will appreciate that various of the steps and operations described below may also be implemented in hardware.
The content retrieval method, the terminal and the server can be arranged in any electronic equipment and used for performing content retrieval operation on certain page content provided by a user, the application scene range of the content retrieval operation is large, and the retrieval efficiency of the content retrieval is high. Including but not limited to wearable devices, head-mounted devices, medical health platforms, personal computers, server computers, hand-held or laptop devices, mobile devices (such as mobile phones, personal Digital Assistants (PDAs), media players, etc.), multiprocessor systems, consumer electronics devices, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The content retrieval terminal is preferably a mobile terminal, and the content retrieval server is preferably a content retrieval background server.
Referring to fig. 1, fig. 1 is a flowchart of a first embodiment of a content search method according to the present invention, which can be implemented using the content search terminal described above, and the content search method according to the present invention includes:
step S101, receiving a page content retrieval triggering instruction;
step S102, acquiring a page address of page content according to a page content triggering instruction;
step S103, generating a content entity knowledge graph corresponding to the page content based on the page address;
step S104, the content entity knowledge graph is displayed so that the user can perform keyword content retrieval operation.
The specific flow of each step of the content retrieval method of the present embodiment is described in detail below.
In step S101, the content retrieval terminal receives a page content retrieval trigger instruction, where the page content retrieval trigger instruction is an instruction for triggering transmission of page content selected by a user to a background server for content retrieval. The user may generate the page content search trigger instruction in various manners, for example, by clicking a search key at a certain page setting position or performing a touch operation on the current page content, for example, by performing a pull-down operation on the page content through a touch operation, or by performing a zoom operation on the page content through a touch operation, or the like.
In step S102, the content retrieval terminal acquires the page address of the page content currently being displayed by the content retrieval terminal according to the page content retrieval trigger instruction acquired in step S101.
In step S103, the content retrieval terminal generates a content entity knowledge graph corresponding to the page content based on the page address acquired in step S102; specifically, the content retrieval terminal may send the page address obtained in step S102 to a corresponding background server, so that the background server may obtain the corresponding page content for the page address, and then the background server may obtain the page content keyword of the page content, and generate the content entity knowledge graph of the page content according to the page content keyword. Of course, the content retrieval terminal can also generate the content entity knowledge graph corresponding to the page content according to the page address.
The content entity knowledge graph here refers to a visual way to describe the interrelationship between a plurality of content entities in the page content. The page content can be graphically described through the content entity knowledge graph of the page content, so that a user can better acquire keywords of the page content and relevance among the keywords.
In step S104, the content retrieval terminal receives the content entity knowledge graph from the background server, and displays the content entity knowledge graph on the screen of the content retrieval terminal, so that the user can perform the keyword content retrieval operation by selecting keywords on the content entity knowledge graph.
Thus, the page content retrieval process of the content retrieval method of the present embodiment is completed.
According to the content retrieval method, the corresponding content entity knowledge graph is generated through the page content, and the user can conduct content retrieval operation through the keywords in the content entity knowledge graph, so that the user does not need to actively input the keywords, and even can conduct retrieval operation on a plurality of keywords in the page content at one time, the application scene range of content retrieval is enlarged, and the retrieval efficiency of content retrieval is improved.
Referring to fig. 2, fig. 2 is a flowchart of a second embodiment of a content search method according to the present invention, where the content search method according to the present embodiment may be implemented using the content search terminal described above, and the content search method according to the present embodiment includes:
step S201, receiving a page content retrieval list from a background server, and performing page content retrieval triggering prompt according to the content of the page content retrieval list;
Step S202, generating a page content retrieval triggering instruction according to touch operation of a user on a page content display interface;
step S203, obtaining the page address of the page content according to the page content retrieval triggering instruction;
step S204, generating a content entity knowledge graph corresponding to the page content based on the page address;
step S205, the content entity knowledge graph is displayed so that the user can perform keyword content retrieval operation.
The specific flow of each step of the content retrieval method of the present embodiment is described in detail below.
In step S201, since not all the page contents can be subjected to the page content retrieval operation, for example, some pages cannot be subjected to the page content extraction by the page crawler. The content retrieval terminal will receive a page content retrieval list from the background server indicating those pages for which page content retrieval operations are available.
The page content retrieval list can be a white list of pages, such as a white list which can be used for page content retrieval, wherein the page content under the website a is set; the method can also be a blacklist of pages, such as setting the page content under the website b as a blacklist which can not be searched for the page content; the page may be a black-and-white list of pages, or a list of black-and-white list types of pages, for example, a white list website type in which all pages with suffix c are set to be capable of searching page contents, a black list website type in which all pages with suffix d are set to be incapable of searching page contents, or the like.
And then the content retrieval terminal carries out page content retrieval triggering prompt on the current browsed page of the user according to the content of the page content retrieval list, so that the user sends out a page content retrieval triggering instruction according to the page content retrieval triggering prompt. If the user can search the page content currently, the page content search triggering prompt is carried out at the preset position of the browsed page, for example, the retrievable page is marked at the upper right corner of the page; if the user currently browses the page and can not perform the page content retrieval operation, the upper right corner of the page indicates "not retrievable". Of course, the display mode of the page content retrieval triggering prompt can be modified according to requirements.
In step S202, if the user currently browses the page and can perform the page content retrieval operation, the content retrieval terminal may receive the touch operation of the user on the page display interface to generate the page content retrieval trigger instruction. Such as by clicking a search button at a set position of the current browsing page of the user, or performing a pull-down operation or a zoom operation on the current browsing page of the user. The page content retrieval triggering instruction here refers to an instruction for triggering the user-selected page content to be sent to the background server for content retrieval. The touch operation needs to be preset, namely, when the fact that the user performs the touch operation and the user currently browses the page to perform page content retrieval operation is detected, the content retrieval terminal generates a page content retrieval trigger instruction.
In step S203, the content retrieval terminal acquires the page address of the page content currently being displayed by the content retrieval terminal according to the page content retrieval trigger instruction generated in step S202.
In step S204, the content retrieval terminal generates a content entity knowledge graph corresponding to the page content based on the page address acquired in step S203, and specifically, the content retrieval terminal sends the page address acquired in step S203 to a corresponding background server, so that the background server can generate the content entity knowledge graph of the page content according to the page address. Referring to fig. 3 in detail, fig. 3 is a flowchart of generating a content entity knowledge graph of page content by a background server according to a second embodiment of the content retrieval method of the present invention. The step S204 includes:
in step S301, the background server extracts the page content according to the acquired page address.
Specifically, the background server can perform normalization operation on the acquired page addresses, so that the background server can better identify the same page addresses represented by different domain names.
And then the background server judges whether the local memory of the server stores the page content corresponding to the page address after the normalization operation. If the local memory of the server stores the page content corresponding to the page address after normalization operation, the background server can directly extract the page content from the local memory of the server, so that the problem of low extraction speed of the real-time page content can be better avoided, and the extraction performance of the page content is improved. If the local memory of the server does not store the page content corresponding to the page address after normalization operation, the background server directly extracts the page content from the page address.
In step S302, the background server uses the page crawler to extract the content entity from the page content. The title, subtitle, author, and specific content in the page content may be extracted. And then performing text processing operations such as word segmentation, named entity recognition (NER, named Entity Recognition), word frequency-reverse file frequency (TF-IDF, term frequency-inverse document frequency) and the like on the title and the specific content, and abstracting the page content into a plurality of content entities. These content entities can effectively feed back all of the content of the page content.
In step S303, the background server uses the content entities as search terms, extracts specific data of the content entities from the background database through a search engine technology, and obtains the relevance between the content entities. I.e. to obtain entity attributes (entity name, entity category, entity information, etc.) of the content entities and entity relationships (e.g. singer, performer, couple person relationships, etc.) between related content entities.
If the content entity is Liu somewhere, the background server takes Liu somewhere as a search term, and extracts specific data of the content entity from a background database through a search engine technology, such as the leaving time, the representative works and the like of the Liu somewhere as actors, singers and Liu somewhere; and a relationship between a Liu somewhere and another content entity, such as Liu Moumou and a singer, a movie is played together with the Liu somewhere and the Zhang somewhere, etc. can be extracted. Thus, an entity relationship between a content entity and a content entity can be established.
The physical relationship here may be, for example, a figure relationship map of an actor in a drama of a television drama, a figure relationship map of an actor in real life, and the like. The name of the TV play and the name of the actor are the entity attributes of the content entity, and the couple relationship, father-son relationship and actor relationship between the characters in the play and the TV play are the entity relationship of the content entity.
Thus, the background server can create a knowledge graph of the content entities according to the content entities and the relevance among the content entities. The content entity knowledge graph here refers to a visual way to describe the interrelationship between a plurality of content entities in the page content. The page content can be graphically described through the content entity knowledge graph of the page content, so that a user can better acquire keywords of the page content and relevance among the keywords. The content entity knowledge graph can represent the interconnection among different content entities through a plurality of hierarchical structures, and the important content entities are placed at the highest level of the hierarchical structures so as to better show the entity attributes and entity relations of the content entities.
In step S304, since the content entities included in the page content may be excessive, the relevance between all the content entities cannot be fed back through a less hierarchical content entity knowledge graph. The background server reads user portraits of the user of the content retrieval terminal, wherein the user portraits can be preset in the background server or the content retrieval terminal, and the user portraits refer to interest values of the user to different content entities, which are obtained through actions such as content browsing, content searching and content purchasing of the user. Such as some users having a greater interest in movies, some users having a greater interest in songs, etc.
Thus, the background server can adjust the priority of the content entities in the content entity knowledge graph acquired in step S303 according to the preset user portrait. The content entity knowledge graph can display the content entity most interested by the user preferentially, the content entity with poor user interest is placed in the second level or the third level of the content entity knowledge graph, and the content entity which is not interested by the user is judged to be directly deleted from the content entity knowledge graph.
Thus, the process of generating the entity knowledge graph of the page content by the background server is completed.
In step S205, the content retrieval terminal receives the content entity knowledge graph with priority adjusted from the background server, and displays the content entity knowledge graph on the screen of the content retrieval terminal, so that the user can perform keyword content retrieval operation by selecting keywords on the content entity knowledge graph or directly re-generate a new content entity knowledge graph by using the keywords selected by the user.
Thus, the page content retrieval process of the content retrieval method of the present embodiment is completed.
On the basis of the first embodiment, the content retrieval method of the embodiment filters the pages which cannot be retrieved by the page content retrieval list and the page content retrieval trigger prompt, so that the retrieval efficiency of the page content retrieval is further improved; the page content retrieval triggering instruction is generated through touch operation of a user on the page content display interface, so that the diversity of the page content retrieval triggering instruction is improved; the page retrieval process can be carried out in a background server, and the content retrieval terminal only carries out display operation on the content entity knowledge graph, so that the performance of the content retrieval terminal is improved.
Referring to fig. 4, fig. 4 is a flowchart of a third embodiment of a content search method according to the present invention, which can be implemented using the content search server, and the content search method according to the present invention includes:
Step S401, receiving page addresses of page contents from a retrieval terminal;
step S402, extracting page contents according to page addresses;
step S403, extracting content entities from the page content by using the page crawler;
step S404, creating a content entity knowledge graph according to the extracted content entities and the relevance between the content entities;
step S405, adjusting the priority of the content entity for the knowledge graph of the content entity based on the preset user portrait;
and step S406, the content entity knowledge graph is sent to a retrieval terminal for display so that a user can conduct keyword content retrieval operation.
The specific flow of each step of the content retrieval method of the present embodiment is described in detail below.
In step S401, the content retrieval server receives the page address of the page content from the retrieval terminal, that is, the page address of the page content currently being displayed by the retrieval terminal.
In step S402, the content retrieval server extracts page content based on the page address acquired in step S401.
Specifically, the content retrieval server may perform normalization operation on the obtained page address, so that the content retrieval server may better identify the same page address represented by different domain names.
And then the content retrieval server judges whether the local memory of the server stores the page content corresponding to the page address after the normalization operation. If the local memory of the server stores the page content corresponding to the page address after normalization operation, the background server can directly extract the page content from the local memory of the server, so that the problem of low extraction speed of the real-time page content can be better avoided, and the extraction performance of the page content is improved. If the local memory of the server does not store the page content corresponding to the page address after normalization operation, the background server directly extracts the page content from the page address.
In step S403, the content retrieval server performs content entity extraction on the page content using the page crawler. The title, subtitle, author, and specific content in the page content may be extracted. And then performing text processing operations such as word segmentation, named entity recognition (NER, named Entity Recognition), word frequency-reverse file frequency (TF-IDF, term frequency-inverse document frequency) and the like on the title and the specific content, and abstracting the page content into a plurality of content entities. These content entities can effectively feed back all of the content of the page content.
In step S404, the content retrieval server uses the content entities as search terms, extracts specific data of the content entities from the background database through search engine technology, and obtains relevance between the content entities. I.e. to obtain entity attributes (entity name, entity category, entity information, etc.) of the content entities and entity relationships (e.g. singer, performer, couple person relationships, etc.) between related content entities.
If the content entity is Liu somewhere, the background server takes Liu somewhere as a search term, and extracts specific data of the content entity from a background database through a search engine technology, such as the leaving time, the representative works and the like of the Liu somewhere as actors, singers and Liu somewhere; and a relationship between a Liu somewhere and another content entity, such as Liu Moumou and a singer, a movie is played together with the Liu somewhere and the Zhang somewhere, etc. can be extracted. Thus, an entity relationship between a content entity and a content entity can be established.
The physical relationship here may be, for example, a figure relationship map of an actor in a drama of a television drama, a figure relationship map of an actor in real life, and the like. The name of the TV play and the name of the actor are the entity attributes of the content entity, and the couple relationship, father-son relationship and actor relationship between the characters in the play and the TV play are the entity relationship of the content entity.
The content retrieval server can then create a knowledge graph of the content entities based on the content entities and the associations between the content entities. The content entity knowledge graph here refers to a visual way to describe the interrelationship between a plurality of content entities in the page content. The page content can be graphically described through the content entity knowledge graph of the page content, so that a user can better acquire keywords of the page content and relevance among the keywords. The content entity knowledge graph can represent the interconnection among different content entities through a plurality of hierarchical structures, and the important content entities are placed at the highest level of the hierarchical structures so as to better show the entity attributes and entity relations of the content entities.
In step S405, since the content entities included in the page content may be excessive, it is not possible to feed back the relevance between all the content entities through a less hierarchical knowledge graph of the content entities. The background server reads user portraits of the user of the content retrieval terminal, wherein the user portraits can be preset in the background server or the content retrieval terminal, and the user portraits refer to interest values of the user to different content entities, which are obtained through actions such as content browsing, content searching and content purchasing of the user. Such as some users having a greater interest in movies, some users having a greater interest in songs, etc.
Thus, the content retrieval server can adjust the priority of the content entities in the content entity knowledge graph acquired in step S404 according to the preset user portrait. The content entity knowledge graph can display the content entity most interested by the user preferentially, the content entity with poor user interest is placed in the second level or the third level of the content entity knowledge graph, and the content entity which is not interested by the user is judged to be directly deleted from the content entity knowledge graph.
In step S406, the content retrieval server sends the content entity knowledge graph with the adjusted priority to the retrieval terminal for display, so that the user of the content retrieval terminal can perform the keyword content retrieval operation by selecting keywords on the content entity knowledge graph or directly re-generate a new content entity knowledge graph by using the keywords selected by the user.
Thus, the page content retrieval process of the content retrieval method of the present embodiment is completed.
According to the content retrieval method, the corresponding content entity knowledge graph is generated through the page content, and the user can conduct content retrieval operation through the keywords in the content entity knowledge graph, so that the user does not need to actively input the keywords, and even can conduct retrieval operation on a plurality of keywords in the page content at one time, the application scene range of content retrieval is enlarged, and the retrieval efficiency of content retrieval is improved.
And the page retrieval process can be carried out in a background server, and the content retrieval terminal only carries out display operation on the content entity knowledge graph, so that the performance of the corresponding content retrieval terminal can be effectively improved.
The present invention also provides a content retrieval terminal, please refer to fig. 5, fig. 5 is a schematic structural diagram of a first embodiment of the content retrieval terminal of the present invention. The content retrieval terminal of the present embodiment can be implemented using the first embodiment of the content retrieval method described above, and the content retrieval terminal 50 of the present embodiment includes a trigger instruction receiving module 51, a page address obtaining module 52, a knowledge graph generating module 53, and a graph displaying module 54.
The trigger instruction receiving module 51 is configured to receive a page content retrieval trigger instruction; the page address acquisition module 52 is configured to acquire a page address of the page content according to the page content retrieval trigger instruction; the knowledge graph generation module 53 is configured to generate a content entity knowledge graph corresponding to the page content based on the page address; the map display module 54 is configured to receive and display a content entity knowledge map for a user to perform a keyword content retrieval operation.
When the content retrieval terminal 50 of the present embodiment is used, the trigger instruction receiving module 51 receives a page content retrieval trigger instruction, where the page content retrieval trigger instruction is an instruction for triggering to send the page content selected by the user to the background server for content retrieval. The user may generate the page content search trigger instruction in various manners, for example, by clicking a search key at a certain page setting position or performing a touch operation on the current page content, for example, by performing a pull-down operation on the page content through a touch operation, or by performing a zoom operation on the page content through a touch operation, or the like.
The page address acquisition module 52 then acquires the page address of the page content currently being displayed by the content retrieval terminal according to the page content retrieval trigger instruction acquired by the trigger instruction receiving module 51.
The knowledge graph generation module 53 then generates a content entity knowledge graph corresponding to the page content based on the page address acquired by the page address acquisition module 52; specifically, the knowledge graph generating module 53 sends the page address acquired by the page address acquiring module 52 to the corresponding background server, so that the background server can acquire the corresponding page content for the page address, and then the background server can acquire the page content keywords of the page content and generate the content entity knowledge graph of the page content according to the page content keywords. Of course, the knowledge graph generation module 53 may also generate the content entity knowledge graph corresponding to the page content according to the page address.
The content entity knowledge graph here refers to a visual way to describe the interrelationship between a plurality of content entities in the page content. The page content can be graphically described through the content entity knowledge graph of the page content, so that a user can better acquire keywords of the page content and relevance among the keywords.
Finally, the map display module 54 receives the content entity knowledge map from the background server, and displays the content entity knowledge map on the screen of the content retrieval terminal, so that the user can perform keyword content retrieval operation by selecting keywords on the content entity knowledge map.
This completes the page content retrieval process of the content retrieval terminal 50 of the present embodiment.
The content retrieval terminal of the embodiment generates the corresponding content entity knowledge graph through the page content, and the user can perform content retrieval operation through the keywords in the content entity knowledge graph, so that the user does not need to actively input the keywords, and can perform retrieval operation on a plurality of keywords in the page content at one time, thereby expanding the application scene range of content retrieval and improving the retrieval efficiency of content retrieval.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a content retrieval terminal according to a second embodiment of the present invention. The content retrieval terminal of the present embodiment can be implemented using the second embodiment of the content retrieval method described above, and the content retrieval terminal 60 of the present embodiment includes a retrieval trigger prompt module 61, a trigger instruction receiving module 62, a page address acquisition module 63, a knowledge graph generation module 64, and a graph display module 65.
The search trigger prompt module 61 is configured to receive the page content search list from the background server, and perform a page content search trigger prompt according to the content of the page content search list, so that a user sends a page content search trigger instruction according to the page content search trigger prompt. The trigger receiving module 62 is configured to generate a trigger for retrieving page content according to a touch operation performed by a user on the page content display interface. The page address acquisition module 63 is configured to acquire a page address of the page content according to the page content retrieval trigger instruction; the knowledge graph generation module is used for generating a content entity knowledge graph corresponding to the page content based on the page address; the map display module 65 is configured to display a content entity knowledge map for a user to perform a keyword content retrieval operation.
Referring to fig. 7, fig. 7 is a schematic structural diagram of a background server corresponding to a second embodiment of the content retrieval terminal of the present invention. The background server 70 includes a page content extraction module 71, a content entity extraction module 72, a knowledge-graph creation module 73, and a knowledge-graph priority adjustment module 74.
The page content extraction module 71 is configured to extract page content according to the page address; the content entity extraction module 72 is configured to perform content entity extraction on the page content using the page crawler; the knowledge graph creation module 73 is configured to create a knowledge graph of content entities according to the extracted content entities and the correlation between the content entities. The knowledge-graph priority adjustment module 74 is configured to perform content-entity priority adjustment on the content-entity knowledge graph based on the preset user representation.
Referring to fig. 8, fig. 8 is a schematic structural diagram of a page content extraction module of a background server according to a second embodiment of the content retrieval terminal of the present invention. The page content extraction module 71 includes a page address normalization unit 81, a page content storage judgment unit 82, a first page content extraction unit 83, and a second page content extraction unit 84.
The page address normalization unit 81 is configured to normalize a page address; the page content storage judging unit 82 is configured to judge whether the local memory of the server stores page content corresponding to the page address after the normalization operation; the first page content extracting unit 83 is configured to extract the page content from the server local memory if the page content corresponding to the page address after the normalization operation is stored; the second page content extracting unit 84 is configured to extract the page content from the page address if the page content corresponding to the page address after the normalization operation is not stored.
When the content retrieval terminal 60 of the present preferred embodiment is used, since not all page contents can be subjected to the page content retrieval operation, for example, some pages cannot be subjected to the page content extraction by the page crawler. The search trigger prompt module 61 will receive a page content search list from the background server 70 indicating those pages for which page content search operations are available.
The page content retrieval list can be a white list of pages, such as a white list which can be used for page content retrieval, wherein the page content under the website a is set; the method can also be a blacklist of pages, such as setting the page content under the website b as a blacklist which can not be searched for the page content; the page may be a black-and-white list of pages, or a list of black-and-white list types of pages, for example, a white list website type in which all pages with suffix c are set to be capable of searching page contents, a black list website type in which all pages with suffix d are set to be incapable of searching page contents, or the like.
The search trigger prompt module 61 then performs a page content search trigger prompt on the currently browsed page according to the content of the page content search list, so that the user issues a page content search trigger instruction according to the page content search trigger prompt. If the user can search the page content currently, the page content search triggering prompt is carried out at the preset position of the browsed page, for example, the retrievable page is marked at the upper right corner of the page; if the user currently browses the page and can not perform the page content retrieval operation, the upper right corner of the page indicates "not retrievable". Of course, the display mode of the page content retrieval triggering prompt can be modified according to requirements.
Then, if the user currently browses the page and can perform the page content retrieval operation, the trigger instruction receiving module 62 may receive the touch operation of the user on the page display interface to generate the page content retrieval trigger instruction. Such as by clicking a search button at a set position of the current browsing page of the user, or performing a pull-down operation or a zoom operation on the current browsing page of the user. The page content retrieval triggering instruction here refers to an instruction for triggering the user-selected page content to be sent to the background server for content retrieval. The touch operation needs to be preset, namely, when the fact that the user performs the touch operation and the user currently browses the page to perform page content retrieval operation is detected, the content retrieval terminal generates a page content retrieval trigger instruction.
The page address acquisition module 63 then acquires the page address of the page content currently being displayed by the content retrieval terminal according to the page content retrieval trigger instruction generated by the trigger instruction receiving module 62.
The knowledge graph generating module 64 then generates a content entity knowledge graph corresponding to the page content based on the page address acquired by the page address acquiring module 63, and specifically, the knowledge graph generating module 64 sends the page address acquired by the page address acquiring module 63 to a corresponding background server, so that the background server 70 can generate the content entity knowledge graph of the page content according to the page address. The specific process comprises the following steps:
The page content extraction module 71 of the background server 70 extracts page content based on the acquired page address.
Specifically, the page address normalization unit 81 of the page content extraction module 71 may perform normalization operation on the obtained page address, so that the background server may better identify the same page address represented by different domain names.
The page content storage determining unit 82 of the page content extracting module 71 then determines whether the page content corresponding to the normalized page address is stored in the server local memory. If the local memory of the server stores the page content corresponding to the normalized page address, the first page content extracting unit 83 of the page content extracting module 71 may directly extract the page content from the local memory of the server, so that the problem of low real-time page content extracting speed can be better avoided, and the extracting performance of the page content is improved. If the server local memory does not store the page content corresponding to the normalized page address, the second page content extraction unit 84 of the page content extraction module 71 directly extracts the page content from the page address.
The content entity extraction module 72 of the background server 70 then uses the page crawler to perform content entity extraction on the page content. The title, subtitle, author, and specific content in the page content may be extracted. And then performing text processing operations such as word segmentation, named entity recognition (NER, named Entity Recognition), word frequency-reverse file frequency (TF-IDF, term frequency-inverse document frequency) and the like on the title and the specific content, and abstracting the page content into a plurality of content entities. These content entities can effectively feed back all of the content of the page content.
The knowledge graph creation module 73 of the background server 70 then extracts specific data of the content entities from the background database by using the content entities as search terms through a search engine technology, and obtains the relevance between the content entities. I.e. to obtain entity attributes (entity name, entity category, entity information, etc.) of the content entities and entity relationships (e.g. singer, performer, couple person relationships, etc.) between related content entities.
If the content entity is Liu somewhere, the background server takes Liu somewhere as a search term, and extracts specific data of the content entity from a background database through a search engine technology, such as the leaving time, the representative works and the like of the Liu somewhere as actors, singers and Liu somewhere; and a relationship between a Liu somewhere and another content entity, such as Liu Moumou and a singer, a movie is played together with the Liu somewhere and the Zhang somewhere, etc. can be extracted. Thus, an entity relationship between a content entity and a content entity can be established.
The physical relationship here may be, for example, a figure relationship map of an actor in a drama of a television drama, a figure relationship map of an actor in real life, and the like. The name of the TV play and the name of the actor are the entity attributes of the content entity, and the couple relationship, father-son relationship and actor relationship between the characters in the play and the TV play are the entity relationship of the content entity.
The knowledge-graph creation module 73 may thus create a knowledge graph of the content entities based on the content entities and the associations between the content entities. The content entity knowledge graph here refers to a visual way to describe the interrelationship between a plurality of content entities in the page content. The page content can be graphically described through the content entity knowledge graph of the page content, so that a user can better acquire keywords of the page content and relevance among the keywords. The content entity knowledge graph can represent the interconnection among different content entities through a plurality of hierarchical structures, and the important content entities are placed at the highest level of the hierarchical structures so as to better show the entity attributes and entity relations of the content entities.
Since the content entities contained in the page content may be excessive, the relevance between all the content entities cannot be fed back through a less-level content entity knowledge graph. Finally, the knowledge-graph priority adjustment module 74 of the background server 70 reads the user portraits of the user of the content retrieval terminal, where the user portraits may be preset in the background server or in the content retrieval terminal, and the user portraits refer to interest values of the user in different content entities, which are obtained through actions of the user, such as content browsing, content searching, and content purchasing. Such as some users having a greater interest in movies, some users having a greater interest in songs, etc.
In this way, the knowledge-graph priority adjustment module 74 can perform priority adjustment on the content entities in the content-entity knowledge graph acquired by the knowledge-graph creation module 73 according to the preset user portrait. The content entity knowledge graph can display the content entity most interested by the user preferentially, the content entity with poor user interest is placed in the second level or the third level of the content entity knowledge graph, and the content entity which is not interested by the user is judged to be directly deleted from the content entity knowledge graph.
This completes the process of the background server 70 generating the entity knowledge-graph of the page content.
The profile display module 65 then receives the prioritized content entity knowledge-profile from the background server 70 and displays the content entity knowledge-profile on the screen of the content retrieval terminal 60, and the user may perform a keyword content retrieval operation by selecting keywords on the content entity knowledge-profile or directly re-generate a new content entity knowledge-profile with the keywords selected by the user.
This completes the page content retrieval process of the content retrieval terminal 60 of the present embodiment.
On the basis of the first embodiment, the content retrieval terminal of the embodiment filters the pages which cannot be retrieved by the page content retrieval list and the page content retrieval trigger prompt, so that the retrieval efficiency of the page content retrieval is further improved; the page content retrieval triggering instruction is generated through touch operation of a user on the page content display interface, so that the diversity of the page content retrieval triggering instruction is improved; the page retrieval process can be carried out in a background server, and the content retrieval terminal only carries out display operation on the content entity knowledge graph, so that the performance of the content retrieval terminal is improved.
The present invention also provides a content retrieval server, please refer to fig. 9, fig. 9 is a schematic structural diagram of an embodiment of the content retrieval server of the present invention. The content retrieval server of the present embodiment can be implemented using the third embodiment of the content retrieval method described above. The content retrieval server 90 of the present embodiment includes a page address receiving module 91, a page content extracting module 92, a content entity extracting module 93, a knowledge pattern creating module 94, a knowledge pattern priority adjusting module 95, and a knowledge pattern transmitting module 96.
The page address receiving module 91 is configured to receive a page address of the page content from the search terminal; the page content extracting module 92 is configured to extract page content according to the page address; the content entity extraction module 93 is configured to perform content entity extraction on the page content by using the page crawler; the knowledge graph creation module 94 is configured to create a knowledge graph of the content entities according to the extracted content entities and the correlation between the content entities; the knowledge graph priority adjustment module 95 is configured to adjust the priority of the content entity for the content entity knowledge graph based on the preset user representation; the knowledge graph sending module 96 is configured to send the content entity knowledge graph to the search terminal for display, so that the user performs the keyword content search operation.
Referring to fig. 10, fig. 10 is a schematic diagram illustrating a structure of a page content extraction module of an embodiment of a content retrieval server according to the present invention. The page content extraction module 92 includes a page address normalization unit 101, a page content storage judgment unit 102, a first page content extraction unit 103, and a second page content extraction unit 104.
The page address normalization unit 101 is configured to normalize a page address; the page content storage judging unit 102 is configured to judge whether the local memory of the server stores page content corresponding to the page address after the normalization operation; the first page content extracting unit 103 is configured to extract the page content from the server local memory if the page content corresponding to the page address after the normalization operation is stored; the second page content extracting unit 104 is configured to extract the page content from the page address if the page content corresponding to the page address after the normalization operation is not stored.
When the content retrieval server 90 of the present embodiment is used, first, the page address receiving module 91 receives the page address of the page content from the retrieval terminal, that is, the page address of the page content currently being displayed by the retrieval terminal.
The page content extraction module 92 then extracts page content from the page address acquired by the page address reception module 91.
Specifically, the page address normalization unit 101 of the page content extraction module 92 may perform normalization operation on the acquired page address, so that the content retrieval server may better identify the same page address represented by different domain names.
The page content storage determining unit 102 of the page content extracting module 92 then determines whether the page content corresponding to the normalized page address is stored in the server local memory. If the local memory of the server stores the page content corresponding to the normalized page address, the first page content extracting unit 103 of the page content extracting module 92 may directly extract the page content from the local memory of the server, so that the problem of low real-time page content extracting speed may be better avoided, and the extracting performance of the page content is improved. If the server local memory does not store the page content corresponding to the normalized page address, the second page content extracting unit 104 of the page content extracting module 92 directly extracts the page content from the page address.
The content entity extraction module 93 then uses the page crawler to perform content entity extraction on the page content. The title, subtitle, author, and specific content in the page content may be extracted. And then performing text processing operations such as word segmentation, named entity recognition (NER, named Entity Recognition), word frequency-reverse file frequency (TF-IDF, term frequency-inverse document frequency) and the like on the title and the specific content, and abstracting the page content into a plurality of content entities. These content entities can effectively feed back all of the content of the page content.
The knowledge graph creation module 94 then uses the content entities as search terms, extracts specific data of the content entities from the background database through search engine technology, and obtains the relevance between the content entities. I.e. to obtain entity attributes (entity name, entity category, entity information, etc.) of the content entities and entity relationships (e.g. singer, performer, couple person relationships, etc.) between related content entities.
If the content entity is Liu somewhere, the background server takes Liu somewhere as a search term, and extracts specific data of the content entity from a background database through a search engine technology, such as the leaving time, the representative works and the like of the Liu somewhere as actors, singers and Liu somewhere; and a relationship between a Liu somewhere and another content entity, such as Liu Moumou and a singer, a movie is played together with the Liu somewhere and the Zhang somewhere, etc. can be extracted. Thus, an entity relationship between a content entity and a content entity can be established.
The physical relationship here may be, for example, a figure relationship map of an actor in a drama of a television drama, a figure relationship map of an actor in real life, and the like. The name of the TV play and the name of the actor are the entity attributes of the content entity, and the couple relationship, father-son relationship and actor relationship between the characters in the play and the TV play are the entity relationship of the content entity.
The knowledge-graph creation module 94 may then create a knowledge graph of the content entities based on the content entities and the associations between the content entities. The content entity knowledge graph here refers to a visual way to describe the interrelationship between a plurality of content entities in the page content. The page content can be graphically described through the content entity knowledge graph of the page content, so that a user can better acquire keywords of the page content and relevance among the keywords. The content entity knowledge graph can represent the interconnection among different content entities through a plurality of hierarchical structures, and the important content entities are placed at the highest level of the hierarchical structures so as to better show the entity attributes and entity relations of the content entities.
Since the content entities contained in the page content may be excessive, the relevance between all the content entities cannot be fed back through a less-level content entity knowledge graph. The knowledge graph priority adjustment module reads user portraits of users of the content retrieval terminal, wherein the user portraits can be preset in a content retrieval server or in the content retrieval terminal, and the user portraits refer to interest values of the users in different content entities, which are obtained through actions such as content browsing, content searching and content purchasing of the users. Such as some users having a greater interest in movies, some users having a greater interest in songs, etc.
In this way, the knowledge-graph priority adjustment module 95 can perform priority adjustment on the content entities in the content-entity knowledge graph acquired by the knowledge-graph creation module 94 according to the preset user portrait. The content entity knowledge graph can display the content entity most interested by the user preferentially, the content entity with poor user interest is placed in the second level or the third level of the content entity knowledge graph, and the content entity which is not interested by the user is judged to be directly deleted from the content entity knowledge graph.
And finally, the knowledge graph sending module 96 sends the content entity knowledge graph with the adjusted priority to the searching terminal for displaying, so that a user of the content searching terminal can perform keyword content searching operation by selecting keywords on the content entity knowledge graph or directly re-generate a new content entity knowledge graph by using the keywords selected by the user.
This completes the page content retrieval process of the content retrieval server 90 of the present embodiment.
The content retrieval server in the embodiment generates the corresponding content entity knowledge graph through the page content, and the user can perform content retrieval operation through the keywords in the content entity knowledge graph, so that the user does not need to actively input the keywords, and can perform retrieval operation on a plurality of keywords in the page content at one time, thereby expanding the application scene range of content retrieval and improving the retrieval efficiency of content retrieval.
And the page retrieval process is carried out in the content retrieval server, and the content retrieval terminal only carries out the display operation on the content entity knowledge graph, so that the performance of the corresponding content retrieval terminal can be effectively improved.
The following describes the operation of the content retrieval method, the content retrieval terminal, and the content retrieval server of the present invention by a specific embodiment. Referring to fig. 11, fig. 11 is a timing chart of a content search process according to an embodiment of the content search method, the content search terminal and the content search server of the present invention. In this embodiment, the content retrieval terminal is a mobile terminal of a user, and the content retrieval server is a background server of a browser application. The content retrieval process of this embodiment includes:
In step S1101, when the mobile terminal user sees the interested page content, if a page content search trigger prompt is set on the page content, the user may send out a page content search trigger instruction by performing a pull-down operation on the page content.
In step S1102, the mobile terminal obtains the current page address of the browser application according to the page content retrieval trigger instruction, and sends the page address to the background server of the browser application.
In step S1103, after performing the normalization operation on the received page address, the background server obtains the corresponding page content through the local cache or directly through the page address.
In step S1104, the background server uses the page crawler to perform content entity extraction on the page content, such as extracting the title, subtitle, author, and specific content in the page content. And then performing text processing operations such as word segmentation, named entity recognition (NER, named Entity Recognition), word frequency-reverse file frequency (TF-IDF, term frequency-inverse document frequency) and the like on the title and the specific content, and abstracting the page content into a plurality of content entities.
In fig. 12a, the advertisement page of the tv show "small a-pass" is a page, and content entities such as "small a-pass", "small a" and "some Zhao" can be extracted from the page content.
In step S1105, the background server uses the content entities as search terms, extracts specific data of the content entities from the background database through a search engine technology, and creates a content entity knowledge graph corresponding to the page content based on the relevance between the content entities. As shown in particular in fig. 12b and 12 c.
In step S1106, the background server determines the interest degree of the user on the content entities in the content entity knowledge graph according to the user portrait formed by the previous page browsing record of the mobile terminal user, and adjusts the position and priority of the content entities in the content entity knowledge graph according to the interest degree. If the interest degree of the user on the TV play "small A pass" is large, generating a content entity knowledge graph shown in FIG. 12 b; if the user has a high interest in "Zhao somewhere" of the actor, a content entity knowledge map as shown in fig. 12c is generated.
In step S1107, the background server sends the adjusted knowledge graph of the content entity to the mobile terminal for display, and the mobile terminal user can perform keyword content retrieval operation by selecting any keywords on the knowledge graph of the content entity. Here the user can switch to the content entity knowledge graph of fig. 12c by clicking on the content entity "some" in fig. 12 b.
Thus, the content retrieval method, the content retrieval terminal, and the page content retrieval process of the content retrieval server of the present embodiment are completed.
According to the content retrieval method, the content retrieval terminal, the content retrieval server and the electronic equipment, the corresponding content entity knowledge graph is generated through the page content, and a user can conduct content retrieval operation through the keywords in the content entity knowledge graph, so that the application scene range of content retrieval is enlarged, and the retrieval efficiency of content retrieval is improved; the method and the device solve the technical problems that the prior content retrieval method and the prior content retrieval device have smaller content retrieval application scene range and lower content retrieval efficiency.
The terms "component," "module," "system," "interface," "process," and the like as used herein are generally intended to refer to a computer-related entity: hardware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a controller and the controller can be a component. One or more components can reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers.
Fig. 13 and the following discussion provide a brief, general description of the operating environment of an electronic device in which a content retrieval terminal and content retrieval server embodying the present invention are implemented. The work environment of fig. 13 is only one example of a suitable work environment and is not intended to suggest any limitation as to the scope of use or functionality of the work environment. Example electronic devices 1312 include, but are not limited to, wearable devices, head-mounted devices, medical health platforms, personal computers, server computers, hand-held or laptop devices, mobile devices (such as mobile phones, personal Digital Assistants (PDAs), media players, and the like), multiprocessor systems, consumer electronics, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
Although not required, embodiments are described in the general context of "computer-readable instructions" being executed by one or more electronic devices. Computer readable instructions may be distributed via a computer readable medium (discussed below). Computer readable instructions may be implemented as program modules, such as functions, objects, application Programming Interfaces (APIs), data structures, etc., that perform particular tasks or implement particular abstract data types. Typically, the functionality of the computer readable instructions may be combined or distributed as desired in various environments.
Fig. 13 illustrates an example of an electronic device 1312 that includes one or more embodiments of the content retrieval terminal and content retrieval server of the present invention. In one configuration, the electronic device 1312 includes at least one processing unit 1316 and memory 1318. Depending on the exact configuration and type of electronic device, memory 1318 may be volatile (such as RAM), non-volatile (such as ROM, flash memory, etc.) or some combination of the two. This configuration is illustrated in fig. 13 by dashed line 1314.
In other embodiments, electronic device 1312 may include additional features and/or functionality. For example, device 1312 may also include additional storage (e.g., removable and/or non-removable) including, but not limited to, magnetic storage, optical storage, and the like. Such additional storage is illustrated in fig. 13 by storage 1320. In one embodiment, computer readable instructions for implementing one or more embodiments provided herein may be in storage 1320. Storage 1320 may also store other computer readable instructions for implementing an operating system, application programs, and the like. Computer readable instructions may be loaded in memory 1318 and executed by processing unit 1316, for example.
The term "computer readable media" as used herein includes computer storage media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions or other data. Memory 1318 and storage 1320 are examples of computer storage media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by electronic device 1312. Any such computer storage media may be part of electronic device 1312.
Electronic device 1312 may also include communication connection 1326 that allows electronic device 1312 to communicate with other devices. Communication connection 1326 may include, but is not limited to, a modem, a Network Interface Card (NIC), an integrated network interface, a radio frequency transmitter/receiver, an infrared port, a USB connection, or other interface for connecting electronic device 1312 to other electronic devices. Communication connection 1326 may include a wired connection or a wireless connection. Communication connection 1326 may transmit and/or receive communication media.
The term "computer readable media" may include communication media. Communication media typically embodies computer readable instructions or other data in a "modulated data signal" such as a carrier wave or other transport mechanism and includes any information delivery media. The term "modulated data signal" may include such signals: one or more of the signal characteristics are set or changed in such a manner as to encode information into the signal.
Electronic device 1312 can include input device(s) 1324 such as keyboard, mouse, pen, voice input device, touch input device, infrared camera, video input device, and/or any other input device. Output device(s) 1322 such as one or more displays, speakers, printers, and/or any other output device may also be included in device 1312. The input device 1324 and the output device 1322 may be connected to the electronic device 1312 via a wired connection, a wireless connection, or any combination thereof. In one embodiment, an input device or an output device from another electronic device may be used as input device 1324 or output device 1322 for electronic device 1312.
The components of electronic device 1312 may be connected by various interconnects (e.g., buses). Such interconnects may include a Peripheral Component Interconnect (PCI), such as PCI express, universal Serial Bus (USB), firewire (IEEE 1394), an optical bus structure, and the like. In another embodiment, components of electronic device 1312 may be interconnected by a network. For example, memory 1318 may be comprised of multiple physical memory units located in different physical locations interconnected by a network.
Those skilled in the art will appreciate that storage devices for storing computer readable instructions may be distributed across a network. For example, an electronic device 1330 accessible via a network 1328 may store computer readable instructions for implementing one or more embodiments of the present invention. The electronic device 1312 may access the electronic device 1330 and download a part or all of the computer readable instructions for execution. Alternatively, the electronic device 1312 may download pieces of the computer readable instructions, as needed, or some instructions may be executed at the electronic device 1312 and some at the electronic device 1330.
Various operations of the embodiments are provided herein. In one embodiment, the one or more operations may constitute computer-readable instructions stored on one or more computer-readable media that, when executed by an electronic device, will cause the computing device to perform the operations. The order in which some or all of the operations are described should not be construed as to imply that these operations are necessarily order dependent. Those skilled in the art will appreciate alternative ordering that will have the benefit of this description. Moreover, it should be understood that not all operations need be present in every embodiment provided herein.
Moreover, although the disclosure has been shown and described with respect to one or more implementations, equivalent alterations and modifications will occur to others skilled in the art based upon a reading and understanding of this specification and the annexed drawings. The present disclosure includes all such modifications and alterations and is limited only by the scope of the following claims. In particular regard to the various functions performed by the above described components (e.g., elements, resources, etc.), the terms used to describe such components are intended to correspond, unless otherwise indicated, to any component which performs the specified function of the described component (e.g., that is functionally equivalent), even though not structurally equivalent to the disclosed structure which performs the function in the herein illustrated exemplary implementations of the disclosure. Furthermore, while a particular feature of the disclosure may have been disclosed with respect to only one of several implementations, such feature may be combined with one or more other features of the other implementations as may be desired and advantageous for a given or particular application. Moreover, to the extent that the terms "includes," has, "" contains, "or variants thereof are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term" comprising.
The functional units in the embodiment of the invention can be integrated in one processing module, or each unit can exist alone physically, or two or more units are integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product. The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. Each of the devices or systems described above may perform the method in the corresponding method embodiment.
In summary, although the embodiments of the present invention have been described above, the numbers before the embodiments are used for convenience of description, and the order of the embodiments of the present invention is not limited. Moreover, the above-mentioned embodiments are not intended to limit the present invention, and those skilled in the art can make various modifications and variations without departing from the spirit and scope of the present invention, so the scope of the present invention is defined by the claims.

Claims (10)

1. A content retrieval method, comprising:
Receiving a page content retrieval list from a background server, and carrying out page content retrieval triggering prompt according to the content of the page content retrieval list so that a user sends a page content retrieval triggering instruction according to the page content retrieval triggering prompt, wherein the page content retrieval list comprises pages of which page content supports page content retrieval;
receiving the page content retrieval triggering instruction;
acquiring a page address of the page content according to the page content retrieval triggering instruction;
sending the page address to the background server;
receiving a content entity knowledge graph subjected to priority adjustment from the background server, and displaying the content entity knowledge graph so as to facilitate keyword content retrieval operation of a user;
the background server is used for extracting the page content according to the page address; extracting at least one of a title, a subtitle, an author and specific content in the page content by using a page crawler; performing word segmentation, named entity recognition and word frequency-reverse file frequency text processing operation on at least one of the title, the subtitle, the author and the specific content, and abstracting the page content into a plurality of content entities; creating a content entity knowledge graph corresponding to the page content according to the extracted content entities and the relevance among the content entities; reading user portraits of content retrieval terminal users, and adjusting content priority of the content entity knowledge graph based on the user portraits, wherein the user portraits refer to interest values of users on different content entities, which are obtained through user behaviors; the content entity knowledge graph represents the interconnection among different content entities through a plurality of hierarchical structures, the more important the content entities are, the higher the hierarchical structure of the content entities is, the relevance among the content entities is obtained through specific data extracted from a background database by a search engine technology by taking the content entities as search words, and the relevance among the content entities comprises entity attributes of the content entities and entity relations among related content entities;
And responding to the selected operation of the keywords on the content entity knowledge graph, performing keyword content retrieval operation, or generating a new content entity knowledge graph according to the selected keywords.
2. The content retrieval method according to claim 1, wherein the step of receiving the page content retrieval trigger instruction is:
and generating the page content retrieval triggering instruction according to touch operation of a user on a page content display interface.
3. A content retrieval method, comprising:
receiving a page address of the page content from the retrieval terminal;
extracting the page content according to the page address;
extracting at least one of a title, a subtitle, an author and specific content in the page content by using a page crawler; performing word segmentation, named entity recognition and word frequency-reverse file frequency text processing operation on at least one of the title, the subtitle, the author and the specific content, and abstracting the page content into a plurality of content entities;
creating a content entity knowledge graph corresponding to the page content according to the extracted content entities and the relevance among the content entities; the content entity knowledge graph represents the interconnection among different content entities through a plurality of hierarchical structures, and the more important the content entities are, the higher the hierarchical structure of the content entities is; the relevance among the content entities is obtained by taking the content entities as search words and extracting specific data from a background database through a search engine technology, wherein the relevance of the content entities comprises entity attributes of the content entities and entity relations among related content entities;
Based on a preset user portrait, carrying out content entity priority adjustment on the content entity knowledge graph, wherein the user portrait refers to interest values of users on different content entities, which are obtained through user behaviors; and
the content entity knowledge graph with the priority adjusted is sent to the retrieval terminal for display, so that a user can perform keyword content retrieval operation;
the retrieval terminal is used for receiving a page content retrieval list from a background server, and carrying out page content retrieval triggering prompt according to the content of the page content retrieval list so that a user can send out a page content retrieval triggering instruction according to the page content retrieval triggering prompt, wherein the page content retrieval list comprises pages of which page content supports page content retrieval; receiving the page content retrieval triggering instruction; acquiring a page address of the page content according to the page content retrieval triggering instruction;
the search terminal is also used for responding to the selection operation of the keywords on the content entity knowledge graph and carrying out the keyword content search operation or generating a new content entity knowledge graph according to the selected keywords.
4. A content retrieval method according to claim 3, wherein the step of extracting the page content from the page address comprises:
Normalizing the page address;
judging whether a local memory of the server stores page contents corresponding to the page addresses after normalization operation;
if the page content corresponding to the page address after the normalization operation is stored, extracting the page content from the server local memory; and
and if the page content corresponding to the page address after the normalization operation is not stored, extracting the page content from the page address.
5. A content retrieval terminal, comprising:
the retrieval triggering prompt module is used for receiving a page content retrieval list from a background server and carrying out page content retrieval triggering prompt according to the content of the page content retrieval list so that a user can send out a page content retrieval triggering instruction according to the page content retrieval triggering prompt, wherein the page content retrieval list comprises pages of which page content supports page content retrieval;
the trigger instruction receiving module is used for receiving the page content retrieval trigger instruction;
the page address acquisition module is used for acquiring the page address of the page content according to the page content retrieval triggering instruction;
The knowledge graph generation module is used for sending the page address to the background server; and
the map display module is used for receiving the content entity knowledge map subjected to priority adjustment from the background server and displaying the content entity knowledge map so as to facilitate the keyword content retrieval operation of a user;
the background server is used for extracting the page content according to the page address; extracting at least one of a title, a subtitle, an author and specific content in the page content by using a page crawler; performing word segmentation, named entity recognition and word frequency-reverse file frequency text processing operation on at least one of the title, the subtitle, the author and the specific content, and abstracting the page content into a plurality of content entities; creating a content entity knowledge graph corresponding to the page content according to the extracted content entities and the relevance among the content entities; reading user portraits of content retrieval terminal users, and adjusting content priority of the content entity knowledge graph based on the user portraits, wherein the user portraits refer to interest values of users on different content entities, which are obtained through user behaviors; the content entity knowledge graph represents the interconnection among different content entities through a plurality of hierarchical structures, the more important the content entities are, the higher the hierarchical structure of the content entities is, the relevance among the content entities is obtained through specific data extracted from a background database by a search engine technology by taking the content entities as search words, and the relevance among the content entities comprises entity attributes of the content entities and entity relations among related content entities;
A module for performing the steps of: and responding to the selected operation of the keywords on the content entity knowledge graph, performing keyword content retrieval operation, or generating a new content entity knowledge graph according to the selected keywords.
6. The content retrieval terminal according to claim 5, wherein the trigger instruction receiving module is further configured to generate the page content retrieval trigger instruction according to a touch operation performed by a user on a page content display interface.
7. A content retrieval server, comprising:
the page address receiving module is used for receiving the page address of the page content from the retrieval terminal;
the page content extraction module is used for extracting the page content according to the page address;
the content entity extraction module is used for extracting at least one of a title, a subtitle, an author and specific content in the page content by using a page crawler; performing word segmentation, named entity recognition and word frequency-reverse file frequency text processing operation on at least one of the title, the subtitle, the author and the specific content, and abstracting the page content into a plurality of content entities;
the knowledge graph creation module is used for creating a knowledge graph of the content entity corresponding to the page content according to the extracted content entity and the relevance between the content entities; the content entity knowledge graph represents the interconnection among different content entities through a plurality of hierarchical structures, and the more important the content entities are, the higher the hierarchical structure of the content entities is; the relevance among the content entities is obtained by taking the content entities as search words and extracting specific data from a background database through a search engine technology, wherein the relevance of the content entities comprises entity attributes of the content entities and entity relations among related content entities;
The knowledge graph priority adjustment module is used for adjusting the priority of the content entity for the knowledge graph of the content entity based on a preset user portrait, wherein the user portrait refers to the interest value of the user on different content entities, which is obtained through the user behavior; and
the knowledge graph sending module is used for sending the content entity knowledge graph with the priority adjusted to the retrieval terminal for display so that a user can perform keyword content retrieval operation;
the retrieval terminal is used for receiving a page content retrieval list from a background server, and carrying out page content retrieval triggering prompt according to the content of the page content retrieval list so that a user can send out a page content retrieval triggering instruction according to the page content retrieval triggering prompt, wherein the page content retrieval list comprises pages of which page content supports page content retrieval; receiving the page content retrieval triggering instruction; acquiring a page address of the page content according to the page content retrieval triggering instruction;
the search terminal is also used for responding to the selection operation of the keywords on the content entity knowledge graph and carrying out the keyword content search operation or generating a new content entity knowledge graph according to the selected keywords.
8. The content retrieval server of claim 7, wherein the page content extraction module comprises:
the page address normalization unit is used for performing normalization operation on the page address;
the page content storage judging unit is used for judging whether the page content corresponding to the page address after the normalization operation is stored in the local memory of the server or not;
a first page content extracting unit, configured to extract, if page content corresponding to the normalized page address is stored, the page content from the server local memory;
and the second page content extraction unit is used for extracting the page content from the page address if the page content corresponding to the page address after the normalization operation is not stored.
9. A storage medium having stored therein processor-executable instructions that are loaded by one or more processors to perform the content retrieval method of any of claims 1 to 4.
10. An electronic device comprising a processor and a memory, said memory having a computer program, characterized in that the processor is adapted to perform the content retrieval method according to any of claims 1 to 4 by invoking said computer program.
CN201710872842.XA 2017-09-25 2017-09-25 Content retrieval method, terminal, server, electronic device, and storage medium Active CN109948073B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710872842.XA CN109948073B (en) 2017-09-25 2017-09-25 Content retrieval method, terminal, server, electronic device, and storage medium
PCT/CN2018/107273 WO2019057191A1 (en) 2017-09-25 2018-09-25 Content retrieval method, terminal and server, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710872842.XA CN109948073B (en) 2017-09-25 2017-09-25 Content retrieval method, terminal, server, electronic device, and storage medium

Publications (2)

Publication Number Publication Date
CN109948073A CN109948073A (en) 2019-06-28
CN109948073B true CN109948073B (en) 2023-05-23

Family

ID=65809522

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710872842.XA Active CN109948073B (en) 2017-09-25 2017-09-25 Content retrieval method, terminal, server, electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN109948073B (en)
WO (1) WO2019057191A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110134796B (en) * 2019-04-19 2023-06-02 平安科技(深圳)有限公司 Knowledge graph-based clinical trial retrieval method, device, computer equipment and storage medium
CN111309872B (en) * 2020-03-26 2023-08-08 北京百度网讯科技有限公司 Search processing method, device and equipment
CN111522967B (en) * 2020-04-27 2023-09-15 北京百度网讯科技有限公司 Knowledge graph construction method, device, equipment and storage medium
CN111931928B (en) * 2020-07-16 2022-12-27 成都井之丽科技有限公司 Scene graph generation method, device and equipment
CN112182239A (en) * 2020-09-22 2021-01-05 中国建设银行股份有限公司 Information retrieval method and device
CN113722434A (en) * 2021-08-30 2021-11-30 平安科技(深圳)有限公司 Text data processing method and device, computer equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016176099A1 (en) * 2015-04-28 2016-11-03 Alibaba Group Holding Limited Information search navigation method and apparatus
CN106156244A (en) * 2015-04-28 2016-11-23 阿里巴巴集团控股有限公司 A kind of information search air navigation aid and device
CN106817271A (en) * 2015-11-30 2017-06-09 阿里巴巴集团控股有限公司 The forming method and device of flow collection of illustrative plates

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577595B (en) * 2013-11-15 2017-09-22 北京奇虎科技有限公司 Keyword method for pushing and device based on current browse webpage
CN104102713B (en) * 2014-07-16 2018-01-19 百度在线网络技术(北京)有限公司 Recommendation results show method and apparatus
CN104598613B (en) * 2015-01-30 2017-11-03 百度在线网络技术(北京)有限公司 A kind of conceptual relation construction method and apparatus for vertical field
CN105302881A (en) * 2015-10-14 2016-02-03 上海大学 Literature search system-oriented search prompt word generation method
CN106294596A (en) * 2016-07-29 2017-01-04 北京小米移动软件有限公司 The method and device of information search
CN107169010A (en) * 2017-03-31 2017-09-15 北京奇艺世纪科技有限公司 A kind of determination method and device of recommendation search keyword

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016176099A1 (en) * 2015-04-28 2016-11-03 Alibaba Group Holding Limited Information search navigation method and apparatus
CN106156244A (en) * 2015-04-28 2016-11-23 阿里巴巴集团控股有限公司 A kind of information search air navigation aid and device
CN106817271A (en) * 2015-11-30 2017-06-09 阿里巴巴集团控股有限公司 The forming method and device of flow collection of illustrative plates

Also Published As

Publication number Publication date
CN109948073A (en) 2019-06-28
WO2019057191A1 (en) 2019-03-28

Similar Documents

Publication Publication Date Title
CN109948073B (en) Content retrieval method, terminal, server, electronic device, and storage medium
US10353947B2 (en) Relevancy evaluation for image search results
US11687600B2 (en) Ranking search results based upon content creation trends
US11238127B2 (en) Electronic device and method for using captured image in electronic device
US9977835B2 (en) Queryless search based on context
US8762868B2 (en) Integrating user interfaces from one application into another
US8661041B2 (en) Apparatus and method for semantic-based search and semantic metadata providing server and method of operating the same
US20090299990A1 (en) Method, apparatus and computer program product for providing correlations between information from heterogenous sources
TWI705337B (en) Information search and navigation method and device
US20120036153A1 (en) Mobile system, search system and search result providing method for mobile search
KR20170091142A (en) Web content tagging and filtering
US20100114854A1 (en) Map-based websites searching method and apparatus therefor
JP6404351B2 (en) Method, apparatus, and system for communicating and presenting merchandise information
KR20090111827A (en) Method and apparatus for voice searching in a mobile communication device
US11048736B2 (en) Filtering search results using smart tags
KR20160075126A (en) Method of providing content and electronic apparatus thereof
JP2021530070A (en) Methods for sharing personal information, devices, terminal equipment and storage media
JP2010518514A (en) System and method for displaying and navigating content on an electronic device
CN107515870B (en) Searching method and device and searching device
JP4894253B2 (en) Metadata generating apparatus and metadata generating method
CN110309324B (en) Searching method and related device
CN105653674B (en) File management method and system of intelligent terminal
KR102519159B1 (en) Electronic apparatus and control method thereof
US20160150038A1 (en) Efficiently Discovering and Surfacing Content Attributes
CN107885862B (en) Image display method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant