CN111309872A - Search processing method, device and equipment - Google Patents

Search processing method, device and equipment Download PDF

Info

Publication number
CN111309872A
CN111309872A CN202010223795.8A CN202010223795A CN111309872A CN 111309872 A CN111309872 A CN 111309872A CN 202010223795 A CN202010223795 A CN 202010223795A CN 111309872 A CN111309872 A CN 111309872A
Authority
CN
China
Prior art keywords
entity
searched
text
candidate
knowledge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010223795.8A
Other languages
Chinese (zh)
Other versions
CN111309872B (en
Inventor
林泽南
卢佳俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010223795.8A priority Critical patent/CN111309872B/en
Publication of CN111309872A publication Critical patent/CN111309872A/en
Application granted granted Critical
Publication of CN111309872B publication Critical patent/CN111309872B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Abstract

The application discloses a search processing method, a search processing device and search processing equipment, and relates to the technical field of artificial intelligence, in particular to the technical field of knowledge maps. The technical scheme disclosed by the application comprises the following steps: acquiring a text to be searched, and determining at least one candidate entity according to the text to be searched; for each candidate entity, acquiring entity information corresponding to the candidate entity from the knowledge graph, and labeling the keywords in the text to be searched according to the entity information corresponding to the candidate entity to obtain a labeling result corresponding to the candidate entity; and determining a target entity corresponding to the text to be searched according to the labeling result corresponding to each of the at least one candidate entity. In the process, the entity information in the knowledge graph is utilized to label the keywords in the text to be searched, so that the semantics and the intention of the text to be searched are thoroughly understood, and the accuracy of the search result is improved.

Description

Search processing method, device and equipment
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a search processing method, apparatus, and device.
Background
When a user uses a product having a search function, such as a search engine, the text to be searched, which is input by the user, may include an entity name and entity information. The entity name is used for indicating the entity to be searched, and the entity information is information used for further describing or limiting the entity to be searched. Only if the entity to be searched is accurately identified according to the text to be searched, the accurate search result can be displayed to the user.
In the prior art, when a text to be searched is identified, a preset template is used for matching and identifying the text to be searched so as to determine an entity to be searched. The entity to be searched determined by the method is low in accuracy, so that the search result is not accurate enough.
Disclosure of Invention
The application provides a search processing method, a search processing device and search processing equipment, which are used for improving the accuracy of a search result.
In a first aspect, an embodiment of the present application provides a search processing method, including:
acquiring a text to be searched; determining at least one candidate entity according to the text to be searched;
for each candidate entity, acquiring entity information corresponding to the candidate entity from the knowledge graph, and labeling the keywords in the text to be searched according to the entity information corresponding to the candidate entity to obtain a labeling result corresponding to the candidate entity; and determining a target entity corresponding to the text to be searched according to the labeling result corresponding to the at least one candidate entity.
In the scheme, the entity information in the knowledge graph is used for marking the keywords in the text to be searched, and the knowledge attributes corresponding to the keywords can be marked for each keyword in the text to be searched, so that each keyword in the text to be searched can be accurately understood, and the semantics and the intention of the text to be searched can be thoroughly understood. Furthermore, according to the understanding result of the text to be searched, the entity to be searched can be accurately identified, and therefore the accuracy of the searching result is improved. In addition, the search processing method of the embodiment can be used for texts to be searched with any structures and any lengths, and has high universality.
In a possible implementation manner, the entity information corresponding to a candidate entity is used to indicate an attribute value corresponding to at least one knowledge attribute of the candidate entity; and the labeling result is used for indicating the knowledge attribute corresponding to each keyword in the text to be searched.
In a possible implementation manner, the labeling the keyword in the text to be searched according to the entity information corresponding to the candidate entity to obtain a labeling result corresponding to the candidate entity includes: segmenting and matching the keywords in the text to be searched according to the attribute value corresponding to at least one knowledge attribute of the candidate entity, and determining the knowledge attribute corresponding to each keyword in the text to be searched; and generating a labeling result corresponding to the candidate entity according to the knowledge attribute corresponding to each keyword in the text to be searched.
In a possible implementation, the at least one knowledge attribute each corresponds to a priority; the method for segmenting and matching the keywords in the text to be searched according to the attribute value corresponding to at least one knowledge attribute of the candidate entity to determine the knowledge attribute corresponding to each keyword in the text to be searched comprises the following steps: and segmenting and matching the keywords in the text to be searched according to the attribute value corresponding to at least one knowledge attribute of the candidate entity and the priority, and determining the knowledge attribute corresponding to each keyword in the text to be searched.
In the implementation mode, by means of the priority among the knowledge attributes, the problem of dislocation of the text to be searched in the segmentation process can be solved, and the accuracy of keyword labeling is improved.
In a possible implementation manner, after generating the labeling result corresponding to the candidate entity, the method further includes: and if the labeling result indicates that the knowledge attribute corresponding to the first keyword in the text to be searched is unknown, matching the first keyword according to a preset regular matching rule, and generating the knowledge attribute corresponding to the first keyword according to the matching result.
In the implementation mode, when the knowledge attribute corresponding to the first keyword is marked as unknown, the second matching is performed on the first keyword by using the preset regular matching rule, so that the marking comprehensiveness and accuracy of each keyword in the text to be searched can be ensured.
In a possible implementation manner, the determining, according to the labeling result corresponding to each of the at least one candidate entity, a target entity corresponding to the text to be searched includes: according to the labeling result corresponding to each candidate entity, determining the matching degree of the entity information corresponding to the candidate entity and the text to be searched; and determining the candidate entity corresponding to the highest matching degree as the target entity.
In a possible implementation manner, each of the at least one knowledge attribute corresponds to a weight coefficient; the determining the matching degree between the entity information corresponding to each candidate entity and the text to be searched according to the labeling result corresponding to each candidate entity comprises: and determining the matching degree of the entity information corresponding to each candidate entity and the text to be searched according to the labeling result corresponding to each candidate entity and the weight coefficient corresponding to at least one knowledge attribute of each candidate entity.
In the implementation mode, through the evaluation process of the marking result, the understanding accuracy of the text to be searched can be improved, and therefore the accuracy of the identified entity to be searched is ensured.
In a possible implementation manner, the determining at least one candidate entity according to the text to be searched includes: according to the entity name dictionary, matching the text to be searched, and determining at least one candidate entity name from the text to be searched; the entity name dictionary comprises at least one entity name and at least one entity corresponding to each entity name; and determining the entity corresponding to each candidate entity name as the at least one candidate entity.
In the implementation mode, the entity name dictionary is used for matching the text to be searched, so that at least one candidate entity is determined, and the comprehensiveness and accuracy of the determined candidate entity are guaranteed.
In a possible implementation manner, before determining at least one candidate entity according to the text to be searched, the method further includes: determining that the text to be searched comprises a first class keyword and a second class keyword according to the entity name dictionary; wherein the first category of keywords are keywords that match any entity name in the entity name dictionary, and the second category of keywords are keywords that do not match all entity names in the entity name dictionary.
In a possible implementation manner, the knowledge graph is used for indicating an entity name and entity information corresponding to at least one entity; the method further comprises the following steps: and generating the entity name dictionary according to the knowledge graph.
In one possible implementation, the generating the entity name dictionary according to the knowledge graph includes: adding entity names corresponding to the entities in the knowledge graph into the entity name dictionary; mining entity names corresponding to all entities in the knowledge graph, and adding the mined entity names into the entity name dictionary; wherein the digging comprises at least one of: alias mining, transformation mining, mining for short, and error correction mining.
In the implementation mode, through the mining process, the number of entity names in the entity name dictionary is greatly enriched. Furthermore, when the entity name dictionary is used for matching the text to be searched and determining the candidate entity, the comprehensiveness and accuracy of the determined candidate entity are ensured.
In a second aspect, an embodiment of the present application provides a search processing apparatus, including:
the acquisition module is used for acquiring a text to be searched; the selection module is used for determining at least one candidate entity according to the text to be searched; the identification module is used for acquiring entity information corresponding to each candidate entity from the knowledge graph and marking the keywords in the text to be searched according to the entity information corresponding to the candidate entity to obtain a marking result corresponding to the candidate entity; and the determining module is used for determining the target entity corresponding to the text to be searched according to the labeling result corresponding to the at least one candidate entity.
In a possible implementation manner, the entity information corresponding to a candidate entity is used to indicate an attribute value corresponding to at least one knowledge attribute of the candidate entity; and the labeling result is used for indicating the knowledge attribute corresponding to each keyword in the text to be searched.
In a possible implementation manner, the labeling module is specifically configured to: segmenting and matching the keywords in the text to be searched according to the attribute value corresponding to at least one knowledge attribute of the candidate entity, and determining the knowledge attribute corresponding to each keyword in the text to be searched; and generating a labeling result corresponding to the candidate entity according to the knowledge attribute corresponding to each keyword in the text to be searched.
In a possible implementation, the at least one knowledge attribute each corresponds to a priority; the labeling module is specifically configured to: and segmenting and matching the keywords in the text to be searched according to the attribute value corresponding to at least one knowledge attribute of the candidate entity and the priority, and determining the knowledge attribute corresponding to each keyword in the text to be searched.
In a possible implementation manner, the labeling module is further configured to: and if the labeling result indicates that the knowledge attribute corresponding to the first keyword in the text to be searched is unknown, matching the first keyword according to a preset regular matching rule, and generating the knowledge attribute corresponding to the first keyword according to the matching result.
In a possible implementation manner, the determining module is specifically configured to: according to the labeling result corresponding to each candidate entity, determining the matching degree of the entity information corresponding to the candidate entity and the text to be searched; and determining the candidate entity corresponding to the highest matching degree as the target entity.
In a possible implementation manner, each of the at least one knowledge attribute corresponds to a weight coefficient; the determining module is specifically configured to: and determining the matching degree of the entity information corresponding to each candidate entity and the text to be searched according to the labeling result corresponding to each candidate entity and the weight coefficient corresponding to at least one knowledge attribute of each candidate entity.
In a possible implementation manner, the selection module is specifically configured to: according to the entity name dictionary, matching the text to be searched, and determining at least one candidate entity name from the text to be searched; the entity name dictionary comprises at least one entity name and at least one entity corresponding to each entity name; and determining the entity corresponding to each candidate entity name as the at least one candidate entity.
In a possible implementation manner, the selection module is further configured to: determining that the text to be searched comprises a first class keyword and a second class keyword according to the entity name dictionary; wherein the first category of keywords are keywords that match any entity name in the entity name dictionary, and the second category of keywords are keywords that do not match all entity names in the entity name dictionary.
In a possible implementation manner, the knowledge graph is used for indicating an entity name and entity information corresponding to at least one entity; the device further comprises: and the generating module is used for generating the entity name dictionary according to the knowledge graph.
In a possible implementation manner, the generating module is specifically configured to: adding entity names corresponding to the entities in the knowledge graph into the entity name dictionary; mining entity names corresponding to all entities in the knowledge graph, and adding the mined entity names into the entity name dictionary; wherein the digging comprises at least one of: alias mining, transformation mining, mining for short, and error correction mining.
In a third aspect, an embodiment of the present application provides an electronic device, including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the first aspects.
In a fourth aspect, embodiments of the present application provide a non-transitory computer-readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any of the first aspects.
According to the search processing method, the search processing device and the search processing equipment, the entity information in the knowledge map is used for marking the key words in the text to be searched, the knowledge attributes corresponding to the key words can be marked for each key word in the text to be searched, so that each key word in the text to be searched can be accurately understood, and the semantics and the intention of the text to be searched can be thoroughly understood. Furthermore, according to the understanding result of the text to be searched, the entity to be searched can be accurately identified, and therefore the accuracy of the searching result is improved. In addition, the search processing method of the embodiment can be used for texts to be searched with any structures and any lengths, and has high universality.
Other effects of the above-described alternative will be described below with reference to specific embodiments.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
fig. 1 is a schematic diagram of a network architecture suitable for use in the embodiments of the present application;
FIG. 2 is a diagram illustrating a possible search interaction process according to an embodiment of the present application;
fig. 3 is a schematic flowchart of a search processing method according to an embodiment of the present application;
FIGS. 4A-4C are schematic diagrams of a knowledge-graph provided by an embodiment of the present application;
fig. 5 is a schematic flowchart of a search processing method according to another embodiment of the present application;
fig. 6 is a schematic diagram of a search result interface of a terminal device according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a search processing apparatus according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a search processing apparatus according to another embodiment of the present application;
fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic diagram of a network architecture suitable for use in the embodiment of the present application, and as shown in fig. 1, the network architecture includes at least one terminal device 11 and at least one server 12. The terminal device 11 provides the user with a search entry. The search entry may be a search engine installed in the terminal device 11, or may be another application having a search function. Terminal equipment 11 is also referred to as a Terminal (Terminal), User Equipment (UE), access Terminal, subscriber unit, mobile device, user Terminal, wireless communication device, user agent, or user equipment. The terminal device may be a Personal Digital Assistant (PDA) device, a smart television, a handheld device (e.g., a smart phone or a tablet computer) having a wireless communication function, a computing device (e.g., a Personal Computer (PC), an in-vehicle device, a wearable device, and the like.
The server 12 has storage, analysis and retrieval functions. The server may be a centralized server or a distributed server. The server may also be a cloud server. The server 12 may be provided with a database in which a large amount of entity-related information is stored in advance.
Fig. 2 is a schematic diagram of a possible search interaction process according to an embodiment of the present application. As shown in fig. 2, the user can input text to be searched in a search entry provided in the terminal device 11. The text to be searched may include one or more keywords. After detecting that the user operates the "search" button, the terminal device 11 sends the text to be searched to the server 12 for search processing. The server 12 identifies and analyzes the text to be searched to determine the entity to be searched, further retrieves the relevant information of the entity to be searched from the database, and returns the retrieval result to the terminal device 11. The terminal device 11 displays the search result in the interface.
In the searching process, only if the entity to be searched of the user is accurately identified according to the text to be searched, the accurate searching result can be displayed to the user. In the related technology, when a text to be searched is identified, a preset template is used for matching and identifying the text to be searched so as to determine an entity to be searched. The entity to be searched determined by the method is low in accuracy, so that the search result is not accurate enough. In addition, the above method usually requires that the text to be searched meets the preset form requirement, has limitations, and is not easy to expand.
The embodiment of the application provides a search processing method, which can accurately identify an entity to be searched by accurately understanding each keyword in a text to be searched by using a knowledge graph, so that the accuracy of a search result is improved.
In some scenarios, the method of the embodiments of the present application may be applied to a server as shown in fig. 1. In other scenarios, when the terminal device has relatively strong computing power and storage power, the method of the embodiment of the present application may also be executed by the terminal device as in fig. 1.
The technical solution of the present application is described in detail below with reference to several specific embodiments. Several of the following embodiments may be combined with each other and the description of the same or similar content may not be repeated in some embodiments.
Fig. 3 is a schematic flowchart of a search processing method according to an embodiment of the present application. As shown in fig. 3, the method of the present embodiment includes:
s301: and acquiring a text to be searched.
The text to be searched means text input by the user for the purpose of search. The text to be searched may be input by the user in a text form, may also be input in a voice form, and of course, may also be input in other forms, which is not limited in this embodiment. When the method of the embodiment is executed by the server, the terminal device receives the text to be searched input by the user, and then sends the text to be searched to the server.
The text to be searched may include one or more keywords. Optionally, at least one keyword in the text to be searched is used to indicate the name of the entity to be searched, and the rest keywords are used to further define or describe other information of the entity to be searched.
For example, the text to be searched may be "color-aggregated drama minired lead actor 2019 episode 20". Wherein, it is assumed that "color set" is the name of the tv play and "minired" is the name of the lead actor. In the text to be searched, the keyword "color set" indicates the name of the entity to be searched, and the other keywords indicate information that further describes or defines the entity to be searched.
S302: and determining at least one candidate entity according to the text to be searched.
In the embodiments of the present application, entities refer to things that exist objectively and can be distinguished from each other. An entity may be a specific person, thing, or an abstract concept or connection. For example, the entity may be a television show, a piece of music, a person, a place, etc.
In this embodiment, at least one candidate entity may be determined by performing preliminary identification on the text to be searched. Wherein, the candidate entity refers to an entity that the user may want to search.
For example, in the above example, when the text to be searched is "color set tv drama minired 2001 set 18", the determined candidate entities may include: the "color set" this television play, the "little red" this actor. Of course, in some examples, when there are multiple tv shows whose names are all "color sets," these tv shows may be all candidate entities. In other examples, when the name of a still movie is "color set", the movie may also be used as a candidate entity.
It should be noted that, the specific implementation of S302 may be various. For example, the text to be searched can be identified by a named entity tool to obtain at least one candidate entity; the present invention may also be implemented in other manners, and specific reference may be made to the detailed description of the subsequent embodiments, which are not repeated herein.
S303: and aiming at each candidate entity, acquiring entity information corresponding to the candidate entity from the knowledge graph, and labeling the keywords in the text to be searched according to the entity information corresponding to the candidate entity to obtain a labeling result corresponding to the candidate entity.
Knowledge-graph is essentially a semantic network. Nodes and edges for connecting the nodes may be included in the knowledge-graph. Wherein, the nodes represent entities (entries) or concepts (concepts), and the edges represent various semantic relationships between the entities/concepts.
In this embodiment, the knowledge graph is used to indicate an entity name and entity information corresponding to at least one entity. The entity information is related information for describing an entity. Optionally, the entity information of an entity is used to indicate an attribute value corresponding to at least one knowledge attribute of the entity. Fig. 4A to 4C are schematic diagrams of a knowledge graph provided in an embodiment of the present application. Taking the knowledge graph in the field of film and television as an example, fig. 4A to 4C illustrate the related information of one entity in the knowledge graph, respectively. As shown in fig. 4A to 4C, each entity corresponds to a tv series or movie. The knowledge attributes of each entity may include: director, drama, actor, character, year, version, region, season information, field, genre, play site (site a, site B, etc.), album number, viewing intent (free, undeleted, high definition, 1080p, full version, etc.).
For convenience of subsequent examples, the present embodiment assumes that there are 2 dramas whose names are "color sets" and also 1 movie whose names are "color sets". Fig. 4A to 4C illustrate entity information corresponding to the above-described 3 entities (2 dramas and 1 movie). As shown in fig. 4A to 4C, the entity names corresponding to the entity IDs 1-3 are all "color sets". The entity information of the entity ID includes: the lead actor is red, the director is Zhang one, the genre is a city play, the year is 2001, the episode number is 20 episodes, and the playing site is site A. The entity information of the entity ID2 includes: the lead actor is small green, the director is Zhang II, the type is ancient drama, the year is 2010, the number of episodes is 30 episodes, and the playing site is site B. The entity information of the entity ID3 includes: the lead actor is little blue, the director is Zhang III, the type is science fiction, the year is 2020, and the playing site is site C.
After determining the candidate entities in this embodiment, for each candidate entity, entity information corresponding to the candidate entity (i.e., attribute values of each knowledge attribute of the candidate entity) may be acquired from the knowledge graph shown in fig. 4A to 4C. Further, the entity information corresponding to the candidate entity can be utilized to label the keywords in the text to be searched, so as to obtain a labeling result corresponding to the candidate entity. And the labeling result indicates the knowledge attribute corresponding to each keyword in the text to be searched.
In a possible implementation manner, the keywords in the text to be searched may be segmented and matched according to the attribute value corresponding to at least one knowledge attribute of the candidate entity, so as to determine the knowledge attribute corresponding to each keyword in the text to be searched. And generating a labeling result corresponding to the candidate entity according to the knowledge attribute corresponding to each keyword in the text to be searched.
Illustratively, taking the entity ID1 in the knowledge graph as an example, when a keyword in the text to be searched, "color set tv drama minired 2001 set 18" is labeled with entity information corresponding to the entity ID1, an attribute value of each knowledge attribute of the entity ID1 is used for matching with the text to be searched. For example, if the attribute value "drama" of the domain of the entity ID1 is successfully matched with the keyword "drama" in the text to be searched, the knowledge attribute corresponding to the keyword "drama" in the text to be searched is set as "domain". And if the attribute value 'minired' of the actor of the entity ID1 is successfully matched with the keyword 'minired' in the text to be searched, setting the knowledge attribute corresponding to the keyword 'minired' in the text to be searched as 'actor'. If the attribute value "2001" of the year of the entity ID1 is successfully matched with the keyword "2001" in the text to be searched, the knowledge attribute corresponding to the keyword "2001" in the text to be searched is set as "year". And if the attribute values of the knowledge attributes of the entity ID1 are not matched with the keyword '18 th set' in the text to be searched, setting the knowledge attribute corresponding to the keyword '18 th set' in the text to be searched as 'unknown'. Therefore, the labeling results shown in table 1 can be obtained finally.
TABLE 1
Keyword Knowledge attributes
Color collection Entity name
TV play FIELD
Small red Actor(s)
2001 Year of year
18 th set Is unknown
In the matching process, various existing text segmentation matching technologies may be adopted, which is not specifically limited in this embodiment. For example, a multi-mode matching tree algorithm can be used for matching, and the multi-mode matching tree algorithm can be used for completing the labeling of the knowledge attributes of all keywords in the text to be searched in one-time calculation, so that the labeling efficiency is improved.
In a possible implementation manner, after obtaining the labeling result by using the matching process, the method may further include: and if the labeling result indicates that the knowledge attribute corresponding to the first keyword in the text to be searched is unknown, matching the first keyword according to a preset regular matching rule, and generating the knowledge attribute corresponding to the first keyword according to the matching result. The first keyword may be any keyword in the text to be searched.
For example, in the labeling result shown in table 1, the knowledge attribute corresponding to the keyword "18 th set" is "unknown", which indicates that the keyword is not successfully matched. Therefore, the keyword "18 th set" may be further matched with the attribute value "20 sets" of the set number of the entity ID1 by using a preset regular matching rule. For example, the number 18 in the keyword "18 th set" is extracted, the number is compared with the number 20 in the set of attribute values "20", 18 is less than 20, and the keyword is considered to indicate a certain set, and therefore, the knowledge attribute of the keyword "18 th set" may be set to "set number".
When the knowledge attribute corresponding to the first keyword is marked as unknown, the second matching is performed on the first keyword by using a preset regular matching rule, so that the marking comprehensiveness and accuracy of each keyword in the text to be searched can be ensured.
Further, some of the plurality of knowledge attributes of each entity in the knowledge-graph are used to describe the viewing intent, such as: playing sites, episode numbers, etc. When "site a 18 th set" is included in the text to be searched input by the user, the viewing intention of the user may be considered to be "viewing 18 th set through site a". Therefore, in the labeling result shown in table 1, when the knowledge attribute corresponding to a certain keyword is the number of broadcast sites or episodes, the knowledge attribute of the keyword may be further labeled as "viewing intention".
The following examples are given. Assume that the candidate entities determined in S302 are entity ID1, entity ID2, and entity ID3 in the knowledge graph shown in fig. 4A to 4C. For the entity ID1, the entity information corresponding to the entity ID1 is used to label the text to be searched, "color set tv drama minired 2001 set 18", and the obtained labeling result is shown in table 2. For the entity ID2, the entity information corresponding to the entity ID2 is used to label the text to be searched, "color set tv drama minired 2001 set 18", and the obtained labeling result is shown in table 3. For the entity ID3, the entity information corresponding to the entity ID3 is used to label the text to be searched, "color set tv drama minired 2001 set 18", and the obtained labeling result is shown in table 4.
TABLE 2
Keyword Knowledge attributes
Color collection Entity name
TV play FIELD
Small red Actor(s)
2001 Year of year
18 th set Number of sets (view intention)
TABLE 3
Keyword Knowledge attributes
Color collection Entity name
TV play FIELD
Xiaohong
2001 Is unknown
18 th set Number of sets (view intention)
TABLE 4
Keyword Knowledge attributes
Color collection Entity name
Television drama xiaohong 2001 18 th episode Is unknown
The structure and the length of the text to be searched are not limited in the labeling process, and even if the text to be searched is abnormally complex, the keyword labeling can be accurately performed on the text to be searched, so that the semantics of the text to be searched can be accurately understood.
Optionally, in this embodiment S303, since the processes of labeling the keywords of the text to be searched by using the entity information of each candidate entity are independent, the multi-thread parallel computation may be performed on multiple candidate entities. Therefore, when the number of the candidate entities is very large (such as greater than or equal to 50), the multi-thread parallel computing can greatly improve the labeling efficiency, improve the online computing performance and reduce the search delay.
S304: and determining a target entity corresponding to the text to be searched according to the labeling result corresponding to each of the at least one candidate entity.
Illustratively, according to the labeling result corresponding to each candidate entity, the matching degree between the entity information corresponding to the candidate entity and the text to be searched is determined, and the candidate entity corresponding to the highest matching degree is determined as the target entity. For example, according to the labeling results shown in tables 2 to 4, only the entity information of the entity ID completely segments the "text to be searched", and each keyword is labeled with a knowledge attribute, so that the meaning and intention of the text to be searched can be fully understood by using the entity ID 1. The labeling results of other entity IDs are unknown. As can be seen, the matching degree of the entity information corresponding to the entity ID1 and the text to be searched is the highest, so the entity ID1 is taken as the target entity corresponding to the text to be searched. That is, the entity to be searched by the user is the tv play corresponding to the entity ID 1.
The search processing method provided by the embodiment comprises the following steps: acquiring a text to be searched, and determining at least one candidate entity according to the text to be searched; for each candidate entity, acquiring entity information corresponding to the candidate entity from the knowledge graph, and labeling the keywords in the text to be searched according to the entity information corresponding to the candidate entity to obtain a labeling result corresponding to the candidate entity; and determining a target entity corresponding to the text to be searched according to the labeling result corresponding to each of the at least one candidate entity. In the process, the entity information in the knowledge graph is used for marking the key words in the text to be searched, and the knowledge attributes corresponding to the key words can be marked for each key word in the text to be searched, so that each key word in the text to be searched can be accurately understood, and the semantics and the intention of the text to be searched are thoroughly understood. Furthermore, according to the understanding result of the text to be searched, the entity to be searched can be accurately identified, and therefore the accuracy of the searching result is improved. In addition, the search processing method of the embodiment can be used for texts to be searched with any structures and any lengths, and has high universality.
Fig. 5 is a schematic flowchart of a search processing method according to another embodiment of the present application. On the basis of the embodiment shown in fig. 3, the embodiment further refines the technical solution of the present application. As shown in fig. 5, the method of this embodiment may include:
s501: and acquiring a text to be searched.
The specific implementation of S501 in this embodiment is similar to S301, and is not described herein again.
S502: and determining that the text to be searched comprises the first class of keywords and the second class of keywords according to the entity name dictionary.
The entity name dictionary comprises at least one entity name and at least one entity corresponding to each entity name. Illustratively, Table 5 is an example of one possible entity name dictionary.
TABLE 5
Entity name Entity ID
Color collection Entity ID1, entity ID2, entity ID3
Small red Entity ID4
Small green Entity ID5
Alternatively, the entity name dictionary may be generated from a knowledge graph. For a possible generation process of the entity name dictionary, reference may be made to the detailed description of the subsequent embodiments, which are not repeated herein.
In this embodiment, the first category of keywords are keywords that match any entity name in the entity name dictionary (or, the first category of keywords are keywords indicating names of entities), and the second category of keywords are keywords that do not match all entity names in the entity name dictionary (or, the second category of keywords are keywords describing or defining related information of entities).
For example, the text to be searched may be matched with the entity name dictionary, and if the text to be searched is directly equal to a certain entity name in the entity name dictionary, it is indicated that only the first type of keywords are included in the text to be searched (i.e., only the keywords indicating the entity name are included in the text to be searched). If the text to be searched is not directly equal to any entity name in the entity name dictionary, the text to be searched includes the first class keywords and the second class keywords.
It should be understood that when it is determined that only the first category keyword is included in the text to be searched, the entity ID corresponding to the first category keyword may be directly used as the entity to be searched, which is similar to the prior art. When it is determined that the text to be searched includes the first category of keywords and the second category of keywords, the subsequent S503 to S506 may be continuously performed, and the keywords in the text to be searched are labeled by using the knowledge graph, so that the entity to be searched is determined after the semantics of the text to be searched are accurately understood.
S503: and matching the text to be searched according to the entity name dictionary, determining at least one candidate entity name from the text to be searched, and determining the entity corresponding to each candidate entity name in the entity name dictionary as at least one candidate entity.
S503 in this embodiment shows a possible implementation of S302 in the above embodiment. And determining at least one candidate entity by utilizing the entity name dictionary to perform matching processing on the text to be searched. The matching process may use an existing matching algorithm.
In one example, a multi-mode matching tree algorithm may be adopted, the entity name dictionary is read into the multi-mode matching tree, the multi-mode matching calculation is performed on the text to be searched, and all candidate entity names included in the text to be searched are calculated at one time. Further, the entity name dictionary shown in table 5 is searched to obtain entity IDs corresponding to the candidate entity names, and these entity IDs are used as candidate entities. For example, assuming that the text to be searched is "color set tv drama minired 2001 set 18", the candidate entity names obtained by the above matching process include: "set of colors", "small red". Further, by referring to table 5, entity ID 1-entity ID4 are made candidate entities.
The entity name dictionary is utilized to carry out matching processing on the text to be searched, so that at least one candidate entity is determined, and the comprehensiveness and accuracy of the determined candidate entity are guaranteed.
S504: and aiming at each candidate entity, acquiring entity information corresponding to the candidate entity from the knowledge graph, and labeling the keywords in the text to be searched according to the entity information corresponding to the candidate entity to obtain a labeling result corresponding to the candidate entity.
In this embodiment, the specific implementation of S504 is similar to S303 in the embodiment shown in fig. 3, and is not described in detail here.
On the basis of the embodiment shown in fig. 3, another possible implementation of S504 is given below. In this embodiment, priority information may be set for a plurality of knowledge attributes involved in the knowledge-graph. Each priority may correspond to one or more knowledge attributes. When one priority level corresponds to a plurality of knowledge attributes, the priority levels of the knowledge attributes are the same.
For example, one possible priority information is shown in table 6. In Table 6, the priority is sequentially lowered in the order of 1 to 7.
TABLE 6
Priority order number Knowledge attributes
Priority 1 Entity name
Priority 2 FIELD
Priority 3 Season part
Priority 4 Version, year, collection number
Priority 5 Director, drama, actor, character
Priority 6 Country, type
Priority 7 Play site, view intention
Further, in S504, when the entity information of the candidate entity is used to label the keywords of the text to be searched, the priorities of the knowledge attributes shown in table 6 may be used as the basis for segmenting and matching the keywords, that is, the keywords in the text to be searched are segmented and matched according to the attribute value and the priority corresponding to at least one knowledge attribute of the candidate entity, so as to determine the knowledge attributes corresponding to the keywords in the text to be searched.
This is illustrated below with reference to one example. Suppose the text to be searched is "where stars are free of charge", wherein "where stars" is the title of a television show, "charge free" is an actor, and "charge free" is a character. When the priority of the knowledge attribute is "actor ═ role > viewing intention", then when segmenting and matching the text to be searched, the priority is identified as: the knowledge attribute of the keyword "zhuang" is "actor", and the knowledge attribute of the keyword "fee medium" is "role", without recognizing the keyword "free" as the viewing intention. When the priority of the knowledge attribute is 'viewing intention > actor', then when segmenting and matching the text to be searched, the priority is identified as: the knowledge attribute of the keyword "free" is "viewing intent", rather than identifying the keyword "exempt" as an actor.
Therefore, by means of the priority among the knowledge attributes, the problem of dislocation of the text to be searched in the segmentation process can be solved, and the accuracy of keyword labeling is improved.
S505: and determining a target entity corresponding to the text to be searched according to the labeling result corresponding to each of the at least one candidate entity.
The embodiment of S505 in this embodiment is similar to S304 in the embodiment shown in fig. 3, and will not be described in detail here.
On the basis of the embodiment shown in fig. 3, another possible implementation of S505 is given below. In this embodiment, a weighting factor may be set for a plurality of knowledge attributes involved in the knowledge-graph. Wherein, the weighting coefficients corresponding to different knowledge attributes may be different. Or, the weight coefficients corresponding to some knowledge attributes are the same, and the weight coefficients corresponding to some knowledge attributes are different.
In an example, the matching degree between the entity information corresponding to each candidate entity and the text to be searched can be determined according to the labeling result corresponding to each candidate entity and the weighting coefficient corresponding to each of the at least one knowledge attribute of the candidate entity. And then, determining the candidate entity corresponding to the highest matching degree as the target entity.
Optionally, the weight coefficient corresponding to the knowledge attribute is positively correlated with the priority corresponding to the knowledge attribute, that is, the higher the priority corresponding to a certain knowledge attribute is, the higher the weight coefficient corresponding to the knowledge attribute is; the lower the priority corresponding to a certain knowledge attribute, the lower the weight coefficient corresponding to that knowledge attribute. Thus, for the labeling result of a certain candidate entity, weighted summation can be performed according to the knowledge attribute corresponding to each keyword identified in the labeling result and the weight coefficient thereof, so as to obtain the matching degree between the entity information corresponding to the candidate entity and the text to be searched.
Through the evaluation process of the marking result, the accuracy of understanding the text to be searched can be improved, and therefore the accuracy of the identified entity to be searched is ensured.
S506: and determining the entity card to be displayed and the display information according to the marking result corresponding to the target entity.
The embodiment can also determine the entity card to be displayed and the display information according to the labeling result corresponding to the target entity, so that the terminal device can display the entity card.
Optionally, the viewing intention indicated by the annotation result may also be considered when determining the presentation information. For example: a particular episode range, designated free, designated site, designated unpopulated version, designated 1080p, and so on. Illustratively, for the text "color-set television drama minired 2001 set 18 online viewing site a" to be searched, the following keywords are labeled as the viewing intentions in the obtained labeling result:
and (3) online watching: watching (view intention)
And site A: player station (view intention)
Set 18: number of sets (view intention)
Therefore, when the site a is judged to include the playing resource of the physical card, the physical card is determined to be displayed. And, order "site a" to the forefront, and reserve only the 18 th episode at the play episode number is indicated in the presentation information.
Fig. 6 is a schematic diagram of a search result interface of a terminal device according to an embodiment of the present application. Assuming that the text to be searched input by the user is "color-aggregated television drama minired 2001, 18 th-set online viewing site a", the terminal presents a search result interface as shown in fig. 6. In FIG. 6, the entity card corresponding to entity ID1 in the knowledge-graph is shown, and "site A" is sorted to the top, and only set 18 is retained at the play set number. Therefore, the search requirements of the user can be directly met, and the user experience is improved.
The following describes a generation process of the entity name dictionary in the embodiment shown in fig. 5, with reference to a specific example. The generation process of the entity name dictionary of the embodiment may be performed on-line or off-line.
In one example, generating the entity name dictionary from the knowledge-graph can include: and adding the entity name corresponding to each entity in the knowledge graph into an entity name dictionary. For example, in the knowledge graph shown in fig. 4, the entity names of the entity IDs 1 to 3 are "color sets", and therefore, the entity names "color sets" are added to the entity name dictionary, and are associated with the entity IDs 1, 2, and 3 in the entity name dictionary. In addition, assuming that entity names including the entity ID4, the entity ID5, the entity ID4, and the entity ID5 are "reddish small" and "greenish small", respectively, the entity names "reddish small" and "greenish small" may be added to the entity name dictionary, and the entity name "reddish small" is associated with the entity ID4 and the entity name "greenish small" is associated with the entity ID5 in the entity name dictionary. Thus, an entity name dictionary as shown in table 5 is obtained.
Further, in order to enrich data in the entity name dictionary, the process of generating the entity name dictionary according to the knowledge graph may further include: and mining the entity names corresponding to the entities in the knowledge graph, and adding the mined entity names into an entity name dictionary. Wherein the digging comprises at least one of: alias mining, transformation mining, mining for short, and error correction mining.
The alias mining means performing alias replacement on entity names in the knowledge graph to obtain new entity names. For example: assuming that the actor "minired" also has an alias called "red", the entity name "red" may also be added to the entity name dictionary. And associates "red" with the entity ID4 in the entity name dictionary.
Transformation mining refers to performing certain transformation on entity names in a knowledge graph to obtain new entity names. For example: assuming that the knowledge-graph also includes an entity ID6, whose entity name is "color set 2", the entity name can be transformed into "color set two", "color set II", "color set second quarter", etc. Thus, the transformed entity names may be added to the entity name dictionary and the associations between these changed entity names and the entity ID6 may be established in the entity name dictionary. It should be noted that the above example is exemplified by a numerical conversion example, and in practical applications, the entity name may be converted in various forms, which is not limited in this embodiment.
Mining for short refers to replacing entity names in the knowledge graph for short to obtain new entity names. For example: assuming that the "color set" of the tv series is also called "color" for short, the entity name "color" may also be added to the entity name dictionary. And associates the "color" with the entity ID1, ID2 in the entity name dictionary.
Error correction mining refers to correcting errors in entity names in a knowledge graph to obtain new entity names. For example: assuming that the knowledge-graph further includes an entity ID7 whose entity name is "rainbow bridge", and there is an error in the entity name, and the entity name should be "rainbow bridge", it is possible to add the "rainbow bridge" to the entity name dictionary as well, and associate the "rainbow bridge" with the entity ID7 in the entity name dictionary.
It can be appreciated that through the mining process described above, the number of entity names in the entity name dictionary will be greatly enriched. Furthermore, when the entity name dictionary is used for matching the text to be searched and determining the candidate entity, the comprehensiveness and accuracy of the determined candidate entity are ensured.
Fig. 7 is a schematic structural diagram of a search processing apparatus according to an embodiment of the present application. The apparatus of the present embodiment may be in the form of software and/or hardware. As shown in fig. 7, the search processing apparatus 800 of the present embodiment may include: an acquisition module 801, a selection module 802, an annotation module 803, and a determination module 804. Wherein the content of the first and second substances,
an obtaining module 801, configured to obtain a text to be searched; a selecting module 802, configured to determine at least one candidate entity according to the text to be searched; the identification module 803 is configured to, for each candidate entity, obtain entity information corresponding to the candidate entity from the knowledge graph, and label the keyword in the text to be searched according to the entity information corresponding to the candidate entity to obtain a labeling result corresponding to the candidate entity; a determining module 804, configured to determine, according to the labeling result corresponding to each of the at least one candidate entity, a target entity corresponding to the text to be searched.
In a possible implementation manner, the entity information corresponding to a candidate entity is used to indicate an attribute value corresponding to at least one knowledge attribute of the candidate entity; and the labeling result is used for indicating the knowledge attribute corresponding to each keyword in the text to be searched.
In a possible implementation manner, the labeling module 803 is specifically configured to: segmenting and matching the keywords in the text to be searched according to the attribute value corresponding to at least one knowledge attribute of the candidate entity, and determining the knowledge attribute corresponding to each keyword in the text to be searched; and generating a labeling result corresponding to the candidate entity according to the knowledge attribute corresponding to each keyword in the text to be searched.
In a possible implementation, the at least one knowledge attribute each corresponds to a priority; the labeling module 803 is specifically configured to: and segmenting and matching the keywords in the text to be searched according to the attribute value corresponding to at least one knowledge attribute of the candidate entity and the priority, and determining the knowledge attribute corresponding to each keyword in the text to be searched.
In a possible implementation manner, the labeling module 803 is further configured to: and if the labeling result indicates that the knowledge attribute corresponding to the first keyword in the text to be searched is unknown, matching the first keyword according to a preset regular matching rule, and generating the knowledge attribute corresponding to the first keyword according to the matching result.
In a possible implementation manner, the determining module 804 is specifically configured to: according to the labeling result corresponding to each candidate entity, determining the matching degree of the entity information corresponding to the candidate entity and the text to be searched; and determining the candidate entity corresponding to the highest matching degree as the target entity.
In a possible implementation manner, each of the at least one knowledge attribute corresponds to a weight coefficient; the determining module 804 is specifically configured to: and determining the matching degree of the entity information corresponding to each candidate entity and the text to be searched according to the labeling result corresponding to each candidate entity and the weight coefficient corresponding to at least one knowledge attribute of each candidate entity.
In a possible implementation manner, the selecting module 802 is specifically configured to: according to the entity name dictionary, matching the text to be searched, and determining at least one candidate entity name from the text to be searched; the entity name dictionary comprises at least one entity name and at least one entity corresponding to each entity name; and determining the entity corresponding to each candidate entity name as the at least one candidate entity.
In a possible implementation manner, the selecting module 802 is further configured to: determining that the text to be searched comprises a first class keyword and a second class keyword according to the entity name dictionary; wherein the first category of keywords are keywords that match any entity name in the entity name dictionary, and the second category of keywords are keywords that do not match all entity names in the entity name dictionary.
Fig. 8 is a schematic structural diagram of a search processing apparatus according to another embodiment of the present application. In a possible implementation manner, the knowledge graph is used to indicate an entity name and entity information corresponding to at least one entity. As shown in fig. 8, the apparatus of this embodiment may further include: a generating module 805, configured to generate the entity name dictionary according to the knowledge graph.
In a possible implementation manner, the generating module 805 is specifically configured to: adding entity names corresponding to the entities in the knowledge graph into the entity name dictionary; mining entity names corresponding to all entities in the knowledge graph, and adding the mined entity names into the entity name dictionary; wherein the digging comprises at least one of: alias mining, transformation mining, mining for short, and error correction mining.
The search processing apparatus provided in this embodiment may be configured to execute the technical solution in any of the method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
As shown in fig. 9, it is a block diagram of an electronic device according to the search processing method of the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 9, the electronic apparatus includes: one or more processors 701, a memory 702, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 9, one processor 701 is taken as an example.
The memory 702 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by at least one processor to cause the at least one processor to perform the search processing method provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the search processing method provided by the present application.
The memory 702, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the search processing method in the embodiment of the present application (for example, the obtaining module 801, the selecting module 802, the labeling module 803, the determining module 804, and the generating module 805 shown in fig. 8 shown in fig. 7). The processor 701 executes various functional applications of the server or the terminal device and data processing, i.e., implements the search processing method in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 702.
The memory 702 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by use of the electronic device, and the like. Further, the memory 702 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 702 may optionally include memory located remotely from the processor 701, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device may further include: an input device 703 and an output device 704. The processor 701, the memory 702, the input device 703 and the output device 704 may be connected by a bus or other means, and fig. 9 illustrates an example of a connection by a bus.
The input device 703 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or other input devices. The output devices 704 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (14)

1. A search processing method, comprising:
acquiring a text to be searched;
determining at least one candidate entity according to the text to be searched;
for each candidate entity, acquiring entity information corresponding to the candidate entity from the knowledge graph, and labeling the keywords in the text to be searched according to the entity information corresponding to the candidate entity to obtain a labeling result corresponding to the candidate entity;
and determining a target entity corresponding to the text to be searched according to the labeling result corresponding to the at least one candidate entity.
2. The method of claim 1, wherein the entity information corresponding to a candidate entity is used to indicate an attribute value corresponding to at least one knowledge attribute of the candidate entity; and the labeling result is used for indicating the knowledge attribute corresponding to each keyword in the text to be searched.
3. The method according to claim 2, wherein the labeling the keyword in the text to be searched according to the entity information corresponding to the candidate entity to obtain the labeling result corresponding to the candidate entity comprises:
segmenting and matching the keywords in the text to be searched according to the attribute value corresponding to at least one knowledge attribute of the candidate entity, and determining the knowledge attribute corresponding to each keyword in the text to be searched;
and generating a labeling result corresponding to the candidate entity according to the knowledge attribute corresponding to each keyword in the text to be searched.
4. The method of claim 3, wherein the at least one knowledge attribute each corresponds to a priority; the method for segmenting and matching the keywords in the text to be searched according to the attribute value corresponding to at least one knowledge attribute of the candidate entity to determine the knowledge attribute corresponding to each keyword in the text to be searched comprises the following steps:
and segmenting and matching the keywords in the text to be searched according to the attribute value corresponding to at least one knowledge attribute of the candidate entity and the priority, and determining the knowledge attribute corresponding to each keyword in the text to be searched.
5. The method of claim 3, wherein after generating the labeling result corresponding to the candidate entity, the method further comprises:
and if the labeling result indicates that the knowledge attribute corresponding to the first keyword in the text to be searched is unknown, matching the first keyword according to a preset regular matching rule, and generating the knowledge attribute corresponding to the first keyword according to the matching result.
6. The method according to any one of claims 2 to 5, wherein the determining a target entity corresponding to the text to be searched according to the labeling result corresponding to each of the at least one candidate entity comprises:
according to the labeling result corresponding to each candidate entity, determining the matching degree of the entity information corresponding to the candidate entity and the text to be searched;
and determining the candidate entity corresponding to the highest matching degree as the target entity.
7. The method of claim 6, wherein the at least one knowledge attribute each corresponds to a weight coefficient; the determining the matching degree between the entity information corresponding to each candidate entity and the text to be searched according to the labeling result corresponding to each candidate entity comprises:
and determining the matching degree of the entity information corresponding to each candidate entity and the text to be searched according to the labeling result corresponding to each candidate entity and the weight coefficient corresponding to at least one knowledge attribute of each candidate entity.
8. The method according to any one of claims 1 to 5, wherein the determining at least one candidate entity according to the text to be searched comprises:
according to the entity name dictionary, matching the text to be searched, and determining at least one candidate entity name from the text to be searched; the entity name dictionary comprises at least one entity name and at least one entity corresponding to each entity name;
and determining the entity corresponding to each candidate entity name as the at least one candidate entity.
9. The method of claim 8, wherein before determining at least one candidate entity from the text to be searched, further comprising:
determining that the text to be searched comprises a first class keyword and a second class keyword according to the entity name dictionary; wherein the first category of keywords are keywords that match any entity name in the entity name dictionary, and the second category of keywords are keywords that do not match all entity names in the entity name dictionary.
10. The method of claim 8, wherein the knowledge-graph is used to indicate entity names and entity information corresponding to at least one entity; the method further comprises the following steps:
and generating the entity name dictionary according to the knowledge graph.
11. The method of claim 10, wherein the generating the entity name dictionary from the knowledge-graph comprises:
adding entity names corresponding to the entities in the knowledge graph into the entity name dictionary;
mining entity names corresponding to all entities in the knowledge graph, and adding the mined entity names into the entity name dictionary; wherein the digging comprises at least one of: alias mining, transformation mining, mining for short, and error correction mining.
12. A search processing apparatus, characterized by comprising:
the acquisition module is used for acquiring a text to be searched;
the selection module is used for determining at least one candidate entity according to the text to be searched;
the identification module is used for acquiring entity information corresponding to each candidate entity from the knowledge graph and marking the keywords in the text to be searched according to the entity information corresponding to the candidate entity to obtain a marking result corresponding to the candidate entity;
and the determining module is used for determining the target entity corresponding to the text to be searched according to the labeling result corresponding to the at least one candidate entity.
13. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 11.
14. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 11.
CN202010223795.8A 2020-03-26 2020-03-26 Search processing method, device and equipment Active CN111309872B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010223795.8A CN111309872B (en) 2020-03-26 2020-03-26 Search processing method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010223795.8A CN111309872B (en) 2020-03-26 2020-03-26 Search processing method, device and equipment

Publications (2)

Publication Number Publication Date
CN111309872A true CN111309872A (en) 2020-06-19
CN111309872B CN111309872B (en) 2023-08-08

Family

ID=71157330

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010223795.8A Active CN111309872B (en) 2020-03-26 2020-03-26 Search processing method, device and equipment

Country Status (1)

Country Link
CN (1) CN111309872B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112905884A (en) * 2021-02-10 2021-06-04 北京百度网讯科技有限公司 Method, apparatus, medium, and program product for generating sequence annotation model
CN113139033A (en) * 2021-05-13 2021-07-20 平安国际智慧城市科技股份有限公司 Text processing method, device, equipment and storage medium
CN114741550A (en) * 2022-06-09 2022-07-12 腾讯科技(深圳)有限公司 Image searching method and device, electronic equipment and computer readable storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060242113A1 (en) * 2005-04-20 2006-10-26 Kumar Anand Cybernetic search with knowledge maps
US20180366013A1 (en) * 2014-08-28 2018-12-20 Ideaphora India Private Limited System and method for providing an interactive visual learning environment for creation, presentation, sharing, organizing and analysis of knowledge on subject matter
US20190018849A1 (en) * 2017-07-14 2019-01-17 Guangzhou Shenma Mobile Information Technology Co., Ltd. Information query method and apparatus
CN109388793A (en) * 2017-08-03 2019-02-26 阿里巴巴集团控股有限公司 Entity mask method, intension recognizing method and corresponding intrument, computer storage medium
WO2019057191A1 (en) * 2017-09-25 2019-03-28 腾讯科技(深圳)有限公司 Content retrieval method, terminal and server, electronic device and storage medium
CN109977233A (en) * 2019-03-15 2019-07-05 北京金山数字娱乐科技有限公司 A kind of idiom knowledge map construction method and device
CN109992689A (en) * 2019-03-26 2019-07-09 华为技术有限公司 Searching method, terminal and medium
CN110245259A (en) * 2019-05-21 2019-09-17 北京百度网讯科技有限公司 The video of knowledge based map labels method and device, computer-readable medium
CN110516047A (en) * 2019-09-02 2019-11-29 湖南工业大学 The search method and searching system of knowledge mapping based on packaging field
CN110569367A (en) * 2019-09-10 2019-12-13 苏州大学 Knowledge graph-based space keyword query method, device and equipment
CN110659366A (en) * 2019-09-24 2020-01-07 Oppo广东移动通信有限公司 Semantic analysis method and device, electronic equipment and storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060242113A1 (en) * 2005-04-20 2006-10-26 Kumar Anand Cybernetic search with knowledge maps
US20180366013A1 (en) * 2014-08-28 2018-12-20 Ideaphora India Private Limited System and method for providing an interactive visual learning environment for creation, presentation, sharing, organizing and analysis of knowledge on subject matter
US20190018849A1 (en) * 2017-07-14 2019-01-17 Guangzhou Shenma Mobile Information Technology Co., Ltd. Information query method and apparatus
CN109388793A (en) * 2017-08-03 2019-02-26 阿里巴巴集团控股有限公司 Entity mask method, intension recognizing method and corresponding intrument, computer storage medium
WO2019057191A1 (en) * 2017-09-25 2019-03-28 腾讯科技(深圳)有限公司 Content retrieval method, terminal and server, electronic device and storage medium
CN109977233A (en) * 2019-03-15 2019-07-05 北京金山数字娱乐科技有限公司 A kind of idiom knowledge map construction method and device
CN109992689A (en) * 2019-03-26 2019-07-09 华为技术有限公司 Searching method, terminal and medium
CN110245259A (en) * 2019-05-21 2019-09-17 北京百度网讯科技有限公司 The video of knowledge based map labels method and device, computer-readable medium
CN110516047A (en) * 2019-09-02 2019-11-29 湖南工业大学 The search method and searching system of knowledge mapping based on packaging field
CN110569367A (en) * 2019-09-10 2019-12-13 苏州大学 Knowledge graph-based space keyword query method, device and equipment
CN110659366A (en) * 2019-09-24 2020-01-07 Oppo广东移动通信有限公司 Semantic analysis method and device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
管健;汪璟玢;卞倩虹: "基于城市安全知识图谱的多关键词流式并行检索算法", 计算机科学, no. 002 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112905884A (en) * 2021-02-10 2021-06-04 北京百度网讯科技有限公司 Method, apparatus, medium, and program product for generating sequence annotation model
CN113139033A (en) * 2021-05-13 2021-07-20 平安国际智慧城市科技股份有限公司 Text processing method, device, equipment and storage medium
CN114741550A (en) * 2022-06-09 2022-07-12 腾讯科技(深圳)有限公司 Image searching method and device, electronic equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN111309872B (en) 2023-08-08

Similar Documents

Publication Publication Date Title
US11714816B2 (en) Information search method and apparatus, device and storage medium
US11581021B2 (en) Method and apparatus for locating video playing node, device and storage medium
CN110955764B (en) Scene knowledge graph generation method, man-machine conversation method and related equipment
CN112650907B (en) Search word recommendation method, target model training method, device and equipment
CN111125435B (en) Video tag determination method and device and computer equipment
US11508153B2 (en) Method for generating tag of video, electronic device, and storage medium
CN111309872B (en) Search processing method, device and equipment
CN111782977A (en) Interest point processing method, device, equipment and computer readable storage medium
CN111680189B (en) Movie and television play content retrieval method and device
CN111241427B (en) Method, device, equipment and computer storage medium for query automatic completion
CN111538815B (en) Text query method, device, equipment and storage medium
CN111090991A (en) Scene error correction method and device, electronic equipment and storage medium
CN110795593A (en) Voice packet recommendation method and device, electronic equipment and storage medium
CN111814077A (en) Information point query method, device, equipment and medium
CN111563198B (en) Material recall method, device, equipment and storage medium
CN112380847A (en) Interest point processing method and device, electronic equipment and storage medium
CN110532404B (en) Source multimedia determining method, device, equipment and storage medium
CN111666372A (en) Method and device for analyzing query term query, electronic equipment and readable storage medium
CN111309200B (en) Method, device, equipment and storage medium for determining extended reading content
CN112015845A (en) Method, device and equipment for map retrieval test and storage medium
CN111984876A (en) Interest point processing method, device, equipment and computer readable storage medium
CN111666417A (en) Method and device for generating synonyms, electronic equipment and readable storage medium
CN111625706B (en) Information retrieval method, device, equipment and storage medium
CN111881255B (en) Synonymous text acquisition method and device, electronic equipment and storage medium
CN110889020B (en) Site resource mining method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant