WO2019214679A1 - 实体搜索方法、相关设备及计算机存储介质 - Google Patents

实体搜索方法、相关设备及计算机存储介质 Download PDF

Info

Publication number
WO2019214679A1
WO2019214679A1 PCT/CN2019/086197 CN2019086197W WO2019214679A1 WO 2019214679 A1 WO2019214679 A1 WO 2019214679A1 CN 2019086197 W CN2019086197 W CN 2019086197W WO 2019214679 A1 WO2019214679 A1 WO 2019214679A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
entity
candidate
query information
candidate entities
Prior art date
Application number
PCT/CN2019/086197
Other languages
English (en)
French (fr)
Inventor
徐传飞
常毅
夏命榛
陈跃国
马登豪
张凯文
Original Assignee
华为技术有限公司
伊利诺伊大学董事会
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司, 伊利诺伊大学董事会 filed Critical 华为技术有限公司
Publication of WO2019214679A1 publication Critical patent/WO2019214679A1/zh
Priority to US17/093,210 priority Critical patent/US11636143B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models

Definitions

  • the present invention relates to the field of Internet technologies, and in particular, to an entity search method, related devices, and a computer storage medium.
  • the early search engines were mainly text-based search.
  • the keyword-based matching search method but the search method lacks a deep understanding of the meaning of the word, the matching degree is low, and the feedback result is displayed in text form, the user needs to find the answer from the feedback text, and the user experience is poor.
  • an entity search method which aims to find an entity (result answer) of a user query and present it to the user.
  • the current entity search method mainly adopts a keyword-based matching search method. Because the keyword matching scheme needs to ensure that the query keyword and the entity keyword are consistent, the matching success can be successful, so that most synonyms or synonyms can not be successfully matched, that is, words with the same or similar expression cannot be successfully matched. It can be seen that in the actual search process, different users have different representations of the same thing, which is also called a conceptual gap, which will result in a lower matching rate or a lower accuracy rate of the search.
  • the embodiments of the present invention disclose an entity search method, a related device, and a computer storage medium, which can solve the problems of low matching rate or low accuracy in the result search due to differences in expressions in the prior art.
  • an embodiment of the present invention provides an entity search method, where the method includes:
  • the terminal device determines a first category word and a second category word included in the query information, where the first category word is a word indicating a type of the query result in the query information, and the second category word is the query information a word other than the first category word;
  • the first entity library includes the w Information of each candidate entity in the candidate entity, where the information of the candidate entity includes a third classification word and a fourth classification word, the third classification word and the first classification word belong to the same category, and the fourth The classification word and the second classification word belong to the same classification;
  • the correlation degree is used to indicate the correlation between the classification word in the query information and the classification word in the candidate entity, and both w and s are positive integers And s is less than or equal to 4;
  • the first category word includes a core word corresponding to a type of the query result expressed in the query information
  • the second category word includes the first category word in the query information.
  • modifiers other than stop words are other than stop words.
  • the s correlations include a first correlation, which is used to indicate a correlation between a first classified word in the query information and a third classified word in the candidate entity,
  • Determining, according to the first entity library and the first classification word and the second classification word included in the query information, the s correlations corresponding to each of the w candidate entities includes:
  • the s correlations include a second correlation, used to indicate a correlation between a first classified word in the query information and a fourth classified word in the candidate entity,
  • the processed first classified word is obtained by performing context association processing on the first classified word in the query information according to the first pre-stored document, and the processed fourth classified word is based on the first pre-existing
  • the context association processing is: extracting, in the first pre-stored document, the first classification word or the fourth classification
  • the correlation a or the correlation b is determined according to a correlation smoothing algorithm, and the correlation smoothing algorithm is used to mitigate the first classified word or the location in the query information. Deviating the fourth category word in the candidate entity in the first pre-stored document.
  • the s correlations include a third correlation, which is used to indicate a correlation between a second classified word in the query information and a third classified word in the candidate entity,
  • the processed second classified word is obtained by performing context association processing on the second classified word in the query information according to the second pre-stored document, and the processed third classified word is according to the second pre-stored word.
  • the context association processing is: extracting, in the second pre-stored document, the second classified word or the third classified The first k words and/or the last l words of the word, where k and l are both positive integers.
  • the correlation c or the correlation d is determined according to a correlation smoothing algorithm, and the correlation smoothing algorithm is used to mitigate the second classified word or the location in the query information. Deviating the third category word in the candidate entity in the second pre-stored document.
  • the s correlations include a fourth correlation, used to indicate a correlation between a second category word in the query information and a fourth category word in the candidate entity,
  • the expanded second classified word is obtained by expanding the second classified word in the query information by using the attribute word, and the expanded fourth classified word is the first of the candidate entities.
  • the four classification words are obtained after the expansion of the attribute words.
  • the determining, according to the s correlations of the w candidate entities, the information of the target entity corresponding to the query information includes:
  • an embodiment of the present invention provides a terminal device, including a functional unit corresponding to the method described in the foregoing first aspect.
  • an embodiment of the present invention provides a terminal device, including a memory and a processor coupled to the memory, the memory is configured to store an instruction, the processor is configured to execute the instruction, and the The first camera and the second camera are in communication; wherein the processor executes the instructions to perform the method described in the first aspect above.
  • the terminal device further includes a display coupled to the processor, the display for displaying information (search results) of the target entity under control of the processor.
  • the terminal device further includes a communication interface, the communication interface is in communication with the processor, and the communication interface is used under the control of the processor and other devices (such as a network device, etc.) ) to communicate.
  • a computer readable storage medium storing program code for an entity search.
  • the program code includes instructions for performing the method described in the first aspect above.
  • FIG. 1 is a schematic diagram of a network framework according to an embodiment of the present invention.
  • FIG. 2 is a schematic flowchart diagram of an entity search method according to an embodiment of the present invention.
  • 3A-3B are schematic structural diagrams of two first entity libraries provided by an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of another terminal device according to an embodiment of the present invention.
  • FIG. 1 is a schematic diagram of a network framework according to an embodiment of the present invention.
  • the network framework 100 includes an entity search component 12 and an application service component 14.
  • a document search component 16 can also be included. among them:
  • the entity search component 12 includes an entity library 120 and a matcher 122.
  • the entity library 120 includes information of one or more entities, and the information of the entity is description information for describing the entity, such as a name, an identifier, and attribute information of the entity.
  • relationship information and the like between any two entities may also be included, which will be specifically described below.
  • the matcher 122 includes one or more matchers. The matcher can be used to calculate the degree of correlation (ie, the degree of matching) between the query information input by the user and the information of the entity in the entity library. Optionally, the calculated relevance may be fed back to the application service component 14 to facilitate determining, according to the correlation, whether the information of the entity is the search result (result answer) corresponding to the query information.
  • the matcher 122 in the present application may be designed to include three matchers, namely: a first matcher, a second matcher, and a third matcher.
  • the first matcher is configured to calculate a correlation between a first classified word (such as a core word) included in the query information and a third classified word (such as a core word) included in the information of the entity in the entity library (ie, suitability).
  • the second matcher is configured to calculate a correlation between a second classified word (such as a modifier) included in the query information and a third classified word (such as a core word) included in the information of the entity in the entity library, optionally, It can be used to calculate the correlation between the first classification word (such as the core word) included in the query information and the fourth classification word (such as a modifier) included in the information of the entity in the entity library.
  • the third matcher is configured to calculate a correlation between the second classified word (such as a modifier) included in the query information and the fourth classified word (such as a modifier) included in the information of the entity in the entity library.
  • the first category word in the query information and the third category word in the information of the entity belong to the same type/category, for example, the core words corresponding to the type for expressing the query result.
  • the second category word in the query information and the fourth category word in the information of the entity belong to the same type/category, for example, all of the modifiers corresponding to the type used to modify the query result.
  • the classification words specifically, the first classification word and the second classification word
  • the classification words specifically the third classification word and the fourth classification word in the information of the entity, and the above three matches
  • the respective functions of the devices i.e., how to calculate the corresponding correlations
  • the functions of the three matchers can also be integrated into one matcher implementation, or the functions of the three matchers can be split into multiple matchers, such as those involved in the second matcher described above.
  • the two functions can also be split into two matchers and the like, which is not described and limited herein.
  • the document search component 16 includes a document library 160 and a search service 162.
  • the document library 160 includes one or more documents including search results corresponding to the information to be queried.
  • the search service 162 is configured to query a corresponding search result from the document library according to the query information input by the user, and feed back the search result to the application service component 14.
  • the search service 162 can be used to calculate the relevance (matching degree) between the query information input by the user and each document in the document library, and feed back the relevance to the application service component 14 to facilitate the component 14 according to the relevance. Determining whether the document is a search result corresponding to the query information.
  • the application service component 14 is configured to display the search result (result answer) corresponding to the query information, which is convenient for the user to view.
  • the application service component 14 can include a presentation module 140.
  • a sorting module 142 and a feedback model module 144 may also be included.
  • the display module 140 is configured to display a search result corresponding to the query information to a user.
  • the sorting module 142 is configured to perform a sorting order of the received relevance levels, for example, sorting according to the degree of relevance from the largest to the smallest, and the like, so that the display module determines the search to be displayed according to the sorting module. result.
  • the ordering module 142 may also filter/filter the received relevance, such as filtering out correlations below a certain threshold (eg, 40%), and the like.
  • the entity search component 14 may feed back the association between the query information and the information of each entity in the entity library to the application service component 14 (specifically, the sorting module 142). degree. Accordingly, the ranking module can sort the degrees of association that exceed a certain threshold in descending order. Further, the display module 142 may sequentially display the information (ie, the search result) of the entity corresponding to the relevance to the user according to the order of the sorting module, so that the user can view the corresponding information of the query information more intuitively and effectively. search results.
  • the feedback model module 144 can be used to collect feedback information of the user, and the feedback information can be used to re-filter or sort the candidate results (specifically information or documents of the candidate entity) searched in the entity search component or the document search component. Wait. For example, candidate results with poor user feedback or low user click rate (viewing rate) are removed.
  • the feedback information may specifically be information that the user feeds back in the form of a document, such as whether the search result corresponding to the query information is suitable, the related problem is solved, or the click information of the user, for example, whether the user clicks to view the search corresponding to the query information. Results, etc.
  • the network framework proposed by the present application may specifically provide the foregoing functional service by means of creating a web service (Rest) or an application programming interface (API).
  • the network framework can be deployed to a corresponding terminal device.
  • the terminal device includes, but is not limited to, a user equipment (UE), a server, a mobile phone, a tablet personal computer, a personal digital assistant (PDA), and a mobile internet device (MID). Or a device with network communication function such as a wearable device.
  • FIG. 2 is a schematic flowchart diagram of an entity search method according to an embodiment of the present invention.
  • the entity search method shown in FIG. 2 includes the following implementation steps:
  • Step S102 The terminal device determines a first category word and a second category word included in the query information, where the first category word is a word that expresses a type of the query result in the query information, and the second category word Is a word other than the first classification word.
  • the terminal device may determine the binary structure of the query included in the query information.
  • the query binary structure is composed of two (category) classification words.
  • the query binary structure includes a first classified word and a second classified word, and the first classified word may be a word used to describe a type of the query result in the query information input by the user, for example, may be a core word or Key words.
  • the second type of words may be words other than the type used to express the query result in the query information, for example, may be modifiers for modifying/defining the first classified word, and the like.
  • the number of words included in each of the first classification word and the second classification word is not limited herein, such as one or more.
  • the present application hereinafter describes the related content by taking the first classified word as a core word and the second classified word as a modified word as an example.
  • the query information input by the user is "Where is the scene of the A scene in a TV series?” It can be seen that the first category word in the query information is “where”, and the second category word is “a TV drama”, “A scene story” and “framing”.
  • Step S104 The terminal device determines a third category word and a fourth category word respectively included in each of the w candidate entities in the first entity library, where the third category word and the first category word belong to the same category.
  • the fourth classification word and the second classification word belong to the same category, and w is a positive integer.
  • the first entity library includes information of each of the w candidate entities, and w is a positive integer.
  • the information of the candidate entity is description information for describing the candidate entity, such as a name, an identifier, an attribute, and the like of the candidate entity. For example, if the entity is "Yao Ming", the information of the entity may include attribute information such as height, weight, date of birth, and location of the household registration of Yao Ming.
  • the first entity library may further include relationship information, where the relationship information is used to describe a relationship between any two candidate entities, for example, the first candidate entity is a parent node of the second candidate entity. Or a relationship such as a child node.
  • the information of the candidate entity includes an entity binary structure of the candidate entity, and the entity binary structure is composed of two (category) classification words.
  • the entity binary structure includes a third classified word and a fourth classified word.
  • the third classified word is a word for expressing a type of the search result in the information of the candidate entity
  • the fourth classified word is a word other than the third classified word among the information of the candidate information. That is, in addition to the words used to describe the type of search results. That is, the third category word corresponds to the first category word, and all of the words belong to the same type/category; the fourth category word corresponds to the second category word, and all belong to the same type/category .
  • Step S106 The terminal device determines, according to the first classified word and the second classified word in the query information, and the third classified word and the fourth classified word included in each of the w candidate entities, the w candidate The respective s correlations of the entities, wherein the correlation is used to indicate a correlation between the classification words in the query information and the classification words in the candidate entities, and s is a positive integer less than or equal to 4.
  • the terminal device may be configured according to the third classified word and the fourth classified word included in the entity binary structure of each candidate entity and the first classified word and the second classified word included in the query binary structure of the query information,
  • the s correlations between the information of the candidate entity and the query information are calculated, and s is a positive integer.
  • the correlation is used to indicate a correlation between the first target classification word in the candidate entity and the second target classification word in the query information.
  • the first target classification word is the first classification word or the second classification word
  • the second target classification word is a third classification word or a fourth classification word in the candidate entity.
  • the first classification word and the third classification word are core words, the second classification word and the fourth classification word are modified words, and s is a positive integer less than or equal to 4.
  • the s correlations include any one or more of a first correlation to a fourth correlation.
  • the first correlation is used to indicate a correlation between a core word in the query information and a core word in the candidate entity.
  • the second relevance is used to indicate a degree of correlation between the core word in the query information and the modifier in the candidate entity.
  • the third correlation is used to indicate a degree of correlation between the modifier in the query information and the core word in the candidate entity.
  • the fourth relevance is used to indicate a degree of correlation between the modifier in the query information and the modifier in the candidate entity. How to calculate the above four correlations will be explained in detail below.
  • Step S108 The terminal device determines, according to the s correlations of the w candidate entities, the information of the target entity corresponding to the query information, where the target entity is an entity of the w candidate entities.
  • the terminal device may calculate a target relevance corresponding to the candidate entity according to the s correlations corresponding to each candidate entity.
  • the target relevance of each of the w candidate entities can be calculated.
  • candidate entities corresponding to the target relevance exceeding a certain threshold for example, 80% are selected from the w candidate entities corresponding to the w target correlations as the target entity.
  • the information of the target entity is further used as a search result corresponding to the query information.
  • the information of the target entity may also be displayed to the user for viewing, and the like.
  • the number of the target entities is not limited in this application, and may be one or more. How the target correlation is calculated will be described in detail below.
  • step S102 the terminal device acquires query information input by the user.
  • the query information may be pre-processed by using an open source tool to obtain a query binary structure included in the query information. That is, the first classified word and the second classified word included in the query information are extracted by an open source tool.
  • the first classification word and the second classification word refer to related descriptions in the foregoing embodiments, and details are not described herein again.
  • the pre-processing includes a binary structure recognition process (ie, recognition processing of the first classified word and the second classified word).
  • the pre-processing may further include, but is not limited to, a combination of any one or more of the following processes: word segmentation processing (word segmentation processing), de-stop word processing, semantic expansion processing, and the like, The preprocessing is not detailed here.
  • the terminal device may acquire the first entity library.
  • the first entity library may be a predefined database on the user side or the system side, and the database includes information of one or more candidate entities.
  • the application is not limited.
  • the database includes an entity of a movie type, an entity of a clothing type, and an entity of another domain or type.
  • the first entity library is associated with a first category word in the query information, that is, the first entity library is determined according to a first category word in the query information.
  • the terminal device may determine, according to the first category word in the query information, a first entity library corresponding thereto, a type (ie, a classification) of the candidate entities included in the first entity library, and the query information
  • the first classification word corresponds to the same type of expression. For example, if the first category word is “where” for indicating an address, all candidate entities included in the first entity library are entities representing an address, or other entities associated with an entity representing an address.
  • the following describes the related content by taking the information of the first entity library including w candidate entities as an example. w is a positive integer.
  • the terminal device may preprocess the information of each candidate entity in the first entity library by using an open source tool, thereby obtaining a third category word and each of the candidate entities included in the first entity library.
  • the fourth classification word For details about the pre-processing, the third classification word, and the fourth classification word, refer to related descriptions in the foregoing embodiments, and details are not described herein again.
  • the terminal device may calculate/determine the candidate according to the third classified word and the fourth classified word included in the information of each candidate entity and the first classified word and the second classified word included in the query information. There are s correlations between the entity's information and the query information. Where s is a positive integer less than or equal to 4.
  • the terminal device may determine the w candidates according to the third category word and the fourth category word respectively included in the w candidate entities and the first category word and the second category word included in the query information.
  • a candidate entity includes a third classification word and a fourth classification word, the first classification word and the third classification word being a core word, the second classification word and the fourth classification word.
  • S106 determining the candidate entity according to the third classification word and the fourth classification word included in the information of the candidate entity and the first classification word and the second classification word included in the query information) Implementation of corresponding s correlations).
  • the s correlations include a first relevance, where the first correlation p(h t
  • the application may first generalize the information of the candidate entity in the first entity library, and then use the first classification word (ie, the core word) of the candidate entity in the first entity library after the generalization and the third in the query information.
  • the classification words (core words) are matched to improve the success rate or accuracy of the search matching.
  • the terminal device may first classify (categorize) the information of the w candidate entities included in the first entity library to obtain the first entity library. 'Processed candidate entities.
  • the classification process is to merge the candidate entities corresponding to the third classification words that express the same or similar meaning into one processed candidate entity, that is, to merge the candidate entities with the same or similar core words into one processed candidate entity. .
  • the third classification word (core word) in the w′ processed candidate entities in the first entity library and the first classification word (core word) in the query information are reused, and the first A degree of relevance.
  • h q ) can be calculated by the following formula (1).
  • q is the query information and t is the information of the candidate entity (here also refers to the information of the processed candidate entity).
  • h i denotes the core word of i (ie, the first classifier or the third classifier).
  • M i denotes a modifier of i (ie, a second classification word or a fourth classification word).
  • i q or t.
  • represents the number of child nodes of h t .
  • represents the number of all subsequent nodes (including child nodes and descendant nodes) of h t .
  • the parent node, the child node, the grandparent node, and the descendant node are all relationship information between any two candidate entities in the first entity library, which will be described in detail below by way of an example.
  • a first entity library of a building type is shown in Figure 3A.
  • the first entity library includes three layers of node information, and each layer includes one or more nodes (ie, information of one or more candidate entities).
  • the first layer contains information about two candidate entities: works by Scottish people and Scottish architecture.
  • the second level includes information about a candidate entity, a structure designed by Scottish architects.
  • the third layer contains information on four candidate entities: Robert Roward Anderson buildings, William Fowler railway stations, and Charles Lenny McKinley. The Charles Rennie Mackintosh buildings and the Joseph Mitchell buildings.
  • the icon also states/includes relationship information between entities.
  • the node in the upper layer ie, the information of the candidate entity
  • the node in the next layer is a child of the node above.
  • the works by Scottish people in the first layer is the parent of the structures by Scottish architects in the second layer.
  • a node above the parent node may be referred to as a grandparent node of the next layer node
  • a node below the child node may be referred to as a descendant node of the upper layer node.
  • the works by Scottish people in the first layer are the grandfather nodes of Robert Roward Anderson buildings in the third layer.
  • the terminal device may extract, from the first entity library, the related nodes associated with the core words in the query information by using a first header-header to merge, and assume that the relevant nodes are graphs. All nodes in 3A.
  • a random walk method may be used to extract the relationship between related nodes and related nodes (the illustration may be a line existing between nodes), and then these nodes are Nodes with the same core word are merged (ie, merged candidate entities with the same core word) to obtain a generalized first entity library.
  • the first entity library after generalization is shown.
  • the first entity library also includes three layers of node information as shown in FIG. 3B.
  • the processed candidate entity ie, the core word of the candidate entity included in the first layer is: a work and an architecture.
  • the second layer includes the processed candidate entities as buildings.
  • the third layer includes the processed candidate entities: buildings and stations.
  • the first relevance may be calculated by using the core words of the processed candidate entity in the first entity library and the core words in the query information after the generalization.
  • h q ) may be calculated by using the above formula (1).
  • the query information entered by the user is Works by Charles.
  • the core words in the query information are: Works, and the modifier is: Charles.
  • the core word in the query information is selected to the first entity library shown in FIG. 3A, and the information of the candidate entity in FIG. 3A is selected as the building of Joseph Mitchell. (Joseph Mitchell buildings) to calculate the first correlation between them.
  • the core words in the information of the candidate entity are buildings, and the modifiers are Joseph and Mitchell.
  • the information of the candidate entity in the first entity library shown in FIG. 3A is generalized as described above, and the information about the candidate entity processed in the first entity library after the generalization in FIG. 3B is obtained (specifically, the candidate entity) Core word).
  • the core words of the candidate entities corresponding to the Joseph Mitchell buildings in FIG. 3B are buildings.
  • the first correlation between the core word in the query information and the core word in the candidate entity is further calculated using the above formula (1). It can be seen from the above formula (1) and FIG. 3B that the core word buildings(h t ) in the candidate entity is the descendant node of the core word Works(h q ) in the query information, and the first correlation degree P(h between them) t
  • h q ) P(buildings
  • Works) 1.
  • the s correlations include a second relevance, where the second relevance p(M t
  • h q ) is used to indicate a first category word in the query information and the The degree of correlation between the fourth classified words in the candidate entities, that is, the degree of correlation between the core words in the query information and the modifiers in the candidate entities.
  • the terminal device may perform context association on the fourth classified word (modifier) in the information of the candidate entity according to the first pre-stored document.
  • a fourth classified word ie, a processed modifier
  • the correlation degree a is calculated according to the first classification word (ie, the core word) in the query information and the processed fourth classification word (ie, the processed modification word) in the candidate entity.
  • the correlation a can be used as the second correlation.
  • the context association may refer to extracting the first i words and/or the last j words of the fourth classification word (modifier) in the information adjacent to the candidate entity in the first pre-stored document to correspondingly obtain the The processed fourth category word in the information of the candidate entity.
  • i and j are positive integers set by the user side or the system side, which may be the same or different, and are not limited in this application.
  • the first pre-stored document may be a document stored in the terminal device in advance on the user side or the system side, or may be a document acquired from the server side.
  • the document may be a description document or the like related to the query information, or may be a document for describing the candidate entity or the first entity library corresponding to the candidate entity, etc., which is not detailed or limited in this application. . Accordingly, the number of the first pre-stored documents is not limited in this application.
  • the present application may also employ a correlation smoothing algorithm to calculate between the first classified word (core word) in the query information and the processed fourth classified word (ie, the processed modified word) in the candidate entity.
  • the correlation a (or the second correlation). Specifically, the correlation degree a(p a (M t
  • is the probability smoothing factor used in the correlation smoothing algorithm
  • m is a certain modifier in the candidate entity (ie, a fourth classification word)
  • M t is all the modifiers in the candidate entity ( Or a collection of them).
  • D refers to the first pre-stored document.
  • refers to the multiplication.
  • n(h q ,m) refers to the number of simultaneous occurrences of h q and modifier m.
  • w is any word in ctx(m).
  • Ctx(m) is a processed qualifier obtained by context-related to the modifier m, or a set of processed qualifiers.
  • h i denotes the core word of i (first classifier or third classifier).
  • M i denotes a modifier of i (second classifier or fourth classifier).
  • i q or t.
  • q represents query information
  • t represents information of candidate entities.
  • the query information referenced by the above user input is Works by Charles and the example of FIG. 3A.
  • the information for the candidate entities is Joseph Mitchell buildings. It is assumed here that the qualifier of the candidate entity (ie, a fourth categorical word in the candidate entity) is Mitchell, and the core word in the query information is calculated (Works, that is, the first category in the query information) The second degree of correlation between the word) and the modifier (Mitchell) in the candidate entity.
  • the present application may use a correlation smoothing algorithm (also referred to as a probability smoothing algorithm) to calculate the core word in the query information and the entity binary structure of the candidate entity.
  • the second correlation between the modifiers may be specifically as shown in the above formula (2).
  • ⁇ w n(w,m) represents the number of times all words in the processed qualifier appear at the same time as the modifier m, in this case 3, that is, the modifier m and the core word in the query information are in the first pre-stored document.
  • the probability of co-occurrence n(h q ,m)/ ⁇ w n(w,m) 2/3.
  • D) represents the probability that h q appears in the first pre-stored document, and the probability of the work appearing in the example is 2/3.
  • the terminal device may perform context association on the first classified word (ie, the core word) in the query information according to the first pre-stored document, and obtain the processing in the query information.
  • the first classification word ie, the processed core word
  • the correlation degree b is calculated by using the first classified word (ie, the processed core word) processed in the query information and the fourth classified word in the information of the candidate entity.
  • the degree of association b can be used as the second degree of association.
  • the present application may also employ a correlation smoothing algorithm to calculate the correlation b.
  • the correlation b (p b (M t
  • ctx(h q ) refers to the processed core word obtained by context-correlated the core word h q or a set consisting of the processed core words.
  • ctx(h q ) refers to the processed core word obtained by context-correlated the core word h q or a set consisting of the processed core words.
  • the core file (Work, that is, the first category word in the query information) in the query information is context-related by using the first pre-stored document, and the processed core is obtained.
  • Word ctx(Work) ⁇ Mitchell, Charles, Mitchell ⁇ .
  • the correlation b can be p(m
  • h q ) p(Mitchell
  • work) (1 ⁇ )n(h q ,m)/ ⁇ w ⁇ ctx(hq) n(w,m)+ ⁇ p( h q
  • the correlation degree a and the correlation degree b may be performed according to a set operation rule. Processing to obtain the second correlation.
  • the s correlations include a third relevance, where the third correlation p(h t
  • the following three implementation manners exist.
  • the terminal device (specifically, the second matching device in the device) performs context association on the second classified word (modifier) in the query information according to the second pre-stored document, and obtains the processing in the query information.
  • the third category word ie, the processed modifier.
  • the correlation degree c is calculated by using the second classified word (ie, the processed modifier) processed in the query information and the third classified word (core word) in the information of the candidate entity.
  • the correlation c can be used as the third correlation.
  • the context association may refer to extracting, in the second pre-stored document, the first k words and/or the last l words adjacent to the second category word in the query information.
  • k and l can be positive integers set by the user side or the system side.
  • the first pre-stored document and the second pre-stored document are both user-side or system-side customized documents, which may be the same or different.
  • the i, j, k, and l may be positive integers set by the user side or the system side. They may be the same or different, and are not limited in this application.
  • For the context association and the second pre-stored document refer to the related description in the foregoing embodiment, and details are not described herein.
  • the present application may also employ a correlation degree smoothing algorithm to calculate the correlation degree c.
  • M q )) can be obtained by the following formula (4).
  • is the probability smoothing factor used in the correlation smoothing algorithm
  • m is a certain modifier in the query information (ie, a second classification word)
  • M q is all the modifiers in the query information ( Or a collection of them).
  • D refers to the second pre-stored document.
  • refers to the multiplication.
  • n(h t ,m) refers to the number of simultaneous occurrences of h t and modifier m.
  • w is any word in ctx(m).
  • Ctx(m) is a processed qualifier obtained by context-related to the modifier m, or a set of processed qualifiers.
  • h i denotes the core word of i (first classifier or third classifier).
  • M q ) in the present application may also be regarded as the same as p(M q
  • the correlation c can be:
  • the terminal device (specifically, the second matcher in the device) performs context association on the third classified word (core word) in the information of the candidate entity according to the second pre-stored document, to obtain the candidate entity.
  • the third classified word after processing ie, the processed core word.
  • the correlation degree d is calculated by using the second classification word (modifier) in the query information and the third classification word processed in the information of the candidate entity. Alternatively, the correlation d may be taken as the third correlation.
  • the present application may also employ a correlation degree smoothing algorithm to calculate the correlation degree d.
  • M q )) can be obtained by the following formula (5).
  • ctx(h t ) is the processed core word obtained by context-correlated the core word h t or a set consisting of the processed core words.
  • the second pre-stored document is used to context-correlate the core words (buildings, ie, the third classified words) in the information of the candidate entity, and then the processed core word ctx is obtained.
  • (buildings) ⁇ Mitchell ⁇ .
  • the correlation d is: p d (h t
  • h t ) p(m
  • h t ) p(Charles
  • building) (1 ⁇ )n(h t , m) / ⁇ w n (w, m) + ⁇ p (h t
  • the correlation degree c and the correlation degree d may be performed according to a set operation rule. Processing to obtain the third correlation.
  • the s correlations include a fourth correlation, where the fourth correlation p(M t
  • the degree of correlation between the fourth classified words (modifiers) in the information of the candidate entity that is, the degree of correlation between the modifiers in the query information and the modifiers in the candidate entities.
  • the terminal device may expand the second category word (modifier) in the query information, for example, attribute words for the second category word.
  • the extension or the like to obtain the second classified word (ie, the processed qualifier) processed in the query information.
  • the correlation degree e is calculated by using the second classified word (processed modifier) processed in the query information and the fourth classified word (modifier) in the information of the candidate entity.
  • the correlation e may be used as the fourth correlation.
  • M q )) can be obtained by the following formula (6).
  • M e refers to all the qualifiers in the information of the candidate entity (ie, the set of them).
  • m is any word of M e.
  • M q refers to all the modifiers in the query information (or a collection of them, also referred to herein as Me ).
  • M t refers to all the modifiers (or a collection of them) in the information of the candidate entity.
  • n(m, w) refers to the number of simultaneous occurrences of the modifiers m and w, and the modifiers m and w are the same.
  • n(w i , w j ) refers to the number of simultaneous occurrences of the modifiers w i and w j .
  • h i denotes the core word of i (first classifier or third classifier).
  • M i denotes a modifier of i (second classifier or fourth classifier).
  • i q or t. q represents query information, and t represents information of candidate entities.
  • the above query information is cited as an example of Works by Charles.
  • the qualifier in the query information is Charles.
  • the terminal device may expand the attribute word of the modifier word Charles in the query information by using the third matching device, and obtain the expanded modifier in the query information.
  • the first entity library includes a description of Charles's attribute information, such as the Scottish male architect Scottish male architect and the Mitchell building’s architect.
  • the expanded qualifier Me ⁇ Scottish,male,Mitchell ⁇ for Charles.
  • M q ) can be expressed by p(M t
  • the terminal device may expand the fourth classified word (modifier) in the information of the candidate entity, for example, for the fourth classified word.
  • An extension of the attribute words or the like is performed to obtain a fourth classified word (ie, the processed modifier) processed in the candidate entity.
  • the second classification word (modification word) in the query information and the fourth classification word (processed modification word) processed in the candidate entity are used to calculate the correlation degree f.
  • the correlation f may be taken as the fourth correlation.
  • M q )) can be obtained by the following formula (7).
  • M e refers to all the qualifiers in the information of the candidate entity (ie, the set of them).
  • m is any word of M e.
  • M q refers to all the modifiers (or a collection of them) in the query information.
  • M t refers to all the modifiers in the information of the candidate entity (or a collection of them, which may also be regarded as Me ).
  • n(m, w) refers to the number of simultaneous occurrences of the modifiers m and w, and the modifiers m and w are the same.
  • n(w i , w j ) refers to the number of simultaneous occurrences of the modifiers w i and w j .
  • h i denotes the core word of i (first classifier or third classifier).
  • M i denotes a modifier of i (second classifier or fourth classifier).
  • i q or t. q represents query information, and t represents information of candidate entities.
  • the information of the candidate entity is Joseph Mitchell buildings.
  • the qualifier of the candidate entity ie, a fourth categorical word in the candidate entity
  • the qualifier in the query information is calculated (Charles, that is, the second category in the query information)
  • the correlation f is p f (M t
  • the correlation degree e and the correlation degree f may be performed according to a set operation rule. Processing to obtain the second correlation.
  • the terminal device may be configured according to the third classified word and the fourth classified word in each candidate entity, and the first classified word and the second classified word in the query information. Calculate the corresponding s correlations of each candidate entity, and no further details are provided here.
  • the terminal device may process the s correlations corresponding to the candidate entity according to the set operation rule, and obtain the target relevance corresponding to the candidate entity.
  • the setting operation rule is an algorithm for custom setting on the user side or the system side, such as multiplication, addition, power multiplication, and the like.
  • the correlation degree with the largest value may be selected from the s correlations as the target correlation degree, etc., which is not limited in this application.
  • M q ) are included in the s correlation degrees.
  • M q ) is taken as an example, and the terminal device can calculate the target correlation degree p(q
  • t) is used to indicate the correlation between the information t of the candidate entity and the query information q.
  • the information to the target entity may be selected according to the w target relevance degrees, as the search result corresponding to the query information.
  • the target relevance is used to indicate the degree of relevance between the candidate entity (ie, the information of the candidate entity) and the query information.
  • the target entity is one or more of the w candidate entities.
  • the target entity is an entity corresponding to the target relevance of the w candidate entities that exceeds a preset first threshold.
  • the terminal device may directly select, according to the w target relevance degrees of the w candidate entities, a candidate entity whose target relevance exceeds a preset first threshold, as the target entity.
  • the terminal device may sort the w target relevance levels corresponding to the w candidate entities according to a preset order, for example, according to the target correlation degree, from the largest to the smallest, and then select the top m with the largest target relevance.
  • Target relevance, the m candidate entities corresponding to the m target correlations are used as the target candidate entities.
  • m is a positive integer set by the user side or system side, such as 1, 5, and so on.
  • the terminal device may use the information of the selected candidate entity (ie, the information of the target entity) as the search result corresponding to the query information.
  • the information of the target entity may also be displayed to the user for viewing.
  • the terminal device includes corresponding hardware structures and/or software modules for executing the respective functions.
  • the embodiments of the present invention can be implemented in a combination of hardware or hardware and computer software in combination with the elements and algorithm steps of the various examples described in the embodiments disclosed herein. Whether a function is implemented in hardware or computer software to drive hardware depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of the technical solutions of the embodiments of the present invention.
  • the embodiment of the present invention may divide a functional unit into a device according to the foregoing method example.
  • each functional unit may be divided according to each function, or two or more functions may be integrated into one processing unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present invention is schematic, and is only a logical function division, and the actual implementation may have another division manner.
  • FIG. 4 shows a possible structural diagram of the terminal device involved in the above embodiment.
  • the terminal device 700 includes a processing unit 702 and a communication unit 703.
  • the processing unit 702 is configured to perform control management on the actions of the terminal device 700.
  • Processing unit 702 is for supporting terminal device 700 to perform steps S102-S108 of FIG. 2, and/or for performing other steps of the techniques described herein.
  • the communication unit 703 is configured to support communication of the terminal device 700 with other devices, such as support and server communication to obtain information of w candidate entities included in the first entity library, and/or other techniques for performing the techniques described herein. step.
  • the terminal device 700 may further include a storage unit 701 for storing program codes and data of the terminal device 700.
  • the processing unit 702 can be a processor or a controller, for example, a central processing unit (CPU), a general-purpose processor, a digital signal processor (DSP), and an application specific integrated circuit. (English: Application-Specific Integrated Circuit, ASIC), Field Programmable Gate Array (FPGA) or other programmable logic device, transistor logic device, hardware component or any combination thereof. It is possible to implement or carry out the various illustrative logical blocks, modules and circuits described in connection with the present disclosure.
  • the processor may also be a combination of computing functions, for example, including one or more microprocessor combinations, a combination of a DSP and a microprocessor, and the like.
  • the communication unit 703 can be a communication interface, a transceiver, a transceiver circuit, etc., wherein the communication interface is a collective name and can include one or more interfaces, such as an interface between the terminal device and other devices.
  • the storage unit 701 can be a memory.
  • the terminal device according to the embodiment of the present invention may be the terminal device shown in FIG. 5.
  • the terminal device 710 includes a processor 712, a communication interface 713, and a memory 77.
  • the terminal device 710 may further include a bus 714.
  • the communication interface 713, the processor 712, and the memory 77 may be connected to each other through a bus 714;
  • the bus 714 may be a Peripheral Component Interconnect (PCI) bus or an extended industry standard structure (English: Extended Industry) Standard Architecture (EISA) bus, etc.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the bus 714 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in Figure 5, but it does not mean that there is only one bus or one type of bus.
  • the terminal device shown in FIG. 4 and FIG. 5 may further include a display unit, and the display unit may specifically be a display screen, which is not shown.
  • the display screen is used to display search results (information of the target entity).
  • the steps of the method or algorithm described in connection with the disclosure of the embodiments of the present invention may be implemented in a hardware manner, or may be implemented by a processor executing software instructions.
  • the software instructions can be composed of corresponding software modules, which can be stored in random access memory (English: Random Access Memory, RAM), flash memory, read only memory (English: Read Only Memory, ROM), erasable and programmable. Read only memory (English: Erasable Programmable ROM, EPROM), electrically erasable programmable read only memory (English: Electrically EPROM, EEPROM), registers, hard disk, mobile hard disk, compact disk (CD-ROM) or well known in the art Any other form of storage medium.
  • An exemplary storage medium is coupled to the processor to enable the processor to read information from, and write information to, the storage medium.
  • the storage medium can also be an integral part of the processor.
  • the processor and the storage medium can be located in an ASIC. Additionally, the ASIC can be located in a network device.
  • the processor and the storage medium can also exist as discrete components in the terminal device.
  • the foregoing storage medium includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Algebra (AREA)
  • Artificial Intelligence (AREA)
  • Computational Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

实体搜索方法、相关设备及计算机存储介质,其中所述方法包括:确定查询信息中所包括的第一分类词和第二分类词(S102);根据第一实体库以及所述查询信息中所包括的第一分类词和第二分类词,确定w个候选实体各自对应的s个相关度,所述候选实体的信息中包括有第三分类词和第四分类词,所述相关度用于指示所述查询信息中的分类词和所述候选实体中的分类词之间的相关度;根据所述w个候选实体各自对应的s个相关度,确定所述查询信息对应的目标实体的信息(S108),所述目标实体为所述w个候选实体中的实体。所述方法能够解决现有技术中由于表述差异导致结果搜索的匹配率较低或者准确率较低等问题,从而提高了搜索的准确率。

Description

实体搜索方法、相关设备及计算机存储介质 技术领域
本发明涉及互联网技术领域,尤其涉及实体搜索方法、相关设备及计算机存储介质。
背景技术
随着互联网的发展,文本数据的规模越来越大,早期的搜索引擎主要是基于文本的搜索。特别是基于关键词的匹配搜索方法,但该搜索方法缺乏对词义的深层理解,匹配度较低,且反馈结果以文本形式展示,用户需从反馈的文本中查找答案,用户体验差。
为解决上述问题,目前提出实体搜索方法,旨在寻找用户查询的实体(结果答案)并展示给用户。在实践中发现,当前的实体搜索方法主要也采用基于关键词的匹配搜索方法。由于关键词匹配方案中,需保证查询关键词和实体关键词一致时才能匹配成功,这样对大多数同义词或近义词都无法匹配成功,即具有相同或相似表述的词语均无法匹配成功。可见,实际搜索过程中由于不同用户对相同事物的表述存在差异,也称为表述差异(conceptual gap),这将导致搜索的匹配率较低或者准确率较低。
发明内容
本发明实施例公开了实体搜索方法、相关设备及计算机存储介质,能够解决现有技术中由于表述差异导致结果搜索中存在匹配率较低或者准确率较低等问题。
第一方面,本发明实施例公开提供了一种实体搜索方法,所述方法包括:
终端设备确定查询信息中所包括的第一分类词和第二分类词,所述第一分类词为所述查询信息中表述查询结果的类型的词语,所述第二分类词为所述查询信息中除所述第一分类词之外的词语;
根据第一实体库以及所述查询信息中所包括的第一分类词和第二分类词,确定w个候选实体各自对应的s个相关度;其中,所述第一实体库包括所述w个候选实体中每个候选实体的信息,所述候选实体的信息中包括有第三分类词和第四分类词,所述第三分类词和所述第一分类词属于同一分类,所述第四分类词和所述第二分类词属于同一分类;所述相关度用于指示所述查询信息中的分类词和所述候选实体中的分类词之间的相关度,w和s均为正整数,且s为小于等于4;
根据所述w个候选实体各自对应的s个相关度,确定所述查询信息对应的目标实体的信息,所述目标实体为所述w个候选实体中的实体。
在一些可能的实施例中,所述第一分类词包括所述查询信息中表述查询结果的类型所对应的核心词,所述第二分类词包括所述查询信息中除所述第一分类词和停用词之外的修饰词。
在一些可能的实施例中,所述s个相关度包括第一相关度,用于指示所述查询信息中的第一分类词和所述候选实体中的第三分类词之间的相关度,
所述根据第一实体库以及所述查询信息中所包括的第一分类词和第二分类词,确定w个候选实体各自对应的s个相关度包括:
将所述第一实体库中的w个候选实体进行分类处理,得到处理后的候选实体;所述分 类处理为将表述相同或相似词义的第三分类词所对应的候选实体合并为一个处理后的候选实体;
根据所述查询信息中的第一分类词和所述处理后的候选实体中的第三分类词,确定所述w个候选实体各自对应的第一相关度。
在一些可能的实施例中,所述s个相关度包括第二相关度,用于指示所述查询信息中的第一分类词和所述候选实体中的第四分类词之间的相关度,
所述根据第一实体库以及所述查询信息中所包括的第一分类词和第二分类词,确定w个候选实体各自对应的s个相关度包括以下中的任一项:
根据所述查询信息中处理后的第一分类词和所述w个候选实体中各自所包括的第四分类词,确定所述w个候选实体各自对应的相关度a,并作为所述w个候选实体各自对应的第二相关度;
根据所述查询信息中的第一分类词和所述w个候选实体中各自所包括的处理后的第四分类词,确定所述w个候选实体各自所对应的相关度b,并作为所述w个候选实体各自对应的第二相关度;
根据所述w个候选实体各自对应的相关度a以及所述w个候选实体各自对应的相关度b,确定所述w个候选实体各自对应的第二相关度;
其中,所述处理后的第一分类词为根据第一预存文档对所述查询信息中的第一分类词进行上下文关联处理后得到的,所述处理后的第四分类词为根据第一预存文档对所述候选实体中所包括的第四分类词进行上下文关联处理后得到的,所述上下文关联处理为在所述第一预存文档中提取临近所述第一分类词或者所述第四分类词的前i个词语和/或后j个词语,其中,i和j均为正整数。
在一些可能的实施例中,所述相关度a或者所述相关度b是根据相关度平滑算法确定的,所述相关度平滑算法用于缓减所述查询信息中的第一分类词或者所述候选实体中的第四分类词在所述第一预存文档中的偏差度。
在一些可能的实施例中,所述s个相关度包括第三相关度,用于指示所述查询信息中的第二分类词和所述候选实体中的第三分类词之间的相关度,
所述根据第一实体库以及所述查询信息中所包括的第一分类词和第二分类词,确定w个候选实体各自对应的s个相关度包括以下中的任一项:
根据所述查询信息中处理后的第二分类词和所述w个候选实体中各自所包括的第三分类词,确定所述w个候选实体各自对应的相关度c,并作为所述w个候选实体各自对应的第三相关度;
根据所述查询信息中的第二分类词和所述w个候选实体中各自所包括的处理后的第三分类词,确定所述w个候选实体各自对应的相关度d,并作为所述w个候选实体各自对应的第三相关度;
根据所述w个候选实体各自对应的相关度c以及所述w个候选实体各自对应的相关度d,确定所述w个候选实体各自对应的第三相关度;
其中,所述处理后的第二分类词为根据第二预存文档对所述查询信息中的第二分类词进行上下文关联处理后得到的,所述处理后的第三分类词为根据第二预存文档对所述候选 实体中所包括的第三分类词进行上下文关联处理后得到的,所述上下文关联处理为在所述第二预存文档中提取临近所述第二分类词或者所述第三分类词的前k个词语和/或后l个词语,其中,k和l均为正整数。
在一些可能的实施例中,所述相关度c或者所述相关度d是根据相关度平滑算法确定的,所述相关度平滑算法用于缓减所述查询信息中的第二分类词或者所述候选实体中的第三分类词在所述第二预存文档中的偏差度。
在一些可能的实施例中,所述s个相关度包括第四相关度,用于指示所述查询信息中的第二分类词和所述候选实体中的第四分类词之间的相关度,
所述根据第一实体库以及所述查询信息中所包括的第一分类词和第二分类词,确定w个候选实体各自对应的s个相关度包括以下中的任一项:
根据所述查询信息中拓展后的第二分类词和所述w个候选实体中各自所包括的第四分类词,确定所述w个候选实体各自对应的相关度e,并作为所述w个候选实体各自对应的第四相关度;
根据所述查询信息中的第二分类词和所述w个候选实体中各自所包括的拓展后的第四分类词,确定所述w个候选实体各自对应的相关度f,并作为所述w个候选实体各自对应的第四相关度;
根据所述w个候选实体各自对应的相关度e和所述w个候选实体各自对应的相关度f,确定所述w个候选实体各自对应的第四相关度;
其中,所述拓展后的第二分类词为将所述查询信息中的第二分类词进行属性词语的拓展后得到的,所述拓展后的第四分类词为将所述候选实体中的第四分类词进行属性词语的拓展后得到的。
在一些可能的实施例中,所述根据所述w个候选实体各自对应的s个相关度,确定所述查询信息对应的目标实体的信息包括:
根据所述w个候选实体各自对应的s个相关度,确定所述w个候选实体各自对应的目标相关度;
根据所述w个候选实体各自对应的目标相关度,确定所述查询信息对应的目标实体的信息,所述目标实体为所述w个候选实体中目标相关度大于或等于第一阈值所对应的实体。
第二方面,本发明实施例提供一种终端设备,包括用于执行上述第一方面所描述的方法所对应的功能单元。
第三方面,本发明实施例提供了又一种终端设备,包括存储器及与所述存储器耦合的处理器;所述存储器用于存储指令,所述处理器用于执行所述指令,并与所述第一摄像头和所述第二摄像头进行通信;其中,所述处理器执行所述指令时执行上述第一方面所描述的方法。
在一些可能的实施例中,所述终端设备还包括与所述处理器耦合的显示器,所述显示器用于在所述处理器的控制下显示目标实体的信息(搜索结果)。
在一些可能的实施例中,所述终端设备还包括通信接口,所述通信接口与所述处理器通信,所述通信接口用于在所述处理器的控制下与其他设备(如网络设备等)进行通信。
第四方面,提供了一种计算机可读存储介质,所述计算机可读存储介质存储了用于实 体搜索的程序代码。所述程序代码包括用于执行上述第一方面所描述的方法的指令。
通过实施本发明实施例,能够解决现有技术中由于表述差异导致结果搜索中存在匹配率较低或者准确率较低等问题。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍。
图1是本发明实施例提供的一种网络框架示意图。
图2是本发明实施例提供的一种实体搜索方法的流程示意图。
图3A-图3B是本发明实施例提供的两种第一实体库的结构示意图。
图4是本发明实施例提供的一种终端设备的结构示意图。
图5是本发明实施例提供的另一种终端设备的结构示意图。
具体实施方式
下面将结合本发明的附图,对本发明实施例中的技术方案进行详细描述。
首先介绍本申请适用的网络框架示意图。请参见图1是本发明实施例提供的一种网络框架示意图。如图1,所述网络框架100包括实体搜索组件12以及应用服务组件14。可选的,还可包括文档搜索组件16。其中:
所述实体搜索组件12包括实体库120以及匹配器122。所述实体库120中包括有一个或多个实体的信息,所述实体的信息为用于描述该实体的描述信息,例如该实体的名称、标识以及属性信息等等。可选的,还可包括任意两个实体之间的关系信息等,具体将在下文进行详细阐述。所述匹配器122包括一个或多个匹配器。所述匹配器可用于计算用户输入的查询信息和所述实体库中实体的信息之间的相关度(即匹配度)。可选的,还可将计算的相关度反馈给应用服务组件14,便于后续根据该相关度确定该实体的信息是否为所述查询信息对应的搜索结果(结果答案)。
具体的,本申请中所述匹配器122可设计包括有三个匹配器,分别为:第一匹配器、第二匹配器以及第三匹配器。其中,所述第一匹配器用于计算查询信息中包括的第一分类词(如核心词)和实体库中实体的信息所包括的第三分类词(如核心词)之间的相关度(即匹配度)。第二匹配器用于计算查询信息中包括的第二分类词(如修饰词)和实体库中实体的信息所包括的第三分类词(如核心词)之间的相关度,可选的,还可用于计算查询信息中包括的第一分类词(如核心词)和实体库中实体的信息所包括的第四分类词(如修饰词)之间的相关度。第三匹配器用于计算查询信息中包括的第二分类词(如修饰词)和实体库中实体的信息所包括的第四分类词(如修饰词)之间的相关度。所述查询信息中的第一分类词和所述实体的信息中第三分类词属于同一类型/分类,例如均为用于表述查询结果的类型所对应的核心词等。所述查询信息中的第二分类词和所述实体的信息中第四分类词属于同一类型/分类,例如均为用于修饰查询结果的类型所对应的修饰词等。关于所述查询信息中的分类词(具体为第一分类词和第二分类词)、所述实体的信息中的分类词(具体为第三分类词和第四分类词)以及上述三个匹配器各自的功能(即如何计算获得相应的相关度) 将在本申请的下文进行详细阐述,这里不做详述。
需要说明的是,上述三个匹配器仅为示例,并不构成限定。在实际应用中,也可将三个匹配器的功能集成到一个匹配器实现,也可将三个匹配器的功能拆分到多个匹配器中协作完成,例如上述第二匹配器中涉及的两个功能也可拆分到两个匹配器中实现等等,本申请这里不做详述和限定。
所述文档搜索组件16包括文档库160以及搜索服务162。其中,所述文档库160包括一个或多个文档,该文档包括有待查询信息对应的搜索结果。所述搜索服务162用于根据用户输入的查询信息从所述文档库中查询出对应的搜索结果,并将所述搜索结果反馈给应用服务组件14。或者,所述搜索服务162可用于计算用户输入的查询信息和文档库中各文档之间的相关度(匹配度),将该相关度反馈给应用服务组件14,便于该组件14根据该相关度确定该文档是否为所述查询信息对应的搜索结果。
所述应用服务组件14用于展示所述查询信息所对应的搜索结果(结果答案),便于用户查看。具体的,所述应用服务组件14可包括展示模块140。可选的,还可包括排序模块142以及反馈模型模块144。其中,所述展示模块140用于向用户展示所述查询信息对应的搜索结果。所述排序模块142用于对接收的相关度进行设定顺序的排序,例如按照相关度从大到小依次排序等等,便于所述展示模块根据所述排序模块中确定出所需展示的搜索结果。可选的,所述排序模块142还可对接收的相关度进行筛选/过滤,例如过滤掉低于一定阈值(如40%)的相关度等等。
以实体搜索组件反馈的相关度为例,所述实体搜索组件14可向所述应用服务组件14(具体为排序模块142)反馈所述查询信息与实体库中每个实体的信息之间的关联度。相应地,排序模块可将超过一定阈值的这些关联度,按照从大到小的顺序进行排序。进一步地,所述展示模块142可根据所述排序模块的排列顺序依次展示该相关度对应的实体的信息(即搜索结果)给用户,这样便于用户更为直观、有效地查看到查询信息对应的搜索结果。
所述反馈模型模块144可用于收集用户的反馈信息,该反馈信息可用于对上述实体搜索组件或文档搜索组件中搜索到的候选结果(具体为候选实体的信息或文档)进行再过滤或排序等等。例如去除用户反馈不好或用户点击率(查看率)较低的候选结果等。所述反馈信息具体可为用户以文档形式反馈的信息,例如查询信息对应的搜索结果是否合适、解决了相关的问题等;也可是用户的点击信息,例如用户是否点击查看该查询信息对应的搜索结果等。
在产品实现过程中,本申请提出的所述网络框架具体可通过创建web服务(Rest)的方式或者应用程序编程接口(application programming interface,API)的方式来提供上述的功能服务。所述网络框架可部署到相应地终端设备中。所述终端设备包括但不限于用户设备(user equipment,UE)、服务器、手机、平板电脑(table personal computer)、个人数字助理(personal digital assistant,PDA)、移动上网装置(mobile internet device,MID)或可穿戴式设备(wearable device)等具备网络通信功能的设备。
基于图1所示的网络框架示意图,下面介绍本申请涉及的实体搜索方法的相关实施例。 参见图2,是本发明实施例提供的一种实体搜索方法的流程示意图。如图2所示的实体搜索方法包括如下实施步骤:
步骤S102、终端设备确定查询信息中所包括的第一分类词和第二分类词,其中,所述第一分类词为所述查询信息中表述查询结果的类型的词语,所述第二分类词为除所述第一分类词之外的词语。
本申请中,终端设备可确定查询信息中包括的查询二元结构。该查询二元结构是由两个(种)分类词构成的。具体的,所述查询二元结构包括第一分类词和第二分类词,所述第一分类词可为用户输入的查询信息中用于表述查询结果的类型的词语,例如可为核心词或关键词。所述第二类词语可为查询信息中除用于表述查询结果的类型之外的其他词语,例如可为用于修饰/限定所述第一分类词的修饰词等。所述第一分类词和所述第二分类词中各自所包括的词语数量本申请不做限定,例如一个或多个等。为方便描述,本申请下文以所述第一分类词为核心词、所述第二分类词为修饰词为例,进行相关内容的阐述。
例如,用户输入的查询信息为“某电视剧中的A场景篇在哪里取景”。可知,该查询信息中的第一分类词为“哪里”,第二分类词为“某电视剧”、“A场景篇”以及“取景”。
步骤S104、所述终端设备确定第一实体库中w个候选实体各自所包括的第三分类词和第四分类词,其中,所述第三分类词和所述第一分类词属于同一分类,所述第四分类词和所述第二分类词属于同一分类,w为正整数。
具体的,第一实体库包括有w个候选实体中每个候选实体的信息,w为正整数。所述候选实体的信息为用于描述所述候选实体的描述信息,例如该候选实体的名称、标识以及属性等等。例如实体为“姚明”,则该实体的信息可包括有姚明的身高、体重、出生日期以及户籍所在地等属性信息。可选的,所述第一实体库中还可包括有关系信息,该关系信息为用于描述任意两个候选实体之间的关系的信息,例如第一候选实体为第二候选实体的父节点或子节点等关系。
所述候选实体的信息中包括有该候选实体的实体二元结构,该实体二元结构是由两个(种)分类词构成的。具体的,所述实体二元结构包括第三分类词和第四分类词。所述第三分类词为所述候选实体的信息中用于表述搜索结果的类型的词语,所述第四分类词为所述候选信息的信息中除所述第三分类词之外的词语,即除用于描述搜索结果的类型之外的其他词语。即是,所述第三分类词和上述第一分类词对应,均为属于同一类型/分类的词语;所述第四分类词和上述第二分类词对应,均为属于同一类型/分类的词语。
步骤S106、所述终端设备根据所述查询信息中的第一分类词和第二分类词以及所述w个候选实体各自所包括的第三分类词和第四分类词,确定所述w个候选实体各自对应的s个相关度,其中,所述相关度用于指示所述查询信息中的分类词和所述候选实体中的分类词之间的相关度,s为小于等于4的正整数。
终端设备可根据每个候选实体的实体二元结构中所包括的第三分类词和第四分类词以及所述查询信息的查询二元结构中所包括的第一分类词和第二分类词,计算得到该候选实体的信息与该查询信息之间存在的s个相关度,s为正整数。其中,所述相关度用于指示该候选实体中的第一目标分类词和该查询信息中的第二目标分类词之间的相关度。其中,所述第一目标分类词为所述第一分类词或者第二分类词,所述第二目标分类词为所述候选实 体中的第三分类词或者第四分类词。
以所述第一分类词和所述第三分类词为核心词,所述第二分类词和所述第四分类词为修饰词为例,s为小于等于4的正整数。所述s个相关度包括第一相关度至第四相关度中的任一个或多个。其中,第一相关度用于指示所述查询信息中的核心词和所述候选实体中的核心词之间的相关度。第二相关度用于指示所述查询信息中的核心词和所述候选实体中的修饰词之间的相关度。第三相关度用于指示所述查询信息中的修饰词和所述候选实体中的核心词之间的相关度。第四相关度用于指示所述查询信息中的修饰词和所述候选实体中的修饰词之间的相关度。关于上述四个相关度如何计算获得,将在下文进行详细阐述。
步骤S108、所述终端设备根据所述w个候选实体各自对应的s个相关度,确定所述查询信息对应的目标实体的信息,所述目标实体为所述w个候选实体中的实体。
终端设备可根据每个候选实体对应的s个相关度,计算到该候选实体对应的目标相关度。相应地,可计算得到w个候选实体各自对应的目标相关度。进一步地,根据所述w个目标相关度,从所述w个目标相关度对应的w个候选实体中,选择出目标相关度超过一定阈值(如80%)所对应的候选实体,作为目标实体。进而将所述目标实体的信息作为所述查询信息对应的搜索结果。可选的,还可将该目标实体的信息展示给用户查看等。其中,所述目标实体的数量本申请不做限定,可为一个或多个。关于所述目标相关度如何计算将在下文进行详述。
下面介绍本申请涉及的一些具体实施例和可选实施例。
步骤S102中,终端设备获取用户输入的查询信息。接着,可利用开源工具对所述查询信息进行预处理,以获得所述查询信息中包括的查询二元结构。即通过开源工具提取所述查询信息中所包括的第一分类词和第二分类词。关于所述第一分类词和所述第二分类词可参见前述实施例中的相关阐述,这里不再赘述。
所述预处理包括有二元结构识别处理(即对第一分类词和第二分类词的识别处理)。可选的,所述预处理还可包括但不限于以下处理中的任一项或多项的组合:分词处理(词语划分处理)、去停用词处理以及语义拓展处理等等,关于所述预处理这里不做详述。
步骤S104中,终端设备可获取第一实体库。其中,所述第一实体库可为用户侧或系统侧预先定义好的数据库,该数据库中包括有一个或多个候选实体的信息。关于所述候选实体的类型以及相关信息,本申请不做限定。例如所述数据库中包括有电影类型的实体,服装类型的实体以及其他领域或类型的实体等。
在可选实施例中,所述第一实体库与所述查询信息中的第一分类词关联,即所述第一实体库是根据所述查询信息中的第一分类词确定的。具体的,终端设备可根据所述查询信息中的第一分类词确定与之对应的第一实体库,该第一实体库中包括的所有候选实体所属的类型(即分类)和所述查询信息中第一分类词对应表述的类型相同。例如,所述第一分类词为用于表示地址的“哪里”,则所述第一实体库中包括的所有候选实体均为表示地址的实体,或者与表示地址的实体存在关联的其他实体。本申请下文以所述第一实体库包括w个候选实体的信息为例,进行相关内容的阐述。w为正整数。
进一步地,终端设备可利用开源工具对所述第一实体库中的每个候选实体的信息进行预处理,从而获得所述第一实体库中每个候选实体各自所包括的第三分类词和第四分类词。关于所述预处理、所述第三分类词以及所述第四分类词具体可参见前文实施例中的相关阐述,这里不再赘述。
步骤S106中,终端设备可根据每个候选实体的信息所包括的第三分类词和第四分类词以及所述查询信息中所包括的第一分类词和第二分类词,计算/确定该候选实体的信息与所述查询信息之间存在关联的s个相关度。其中,s为小于等于4的正整数。
即,终端设备可根据所述w个候选实体各自所包括的第三分类词和第四分类词以及所述查询信息中所包括的第一分类词和第二分类词,确定所述w个候选实体各自对应的s个相关度。
下面本申请以一个候选实体包括第三分类词和第四分类词为例,所述第一分类词和所述第三分类词为核心词,所述第二分类词和所述第四分类词为修饰词,具体阐述S106(根据所述候选实体的信息所包括的第三分类词和第四分类词以及所述查询信息中所包括的第一分类词和第二分类词确定所述候选实体对应的s个相关度)的实施方式。
在一些实施方式中,所述s个相关度包括第一相关度,该第一相关度p(h t|h q)用于指示所述查询信息中的第一分类词和所述候选实体中的第三分类词之间的相关度。即第一相关度用于指示查询中的核心词和所述候选实体中的核心词之间的相关度。
由于第一实体库中包括的候选实体的信息过于细化或具体,导致在利用查询信息中的第一分类词(即核心词)进行词语匹配时,匹配的成功率较低。因此,本申请可先对第一实体库中候选实体的信息进行泛化,再利用泛化后的第一实体库中候选实体的第一分类词(即核心词)和查询信息中的第三分类词(核心词)进行匹配,从而可提升搜索匹配的成功率或者准确率。
具体的,终端设备(具体可为设备中的第一匹配器)可先将所述第一实体库中包括的w个候选实体的信息进行分类(归类)处理,得到第一实体库中w’个处理后的候选实体。其中,所述分类处理为将表述相同或相似词义的第三分类词所对应的候选实体合并为一个处理后的候选实体,即将具有相同或相似核心词的候选实体合并为一个处理后的候选实体。
进一步地,再利用所述第一实体库中w’个处理后的候选实体中的第三分类词(核心词)和所述查询信息中的第一分类词(核心词),计算所述第一相关度。具体可采用如下公式(1)计算所述第一相关度p(h t|h q)。
Figure PCTCN2019086197-appb-000001
其中,q表示查询信息,t表示候选实体的信息(这里也是指处理后的候选实体的信息)。h i表示i的核心词(即第一分类词或第三分类词)。M i表示i的修饰词(即第二分类词或第 四分类词)。i=q或t。|H(h t)|表示h t的子节点的数量。|S(h t)|表示h t的所有后继节点(包括子节点以及子孙节点)的数量。关于父节点、子节点、祖父节点以及子孙节点均为所述第一实体库中任意两个候选实体之间的关系信息,本申请下文将以一个例子进行详细说明。
举例来说,如图3A示出一种建筑物类型的第一实体库。该第一实体库中包括三层节点信息,每层中包括有一个或多个节点(即一个或多个候选实体的信息)。如图3A中,第一层中包括有两个候选实体的信息,分别为:苏格兰人的作品(works by Scottish people)以及苏格兰建筑物(Scottish architecture)。第二层中包括有一个候选实体的信息,为苏格兰建造师设计的建筑物(structures by Scottish architects)。第三层中包括有四个候选实体的信息,分别为罗伯特.罗安德.安德森的楼房(Robert Roward Anderson buildings)、威廉.富勒的火车站(William Fowler railway stations)、查尔斯.仑尼.麦金托什的楼房(Charles Rennie Mackintosh buildings)以及约瑟夫.米切尔的楼房(Joseph Mitchell buildings)。其中,图示还表述/包括有实体间的关系信息。具体的,上一层中的节点(即候选实体的信息)为下一层节点(候选实体的信息)的父节点。反之,下一层中的节点为上一层节点的子节点。例如,第一层中的works by Scottish people为第二层中structures by Scottish architects的父节点。此外,父节点以上的节点可称为下一层节点的祖父节点,子节点以下的节点可称为上一层节点的子孙节点。例如图示中,第一层中的works by Scottish people为第三层中Robert Roward Anderson buildings的祖父节点等。
相应地,终端设备可利用第一匹配器(header-header)从所述第一实体库中提取出与所述查询信息中的核心词存在关联的相关节点进行合并,假设这里的相关节点为图3A中的所有节点。为实现所述第一实体库的在线构建,可采用随机游走(random walk)方法提取相关节点以及相关节点间的关系(图示可为节点之间存在的线条),进而再将这些节点中具有相同核心词的节点进行合并(即合并具有相同核心词的候选实体),从而得到泛化后的第一实体库。具体如图3B示出泛化后的第一实体库。如图3B该第一实体库同样包括三层节点信息。其中,第一层所包括的处理后的候选实体(即该候选实体的核心词)分别为:著作(works)以及建筑物(architecture)。第二层包括处理后的候选实体为建筑物(structures)。第三层包括处理后的候选实体分别为:楼房(buildings)以及站点(stations)。
进一步地,可利用泛化后所述第一实体库中处理后的候选实体的核心词和查询信息中的核心词来计算获得第一相关度。具体的,可采用上述公式(1)来计算所述第一相关度P(h t|h q)。
例如,用户输入的查询信息为查尔斯的著作(Works by Charles)。由上述S102可知,查询信息中的核心词为:著作(Works),修饰词为:查尔斯(Charles)。引用上述图3A和3B例子,假设依据所述查询信息中的核心词选定到图3A所示的第一实体库,且选定图3A中的候选实体的信息为约瑟夫.米切尔的楼房(Joseph Mitchell buildings)来计算它们之间的第一相关度。其中,该候选实体的信息中的核心词为楼房(buildings),修饰词为约瑟夫(Joseph)和米切尔(Mitchell)。相应地,如上所述对图3A所示的第一实体库中候选实体的信息进行泛化,得到如图3B中泛化后第一实体库中处理后的候选实体的信息(具体为候选实体的核心词)。
由上图3B可知,Joseph Mitchell buildings对应在图3B中的候选实体的核心词为楼房 (buildings)。进一步利用上述公式(1)来计算查询信息中的核心词和候选实体中的核心词之间的第一相关度。由上述公式(1)以及图3B可知,候选实体中的核心词buildings(h t)是查询信息中的核心词Works(h q)的子孙节点,则它们之间的第一相关度P(h t|h q)=P(buildings|Works)=1。
在又一些可能的实施方式中,所述s个相关度包括第二相关度,该第二相关度p(M t|h q)用于指示所述查询信息中的第一分类词和所述候选实体中的第四分类词之间的相关度,即指示所述查询信息中的核心词和所述候选实体中的修饰词之间的相关度。具体存在如下三种实现方式。
第一种方式中,终端设备(具体可为设备中的第二匹配器header-modifier)可根据第一预存文档对所述候选实体的信息中的第四分类词(修饰词)进行上下文关联,得到该候选实体的信息中处理后的第四分类词(即处理后的修饰词)。进一步,再根据所述查询信息中的第一分类词(即核心词)和所述候选实体中处理后的第四分类词(即处理后的修饰词),计算获得相关度a。可选的,可将该相关度a作为所述第二相关度。
所述上下文关联可以是指在所述第一预存文档中提取临近所述候选实体的信息中的第四分类词(修饰词)的前i个词语和/或后j个词语,以对应得到该候选实体的信息中的处理后的第四分类词。其中,i和j为用户侧或系统侧自定义设置的正整数,它们可以相同,也可不同,本申请不做限定。所述第一预存文档可为用户侧或系统侧预先存储在所述终端设备中的文档,也可为从服务器侧获取的文档。该文档可为与所述查询信息相关的说明文档等,也可为对所述候选实体或所述候选实体对应的第一实体库进行相关说明的文档等等,本申请不做详述和限定。相应地,所述第一预存文档的数量本申请也不做限定。
在可选实施例中,由于第一预存文档存在的差异性较大,例如某些文档中第一分类词(即查询信息中的核心词)出现的次数较多,另一些文档中第一分类词出现的次数较少,这样容易导致相关度a的计算准确度不高。因此本申请还可采用相关度平滑算法,来计算所述查询信息中的第一分类词(核心词)和所述候选实体中处理后的第四分类词(即处理后的修饰词)之间的相关度a(或第二相关度)。具体的,可采用如下公式(2)计算获得相关度a(p a(M t|h q))。
Figure PCTCN2019086197-appb-000002
其中,λ是指相关度平滑算法中所采用的概率平滑因子,m是指候选实体中的某个修饰词(即某个第四分类词),M t是指候选实体中所有的修饰词(或它们组成的集合)。D是指第一预存文档。∏是指累乘。n(h q,m)是指h q和修饰词m同时出现的数量。w为ctx(m)中的任意词语。ctx(m)是指对修饰词m进行上下文关联后所获得的处理后的修饰词,或 者由处理后的修饰词所组成的集合。h i表示i的核心词(第一分类词或第三分类词)。M i表示i的修饰词(第二分类词或第四分类词)。i=q或t。q表示查询信息,t表示候选实体的信息。
举例来说,引用上述用户输入的查询信息为查尔斯的著作(Works by Charles)以及图3A的例子。候选实体的信息为约瑟夫.米切尔的楼房(Joseph Mitchell buildings)。这里假设利用该候选实体的修饰词(即该候选实体中的某个第四分类词)为米切尔(Mitchell),来计算查询信息中的核心词(Works,即查询信息中的第一分类词)与候选实体中的修饰词(Mitchell)之间的第二相关度。
具体的,假设第一预存文档有如下三份,d1:Mitchell is work by Charles…,d2:Joseph Mitchell building…,d3:Mitchell Work…。则利用第一预存文档对候选实体中的修饰词(Mitchell)进行上下文关联后,获得处理后的修饰词ctx(Mitchell)={work,building,work…}。为减小第一预存文档的选文偏差度,本申请可采用相关度平滑算法(也可称为概率平滑算法),计算出查询信息中的核心词和候选实体的实体二元结构中处理后的修饰词之间的第二相关度,具体可如上述公式(2)所示。
假设该例中,λ=0.5,利用上述公式(2)可获得n(h q,m)=n(work,Mitchell)=2。Σ wn(w,m)表示处理后的修饰词中的所有词与修饰词m同时出现的次数,该例中是3,即修饰词m和查询信息中的核心词在第一预存文档中共同出现的概率n(h q,m)/Σ wn(w,m)=2/3。p(h q|D)表示h q在第一预存文档中出现的概率,例子中work出现的概率为2/3。相应地,相关度a:p(m|h q)=p(Mitchell|work)=(1-λ)n(h q,m)/Σ wn(w,m)+λp(h q|D)=(1-0.5)x2/3+0.5x2/3=1/3。
第二种方式中,终端设备(具体可为设备中的第二匹配器)可根据第一预存文档对查询信息中的第一分类词(即核心词)进行上下文关联,得到该查询信息中处理后的第一分类词(即处理后的核心词)。进一步地,再利用所述查询信息中处理后的第一分类词(即处理后的核心词)和所述候选实体的信息中的第四分类词,计算获得相关度b。可选的,可将关联度b作为第二关联度。关于所述上下文关联可参照前述实施例中的相关阐述,这里不再赘述。
在可选实施例中,为减小第一预存文档的选文偏差度(即提高相关度b的计算准确度),本申请同样可采用相关度平滑算法来计算相关度b。具体的,可采用如下公式(3)计算获得相关度b(p b(M t|h q))。
Figure PCTCN2019086197-appb-000003
其中,ctx(h q)是指对核心词h q进行上下文关联后所获得的处理后的核心词,或者由处理后的核心词所组成的集合。关于公式(3)中涉及的其他参数含义可参见前述公式(2) 中的相关阐述,这里不再赘述。
举例来说,引用上述第一种实施方式所示例子,利用第一预存文档对查询信息中的核心词(Work,即查询信息中的第一分类词)进行上下文关联后,获得处理后的核心词ctx(Work)={Mitchell,Charles,Mitchell}。则相关度b可为p(m|h q)=p(Mitchell|work)=(1-λ)n(h q,m)/Σ w∈ctx(hq)n(w,m)+λp(h q|D)=(1-0.5)x2/3+0.5x2/3=1/3。
第三种实现方式,终端设备在利用上述第一种以及第二种实现方式,计算获得相关度a和相关度b后,可按照设定运算规则,对所述相关度a和相关度b进行处理以得到所述第二相关度。所述设定运算规则为用户侧或系统侧自定义设置的运算法则,例如加法、减法、除法、乘法、数值取最大等等,本申请不做限定。以设定运算规则为数值取最大的运算法则为例,则第二相关度=相关度a∨相关度b=p a(M t|h q)∨p b(M t|h q)。
在又一些可能的实施例中,所述s个相关度包括第三相关度,该第三相关度p(h t|M q)用于指示所述查询信息中的第二分类词(修饰词)和所述候选实体的信息中的第三分类词(核心词)之间的相关度,即指示所述查询信息中的修饰词和所述候选实体的信息中的核心词之间的相关度。具体存在如下三种实施方式。
第一种实施方式中,终端设备(具体可为设备中的第二匹配器)根据第二预存文档对查询信息中的第二分类词(修饰词)进行上下文关联后,得到该查询信息中处理后的第三分类词(即处理后的修饰词)。进一步,再利用所述查询信息中处理后的第二分类词(即处理后的修饰词)和所述候选实体的信息中的第三分类词(核心词),计算获得相关度c。可选的,可将该相关度c作为所述第三相关度。
所述上下文关联可以是指在所述第二预存文档中提取临近所述查询信息中第二分类词的前k个词语和/或后l个词语。其中,k和l可为用户侧或系统侧自定义设置的正整数。本申请中,所述第一预存文档和所述第二预存文档均为用户侧或系统侧自定义的文档,它们可以相同,也可不同。i,j,k以及l可为用户侧或系统侧自定义设置的正整数,它们可以相同,也可不同,本申请不做限定。关于所述上下文关联以及所述第二预存文档可参见前述实施例中的相关阐述,这里不做赘述。
在可选实施例中,为减小第二预存文档的选文偏差度(即提高相关度c的计算准确度),本申请同样可采用相关度平滑算法来计算相关度c。具体的,可采用如下公式(4)计算获得相关度c(p c(h t|M q))。
Figure PCTCN2019086197-appb-000004
其中,λ是指相关度平滑算法中所采用的概率平滑因子,m是指查询信息中的某个修 饰词(即某个第二分类词),M q是指查询信息中所有的修饰词(或它们组成的集合)。D是指第二预存文档。∏是指累乘。n(h t,m)是指h t和修饰词m同时出现的数量。w为ctx(m)中的任意词语。ctx(m)是指对修饰词m进行上下文关联后所获得的处理后的修饰词,或者由处理后的修饰词所组成的集合。h i表示i的核心词(第一分类词或第三分类词)。M i表示i的修饰词(第二分类词或第四分类词)。i=q或t。q表示查询信息,t表示候选实体的信息。∝表示正比关系。可选的,本申请中p(h t|M q)也可视为和p(M q|h t)相同,本申请不做限定。
举例来说,引用上述公式(2)的例子,对查询信息中的修饰词Charles进行上下文关联后,得到的ctx(Charles)={work}。相应地,相关度c可为:
p 1(h t|M q)∝p 1(M q|h t)=p(m|h t)
=p(Charles|building)
=(1-λ)n(h t,m)/Σ wn(w,m)+λp(h t|D)
=(1-0.5)x0+0.5x0
=0
第二种实施方式中,终端设备(具体可为设备中的第二匹配器)根据第二预存文档对候选实体的信息中的第三分类词(核心词)进行上下文关联,得到该候选实体的信息中处理后的第三分类词(即处理后的核心词)。进一步,再利用所述查询信息中的第二分类词(修饰词)和所述候选实体的信息中处理后的第三分类词,计算获得相关度d。可选的,可将相关度d作为第三相关度。
在可选实施例中,为减小第二预存文档的选文偏差度(即提高相关度c的计算准确度),本申请同样可采用相关度平滑算法来计算相关度d。具体的,可采用如下公式(5)计算获得相关度d(p d(h t|M q))。
Figure PCTCN2019086197-appb-000005
其中,ctx(h t)是指对核心词h t进行上下文关联后所获得的处理后的核心词,或者由处理后的核心词所组成的集合。关于公式(5)中涉及的其他参数含义可参见前述公式(4)中的相关阐述,这里不再赘述。
举例来说,引用上述第一种实施方式所示例子,利用第二预存文档对候选实体的信息中的核心词(buildings,即第三分类词)进行上下文关联后,获得处理后的核心词ctx(buildings)={Mitchell}。则相关度d为:p d(h t|M q)∝p(M q|h t)=p(m|h t)=p(Charles|building)=(1-λ)n(h t,m)/Σ wn(w,m)+λp(h t|D)=(1-0.5)x0+0.5x0=0。
第三种实现方式,终端设备在利用上述第一种以及第二种实现方式,计算获得相关度c和相关度d后,可按照设定运算规则,对所述相关度c和相关度d进行处理以得到所述第三相关度。所述设定运算规则为用户侧或系统侧自定义设置的运算法则,例如加法、减法、除法、乘法、数值取最大等等,本申请不做限定。以设定运算规则为数值取最大的运算法则为例,则第三相关度=相关度c∨相关度d=p c(h t|M q)∨p d(h t|M q)。
在又一些实施例中,所述s个相关度包括第四相关度,该第四相关度p(M t|M q)用于指示所述查询信息中的第二分类词(修饰词)与所述候选实体的信息中的第四分类词(修饰词)之间的相关度,即指示所述查询信息中的修饰词和所述候选实体中的修饰词之间的相关度。具体存在以下三种实施方式。
第一种实施方式中,终端设备(具体可为设备中的第三匹配器)可对所述查询信息中的第二分类词(修饰词)进行拓展,例如针对该第二分类词进行属性词语的拓展等,以得到该查询信息中处理后的第二分类词(即处理后的修饰词)。然后,再利用所述查询信息中处理后的第二分类词(处理后的修饰词)和所述候选实体的信息中的第四分类词(修饰词),计算获得相关度e。可选的,可将相关度e作为所述第四相关度。具体的,可采用如下公式(6)计算获得相关度e(p e(M t|M q))。
Figure PCTCN2019086197-appb-000006
其中,w和w i属于M q中的任意词语。w j属于M t中的任意词语。M e是指候选实体的信息中拓展后的所有修饰词(即它们组成的集合)。m为M e中的任意词。M q是指查询信息中所有的修饰词(或它们组成的集合,这里也视为M e)。M t是指候选实体的信息中所有的修饰词(或它们组成的集合)。n(m,w)是指修饰词m和w同时出现的数量,且修饰词m和w相同。n(w i,w j)是指修饰词w i以及w j同时出现的数量。h i表示i的核心词(第一分类词或第三分类词)。M i表示i的修饰词(第二分类词或第四分类词)。i=q或t。q表示查询信息,t表示候选实体的信息。
举例来说,引用上述查询信息为查尔斯的著作(Works by Charles)的例子。查询信息中的修饰词为查尔斯(Charles)。则终端设备可通过第三匹配器对该查询信息中的修饰词Charles进行属性词语的拓展,得到查询信息中拓展后的修饰词。例如这里第一实体库中包括有关于Charles的属性信息的描述,如苏格兰男性建筑师Scottish male architect以及米切尔楼房的建筑师(Mitchell building’s architect)。相应的,这里针对Charles拓展后的修饰词Me={Scottish,male,Mitchell}。
进一步地,基于Me以及候选实体中的修饰词Mitchell,计算它们之间的相关度e。具体为:p e(M t|M q)可以用p(M t|M e)表示,即p({Joseph,Mitchell}|{Scottish,male,Mitchell})=n(Mitchell,Mitchell)/Σn(w i,w j)=1/6。
第二种实现方式中,终端设备(具体可为设备中的第三匹配器)可对所述候选实体的信息中的第四分类词(修饰词)进行拓展,例如针对所述第四分类词进行属性词语的拓展等,以对应得到所述候选实体中处理后的第四分类词(即处理后的修饰词)。进一步,再利用所述查询信息中的第二分类词(修饰词)和所述候选实体中处理后的第四分类词(处理后的修饰词),计算获得相关度f。可选的,可将相关度f作为所述第四相关度。
具体的,可采用如下公式(7)计算获得相关度f(p f(M t|M q))。
Figure PCTCN2019086197-appb-000007
其中,w和w i属于M q中的任意词语。w j属于M t中的任意词语。M e是指候选实体的信息中拓展后的所有修饰词(即它们组成的集合)。m为M e中的任意词。M q是指查询信息中所有的修饰词(或它们组成的集合)。M t是指候选实体的信息中所有的修饰词(或它们组成的集合,这里也可视为M e)。n(m,w)是指修饰词m和w同时出现的数量,且修饰词m和w相同。n(w i,w j)是指修饰词w i以及w j同时出现的数量。h i表示i的核心词(第一分类词或第三分类词)。M i表示i的修饰词(第二分类词或第四分类词)。i=q或t。q表示查询信息,t表示候选实体的信息。
举例来说,上述公式(6)的例子,候选实体的信息为约瑟夫.米切尔的楼房(Joseph Mitchell buildings)。这里假设利用该候选实体的修饰词(即该候选实体中的某个第四分类词)为米切尔(Mitchell),来计算查询信息中的修饰词(Charles,即查询信息中的第二分类词)与候选实体中的修饰词(Mitchell)之间的第四相关度。具体的,本例中针对Mitchell拓展后的修饰词M e={Joseph}。则相关度f为p f(M t|M q),可用p(M e|M q)表示:p({Joseph}|{Joseph,Mitchell})=n(Joseph,Joseph)/Σn(w i,w j)=1/2。
第三种实现方式,终端设备在利用上述第一种以及第二种实现方式,计算获得相关度e和相关度f后,可按照设定运算规则,对所述相关度e和相关度f进行处理以得到所述第二相关度。所述设定运算规则为用户侧或系统侧自定义设置的运算法则,例如加法、减法、除法、乘法、数值取最大等等,本申请不做限定。以设定运算规则为数值取最大的运算法则为例,则第二相关度=相关度e∨相关度f=p e(M t|M q)∨p f(M t|M q)。
需要说明的是,按照上述S106具体实施方式的阐述原理,终端设备可根据每个候选实体中的第三分类词和第四分类词以及所述查询信息中的第一分类词和第二分类词,计算出每个候选实体各自对应的s个相关度,这里不再赘述。
相应地步骤S108中,终端设备在获得每个候选实体对应的s个相关度后,可按照设定运算规则该候选实体对应的s个相关度进行处理,得到该候选实体对应的目标相关度。所述设定运算规则为用户侧或系统侧自定义设置的运算法则,例如乘法、加法、幂次乘法运算等等。又如,可从s个相关度中选择一个数值最大的相关度作为目标相关度等,本申请不做限定。
示例性地,以所述s个相关度包括第一相关度p(h t|h q)、第二相关度p(M t|h q)、第三相关度p(h t|M q)以及第四相关度p(M t|M q)为例,终端设备可采用如下公式(8)计算获得该候选实体对应的目标相关度p(q|t)。
Figure PCTCN2019086197-appb-000008
其中,该目标相关度p(q|t)用于指示所述候选实体的信息t与所述查询信息q之间的相关度。α j(j=1,2,3或者4)的取值在0-1之间,其表示在候选实体中第j相关度占整体(s个相关度)中的比重。α j取值越大,表示比重越重。当某部分分类词的相关度不考虑,则对应的α j取值为0,本申请不做详述。
进一步地,在终端设备计算获得w个候选实体各自对应的目标相关度后,可根据w个目标相关度选择到目标实体的信息,作为所述查询信息对应的搜索结果。其中,该目标相关度用于指示候选实体(即候选实体的信息)和查询信息之间的相关度。该目标实体为所述w个候选实体中的一个或多个实体。
在可选实施例中,所述目标实体为所述w个候选实体中目标相关度超过预设第一阈值所对应的实体。具体实现中,所述终端设备可直接根据所述w个候选实体对应的w个目标相关度,从中选择出目标相关度超过预设第一阈值所对应的候选实体,作为所述目标实体。或者,所述终端设备可按照预设顺序对所述w个候选实体对应的w个目标相关度进行排序,例如按照目标相关度从大到小排序,进而选择出目标相关度最大的前m个目标相关度,将该m个目标相关度对应的m个候选实体作为所述目标候选实体。m为用户侧或系统侧自定义设置的正整数,例如1、5等等。
相应地,终端设备可将选取出的候选实体的信息(即目标实体的信息)作为所述查询信息对应的搜索结果。可选的,还可将所述目标实体的信息展示给用户查看。
通过实施本发明实施例,能够解决现有技术中由于表述差异导致搜索匹配率较低、准确率较低等问题,从而提升了搜索匹配的成功率以及准确率。
上述主要从终端设备的角度出发对本发明实施例提供的方案进行了介绍。可以理解的是,终端设备为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。结合本发明中所公开的实施例描述的各示例的单元及算法步骤,本发明实施例能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。本领域技术人员可以对每个特定的应用来使用不同的方法来实现所描述的功能,但是这种实现不应认为超出本发明实施例的技术方案的范围。
本发明实施例可以根据上述方法示例对设备进行功能单元的划分,例如,可以对应各个功能划分各个功能单元,也可以将两个或两个以上的功能集成在一个处理单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。需要说明的是,本发明实施例中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
在采用集成的单元的情况下,图4示出了上述实施例中所涉及的终端设备的一种可能的结构示意图。终端设备700包括:处理单元702和通信单元703。处理单元702用于对终端设备700的动作进行控制管理。处理单元702用于支持终端设备700执行图2中步骤S102-S108,和/或用于执行本文所描述的技术的其它步骤。通信单元703用于支持终端设备700与其它设备的通信,例如支持和服务器通信以获取第一实体库中包括的w个个候选实体的信息,和/或用于执行本文所描述的技术的其它步骤。
终端设备700还可以包括存储单元701,用于存储终端设备700的程序代码和数据。
其中,处理单元702可以是处理器或控制器,例如可以是中央处理器(英文:Central Processing Unit,CPU),通用处理器,数字信号处理器(英文:Digital Signal Processor,DSP),专用集成电路(英文:Application-Specific Integrated Circuit,ASIC),现场可编程门阵列(英文:Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本发明公开内容所描述的各种示例性的逻辑方框,模块和电路。所述处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等等。通信单元703可以是通信接口、收发器、收发电路等,其中,通信接口是统称,可以包括一个或多个接口,例如终端设备与其他设备之间的接口。存储单元701可以是存储器。
当处理单元702为处理器,通信单元703为通信接口,存储单元701为存储器时,本发明实施例所涉及的终端设备可以为图5所示的终端设备。
参阅图5所示,该终端设备710包括:处理器712、通信接口713、存储器77。可选地,终端设备710还可以包括总线714。其中,通信接口713、处理器712以及存储器77可以通过总线714相互连接;总线714可以是外设部件互连标准(英文:Peripheral Component Interconnect,简称PCI)总线或扩展工业标准结构(英文:Extended Industry Standard Architecture,简称EISA)总线等。所述总线714可以分为地址总线、数据总线、控制总线等。为便于表示,图5中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
可选的,如图4和5所示的终端设备还可包括显示单元,所述显示单元具体可为显示屏,图未示。所述显示屏用于显示搜索结果(目标实体的信息)。
上述图4或图5所示的终端设备的具体实现还可以对应参照前述方法实施例的相应描述,此处不再赘述。
结合本发明实施例公开内容所描述的方法或者算法的步骤可以硬件的方式来实现,也可以是由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(英文:Random Access Memory,RAM)、闪存、只读存储器(英文:Read Only Memory,ROM)、可擦除可编程只读存储器(英文:Erasable Programmable ROM,EPROM)、电可擦可编程只读存储器(英文:Electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、只读光盘(CD-ROM)或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于网络设备中。当然,处理器和存储介质也可以作为分立组件存在于终端设备中。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。

Claims (12)

  1. 一种实体搜索方法,其特征在于,所述方法包括:
    确定查询信息中所包括的第一分类词和第二分类词,所述第一分类词为所述查询信息中表述查询结果的类型的词语,所述第二分类词为所述查询信息中除所述第一分类词之外的词语;
    根据第一实体库以及所述查询信息中所包括的第一分类词和第二分类词,确定w个候选实体各自对应的s个相关度;其中,所述第一实体库包括所述w个候选实体中每个候选实体的信息,所述候选实体的信息中包括有第三分类词和第四分类词,所述第三分类词和所述第一分类词属于同一分类,所述第四分类词和所述第二分类词属于同一分类;所述相关度用于指示所述查询信息中的分类词和所述候选实体中的分类词之间的相关度,w和s均为正整数,且s小于等于4;
    根据所述w个候选实体各自对应的s个相关度,确定所述查询信息对应的目标实体的信息,所述目标实体为所述w个候选实体中的实体。
  2. 根据权利要求1所述的方法,其特征在于,所述第一分类词包括所述查询信息中表述查询结果的类型所对应的核心词,所述第二分类词包括所述查询信息中除所述第一分类词和停用词之外的修饰词。
  3. 根据权利要求1或2所述的方法,其特征在于,所述s个相关度包括第一相关度,用于指示所述查询信息中的第一分类词和所述候选实体中的第三分类词之间的相关度,
    所述根据第一实体库以及所述查询信息中所包括的第一分类词和第二分类词,确定w个候选实体各自对应的s个相关度包括:
    将所述第一实体库中的w个候选实体进行分类处理,得到处理后的候选实体;所述分类处理为将表述相同或相似词义的第三分类词所对应的候选实体合并为一个处理后的候选实体;
    根据所述查询信息中的第一分类词和所述处理后的候选实体中的第三分类词,确定所述w个候选实体各自对应的第一相关度。
  4. 根据权利要求1-3中任一项所述的方法,其特征在于,所述s个相关度包括第二相关度,用于指示所述查询信息中的第一分类词和所述候选实体中的第四分类词之间的相关度,
    所述根据第一实体库以及所述查询信息中所包括的第一分类词和第二分类词,确定w个候选实体各自对应的s个相关度包括以下中的任一项:
    根据所述查询信息中处理后的第一分类词和所述w个候选实体中各自所包括的第四分类词,确定所述w个候选实体各自对应的相关度a,并作为所述w个候选实体各自对应的第二相关度;
    根据所述查询信息中的第一分类词和所述w个候选实体中各自所包括的处理后的第四分类词,确定所述w个候选实体各自所对应的相关度b,并作为所述w个候选实体各自对 应的第二相关度;
    根据所述w个候选实体各自对应的相关度a以及所述w个候选实体各自对应的相关度b,确定所述w个候选实体各自对应的第二相关度;
    其中,所述处理后的第一分类词为根据第一预存文档对所述查询信息中的第一分类词进行上下文关联处理后得到的,所述处理后的第四分类词为根据第一预存文档对所述候选实体中所包括的第四分类词进行上下文关联处理后得到的,所述上下文关联处理为在所述第一预存文档中提取临近所述第一分类词或者所述第四分类词的前i个词语和/或后j个词语,其中,i和j均为正整数。
  5. 根据权利要求4所述的方法,其特征在于,所述相关度a或者所述相关度b是根据相关度平滑算法确定的,所述相关度平滑算法用于缓减所述查询信息中的第一分类词或者所述候选实体中的第四分类词在所述第一预存文档中的偏差度。
  6. 根据权利要求1-5中任一项所述的方法,其特征在于,所述s个相关度包括第三相关度,用于指示所述查询信息中的第二分类词和所述候选实体中的第三分类词之间的相关度,
    所述根据第一实体库以及所述查询信息中所包括的第一分类词和第二分类词,确定w个候选实体各自对应的s个相关度包括以下中的任一项:
    根据所述查询信息中处理后的第二分类词和所述w个候选实体中各自所包括的第三分类词,确定所述w个候选实体各自对应的相关度c,并作为所述w个候选实体各自对应的第三相关度;
    根据所述查询信息中的第二分类词和所述w个候选实体中各自所包括的处理后的第三分类词,确定所述w个候选实体各自对应的相关度d,并作为所述w个候选实体各自对应的第三相关度;
    根据所述w个候选实体各自对应的相关度c以及所述w个候选实体各自对应的相关度d,确定所述w个候选实体各自对应的第三相关度;
    其中,所述处理后的第二分类词为根据第二预存文档对所述查询信息中的第二分类词进行上下文关联处理后得到的,所述处理后的第三分类词为根据第二预存文档对所述候选实体中所包括的第三分类词进行上下文关联处理后得到的,所述上下文关联处理为在所述第二预存文档中提取临近所述第二分类词或者所述第三分类词的前k个词语和/或后l个词语,其中,k和l均为正整数。
  7. 根据权利要求6所述的方法,其特征在于,所述相关度c或者所述相关度d是根据相关度平滑算法确定的,所述相关度平滑算法用于缓减所述查询信息中的第二分类词或者所述候选实体中的第三分类词在所述第二预存文档中的偏差度。
  8. 根据权利要求1-7中任一项所述的方法,其特征在于,所述s个相关度包括第四相关度,用于指示所述查询信息中的第二分类词和所述候选实体中的第四分类词之间的相关 度,
    所述根据第一实体库以及所述查询信息中所包括的第一分类词和第二分类词,确定w个候选实体各自对应的s个相关度包括以下中的任一项:
    根据所述查询信息中拓展后的第二分类词和所述w个候选实体中各自所包括的第四分类词,确定所述w个候选实体各自对应的相关度e,并作为所述w个候选实体各自对应的第四相关度;
    根据所述查询信息中的第二分类词和所述w个候选实体中各自所包括的拓展后的第四分类词,确定所述w个候选实体各自对应的相关度f,并作为所述w个候选实体各自对应的第四相关度;
    根据所述w个候选实体各自对应的相关度e和所述w个候选实体各自对应的相关度f,确定所述w个候选实体各自对应的第四相关度;
    其中,所述拓展后的第二分类词为将所述查询信息中的第二分类词进行属性词语的拓展后得到的,所述拓展后的第四分类词为将所述候选实体中的第四分类词进行属性词语的拓展后得到的。
  9. 根据权利要求1-8中任一项所述的方法,其特征在于,所述根据所述w个候选实体各自对应的s个相关度,确定所述查询信息对应的目标实体的信息包括:
    根据所述w个候选实体各自对应的s个相关度,确定所述w个候选实体各自对应的目标相关度;
    根据所述w个候选实体各自对应的目标相关度,确定所述查询信息对应的目标实体的信息,所述目标实体为所述w个候选实体中目标相关度大于或等于第一阈值所对应的实体。
  10. 一种终端设备,其特征在于,包括存储器及与所述存储器耦合的处理器;所述存储器用于存储指令,所述处理器用于执行所述指令;其中,所述处理器执行所述指令时执行如上权利要求1-9中任一项所述的方法。
  11. 根据权利要求10所述的终端设备,其特征在于,所述终端设备还包括与所述处理器耦合的显示器,所述显示器用于在所述处理器的控制下显示目标实体的信息。
  12. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至9任一项所述方法。
PCT/CN2019/086197 2018-05-09 2019-05-09 实体搜索方法、相关设备及计算机存储介质 WO2019214679A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/093,210 US11636143B2 (en) 2018-05-09 2020-11-09 Entity search method, related device, and computer storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810440583.8A CN110472058B (zh) 2018-05-09 2018-05-09 实体搜索方法、相关设备及计算机存储介质
CN201810440583.8 2018-05-09

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/093,210 Continuation US11636143B2 (en) 2018-05-09 2020-11-09 Entity search method, related device, and computer storage medium

Publications (1)

Publication Number Publication Date
WO2019214679A1 true WO2019214679A1 (zh) 2019-11-14

Family

ID=68467198

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/086197 WO2019214679A1 (zh) 2018-05-09 2019-05-09 实体搜索方法、相关设备及计算机存储介质

Country Status (3)

Country Link
US (1) US11636143B2 (zh)
CN (1) CN110472058B (zh)
WO (1) WO2019214679A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460177A (zh) * 2020-03-27 2020-07-28 北京奇艺世纪科技有限公司 影视类表情搜索方法、装置、存储介质、计算机设备

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113495984A (zh) * 2020-03-20 2021-10-12 华为技术有限公司 一种语句检索方法以及相关装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090254527A1 (en) * 2008-04-08 2009-10-08 Korea Institute Of Science And Technology Information Multi-Entity-Centric Integrated Search System and Method
CN103064838A (zh) * 2011-10-19 2013-04-24 阿里巴巴集团控股有限公司 数据搜索方法和装置
CN104102723A (zh) * 2014-07-21 2014-10-15 百度在线网络技术(北京)有限公司 搜索内容提供方法和搜索引擎
CN107133259A (zh) * 2017-03-22 2017-09-05 北京晓数聚传媒科技有限公司 一种搜索方法和装置
CN107330120A (zh) * 2017-07-14 2017-11-07 三角兽(北京)科技有限公司 询问应答方法、询问应答装置及计算机可读存储介质

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004005103A (ja) * 2002-05-31 2004-01-08 Toshiba Corp 類似文書検索装置および類似文書検索方法
CN102063468B (zh) * 2010-12-03 2014-04-16 百度在线网络技术(北京)有限公司 一种用于确定查询序列的查询类别的设备及其方法
CN105956137B (zh) * 2011-11-15 2019-10-01 阿里巴巴集团控股有限公司 一种搜索方法、搜索装置及一种搜索引擎系统
CN102902806B (zh) * 2012-10-17 2016-02-10 深圳市宜搜科技发展有限公司 一种利用搜索引擎进行查询扩展的方法及系统
CN103970761B (zh) * 2013-01-28 2018-05-01 阿里巴巴集团控股有限公司 一种商品数据搜索方法及装置
US20140280050A1 (en) * 2013-03-14 2014-09-18 Fujitsu Limited Term searching based on context
CN106033466A (zh) * 2015-03-20 2016-10-19 华为技术有限公司 数据库查询的方法和设备
CN105975596A (zh) * 2016-05-10 2016-09-28 上海珍岛信息技术有限公司 一种搜索引擎查询扩展的方法及系统
US9785717B1 (en) * 2016-09-29 2017-10-10 International Business Machines Corporation Intent based search result interaction
US11157564B2 (en) * 2018-03-02 2021-10-26 Thoughtspot, Inc. Natural language question answering systems

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090254527A1 (en) * 2008-04-08 2009-10-08 Korea Institute Of Science And Technology Information Multi-Entity-Centric Integrated Search System and Method
CN103064838A (zh) * 2011-10-19 2013-04-24 阿里巴巴集团控股有限公司 数据搜索方法和装置
CN104102723A (zh) * 2014-07-21 2014-10-15 百度在线网络技术(北京)有限公司 搜索内容提供方法和搜索引擎
CN107133259A (zh) * 2017-03-22 2017-09-05 北京晓数聚传媒科技有限公司 一种搜索方法和装置
CN107330120A (zh) * 2017-07-14 2017-11-07 三角兽(北京)科技有限公司 询问应答方法、询问应答装置及计算机可读存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460177A (zh) * 2020-03-27 2020-07-28 北京奇艺世纪科技有限公司 影视类表情搜索方法、装置、存储介质、计算机设备
CN111460177B (zh) * 2020-03-27 2023-12-15 北京奇艺世纪科技有限公司 影视类表情搜索方法、装置、存储介质、计算机设备

Also Published As

Publication number Publication date
US11636143B2 (en) 2023-04-25
CN110472058A (zh) 2019-11-19
US20210056130A1 (en) 2021-02-25
CN110472058B (zh) 2023-03-03

Similar Documents

Publication Publication Date Title
US11468108B2 (en) Data processing method and apparatus, and electronic device thereof
US20210397980A1 (en) Information recommendation method and apparatus, electronic device, and readable storage medium
WO2019105432A1 (zh) 文本推荐方法、装置及电子设备
WO2019149200A1 (zh) 文本分类方法、计算机设备及存储介质
KR102354716B1 (ko) 딥 러닝 모델을 이용한 상황 의존 검색 기법
US9864803B2 (en) Method and system for multimodal clue based personalized app function recommendation
WO2020237856A1 (zh) 基于知识图谱的智能问答方法、装置及计算机存储介质
US20170200065A1 (en) Image Captioning with Weak Supervision
US8243988B1 (en) Clustering images using an image region graph
AU2016256764A1 (en) Semantic natural language vector space for image captioning
GB2546360A (en) Image captioning with weak supervision
CN110704743A (zh) 一种基于知识图谱的语义搜索方法及装置
US10482146B2 (en) Systems and methods for automatic customization of content filtering
CN110209808A (zh) 一种基于文本信息的事件生成方法以及相关装置
CN111382276B (zh) 一种事件发展脉络图生成方法
CN111539197A (zh) 文本匹配方法和装置以及计算机系统和可读存储介质
CN110569496A (zh) 实体链接方法、装置及存储介质
Kotzias et al. Deep multi-instance transfer learning
CN110008365B (zh) 一种图像处理方法、装置、设备及可读存储介质
WO2021169453A1 (zh) 用于文本处理的方法和装置
US10650191B1 (en) Document term extraction based on multiple metrics
WO2019214679A1 (zh) 实体搜索方法、相关设备及计算机存储介质
KR20150041908A (ko) 정답 유형 자동 분류 방법 및 장치, 이를 이용한 질의 응답 시스템
CN112287218B (zh) 一种基于知识图谱的非煤矿山文献关联推荐方法
CN113821657A (zh) 基于人工智能的图像处理模型训练方法及图像处理方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19800369

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19800369

Country of ref document: EP

Kind code of ref document: A1