CN110555208B - Ambiguity elimination method and device in information query and electronic equipment - Google Patents

Ambiguity elimination method and device in information query and electronic equipment Download PDF

Info

Publication number
CN110555208B
CN110555208B CN201810564777.9A CN201810564777A CN110555208B CN 110555208 B CN110555208 B CN 110555208B CN 201810564777 A CN201810564777 A CN 201810564777A CN 110555208 B CN110555208 B CN 110555208B
Authority
CN
China
Prior art keywords
knowledge base
ambiguous
ambiguous word
information
entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810564777.9A
Other languages
Chinese (zh)
Other versions
CN110555208A (en
Inventor
方瑞玉
罗震
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sankuai Online Technology Co Ltd
Original Assignee
Beijing Sankuai Online Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sankuai Online Technology Co Ltd filed Critical Beijing Sankuai Online Technology Co Ltd
Priority to CN201810564777.9A priority Critical patent/CN110555208B/en
Publication of CN110555208A publication Critical patent/CN110555208A/en
Application granted granted Critical
Publication of CN110555208B publication Critical patent/CN110555208B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses an ambiguity elimination method and device in information query and electronic equipment, wherein the ambiguity elimination method in the information query comprises the following steps: receiving customer demand information; acquiring ambiguous words and the context of the ambiguous words from the customer requirement information; querying a candidate entity corresponding to the ambiguous word in a knowledge base; obtaining, in the knowledge base, structural information related to the candidate entity; and obtaining target information corresponding to the customer demand information in the knowledge base according to the ambiguous words, the context, the candidate entities and the structural information. The target information of the technical scheme is obtained according to the ambiguous words, the context of the ambiguous words in the customer requirement information, the candidate entities corresponding to the ambiguous words obtained in the knowledge base and the structural information of the candidate entities in the knowledge base, so that the information query efficiency and the user experience are improved.

Description

Ambiguity elimination method and device in information query and electronic equipment
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method and an apparatus for disambiguating in information query, an electronic device, and a readable storage medium.
Background
In the existing information query technology, some ambiguous words exist in the client requirement information, and the ambiguous words easily cause excessive interference information, so that the accuracy of obtaining target information by a user is low. Taking the web ordering as an example, if the customer demand information is "pizza from two beef", according to this demand, the feedback information obtained by the customer may be "master kang beef noodles", "five-square beef cake", "spiced beef" and "pizza from tender beef", and in some feedback information, "beef" belongs to food material words, such as: "Kangshifu beef noodles"; in other feedback information, "beef" belongs to food words, such as: spiced beef. Only the 'tender beef pizza' is target information, and the rest are interference information, so that the 'beef' is an ambiguous word.
At present, in information query, the context of customer demand information where an ambiguous word "beef" and/or an ambiguous word "beef" is located is generally used as a feature to search in a knowledge base, and other entities having an association relation with the ambiguous word "beef" are obtained, and semantic information formed by the ambiguous word "beef" and the entities obtained by the search is used as a query result. In the query process, the search is carried out in the knowledge base by taking the context of the customer requirement information of the ambiguous word 'beef' and/or the ambiguous word 'beef' as a characteristic, the structural information of the knowledge base is ignored, and the query result contains a large amount of interference information. Finally, the client requirement information containing the ambiguous words cannot accurately obtain the target information, so that the user experience is reduced.
Disclosure of Invention
The embodiment of the application aims to provide an ambiguity eliminating method, an ambiguity eliminating device, electronic equipment and a readable storage medium in information inquiry, and the accuracy of obtaining target information from a knowledge base is improved.
In order to achieve the above object, an embodiment of the present application provides an disambiguation method in information query, including:
receiving customer demand information;
acquiring ambiguous words and the context of the ambiguous words from the customer requirement information;
querying a candidate entity corresponding to the ambiguous word in a knowledge base;
obtaining, in the knowledge base, structural information of the candidate entity;
and obtaining target information corresponding to the customer demand information in the knowledge base according to the ambiguous words, the context, the candidate entities and the structural information.
Correspondingly, to achieve the above object, an embodiment of the present application provides an apparatus for disambiguation in information query, including:
the receiving module is used for receiving the customer requirement information;
the analysis module is used for acquiring ambiguous words and the contexts of the ambiguous words from the customer requirement information;
the candidate entity confirmation module is used for inquiring a candidate entity corresponding to the ambiguous word in a knowledge base;
the structured information confirmation module is used for obtaining the structured information of the candidate entity in the knowledge base;
and the searching module is used for obtaining target information corresponding to the customer requirement information in the knowledge base according to the ambiguous word, the context, the candidate entity and the structural information.
In order to achieve the above object, an embodiment of the present application further provides an disambiguation method in information query, including:
sending customer requirement information; the client requirement information comprises ambiguous words and contexts of the ambiguous words;
receiving target information corresponding to the customer demand information; the target information is obtained according to the ambiguous word, the context, a candidate entity corresponding to the ambiguous word obtained in a knowledge base, and structural information of the candidate entity in the knowledge base.
Correspondingly, to achieve the above object, an embodiment of the present application further provides an ambiguity resolution apparatus in information query, including:
the sending module is used for sending the customer requirement information; the client requirement information comprises ambiguous words and contexts of the ambiguous words;
the receiving module is used for receiving target information corresponding to the customer demand information; the target information is obtained according to the ambiguous word, the context, a candidate entity corresponding to the ambiguous word obtained in a knowledge base, and structural information of the candidate entity in the knowledge base.
In order to achieve the above object, an electronic device according to an embodiment of the present invention includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the customer requirement information processing method disclosed in the embodiment of the present invention.
To achieve the above object, another readable storage medium of the embodiments of the present application is a computer program stored thereon, where the computer program is configured to implement the steps of the customer demand information processing method disclosed in the embodiments of the present invention when executed.
As can be seen from the above, compared with the prior art, according to the technical scheme, the client requirement information is received first, the ambiguous word and the context of the ambiguous word are obtained from the client requirement information, the candidate entity corresponding to the ambiguous word is queried in the knowledge base, the structural information of the candidate entity in the knowledge base is obtained by taking the candidate entity as a center, and the knowledge base is searched according to the ambiguous word, the context, the candidate entity and the structural information, so that the target information corresponding to the client requirement information can be accurately obtained, and the information query efficiency and the user experience are improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of an example knowledge base;
FIG. 2 is a schematic diagram of an application scenario of the present disclosure;
FIG. 3 is a schematic diagram of interaction between a client and a server in a general application scenario according to the present technical solution;
FIG. 4 is a second schematic view of an application scenario of the present invention;
FIG. 5 is a flowchart of a method for disambiguating information queries according to an embodiment of the present invention;
FIG. 6 is a diagram of one embodiment of a knowledge base;
FIG. 7 is a second schematic diagram of a knowledge base according to an embodiment of the invention;
FIG. 8 is a second flowchart of a method for disambiguating information queries according to an embodiment of the present invention;
FIG. 9 is a functional block diagram of an apparatus for disambiguation in querying information according to an embodiment of the present application;
FIG. 10 is a second functional block diagram of an apparatus for disambiguation in querying information according to an embodiment of the present application;
fig. 11 is a schematic view of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art without any inventive work based on the embodiments in the present application shall fall within the scope of protection of the present application.
According to the embodiment of the invention, a method and a device for disambiguation in information query, electronic equipment and a readable storage medium are provided.
In this technical solution, it should be understood that:
and the knowledge base is used for enhancing the function of a search engine by Google. In essence, the knowledge base is intended to describe the various entities and their relationships that exist in the real world, which constitute a huge semantic network graph. As shown in FIG. 1, nodes represent entities and edges are formed by relationships.
In a knowledge base, an entity refers to something that is distinguishable and independent. Such as a person, a city, a plant, etc., a commodity, etc. All things in the world are composed of specific things, which are referred to as entities. Such as "china", "usa", "japan" and the like in fig. 1. An entity may also refer to a collection of things of the same nature, such as a country, a nation, a book, a computer, etc. Further, an entity may also refer to a name, description, explanation, etc. of a thing or a collection of things, which may be expressed by text, images, audio-video, etc. The entity is the most basic element in the knowledge base, and different relationships exist among different entities.
In the knowledge base, the relationship is formalized as a function on one hand, namely: a function that maps k x k physical nodes to a boolean value. And on the other hand from an entity to its property value. Different relationship types correspond to edges of different types of attributes. "area", "population", "capital" as shown in FIG. 1 are several different attributes. Attribute values refer primarily to values of object-specific attributes, such as: 960 ten thousand square kilometers, etc. In the knowledge base, an attribute value belongs to one of entities, and is description, explanation, or the like of a thing or a set of things.
Based on the above definition, a triplet is a general representation of the knowledge base, i.e., G ═ (E, R, S). Wherein E ═ { E ═ E1,e2,......,e|E|E is a set of entities in the knowledge base, and contains | E | different entities; r ═ R1,r2,......,r|R|The relation set in the knowledge base contains | R | different relations;
Figure BDA0001684248870000041
representing a set of triples in a knowledge base. The basic form of a triplet is (entity 1-relationship-entity 2). Each entity may be identified by a globally unique ID, and relationships may be represented by edges connecting two entities, delineating an association between the two entities. As shown in the example of the knowledge base of fig. 1, china is an entity, beijing is an entity, and china-capital-beijing is a triple sample (entity 1-relationship-entity 2). Beijing is an entity, population is an attribute and is also an expression of relationship, and 2069.3 is an attribute value and is another entity. Beijing-population-2069.3 constitutes a sample of one (entity 1-relationship-entity 2) triple.
As mentioned above, the knowledge base is intended to describe various entities and their relationships existing in the real world, which constitute a huge semantic network diagram. For the technical scheme, whether ambiguous words exist in the customer requirement information needs to be determined. If no ambiguous words exist in the customer requirement information, the target information can be accurately obtained according to the customer requirement information according to the existing mature algorithm. And if the customer requirement information contains ambiguous words, searching a knowledge base according to the ambiguous words in the customer requirement information, and excavating target information to be actually inquired by the customer requirement information.
As shown in fig. 2, it is one of application scenarios of the present technical solution. In the application scenario, a user inputs customer requirement information in a search bar of an APP interface of a mobile intelligent terminal, such as: the customer requirement information is 'pizza from two beef'. And clicking a search button, generating a query request instruction by the mobile intelligent terminal, and sending the query request instruction to the server terminal through the communication network. As shown in fig. 3, the server terminal parses the query request command to obtain the customer requirement information. And determining that the 'beef' belongs to the ambiguous word through information processing, searching the knowledge base according to the ambiguous word 'beef', obtaining an entity matched with the ambiguous word 'beef' from nodes of the knowledge base, and calling the matched entity as a candidate entity. In this application scenario, the knowledge base is constructed based on the registered merchant information of APP, and the entities and edges in the knowledge base are used to express the registered merchant information, where the merchant information includes but is not limited to: store names, information on products sold by the stores, and the like. The server terminal takes the candidate entities as the center, extracts the triples related to the candidate entities from the knowledge base, and forms the obtained triples into a triple set which is a structured semantic representation of all the candidate entities. And constructing a matrix according to the triple set under the same relation type. The matrix is an n × n square matrix, and n is the number of entities in the entity set of the knowledge base. The value of an element in the matrix aij0, representing the triplet (entity) formed by the ith entity and the jth entity in the entity set of the knowledge baseiRelationship type, entityj) Not present in the triple set; the value of an element a in the matrixij1, representing a triple (entity) formed by the ith entity and the jth entity in the entity set of the knowledge baseiRelationship type, entityj) Present in a triple set; wherein i represents the row number of the matrix and the number of the entity in the entity set of the knowledge base, and j represents the column number of the matrix and the number of the entity in the entity set of the knowledge base. It can be seen that the constructed matrix characterizes the facts in the knowledge baseThe method has the advantages that the structural semantic information of the relation between the entity and the entity is utilized, the ambiguous words, each candidate entity corresponding to the ambiguous words and the matrix are utilized to process the customer requirement information, when the knowledge base is searched according to the customer requirement information, the result information corresponding to the customer requirement information can be accurately obtained, and the information query efficiency and the user experience degree are improved.
As shown in fig. 2, after the server searches for the merchant information corresponding to the customer demand information, the merchant information is sent to the mobile intelligent terminal. And listing information of pizza restaurants in an APP interface list of the mobile intelligent terminal, wherein the pizza restaurants sell pizza prepared by taking beef as food materials. The merchant information of 'kang master private room beef noodles' can not appear in the display list.
It should be noted that fig. 2 only illustrates a common application scenario, in practice, the server terminal may also obtain the query request of the user in a voice acquisition manner, process the query request instruction in the same manner, search out the target information from the knowledge base, transmit the target information to the client by the server, and transmit the target information to the user in a voice manner, where the application scenario is shown in fig. 4.
In addition, fig. 2 and fig. 4 are schematic diagrams of application scenarios of the meal ordering APP, and for the technical solution, the knowledge base may be established according to different applications and search objects. Such as: in order to facilitate users to obtain more accurate public resources at a search entrance, the knowledge base is constructed based on the whole internet to draw useful information. According to different application scenes, aiming at different knowledge bases, the scheme can effectively eliminate ambiguity of ambiguous words in the client demand information, and target information actually queried by the user is matched from the knowledge bases.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible based on the above description of the application scenarios, the present invention is further described in detail with reference to the following examples and accompanying drawings.
The embodiment of the application provides an ambiguity elimination method in information query. The client requirement information processing method can be applied to a server. In particular, the server may be a background business server of a website capable of providing query services. The website may be, for example, a popular comment web, a mei-wan, or the like. In this embodiment, the server may be an electronic device having data operation, storage function and network interaction function; software may also be provided that runs in the electronic device to support data processing, storage, and network interaction. The number of servers is not particularly limited in the present embodiment. The server may be one server, several servers, or a server cluster formed by several servers.
Fig. 5 is a flowchart of an disambiguation method for querying information according to an embodiment of the present invention. The method comprises the following steps:
step 501: customer demand information is received.
In this embodiment, the user inputs the customer requirement information in the search bar of the APP interface, such as: pizza from beef in two servings. Clicking the search button triggers the generation of a query request command. Or the client acquires the customer requirement information of the user in a voice acquisition mode, and triggers and generates a query request command according to the customer requirement information. The query request command is transmitted to the server over the network. And the server analyzes the query request command to obtain the client requirement information. The technical scheme searches corresponding target information from the knowledge base by using the client demand information and provides the target information for the client.
Step 502: and acquiring ambiguous words and the context of the ambiguous words from the customer requirement information.
In this embodiment, after the information about the customer needs is processed by the word segmentation device, which words in the segmented words are ambiguous words are determined by a semantic recognition algorithm.
In this embodiment, the segmenter may select an open source segmenter, such as: word segmenters, IK segmenters and the like can also improve the segmentation device algorithm according to the application condition of information processing to optimize the segmentation result. The semantic recognition algorithm can be a conventional algorithm. The word segmentation algorithm and the semantic recognition algorithm are not described in detail here, which is not the key point of the technical scheme.
Step 503: and querying a candidate entity corresponding to the ambiguous word in a knowledge base.
In this embodiment, the knowledge base is built up from different applications. Such as: for meal ordering APP, a knowledge base can be established according to registered merchant information, products sold by merchants and public praise evaluation information of users. In the knowledge base, the nodes can be information such as store names, product names, price information, taste information and the like, and the nodes can represent relationship information between entity nodes such as food materials, tastes and the like. And the ordering APP generates client demand information, and the server searches target information from the knowledge base according to the client demand information, wherein the target information is displayed on an ordering APP interface.
In this embodiment, corresponding candidate entities are looked up from the entity set of the knowledge base according to the determined ambiguous words. Such as: taking the customer requirement information mentioned above "pizza with two beef" as an example, the ambiguous word is determined to be "beef", the ambiguous word is mapped to the node of the knowledge base by means of fuzzy matching, the matched information corresponding to the node includes but is not limited to "pizza with tender beef", "spiced beef", "five-square beef cake", "cheese steak", "beef noodles", and the like, and the information corresponding to the node is summarized into the candidate entity set. The entities in the candidate entity set are entities corresponding to ambiguous words, and the candidate entity set in this embodiment is { tender beef pizza, spiced beef, five-square beef patties, cheese steak, beef noodles }.
In this embodiment, the obtaining of all candidate entities corresponding to ambiguous words from the knowledge base is not limited to the fuzzy matching algorithm, and the candidate entities are obtained from the entity set of the knowledge base by using other conventional and general matching algorithms. The matching algorithm is not described in detail here, which is not the focus of the present solution.
Step 504: obtaining, in the knowledge base, structured information of the candidate entities.
In this embodiment, the above step determines that the candidate entity in the candidate entity set is the center, and extracts the triple having a direct relationship or an indirect relationship with the candidate entity from all the triples in the knowledge base in a random walk traversal manner, and records the extracted triple. For example: the results are reported as: (tender beef pizza, food material, beef), (tender beef pizza, taste, spicy), (five-side beef patty, food material, beef), (cheese steak, food material), (beef noodle, taste, salty). And forming a triple set by traversing all the recorded triples of the candidate entities.
In this embodiment, the depth and extent of the random walk may be controlled by parameters. In practical application, parameters are adjusted according to information searching conditions, so that the obtained triple set can well represent the structural semantics of all candidate entities, and a solid foundation is laid for accurate searching.
In the technical scheme, after the triple sets related to the candidate entities are obtained, a matrix is constructed according to the triple sets related to the candidate entities under the same relationship type, and the matrix is a structural information model of the entities related to the candidate entities and the mutual relationship.
In the present embodiment, the matrix is an n × n square matrix. The value of an element in the matrix is aij0, representing a triple (entity) of the ith entity and the jth entity in the set of entities of the knowledge baseiRelationship type, entityj) Not present in the triple set; the value of an element in the matrix aijTo 1, a triple (entity) of the ith entity and the jth entity in the set of entities representing the knowledge baseiRelationship type, entityj) Present in a triple set; wherein n is the number of the entities in the entity set of the knowledge base, i represents the row number of the matrix and the number of the entities in the entity set of the knowledge base, and j represents the column number of the matrix and the number of the entities in the entity set of the knowledge base.
Fig. 6 is a schematic diagram of a knowledge base according to an embodiment of the present invention. The knowledge base illustrated in fig. 6 has 10 physical nodes, and the information sets corresponding to the nodes are { beef, spiced beef, tender beef pizza, cheese steak, beef noodles, salty, spicy, five-square beef patties, must-win guest, and master of tang dynasty private room and noodle museum }. For the knowledge base illustrated in FIG. 6, the user is simply working on the knowledge baseIn the ideal state, each entity node has a relationship with nine other entity nodes. And recording the triples extracted in a random walk mode to form a triple set. The three-component set is { (tender beef pizza, food material, beef), (tender beef pizza, taste, spicy), (five-square beef cake, food material, beef), (cheese steak, food material, beef), (beef noodle, taste, salty) }. The triple set has two relation types, namely food materials and tastes. And under the same relation type, two matrixes are constructed according to the triple set. Each matrix is a 10 th order square matrix. The value of an element in the matrix aij0, representing a triple (entity) of the ith entity and the jth entity in the set of entities of the knowledge baseiRelationship type, entityj) Not present in the triple set; the value of an element in the matrix aijTo 1, a triple (entity) of the ith entity and the jth entity in the set of entities representing the knowledge baseiRelationship type, entityj) Present in a triple set.
In the case of the relation type being food material, the first matrix M1Expressed as:
Figure BDA0001684248870000081
in the case of taste as the type of relationship, the second matrix M2Expressed as:
Figure BDA0001684248870000082
step 505: and obtaining target information corresponding to the customer demand information in the knowledge base according to the ambiguous words, the context, the candidate entities and the structural information.
In this embodiment, still taking the customer requirement information "pizza from two beef" as an example, the "beef" is used as an ambiguous word, and the remaining information is processed according to the characters except the ambiguous word to obtain the characters "come", "two", "ones", "pizza" and "pizza". And determining a distance vector of each character from the origin by taking the ambiguous word as the origin. The distance vector of "in" is-3, the distance vector of "two" is-2, the distance vector of "in" is-1, the distance vector of "in" is 2, and the distance vector of "sa" is 3.
In this embodiment, the characters on both sides of the ambiguous word and the distance vectors from the characters to the ambiguous word are converted into vectors in an n-dimensional space through an Embedding layer. Wherein, the expression of the vector of the n-dimensional space is as follows:
Figure BDA0001684248870000083
wherein v isiA vector, p, indicating the ith character in the character set except for ambiguous words in the customer's request informationiRefers to a vector of distance vectors from the ith character in the character set except for ambiguous words in the customer requirement information to the ambiguous words,
Figure BDA0001684248870000084
the two vectors are processed by the concatee function. n may take the value 128 or 256. The vector is processed by a convolutional neural network to obtain a context feature vector of the ambiguous word in the customer requirement information.
Here, a Convolutional Neural Network (Convolutional Neural Network) is a feed-forward Neural Network whose artificial neurons can respond to a portion of the coverage of surrounding cells, and performs well for large image processing. It includes a convolutional layer (convolutional layer) and a pooling layer (Pooling layer). In general, the basic structure of a convolutional neural network includes two layers, one of which is a feature extraction layer, and the input of each neuron is connected to the local acceptance domain of the previous layer and extracts the features of the local acceptance domain. Once the local feature is extracted, the position relation between the local feature and other features is determined; the other is a feature mapping layer, each calculation layer of the network is composed of a plurality of feature mappings, each feature mapping is a plane, and the weights of all neurons on the plane are equal. The feature mapping structure adopts a sigmoid function with small influence function kernel as an activation function of the convolution network, so that the feature mapping has displacement invariance. In addition, since the neurons on one mapping surface share the weight, the number of free parameters of the network is reduced. Each convolutional layer in the convolutional neural network is followed by a computation layer for local averaging and quadratic extraction, which reduces the feature resolution. Convolutional neural networks are used primarily to identify two-dimensional patterns of displacement, scaling and other forms of distortion invariance. Because the feature detection layer of the convolutional neural network learns through the training data, when the convolutional neural network is used, the displayed feature extraction is avoided, and the learning is implicitly carried out from the training data; moreover, because the weights of the neurons on the same feature mapping surface are the same, the network can learn in parallel, which is also a great advantage of the convolutional neural network relative to the network in which the neurons are connected with each other. The convolution neural network has unique superiority in the aspects of voice recognition and image processing by virtue of a special structure with shared local weight, the layout of the convolution neural network is closer to that of an actual biological neural network, the complexity of the network is reduced by virtue of weight sharing, and particularly, the complexity of data reconstruction in the processes of feature extraction and classification is avoided by virtue of the characteristic that an image of a multi-dimensional input vector can be directly input into the network.
In this embodiment, the candidate entity set corresponding to the ambiguous word "beef" is { tender beef pizza, spiced beef, five-square beef patty, cheese steak, beef noodles }. And 5 candidate entities in the candidate entity set corresponding to the ambiguous word beef are processed by the convolutional neural network, so that a semantic vector of the ambiguous word beef and a semantic vector of each candidate entity are obtained. And performing cross operation on the semantic vector of the ambiguous word and the semantic vector of each candidate entity to obtain the character string similarity between the ambiguous word and the candidate entity. That is, 5 character string similarities are obtained by the above expression.
The string similarity expression is:
Figure BDA0001684248870000091
wherein E issimFor string similarity, EentitySemantic vectors of candidate entities corresponding to ambiguous words, EmentionA semantic vector of ambiguous words. B is a weight matrix.
In this embodiment, two matrices, respectively denoted as the first matrix M, are constructed as described above1And a second matrix M2. Setting a weight vector [ w1,w2]Using the weight vector to pair the first matrix M1And a second matrix M2And performing linear transformation, and processing the results of all linear transformation by a concatee function to obtain semantic representations of all candidate entities corresponding to the ambiguous words in a knowledge base. The expression of semantic representation of all candidate entities corresponding to the ambiguous word in the knowledge base is as follows:
Figure BDA0001684248870000092
and performing feature extraction on semantic representations of all candidate entities corresponding to the ambiguous words in a knowledge base through a multilayer neural network to obtain semantic vectors of all candidate entities corresponding to the ambiguous words in the knowledge base.
Here, a Multi-layer neural network (Multi-layer Perceptron), which is an artificial neural network of a forward structure, maps a set of input vectors to a set of output vectors. A multi-layer neural network can be seen as a directed graph, consisting of multiple layers of nodes, each layer being fully connected to the next. Each node, except the input nodes, is a neuron (or processing unit) with a nonlinear activation function. A supervised learning approach, known as back-propagation algorithm, is often used to train multi-layer neural networks. The multilayer neural network is the popularization of the perceptron, and overcomes the weakness that the perceptron cannot identify linear irreparable data.
In this embodiment, the context feature vector of the ambiguous word in the customer requirement information, the character string similarity between the ambiguous word and each candidate entity, and the semantic vector of all candidate entities corresponding to the ambiguous word in the knowledge base are processed by the multi-layer neural network, so as to obtain the semantic matching degree between the ambiguous word and each candidate entity. In the above, 5 candidate entities are in the candidate entity set, and 5 semantic matching degrees are finally obtained in the scheme. And aiming at the candidate entity sets { tender beef pizza, spiced beef, five-square beef cake, cheese steak and beef noodles }, screening the knowledge base according to the semantic matching degree, and extracting the direct semantics and the indirect semantics of each candidate entity in the knowledge base. In this example, the semantic degree of the candidate entity "tender beef pizza" is the largest, and in the knowledge base illustrated in fig. 6, the entity nodes directly connected to the entity node "tender beef pizza" via the edge are "spicy" and "must-win", and the entity node "must-win" is directly connected to the entity node "cheese steak" via the edge. According to the method, semantic information expressed by each candidate entity node is determined. And sending the semantic information expressed by each candidate entity node to the client according to the sequence of the semantic matching degrees from large to small. And the client displays the semantic information expressed by the entity nodes in a list of the APP interface according to the sequence of the semantic matching degrees from large to small, or displays the semantic information expressed by the entity nodes according to the sequence of the semantic matching degrees from large to small in a voice mode to inform the client.
In practice, a semantic matching degree threshold value can be set, and the server sends semantic information of the candidate entity nodes corresponding to the semantic matching degree greater than the threshold value in the knowledge base to the client. The semantic matching degree threshold value can be adjusted according to actual conditions.
For the technical scheme, in order to obtain a more accurate semantic matching degree, on the basis of the technical scheme for obtaining the semantic matching degree, a big data scheme can be used for performing statistical analysis according to historical meal ordering information of a user, and historical behavior characteristics of the user are extracted, such as: user preference, etc. And inputting the context feature vector of the ambiguous word in the client requirement information, the character string similarity between the ambiguous word and each candidate entity, the semantic vector of all candidate entities corresponding to the ambiguous word in a knowledge base and the historical behavior feature of the user into a multi-layer neural network for processing, so as to obtain more accurate semantic matching degree between the ambiguous word and each candidate entity.
It should be noted that the above-mentioned manner for obtaining a more precise semantic matching value is only an exemplary case, and is not an exhaustive case, and those skilled in the art may generate other modifications or changes based on the technical solution of the present application under the condition of understanding the spirit of the technical solution of the present application, but the present application shall fall within the protection scope as long as the realized functions and achieved technical effects are similar to the present application.
Fig. 7 is a second schematic diagram of the knowledge base according to the embodiment of the invention. In this embodiment, a knowledge base is created with the book information registered in the library. In the knowledge base, the nodes can be information such as title, author, subject, content, character, abstract and the like, and the edges can represent relationship information between entity nodes such as subject, plot and the like.
Inputting query information on an application program APP, wherein the query information is as follows: picria linza. In this query information, "ciona linnaeus" can be either the author or the person in the novel, so that in this query information, "ciona linnaeus" is an ambiguous word. The application program APP generates a query request instruction according to input query information, the query request instruction is sent to the server through the network, the server analyzes the query information according to the query request instruction, target information is searched from the knowledge base by utilizing the query information, and the target information is displayed on an application program APP interface.
In this embodiment, a corresponding candidate entity is searched from an entity set of a knowledge base according to a determination of an ambiguous word, where the ambiguous word in the query information of this embodiment is "linqingxia", the ambiguous word is mapped to a node of the knowledge base in a fuzzy matching manner, and information corresponding to the matched node includes, but is not limited to, "linqingxia' story of appearance", "everlasting linqingxia", and the like, and the information corresponding to the node is summarized as the candidate entity set. Taking a candidate entity in the candidate entity set as a center, extracting triples which have direct relation or indirect relation with the candidate entity from all triples of the knowledge base in a random walk traversal mode, and recording the extracted triples. For example: record the resultsComprises the following steps: (forever cancrinis, subject, prose), (prose, subject, cloud-removed), (story out of cancrinis, subject, outside of window in window), (story out of cancrinis, plot, Xuke). And forming a triple set by traversing all the recorded triples of the candidate entities. According to the record result, the relationship types in the triple set are two, so that two matrixes need to be constructed, and the two matrixes are respectively marked as A1、A2. For matrix A1Matrix A2And performing linear transformation, processing the results of all linear transformation by a concatee function to obtain semantic representations of all candidate entities corresponding to the ambiguous words in a knowledge base, and performing feature extraction on the semantic representations of all candidate entities corresponding to the ambiguous words in the knowledge base by a multilayer neural network to obtain semantic vectors of all candidate entities corresponding to the ambiguous words in the knowledge base.
And obtaining semantic matching degree between the ambiguous word "Linqingxia" and each candidate entity by utilizing semantic vectors of all candidate entities corresponding to the ambiguous word in a knowledge base. In the above, 2 candidate entities are in the candidate entity set, and the scheme finally obtains 2 semantic matching degrees. And aiming at the candidate entity set { the story of the appearance of the forest canula viridis and the perpetual forest canula viridis }, screening the knowledge base according to the semantic matching degree, and extracting the direct semantics and the indirect semantics of each candidate entity in the knowledge base. In this example, the semantic degree of the candidate entity "everlasting ciona" is the greatest, and the semantic degree of the candidate entity "the story of the appearance of ciona" is the second. In the knowledge base illustrated in fig. 7, the entity nodes directly connected to the entity node "evergreen" via the edge are "story of going out of green" and "prose", and the entity nodes directly connected to the entity node "story of going out of green" via the edge are "window inside and outside". Another node connected to the entity node "prose" by an edge is also "cloud to". Semantic information expressed by each candidate entity node is determined. And sending the semantic information expressed by each candidate entity node to the client according to the sequence of the semantic matching degrees from large to small. The APP display interface of the application program recommends three books, namely ' the everlasting Linqingxia ', the window inside and outside ', and ' cloud elimination '.
As known from a meal ordering application and a book query application, the technical scheme includes that customer requirement information is received, ambiguous words and contexts of the ambiguous words are obtained from the customer requirement information, candidate entities corresponding to the ambiguous words are queried in a knowledge base, triples related to the candidate entities are extracted from the knowledge base by taking the candidate entities as a center, and the obtained triples form a triple set which is a structured semantic representation of all the candidate entities. And constructing a matrix according to the triple set under the same relation type. The matrix is a structured information model of the entities and interrelationships associated with the candidate entities, the elements of the matrix being either 0 or 1. The element value is 1, and the fact that a triple formed by the ith entity and the jth entity in the entity set of the knowledge base exists in the triple set is represented; the element value is 0, and the fact that the triplet formed by the ith entity and the jth entity in the entity set of the knowledge base does not exist in the triplet set is represented. It can be seen that the value of the element of the matrix is determined by the triplet corresponding to the candidate entity, and when searching is performed according to the ambiguous word, the context, the candidate entity and the structured information, the target information corresponding to the customer requirement information can be accurately obtained, so that the information query efficiency and the user experience are improved.
Fig. 8 is a second flowchart of a method for disambiguating information in an information query according to an embodiment of the present invention. The method is applied to the client. The client may be an electronic device for generating a query request. Specifically, the client may be, for example, a desktop computer, a tablet computer, a notebook computer, a smart phone, a smart assistant Siri, a smart wearable device, a shopping guide terminal, a television, and the like, which are capable of inputting voice or text of a user by capturing. Alternatively, the client may be software capable of running in the electronic device. Specifically, the client may be a browser in the electronic device, and the browser may include an access portal provided for an application website service platform. The website service platform may be, for example, a popular comment network or a mei-qu network, and the access portal may be a search input field of the website service platform. The client can also be an application which is provided by the website service platform and runs in the intelligent terminal. The disambiguation method in the information query comprises the following steps:
step 801): sending customer requirement information; the client requirement information comprises ambiguous words and contexts of the ambiguous words;
step 802): receiving target information corresponding to the customer demand information; the target information is obtained according to the ambiguous word, the context, a candidate entity corresponding to the ambiguous word obtained in a knowledge base, and structural information of the candidate entity in the knowledge base.
Fig. 9 is a functional block diagram of an disambiguation apparatus for querying information according to an embodiment of the present application. The method comprises the following steps:
a receiving module 901, configured to receive customer requirement information;
the parsing module 902 is configured to obtain ambiguous words and contexts of the ambiguous words from the customer requirement information;
a candidate entity confirmation module 903, configured to query a knowledge base for candidate entities corresponding to the ambiguous word;
a structured information confirmation module 904, configured to obtain the structured information of the candidate entity in the knowledge base;
a searching module 905, configured to obtain, in the knowledge base, target information corresponding to the customer demand information according to the ambiguous word, the context, the candidate entity, and the structural information.
In this embodiment, the structured information confirmation module includes:
the triple set unit is used for traversing the triples of the knowledge base according to the candidate entities to obtain a triple set related to the candidate entities;
and the structured information model unit is used for constructing a matrix by the triples with the same relation type in the triple set to obtain the structured information model of the candidate entity.
In this embodiment, the structuring isThe matrix constructed by the information model unit is an n multiplied by n square matrix, and the element value a in the matrixijAt 0, a triple formed by the ith entity and the jth entity in the entity set representing the knowledge base does not exist in the triple set; the value of an element a in the matrixij1, a triple formed by an ith entity and a jth entity in an entity set representing the knowledge base exists in the triple set; wherein n is the number of the entities in the entity set of the knowledge base, i represents the row number of the matrix and the number of the entities in the entity set of the knowledge base, and j represents the column number of the matrix and the number of the entities in the entity set of the knowledge base.
In this embodiment, the search module includes:
the context feature vector unit is used for determining a context feature vector of the ambiguous word in the customer requirement information according to the context;
the first semantic vector confirming unit is used for confirming the character string of the ambiguous word and the character string of each candidate entity corresponding to the ambiguous word, confirming the semantic vector of the ambiguous word according to the character string of the ambiguous word, and confirming the semantic vector of each candidate entity by utilizing the character string of each candidate entity;
the second semantic vector confirming unit is used for determining semantic vectors of all candidate entities corresponding to the ambiguous words in a knowledge base by using the matrix;
a semantic matching degree unit, configured to determine a semantic matching degree between each candidate entity and the ambiguous word according to a context feature vector of the ambiguous word in the customer requirement information, a semantic vector of the ambiguous word, a semantic vector of each candidate entity, and semantic vectors of all candidate entities corresponding to the ambiguous word in a knowledge base;
and the screening unit is used for screening target information corresponding to the customer requirement information from the knowledge base according to the semantic matching degree between each candidate entity and the ambiguous word.
In this embodiment, the second semantic vector confirming unit includes:
the semantic expression subunit is used for processing the matrix through a concat function after linear transformation to obtain semantic expression of all candidate entities corresponding to the ambiguous words in a knowledge base;
and the feature extraction subunit is used for performing feature extraction on semantic representations of all candidate entities corresponding to the ambiguous word in the knowledge base to obtain semantic vectors of all candidate entities corresponding to the ambiguous word in the knowledge base.
In this embodiment, the semantic matching unit includes:
the cross operation subunit is used for performing cross operation on the semantic vector of the ambiguous word and the semantic vector of each candidate entity to obtain the similarity of the character strings;
and the neural network processing subunit is used for processing the context feature vector of the ambiguous word in the customer requirement information, the character string similarity between the ambiguous word and each candidate entity, and the semantic vectors of all candidate entities corresponding to the ambiguous word in a knowledge base by a neural network to obtain the semantic matching degree between the ambiguous word and each candidate entity.
In this embodiment, the context feature vector unit includes:
the word segmentation processing unit is used for processing the context to obtain characters or words of information except for ambiguous words in the customer requirement information and distance vectors from the characters or words to the ambiguous words;
and the convolution processing subunit is used for processing the characters or the words and the distance vectors from the characters or the words to the ambiguous words through a convolution neural network to obtain the context feature vectors of the ambiguous words in the customer requirement information.
In this embodiment, the word segmentation processing unit converts the vectors into multidimensional space through an Embedding layer, and the vectors represent the characters or words and distance vectors from the characters or words to the ambiguous words.
In this embodiment, the candidate entity confirmation module queries and obtains all candidate entities corresponding to ambiguous words in the knowledge base by using a fuzzy matching method.
In this embodiment, the triple set unit forms a triple set by extracting all the triples in the knowledge base in a random walk traversal manner.
Fig. 10 is a second functional block diagram of an apparatus for disambiguation in querying information according to an embodiment of the present application. The method comprises the following steps:
a sending module 1001, configured to send customer requirement information; the client requirement information comprises ambiguous words and contexts of the ambiguous words;
a receiving module 1002, configured to receive target information corresponding to the customer demand information; the target information is obtained according to the ambiguous word, the context, a candidate entity corresponding to the ambiguous word obtained in a knowledge base, and structural information of the candidate entity in the knowledge base.
Fig. 11 is a schematic view of an electronic device according to an embodiment of the present application. The system comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the customer requirement information processing method when executing the computer program.
The specific functions implemented by the memory and the processor of the client requirement information processing method provided in the embodiments of the present specification may be explained in comparison with the foregoing embodiments in the present specification, and can achieve the technical effects of the foregoing embodiments, and thus, will not be described herein again.
In this embodiment, the memory may include a physical device for storing information, and typically, the information is digitized and then stored in a medium using an electrical, magnetic, or optical method. The memory according to this embodiment may further include: devices that store information using electrical energy, such as RAM, ROM, etc.; devices that store information using magnetic energy, such as hard disks, floppy disks, tapes, core memories, bubble memories, usb disks; devices for storing information optically, such as CDs or DVDs. Of course, there are other ways of memory, such as quantum memory, graphene memory, and so forth.
In this embodiment, the processor may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, an embedded microcontroller, and so forth.
In this embodiment, an embodiment of the present application further provides a readable storage medium, on which a computer program is stored, where the computer program is executed to implement the steps of the customer demand information processing method described above.
Therefore, the technical scheme can accurately obtain the target information corresponding to the customer demand information, and improves the information query efficiency and the user experience.
In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD) is an integrated circuit whose Logic function is determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Hardware Description Language), traffic, pl (core universal Programming Language), HDCal (jhdware Description Language), lang, Lola, HDL, laspam, hardward Description Language (vhr Description Language), vhal (Hardware Description Language), and vhigh-Language, which are currently used in most common. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.
Those skilled in the art will also appreciate that, in addition to implementing clients and servers as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the clients and servers implement logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such clients and servers may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as structures within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, both for the embodiments of the client and the server, reference may be made to the introduction of embodiments of the method described above.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
Although the present application has been described in terms of embodiments, those of ordinary skill in the art will recognize that there are numerous variations and permutations of the present application without departing from the spirit of the application, and it is intended that the appended claims encompass such variations and permutations without departing from the spirit of the application.

Claims (20)

1. A method for disambiguating in information queries, comprising:
receiving customer demand information;
acquiring ambiguous words and the context of the ambiguous words from the customer requirement information;
querying a candidate entity corresponding to the ambiguous word in a knowledge base;
obtaining, in the knowledge base, structural information of the candidate entity;
obtaining target information corresponding to the customer demand information in the knowledge base according to the ambiguous word, the context, the candidate entity and the structural information;
the step of obtaining the structured information of the candidate entity comprises:
traversing the triples of the knowledge base according to the candidate entities to obtain a triple set related to the candidate entities;
constructing a matrix by the triples of the same relation type in the triple set to obtain a structured information model of the candidate entity;
the step of obtaining target information corresponding to the customer demand information in the knowledge base according to the ambiguous word, the context, the candidate entity, and the structured information includes:
determining a context feature vector of the ambiguous word in the customer requirement information according to the context;
determining a character string of the ambiguous word and a character string of each candidate entity corresponding to the ambiguous word, determining a semantic vector of the ambiguous word according to the character string of the ambiguous word, and determining the semantic vector of each candidate entity by using the character string of each candidate entity;
determining semantic vectors of all candidate entities corresponding to the ambiguous words in a knowledge base by using the matrix;
determining semantic matching degree between each candidate entity and the ambiguous word according to the context feature vector of the ambiguous word in the customer requirement information, the semantic vector of the ambiguous word, the semantic vector of each candidate entity and the semantic vectors of all candidate entities corresponding to the ambiguous word in a knowledge base;
screening target information corresponding to the customer requirement information from the knowledge base according to the semantic matching degree between each candidate entity and the ambiguous word;
the determining the semantic matching degree between each candidate entity and the ambiguous word according to the context feature vector of the ambiguous word in the customer requirement information, the semantic vector of the ambiguous word, the semantic vector of each candidate entity, and the semantic vector of all candidate entities corresponding to the ambiguous word in a knowledge base includes:
performing cross operation on the semantic vector of the ambiguous word and the semantic vector of each candidate entity to obtain the character string similarity between the ambiguous word and each candidate entity;
and inputting the context feature vector of the ambiguous word in the customer requirement information, the character string similarity between the ambiguous word and each candidate entity, the semantic vector of all candidate entities corresponding to the ambiguous word in a knowledge base and the historical behavior feature of the user into a multi-layer neural network for processing, so as to obtain the semantic matching degree between the ambiguous word and each candidate entity.
2. The method of claim 1, wherein the matrix is an n x n square matrix, and wherein the value of an element in the matrix is aijAt 0, a triple formed by the ith entity and the jth entity in the entity set representing the knowledge base does not exist in the triple set; the value of an element a in the matrixij1, a triple formed by an ith entity and a jth entity in an entity set representing the knowledge base exists in the triple set; wherein n is the number of the entities in the entity set of the knowledge base, i represents the row number of the matrix and the number of the entities in the entity set of the knowledge base, and j represents the column number of the matrix and the number of the entities in the entity set of the knowledge base.
3. The method of claim 1, wherein the step of using the matrix to determine semantic vectors of all candidate entities corresponding to the ambiguous word in a knowledge base comprises:
after linear transformation, the matrix is processed by a concat function to obtain semantic representations of all candidate entities corresponding to the ambiguous words in a knowledge base;
and performing feature extraction on semantic representations of all candidate entities corresponding to the ambiguous words in a knowledge base to obtain semantic vectors of all candidate entities corresponding to the ambiguous words in the knowledge base.
4. The method of claim 1, wherein determining the semantic matching degree between each candidate entity and the ambiguous word according to the context feature vector of the ambiguous word in the customer requirement information, the semantic vector of the ambiguous word, the semantic vector of each candidate entity, and the semantic vectors of all candidate entities corresponding to the ambiguous word in a knowledge base comprises:
performing cross operation on the semantic vector of the ambiguous word and the semantic vector of each candidate entity to obtain the similarity of character strings;
and processing the context feature vector of the ambiguous word in the customer requirement information, the character string similarity between the ambiguous word and each candidate entity and the semantic vector of all candidate entities corresponding to the ambiguous word in a knowledge base by a neural network to obtain the semantic matching degree between the ambiguous word and each candidate entity.
5. The method of claim 1, wherein obtaining the contextual feature vector of the ambiguous word in the customer need information comprises:
processing the context to obtain characters or words of information except for ambiguous words in the customer requirement information and distance vectors from the characters or words to the ambiguous words;
and processing the characters or the words and the distance vectors from the characters or the words to the ambiguous words through a convolutional neural network to obtain context feature vectors of the ambiguous words in the customer requirement information.
6. The method of claim 5, wherein the character or word and the distance vector of the character or word to the ambiguous word are converted to a vector of a multidimensional space by an Embedding layer.
7. The method according to any one of claims 1 to 6, wherein all candidate entities corresponding to the ambiguous word are searched in the knowledge base by fuzzy matching.
8. The method of claim 1, wherein the set of triples is a composition of triples extracted from all triples of the knowledge base by way of a random walk traversal.
9. A method for disambiguating in information queries, comprising:
sending customer requirement information; the client requirement information comprises ambiguous words and contexts of the ambiguous words;
receiving target information corresponding to the customer demand information; the target information is obtained according to the ambiguous word, the context, a candidate entity corresponding to the ambiguous word obtained in a knowledge base, and structural information of the candidate entity in the knowledge base;
the step of obtaining the structural information of the candidate entity in the knowledge base comprises the following steps:
traversing the triples of the knowledge base according to the candidate entities to obtain a triple set related to the candidate entities;
constructing a matrix by the triples of the same relation type in the triple set to obtain a structured information model of the candidate entity;
the specific steps of obtaining the target information comprise:
determining a context feature vector of the ambiguous word in the customer requirement information according to the context;
determining a character string of the ambiguous word and a character string of each candidate entity corresponding to the ambiguous word, determining a semantic vector of the ambiguous word according to the character string of the ambiguous word, and determining the semantic vector of each candidate entity by using the character string of each candidate entity;
determining semantic vectors of all candidate entities corresponding to the ambiguous words in a knowledge base by using the matrix;
determining semantic matching degree between each candidate entity and the ambiguous word according to the context feature vector of the ambiguous word in the customer requirement information, the semantic vector of the ambiguous word, the semantic vector of each candidate entity and the semantic vectors of all candidate entities corresponding to the ambiguous word in a knowledge base;
screening target information corresponding to the customer requirement information from the knowledge base according to the semantic matching degree between each candidate entity and the ambiguous word;
the step of obtaining the semantic matching degree specifically includes:
performing cross operation on the semantic vector of the ambiguous word and the semantic vector of each candidate entity to obtain the character string similarity between the ambiguous word and each candidate entity;
and inputting the context feature vector of the ambiguous word in the customer requirement information, the character string similarity between the ambiguous word and each candidate entity, the semantic vector of all candidate entities corresponding to the ambiguous word in a knowledge base and the historical behavior feature of the user into a multi-layer neural network for processing, so as to obtain the semantic matching degree between the ambiguous word and each candidate entity.
10. An apparatus for disambiguation in an information query, comprising:
the receiving module is used for receiving the customer requirement information;
the analysis module is used for acquiring ambiguous words and the contexts of the ambiguous words from the customer requirement information;
the candidate entity confirmation module is used for inquiring a candidate entity corresponding to the ambiguous word in a knowledge base;
the structured information confirmation module is used for obtaining the structured information of the candidate entity in the knowledge base;
the search module is used for obtaining target information corresponding to the customer demand information in the knowledge base according to the ambiguous word, the context, the candidate entity and the structural information;
the structured information confirmation module includes:
the triple set unit is used for traversing the triples of the knowledge base according to the candidate entities to obtain a triple set related to the candidate entities;
the structured information model unit is used for constructing a matrix by the triples with the same relation type in the triple set to obtain a structured information model of the candidate entity;
the search module comprises:
the context feature vector unit is used for determining a context feature vector of the ambiguous word in the customer requirement information according to the context;
the first semantic vector confirming unit is used for confirming the character string of the ambiguous word and the character string of each candidate entity corresponding to the ambiguous word, confirming the semantic vector of the ambiguous word according to the character string of the ambiguous word, and confirming the semantic vector of each candidate entity by utilizing the character string of each candidate entity;
the second semantic vector confirming unit is used for determining semantic vectors of all candidate entities corresponding to the ambiguous words in a knowledge base by using the matrix;
a semantic matching degree unit, configured to determine a semantic matching degree between each candidate entity and the ambiguous word according to a context feature vector of the ambiguous word in the customer requirement information, a semantic vector of the ambiguous word, a semantic vector of each candidate entity, and semantic vectors of all candidate entities corresponding to the ambiguous word in a knowledge base;
the screening unit is used for screening target information corresponding to the customer requirement information from the knowledge base according to the semantic matching degree between each candidate entity and the ambiguous word;
the semantic matching degree unit is specifically configured to perform cross operation on the semantic vector of the ambiguous word and the semantic vector of each candidate entity to obtain a character string similarity between the ambiguous word and each candidate entity; and inputting the context feature vector of the ambiguous word in the customer requirement information, the character string similarity between the ambiguous word and each candidate entity, the semantic vector of all candidate entities corresponding to the ambiguous word in a knowledge base and the historical behavior feature of the user into a multi-layer neural network for processing, so as to obtain the semantic matching degree between the ambiguous word and each candidate entity.
11. The apparatus of claim 10, wherein the matrix constructed by the structured information model unit is an n x n square matrix, and the value of an element in the matrix is aijAt 0, a triple formed by the ith entity and the jth entity in the entity set representing the knowledge base does not exist in the triple set; the value of an element a in the matrixij1, a triple formed by an ith entity and a jth entity in an entity set representing the knowledge base exists in the triple set; wherein n is the number of the entities in the entity set of the knowledge base, i represents the row number of the matrix and the number of the entities in the entity set of the knowledge base, and j represents the column number of the matrix and the number of the entities in the entity set of the knowledge base.
12. The apparatus of claim 10, wherein the second semantic vector validation unit comprises:
the semantic expression subunit is used for processing the matrix through a concat function after linear transformation to obtain semantic expression of all candidate entities corresponding to the ambiguous words in a knowledge base;
and the feature extraction subunit is used for performing feature extraction on semantic representations of all candidate entities corresponding to the ambiguous word in the knowledge base to obtain semantic vectors of all candidate entities corresponding to the ambiguous word in the knowledge base.
13. The apparatus of claim 10, wherein the semantic matching unit comprises:
the cross operation subunit is used for performing cross operation on the semantic vector of the ambiguous word and the semantic vector of each candidate entity to obtain the similarity of the character strings;
and the neural network processing subunit is used for processing the context feature vector of the ambiguous word in the customer requirement information, the character string similarity between the ambiguous word and each candidate entity, and the semantic vectors of all candidate entities corresponding to the ambiguous word in a knowledge base by a neural network to obtain the semantic matching degree between the ambiguous word and each candidate entity.
14. The apparatus of claim 10, wherein the context feature vector unit comprises:
the word segmentation processing unit is used for processing the context to obtain characters or words of information except for ambiguous words in the customer requirement information and distance vectors from the characters or words to the ambiguous words;
and the convolution processing subunit is used for processing the characters or the words and the distance vectors from the characters or the words to the ambiguous words through a convolution neural network to obtain the context feature vectors of the ambiguous words in the customer requirement information.
15. The apparatus of claim 14, wherein the participle processing unit converts to a vector of a multidimensional space through an Embedding layer to represent the character or word and a distance vector of the character or word to the ambiguous word.
16. The apparatus according to any one of claims 10 to 15, wherein the candidate entity confirmation module queries the knowledge base by fuzzy matching to obtain all candidate entities corresponding to ambiguous words.
17. The apparatus of claim 10, wherein the triple set unit constructs a triple set by extracting triples obtained from all triples of the knowledge base by way of a random walk traversal.
18. An apparatus for disambiguation in an information query, comprising:
the sending module is used for sending the customer requirement information; the client requirement information comprises ambiguous words and contexts of the ambiguous words;
the receiving module is used for receiving target information corresponding to the customer demand information; the target information is obtained according to the ambiguous word, the context, a candidate entity corresponding to the ambiguous word obtained in a knowledge base, and structural information of the candidate entity in the knowledge base;
the target information is obtained by determining a context feature vector of the ambiguous word in the customer requirement information according to the context, determining a character string of the ambiguous word and a character string of each candidate entity corresponding to the ambiguous word, determining a semantic vector of the ambiguous word according to the character string of the ambiguous word, determining a semantic vector of each candidate entity according to the character string of each candidate entity, determining semantic vectors of all candidate entities corresponding to the ambiguous word in a knowledge base according to a matrix, determining a semantic matching degree between each candidate entity and the ambiguous word according to the context feature vector of the ambiguous word in the customer requirement information, the semantic vector of the ambiguous word, the semantic vector of each candidate entity, and the semantic vector of all candidate entities corresponding to the ambiguous word in a knowledge base, and screening out the knowledge base according to the semantic matching degree between each candidate entity and the ambiguous word Of (1); the semantic matching degree is obtained by performing cross operation on a semantic vector of the ambiguous word and a semantic vector of each candidate entity to obtain the character string similarity between the ambiguous word and each candidate entity, and inputting a context feature vector of the ambiguous word in the customer requirement information, the character string similarity between the ambiguous word and each candidate entity, semantic vectors of all candidate entities corresponding to the ambiguous word in a knowledge base and user historical behavior characteristics into a multi-layer neural network for processing; the structured information of the candidate entity in the knowledge base is obtained by traversing the triples of the knowledge base according to the candidate entity to obtain a triple set related to the candidate entity, constructing a matrix by the triples of the same relation type in the triple set, and obtaining a structured information model of the candidate entity.
19. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the customer demand information processing method according to claim 9 or any one of claims 1 to 8 when executing the computer program.
20. A readable storage medium on which a computer program is stored, wherein the computer program when executed implements the steps of the customer demand information processing method of claim 9 or any one of claims 1 to 8.
CN201810564777.9A 2018-06-04 2018-06-04 Ambiguity elimination method and device in information query and electronic equipment Active CN110555208B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810564777.9A CN110555208B (en) 2018-06-04 2018-06-04 Ambiguity elimination method and device in information query and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810564777.9A CN110555208B (en) 2018-06-04 2018-06-04 Ambiguity elimination method and device in information query and electronic equipment

Publications (2)

Publication Number Publication Date
CN110555208A CN110555208A (en) 2019-12-10
CN110555208B true CN110555208B (en) 2021-11-19

Family

ID=68736037

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810564777.9A Active CN110555208B (en) 2018-06-04 2018-06-04 Ambiguity elimination method and device in information query and electronic equipment

Country Status (1)

Country Link
CN (1) CN110555208B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112988921A (en) * 2019-12-13 2021-06-18 北京四维图新科技股份有限公司 Method and device for identifying map information change
CN113010633B (en) * 2019-12-20 2023-01-31 海信视像科技股份有限公司 Information interaction method and equipment
CN111274806B (en) * 2020-01-20 2020-11-06 医惠科技有限公司 Method and device for recognizing word segmentation and part of speech and method and device for analyzing electronic medical record
CN111767453A (en) * 2020-06-09 2020-10-13 上海森亿医疗科技有限公司 Query instruction generation method, device, equipment and storage medium based on semantic network
CN113761218B (en) * 2021-04-27 2024-05-10 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for entity linking
CN113704416B (en) * 2021-10-26 2022-03-04 深圳市北科瑞声科技股份有限公司 Word sense disambiguation method and device, electronic equipment and computer-readable storage medium
CN115828915B (en) * 2022-09-07 2023-08-22 北京百度网讯科技有限公司 Entity disambiguation method, device, electronic equipment and storage medium
CN115500472A (en) * 2022-09-14 2022-12-23 元盛食品制造(上海)有限公司 Preparation method of quick-fried cheese beef cake

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514150A (en) * 2012-06-21 2014-01-15 富士通株式会社 Method and device for recognizing ambiguous words with combinatorial ambiguities
CN104809176A (en) * 2015-04-13 2015-07-29 中央民族大学 Entity relationship extracting method of Zang language
CN107463615A (en) * 2017-07-03 2017-12-12 天津科技大学 Method is recommended based on the real-time place to go of context and user interest in open network
CN107622126A (en) * 2017-09-28 2018-01-23 联想(北京)有限公司 The method and apparatus sorted out to the solid data in data acquisition system
CN108280061A (en) * 2018-01-17 2018-07-13 北京百度网讯科技有限公司 Text handling method based on ambiguity entity word and device

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9665643B2 (en) * 2011-12-30 2017-05-30 Microsoft Technology Licensing, Llc Knowledge-based entity detection and disambiguation
US8751505B2 (en) * 2012-03-11 2014-06-10 International Business Machines Corporation Indexing and searching entity-relationship data
EP2932404A4 (en) * 2012-12-12 2016-08-10 Google Inc Providing search results based on a compositional query
CN103914543B (en) * 2014-04-03 2017-12-26 北京百度网讯科技有限公司 Search result shows method and apparatus
EP3143516A1 (en) * 2014-05-12 2017-03-22 Google, Inc. Disambiguation of queries implicit to multiple entities
CN106202382B (en) * 2016-07-08 2019-06-14 南京柯基数据科技有限公司 Link instance method and system
CN106228245B (en) * 2016-07-21 2018-09-04 电子科技大学 Infer the knowledge base complementing method with tensor neural network based on variation
CN106503148B (en) * 2016-10-21 2019-05-31 东南大学 A kind of table entity link method based on multiple knowledge base
CN106951684B (en) * 2017-02-28 2020-10-09 北京大学 Method for entity disambiguation in medical disease diagnosis record
CN107102989B (en) * 2017-05-24 2020-09-29 南京大学 Entity disambiguation method based on word vector and convolutional neural network
CN107358315A (en) * 2017-06-26 2017-11-17 深圳市金立通信设备有限公司 A kind of information forecasting method and terminal
CN108073570A (en) * 2018-01-04 2018-05-25 焦点科技股份有限公司 A kind of Word sense disambiguation method based on hidden Markov model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514150A (en) * 2012-06-21 2014-01-15 富士通株式会社 Method and device for recognizing ambiguous words with combinatorial ambiguities
CN104809176A (en) * 2015-04-13 2015-07-29 中央民族大学 Entity relationship extracting method of Zang language
CN107463615A (en) * 2017-07-03 2017-12-12 天津科技大学 Method is recommended based on the real-time place to go of context and user interest in open network
CN107622126A (en) * 2017-09-28 2018-01-23 联想(北京)有限公司 The method and apparatus sorted out to the solid data in data acquisition system
CN108280061A (en) * 2018-01-17 2018-07-13 北京百度网讯科技有限公司 Text handling method based on ambiguity entity word and device

Also Published As

Publication number Publication date
CN110555208A (en) 2019-12-10

Similar Documents

Publication Publication Date Title
CN110555208B (en) Ambiguity elimination method and device in information query and electronic equipment
US20220222920A1 (en) Content processing method and apparatus, computer device, and storage medium
Merler et al. Snap, Eat, RepEat: A food recognition engine for dietary logging
CN116402063B (en) Multi-modal irony recognition method, apparatus, device and storage medium
CN111324765A (en) Fine-grained sketch image retrieval method based on depth cascade cross-modal correlation
Gao et al. Self-attention driven adversarial similarity learning network
CN111783903B (en) Text processing method, text model processing method and device and computer equipment
Tran A survey of machine learning and data mining techniques used in multimedia system
CN116151263B (en) Multi-mode named entity recognition method, device, equipment and storage medium
US10191921B1 (en) System for expanding image search using attributes and associations
Mehmood et al. Effect of complementary visual words versus complementary features on clustering for effective content-based image search
CN110275919A (en) Data integrating method and device
CN115131698A (en) Video attribute determination method, device, equipment and storage medium
Wei et al. Food image classification and image retrieval based on visual features and machine learning
Nie et al. Cross-domain semantic transfer from large-scale social media
Cousseau et al. Linking place records using multi-view encoders
Li et al. Text classification on heterogeneous information network via enhanced GCN and knowledge
CN110674265B (en) Unstructured information oriented feature discrimination and information recommendation system
CN114329016B (en) Picture label generating method and text mapping method
Ji et al. Efficient semi-supervised multiple feature fusion with out-of-sample extension for 3D model retrieval
Alfaqeeh et al. Community detection in social networks by spectral embedding of typed graphs
Xu et al. Estimating similarity of rich internet pages using visual information
Mazhar et al. Similarity learning of product descriptions and images using multimodal neural networks
Wu [Retracted] Art Product Recognition Model Design and Construction of VR Model
CN117271818B (en) Visual question-answering method, system, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant