CN116975198A - Information query method, device, equipment and medium - Google Patents

Information query method, device, equipment and medium Download PDF

Info

Publication number
CN116975198A
CN116975198A CN202310268968.1A CN202310268968A CN116975198A CN 116975198 A CN116975198 A CN 116975198A CN 202310268968 A CN202310268968 A CN 202310268968A CN 116975198 A CN116975198 A CN 116975198A
Authority
CN
China
Prior art keywords
query
target
entity
index
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310268968.1A
Other languages
Chinese (zh)
Inventor
邹红建
冯帅
王馨苇
王巍琦
杨子翰
方高林
何秀强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tenpay Payment Technology Co Ltd
Original Assignee
Tenpay Payment Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tenpay Payment Technology Co Ltd filed Critical Tenpay Payment Technology Co Ltd
Priority to CN202310268968.1A priority Critical patent/CN116975198A/en
Publication of CN116975198A publication Critical patent/CN116975198A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application relates to an information query method, an information query device, information query equipment and an information query medium, belongs to the artificial intelligence technology, and can be applied to the financial field. The method comprises the steps of carrying out entity recognition on an input query text according to a target knowledge graph to obtain a target query entity in the target field; the target knowledge graph is constructed in advance based on data in the target field; generating a query expression based on the target query entity; recall, from documents stored in an index library, a target document satisfying a query expression based on an index entity under a target field in the index library constructed in advance; the indexing entity includes an entity identified from documents stored in the index repository based on the target knowledge-graph. By adopting the method, the information query accuracy can be improved.

Description

Information query method, device, equipment and medium
Technical Field
The present application relates to artificial intelligence technology, and in particular, to a method, apparatus, device, and medium for querying information.
Background
With the development of computer technology, information inquiry technology is developed, and information inquiry is a technical means for inquiring information required by a user, so that the user can inquire information matched with inquiry text from a database by inputting the inquiry text. In the conventional technology, a query text input by a user is subjected to word segmentation, and corresponding information is queried from a database based on words after word segmentation.
However, for some information queries in special fields, because the amount of query information contained in the traditional information query method is limited, the query intention of the user is not accurately understood, and the recall result is also more dependent on literal matching, so that intelligent performance is not realized, and the query accuracy is lower. For example, if the conventional information query method is directly applied to the financial information query in the financial scene, the query accuracy is low.
Disclosure of Invention
Based on this, it is necessary to provide an information query method, device, equipment and medium capable of improving the accuracy of information query in the target field, aiming at the technical problems.
In a first aspect, the present application provides an information query method, where the method includes:
performing entity recognition on the input query text according to the target knowledge graph to obtain a target query entity in the target field; the target knowledge graph is constructed in advance based on data in the target field;
generating a query expression based on the target query entity;
recall, from documents stored in an index library, a target document satisfying the query expression based on an index entity in the target field in a pre-constructed index library;
The indexing entity comprises an entity identified from documents stored in the index library based on the target knowledge graph.
In a second aspect, the present application provides an information query apparatus, the apparatus comprising:
the identification module is used for carrying out entity identification on the input query text according to the target knowledge graph to obtain a target query entity in the target field; the target knowledge graph is constructed in advance based on data in the target field;
the generation module is used for generating a query expression based on the target query entity;
the recall module is used for recalling target documents meeting the query expression from documents stored in the index library based on index entities in the target field in the index library constructed in advance;
the indexing entity comprises an entity identified from documents stored in the index library based on the target knowledge graph.
In one embodiment, the recognition module is further configured to segment the query text according to a plurality of text feature dimensions, so as to obtain sub-texts corresponding to the text feature dimensions respectively; matching the sub-texts corresponding to the text feature dimensions with the entities in the target knowledge graph to obtain candidate query entities; and de-duplicating each candidate query entity to obtain at least one target query entity in the target field.
In one embodiment, the identification module is further configured to deduplicate each candidate query entity to obtain at least one deduplicated query entity; obtaining at least one reference sub-text obtained by carrying out boundary segmentation on the query text; and taking the query entity which is matched with the reference sub-text and subjected to the de-duplication as a target query entity in the target field.
In one embodiment, the recognition module is further configured to input the query text to an entity recognition model, so as to perform entity recognition on the query text through the entity recognition model, thereby obtaining at least one target query entity in the target domain; the entity recognition model is obtained by training a sample query text in advance; and the sample query text comprises the entity under the target knowledge graph.
In one embodiment, the generating module is further configured to determine, according to the entity types corresponding to the target query entities, target query weights corresponding to the target query entities respectively; and combining the target query entities according to the target query weights to generate a query expression.
In one embodiment, the generating module is further configured to determine, for each target query entity, a first query weight corresponding to an entity type to which the target query entity belongs; performing weight estimation on the target query entity through a pre-trained entity weighting model to obtain a second query weight corresponding to the target query entity; and fusing the first query weight and the second query weight to obtain the target query weight corresponding to the target query entity.
In one embodiment, the recall module is further configured to determine an index entity in the target domain corresponding to each document stored in the index repository; and selecting a document of which the corresponding index entity accords with the query expression from the documents stored in the index library, obtaining a target document and recalling the target document.
In one embodiment, the recall module is further configured to select, from the documents stored in the index repository, a document whose corresponding index entity conforms to the query expression, and obtain a candidate document; determining semantic relatedness between each candidate document and the query text; and determining target documents which are strongly related to the query text semantics from the candidate documents according to the semantic relativity, and recalling.
In one embodiment, each target query entity corresponds to a target query weight; each index entity corresponds to a target index weight; the recall module is further configured to determine, for each candidate document, an index entity that is the same as a target query entity in the query text from index entities corresponding to the candidate documents, to obtain a common entity; determining semantic relativity between the candidate documents and the query text according to the target index weight and the target query weight corresponding to the common entity; the semantic relativity is related to the target index weight of the common entity and the target query weight.
In one embodiment, the recall module is further configured to recall, from the documents stored in the index repository, the target document satisfying the query expression based on the index entity and associated index information in the target domain in the index repository that is pre-constructed; the associated index information is information which has an associated relation with the index entity in the target knowledge graph.
In one embodiment, the associated index information includes at least one of heterogeneous entities, tag entities, or entity description information; the heterogeneous entity is an entity which has an association relation with the index entity in the target knowledge graph and does not belong to the same entity type with the index entity; the tag entity is an entity used for representing the personalized characteristics of the index entity in the target knowledge graph; the entity description information is determined based on the description text for describing the index entity in the target knowledge graph.
In one embodiment, the apparatus further comprises:
the sorting module is used for sorting the recalled target documents through a document sorting model to obtain a document sorting result for display; the document ordering result is used for representing semantic relativity between each target document and the query text; the document ordering model is obtained by training a plurality of groups of sample data in advance; each set of the sample data comprises sample query text and sample target documents; the sample query text and the sample target document comprise entities under the target knowledge graph; and the entity included in the sample query text has an association relationship with the entity included in the sample target document.
In a third aspect, the present application provides a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments of the application when the computer program is executed.
In a fourth aspect, the present application provides a computer readable storage medium storing a computer program which, when executed by a processor, performs steps in method embodiments of the present application.
In a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, carries out the steps of the method embodiments of the application.
According to the information query method, the information query device, the information query equipment, the information query medium and the computer program product, the target query entity in the target field is obtained by carrying out entity identification on the input query text according to the target knowledge graph, and the query expression is generated based on the target query entity. The target knowledge graph is constructed in advance based on data in the target field and contains rich general knowledge in the target field, so that a query expression generated based on a target query entity can accurately represent the query intention of a user. And recalling the target document meeting the query expression from the documents stored in the index library based on the index entity in the target field in the index library constructed in advance. Because the index entity is an entity identified from the documents stored in the index library based on the target knowledge graph, the semantics of the documents in the target field can be accurately represented, so that the target documents conforming to the real query intention of the user can be recalled based on the index entity in the target field.
Drawings
FIG. 1 is an application environment diagram of a method of information query in one embodiment;
FIG. 2 is a flow chart of a method for querying information in one embodiment;
FIG. 3 is a diagram of a financial knowledge graph in the financial domain, according to one embodiment;
FIG. 4 is a flow diagram of entity identification in one embodiment;
FIG. 5 is a schematic diagram of a flow chart of weight calculation in one embodiment;
FIG. 6 is a flow diagram of a file recall in one embodiment;
FIG. 7 is a flowchart of a method for querying information in another embodiment;
FIG. 8 is a block diagram of an information query apparatus in one embodiment;
FIG. 9 is a block diagram of an information query apparatus according to another embodiment;
fig. 10 is an internal structural view of a computer device in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The information query method provided by the application can be applied to an application environment shown in figure 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on the cloud or other servers. The terminal 102 may be, but not limited to, various desktop computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, network security services such as cloud security and host security, CDNs, and basic cloud computing services such as big data and artificial intelligent platforms. The terminal 102 and the server 104 may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein.
The terminal 102 can acquire a query text input by a user and send the query text to the server 104, and the server 104 can identify the entity of the input query text according to a target knowledge graph to obtain a target query entity in the target field; the server 104 may generate a query expression based on a target query entity and recall a target document satisfying the query expression from documents stored in an index library based on an index entity in the target domain in the index library constructed in advance; the indexing entity comprises an entity identified from documents stored in an index base based on a target knowledge graph.
It will be appreciated that server 104 may sort the recalled target documents and send the sorted target documents to terminal 102 for presentation. It will also be appreciated that server 104 may also send the recalled target document directly to terminal 102 and presented. The present embodiment is not limited thereto, and it is to be understood that the application scenario in fig. 1 is only schematically illustrated and is not limited thereto.
It should be noted that the information query method in some embodiments of the present application uses artificial intelligence technology. For example, the target query entity in the target domain may be obtained by using an artificial intelligence technology to perform entity recognition on the input query text, and the index entity in the target domain may also be obtained by using an artificial intelligence technology to perform entity recognition on the document stored in the index library. To facilitate understanding of artificial intelligence, the concept of artificial intelligence is described in relation to, in particular, simulating, extending and expanding human intelligence using a digital computer or a machine controlled by a digital computer, sensing the environment, obtaining knowledge, and using knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. The application realizes entity identification of the query text and the document based on the artificial intelligence technology, and can improve the accuracy of entity identification in the target field.
In one embodiment, as shown in fig. 2, an information query method is provided, and this embodiment is illustrated by using the method applied to the server 104 in fig. 1 as an example, and includes the following steps:
step 202, entity recognition is carried out on the input query text according to the target knowledge graph, and a target query entity in the target field is obtained; the target knowledge graph is constructed in advance based on data in the target field.
The knowledge graph is a graph showing the relationship between the knowledge development process and the structure, describes knowledge resources by using a visualization technology, and shows the interrelation between the knowledge. It is understood that the knowledge graph includes a plurality of entities and relationships between the entities. An entity is something that is distinguishable and exists independently. Such as a person, a city, a plant, a commodity, etc. World everything consists of concrete things, which refers to entities. The entities are the most basic elements in the knowledge graph, and different relationships exist among different entities. In the knowledge graph, each entity is represented by a node, and the relationship before the entity is represented by an edge. The target knowledge graph is a knowledge graph constructed based on data in the target field and is used for describing the interrelation among various knowledge in the target field. The query text is text input by the user for querying information. It will be appreciated that the query text may be a word, a term, or a sentence. The target query entity is an entity which is identified from the query text and belongs to the target field. It will be appreciated that the target query entity is one of the entities in the target knowledge-graph.
Specifically, the server may acquire a target knowledge graph previously constructed based on data in the target field. The terminal may acquire a query text input by the user and send the query text to the server. The server can receive the query text sent by the terminal, and perform entity recognition on the query text input by the user according to the target knowledge graph to obtain a target query entity in the target field. It can be understood that the target query entity identified from the query text belongs to knowledge in the target field, and can help understand the real query intention of the user in the target field, so that the query accuracy of the information in the target field is improved.
In one embodiment, the server may segment the query text to obtain a plurality of sub-texts, match each sub-text with an entity in the target knowledge-graph, and use the sub-text matched with the entity in the target knowledge-graph as the target query entity. It is understood that matching means that the sub-text is identical to at least part of the content of the entity in the target knowledge-graph. For example, if the sub-text is ABC and the entity in the target knowledge-graph is ABC, the sub-text is matched with the entity in the target knowledge-graph. If the sub-text is ABC and the entity in the target knowledge-graph is ABCD, the sub-text is also matched with the entity in the target knowledge-graph.
In one embodiment, the server may match each sub-text with an entity in the target knowledge graph to obtain a matched sub-text, and filter the matched sub-text belonging to the non-core term to obtain a sub-text matched with the entity in the target knowledge graph as the target query entity. It is understood that non-core words refer to words that do not have a substantial meaning in the field of interest. For example, the word having a substantial meaning in the target domain is ABCD, the matched sub-text is AB, and the AB has no substantial meaning in the target domain, then the AB belongs to non-core words, the server may filter the AB, and it may be understood that although the AB matches the entity in the target knowledge graph, the AB is not identified as the target query entity because it belongs to non-core words.
In one embodiment, the information query method of the present application further includes: and acquiring first map construction data from a pre-constructed private database. The private database is a database with privacy attribute and used for storing data in the target field. And obtaining second map construction data from a public database constructed in advance. The public database is a database which has public attributes and stores data in a target field. And carrying out spectrum construction based on the first spectrum construction data and the second spectrum construction data to obtain a target knowledge spectrum.
It will be appreciated that a private database is a database that is not disclosed externally, e.g., a database that is used internally by a company, and that a database that is private to that company is only used internally by that company and is not shared externally. The public database is a database disclosed outside, and it can be understood that the data obtained through the portal is derived from the public database. For example, news information, bulletins, community reviews, etc. are all derived from public databases. By constructing the target knowledge graph based on the first graph construction data from the private database and the second graph construction data from the public database, the entities and the relations in the target knowledge graph can be richer and more comprehensive, the authority and the completeness of the target knowledge graph are ensured, and accordingly information query is performed based on the complete target knowledge graph, and the accuracy of the information query can be further improved.
In order to facilitate further understanding of the target knowledge graph, the target domain includes a financial domain, and fig. 3 shows a financial knowledge graph pre-constructed based on data in the financial domain. The entities in the financial knowledge graph include foreign exchange, fund company, fund manager, fund, bond, plate 1, plate 2, stock 1 listing company, high management, stakeholder, index, business 1, business 2, stock 2 listing company, stock 3 listing company, event, comment, futures, etc. There is an association between two entities that are edge-connected. For example, an edge connection exists between two entities of the 'foundation manager' and the 'foundation company', which indicates that an association relationship exists between the two entities of the 'foundation manager' and the 'foundation company', namely the 'foundation manager' takes part in the 'foundation company'. For another example, an edge connection exists between two entities of the "fund manager" and the "fund", which indicates that an association exists between the two entities of the "fund manager" and the "fund", that is, the "fund manager" is responsible for managing the "fund". It can be understood that the accuracy of information query in the financial field can be improved by applying the information query method of the application to the information query scene in the financial field.
In step 204, a query expression is generated based on the target query entity.
Wherein the query expression is a logical expression for characterizing the user's query intent. It will be appreciated that the query expression includes target query entities identified from the query text, as well as logical characters between the target query entities.
For example, the target query entity includes A, B, C and D, then the query expression may be A and B and (C or D). It will be appreciated that and or are logical characters between target query entities. The sum between two target query entities is used for indicating that the two target query entities are about to exist simultaneously, and the or between the two target query entities is used for indicating that at least one of the two target query entities exists. The query expression a and B and (C or D) is understood to include at least one of C or D in the information to be queried, along with a and B.
In one embodiment, the server may obtain weights corresponding to the target query entities, and combine the target query entities according to the weights to obtain the query expression. It is to be appreciated that the weights of the target query entities can characterize the importance of the target query entities, as well as the intent preferences of the user query. The importance degree of the target query entity and the intentions of the user to query the information related to the target query entity are all positively correlated with the weight of the target query entity. It will be appreciated that the higher the weight of a target query entity, the more important the target query entity, and the more intensely the user wants to query information related to the target query entity. Therefore, the query expression obtained by combining the target query entities according to the weights can more accurately represent the query intention of the user, so that the accuracy of the information query can be further improved when the information query is performed based on the query expression.
In one embodiment, the server may also randomly combine the target query entities to obtain the query expression.
Step 206, recall the target document meeting the query expression from the documents stored in the index library based on the index entity in the target field in the index library constructed in advance; the indexing entity comprises an entity identified from documents stored in an index base based on a target knowledge graph.
The index database is used for storing a plurality of documents, and each document corresponds to an index entity. The index entity is an entity serving as index information in the index library, and it can be understood that the target document meeting the query expression can be quickly queried from the index library through the index entity. Satisfying the query expression means that the dependency relationship between the index entities corresponding to the document conforms to the dependency relationship between the entities required by the query expression.
In one embodiment, the server may obtain at least one document, identify each document in the index base based on the target knowledge graph, obtain an index entity in the target domain, and store the index entity and the corresponding document in the index base in association. For each document, the index entity identified from the document can be used as index information for querying the document, and the document can be quickly queried from an index library through the index entity of the document. It will be appreciated that after determining the query expression, the server may quickly determine, from among the documents stored in the index repository, the target document recalled to satisfy the query expression based on the index entity in the target domain in the index repository.
For example, if the query expression is A and B and (C or D), the index entity corresponding to document 1 includes A, B and C, the index entity corresponding to document 2 includes A, B and D, the index entity corresponding to document 3 includes A, B, C and D, the index entity corresponding to document 4 includes B, C and D, and the index entity corresponding to document 5 includes A and B. It can be appreciated that the dependencies between entities required by query expressions A and B and (C or D) are: at least one of C or D is also present while a and B are present, so that only index entities corresponding to document 1, document 2, and document 3 conform to the query expression, while index entities corresponding to document 4 and document 5 do not conform to the query expression, and thus, the server may treat document 1, document 2, and document 3 as target documents satisfying the query expression.
In the information query method, the target query entity in the target field is obtained by carrying out entity identification on the input query text according to the target knowledge graph, and the query expression is generated based on the target query entity. The target knowledge graph is constructed in advance based on data in the target field and contains rich general knowledge in the target field, so that a query expression generated based on a target query entity can accurately represent the query intention of a user. And recalling the target document meeting the query expression from the documents stored in the index library based on the index entity in the target field in the index library constructed in advance. Because the index entity is an entity identified from the documents stored in the index library based on the target knowledge graph, the semantics of the documents in the target field can be accurately represented, so that the target documents conforming to the real query intention of the user can be recalled based on the index entity in the target field.
In one embodiment, performing entity recognition on an input query text according to a target knowledge graph to obtain a target query entity in a target field, including: segmenting the query text according to a plurality of text feature dimensions to obtain sub-texts corresponding to the text feature dimensions respectively; matching the sub-texts corresponding to the text feature dimensions with the entities in the target knowledge graph to obtain candidate query entities; and de-duplicating each candidate query entity to obtain at least one target query entity in the target field.
Wherein the text feature dimension is a dimension for describing a text personalization feature. By way of example, text feature dimensions may include the dimensions of a word, and the dimensions of pinyin letters. It can be understood that the query text can be respectively segmented according to the dimension of the word, the dimension of the word and the dimension of the pinyin letters, so as to obtain the sub-text respectively corresponding to the dimension of the word, the dimension of the word and the dimension of the pinyin letters. The sub-text is text cut from the query text, and it is understood that the query text may include a plurality of sub-texts.
Specifically, the server may determine a plurality of text feature dimensions, and segment the input query text according to the plurality of text feature dimensions, to obtain sub-texts corresponding to the text feature dimensions respectively. The server can match the sub-texts corresponding to the text feature dimensions with the entities in the target knowledge graph to obtain candidate query entities. Because repeated sub-texts may exist in the sub-texts corresponding to the text feature dimensions, there may be repeated candidate query entities in the candidate query entities obtained by matching, and in order to avoid the repetition of the candidate query entities, the server may deduplicate each candidate query entity to obtain at least one target query entity in the target field.
In one embodiment, the server may perform deduplication on each candidate query entity to obtain at least one deduplicated query entity, and directly use the deduplicated query entity as a target query entity in the target domain.
In one embodiment, the server matches the sub-text corresponding to each text feature dimension with the entity in the target knowledge graph, and specifically may be implemented by using any matching method of hash (hash) table matching, prefix tree matching, suffix tree matching, and the like.
In one embodiment, the server may correct the misspelled word in the process of matching the sub-text corresponding to each text feature dimension with the entity in the target knowledge graph by the server to obtain the candidate query entity. Specifically, the server may match the sub-text corresponding to the dimension of the pinyin letters with the entity in the target knowledge graph, and if the text corresponding to the sub-text corresponding to the dimension of the pinyin letters in the query text is different from the text corresponding to the entity in the target knowledge graph, but the pinyin is the same, the server may correct the text corresponding to the query text, so that the corrected text corresponding to the query text is the same as the text corresponding to the entity in the target knowledge graph.
In the above embodiment, the candidate query entity is obtained by segmenting the query text according to the plurality of text feature dimensions and matching the sub-text corresponding to each text feature dimension obtained by segmentation with the entity in the target knowledge graph, so that the situation that the entity cannot be comprehensively matched with the entity in the target knowledge graph due to single text feature dimension can be avoided, and thus the entity matching accuracy can be improved, and the identification accuracy of the target query entity is improved. Furthermore, by performing deduplication on each candidate query entity, at least one target query entity in the target field is obtained, the situation that the matched entity is redundant repeatedly can be avoided, and therefore information query efficiency can be improved.
In one embodiment, deduplicating each candidate query entity to obtain at least one target query entity in the target domain, including: performing deduplication on each candidate query entity to obtain at least one deduplicated query entity; obtaining at least one reference sub-text obtained by carrying out boundary segmentation on a query text; and taking the query entity matched with the reference sub-text after the duplication removal as a target query entity in the target field.
The reference sub-text is used as a reference sub-text, and it can be understood that the reference sub-text has higher accuracy than the sub-text obtained by segmenting the query text according to a plurality of text feature dimensions, and the correctness of the sub-text obtained by segmenting the query text according to a plurality of text feature dimensions can be verified by referring to the sub-text, that is, the correctness of the query entity after duplicate removal can be verified by referring to the sub-text.
Specifically, the server may perform deduplication on each candidate query entity to obtain at least one deduplicated query entity. The server may input the query text into a trained text segmentation model to segment the query text boundaries through the text segmentation model to obtain at least one reference sub-text. Further, the server may use the deduplicated query entity that matches the reference sub-text as a target query entity in the target domain. It will be appreciated that the matching deduplicated query entity and reference sub-text have the same text content.
In the above embodiment, at least one de-duplicated query entity is obtained by de-duplicating each candidate query entity, and the de-duplicated query entity matched with the reference sub-text is used as the target query entity in the target field. The query text is subjected to boundary segmentation by referring to the sub-text, so that the boundary segmentation of the query entity subjected to duplication removal is verified to be accurate by referring to the sub-text, and the occurrence of segmentation ambiguity of the finally obtained target query entity can be avoided, so that the information query accuracy is further improved.
In one embodiment, performing entity recognition on an input query text according to a target knowledge graph to obtain a target query entity in a target field, including: inputting the query text into an entity recognition model to perform entity recognition on the query text through the entity recognition model to obtain at least one target query entity in the target field; the entity recognition model is obtained by training a sample query text in advance; the sample query text comprises entities under the target knowledge graph.
Wherein the entity recognition model is a neural network model for recognizing an entity from text. Sample query text is sample data used to train an entity recognition model.
Specifically, the server may input the query text to the entity recognition model, so as to perform entity recognition on the input query text through the entity recognition model, thereby obtaining at least one target query entity in the target field. It can be appreciated that, since the entity recognition model is trained by the sample query text including the entities under the target knowledge graph, the trained entity recognition model can accurately recognize the target query entities under the target domain from the query text.
In the above embodiment, the training process of the entity recognition model refers to the rich knowledge in the target knowledge graph, so that the entity recognition model has more accurate recognition capability on the data in the target field. And carrying out entity recognition on the query text through the entity recognition model, so that a more accurate target query entity in the target field can be obtained, and the accuracy of entity recognition in the target field is improved. In addition, the entity recognition model is used for recognizing the target query entity in the target field of the query text, so that the target query entity in the target field can be rapidly recognized, and the recognition efficiency of the target query entity is improved.
In one embodiment, the target query entity can be obtained by performing the multi-dimensional segmentation query on the query text, or can be obtained by identifying the entity identification model, or can be obtained by combining the two methods. It can be understood that if the target query entity is obtained by combining the two modes, a specific implementation manner may be to perform deduplication on the query entity obtained by multi-dimensional segmentation and the query entity obtained by the identification of the entity identification model, so as to obtain the final target query entity. It can be understood that by combining the two recognition modes of multi-dimensional segmentation recognition and entity recognition model recognition, the condition of missing recognition caused by single recognition mode can be avoided, and a more accurate target query entity in the target field can be obtained, so that the accuracy of entity recognition in the target field is further improved.
In one embodiment, as shown in fig. 4, the server may combine a manner of performing entity recognition on the multi-dimensional split query on the query text with a manner of performing entity recognition through an entity recognition model to jointly recognize entities in the query text or document that belong to the target domain. Specifically, the query text and the documents in the index library are taken as target text to be identified. The server may perform preprocessing on the target text, for example, perform format conversion, chinese-english conversion, case-case conversion, and the like on the target text, to obtain a preprocessed target text. Segmenting the preprocessed target text according to a plurality of text feature dimensions (namely multi-granularity segmentation) to obtain sub-texts corresponding to the text feature dimensions respectively, matching the sub-texts corresponding to the text feature dimensions with entities in the target knowledge graph (namely multi-granularity query) to obtain candidate entities, and de-duplicating each candidate entity to obtain at least one de-duplicated entity. The server may obtain at least one reference sub-text obtained by performing boundary segmentation on the target text, and use the entity after de-duplication matching with the reference sub-text as a first entity (i.e. disambiguation) in the target field. The server can input the preprocessed target text into the entity recognition model so as to perform entity recognition on the target text through the entity recognition model, and at least one second entity in the target field is obtained. It can be understood that, in the case that the first entity and the second entity are obtained by the two ways of identifying the entities, the server may perform deduplication on the first entity obtained by multidimensional segmentation and the second entity obtained by identifying the entity identification model, so as to obtain a final target entity. It can be understood that if the target text is the target query text, the target entity is the target query entity corresponding to the query text. If the target text is a document, the target entity is an index entity corresponding to the document.
In one embodiment, generating a query expression based on a target query entity includes: determining target query weights corresponding to the target query entities according to the entity types corresponding to the target query entities respectively; and combining the target query entities according to the target query weights to generate a query expression.
Specifically, the entity types are types to which the entity belongs, and each entity type corresponds to a weight value. It will be appreciated that the expert in the target area may give a weight corresponding to each entity type for indicating the importance of each entity type in the target area. Furthermore, the server may determine the target query weights corresponding to the target query entities according to the entity types corresponding to the target query entities, and combine the target query entities according to the target query weights to generate the query expression.
It can be appreciated that the generated query expression can characterize the importance of each target query entity in the query process, i.e., more in line with the user's query intent. For example, if the target query entity includes A, B, C and D, the target query weight corresponding to a is 0.4, the target query weight corresponding to B is 0.3, the target query weight corresponding to B is 0.2, and the target query weight corresponding to D is 0.1, the query expression generated by the server combining each target query entity according to the target query weights may be a and B and (C or D), where the query expression may be understood as including a and B and at least one of C or D in the information of the query.
In one embodiment, for each target query entity, the server may determine a query weight corresponding to the entity type to which the target query entity belongs, and directly use the determined query weight as the target query weight corresponding to the target query entity.
In one embodiment, the entity recognition model can label the entity type of the entity in the sample query text in the training process, so that the entity recognition model can not only recognize the target query entity from the query text, but also recognize the entity type of the target query entity.
In the above embodiment, since the entity type of the target query entity may represent the importance degree of the target query entity to a certain extent, the accuracy of the target query weight of the target query entity may be improved by determining the target query weight corresponding to each target query entity according to the entity type corresponding to each target query entity. Furthermore, by combining the target query entities according to the target query weights, a query expression reflecting the query intention of the user can be generated, thereby further improving the accuracy of information query.
In one embodiment, determining the target query weights corresponding to the target query entities according to the entity types corresponding to the target query entities, respectively, includes: determining a first query weight corresponding to an entity type to which each target query entity belongs according to each target query entity; carrying out weight pre-estimation on the target query entity through a pre-trained entity weighting model to obtain a second query weight corresponding to the target query entity; and fusing the first query weight and the second query weight to obtain the target query weight corresponding to the target query entity.
Wherein the entity weighting model is a neural network model for weighting an entity. The first query weight is a weight of a target query entity determined based on the entity type. The second query weight is obtained by carrying out weight estimation on the target query entity based on the entity weighting model. It can be understood that the first query weight and the second query weight are weight values of the target query entity obtained through different weight determination modes, and the first query weight and the second query weight are fused, so that more accurate target query weight can be obtained.
Specifically, for each target query entity, the server may determine a first query weight corresponding to the entity type to which the target query entity belongs. The server can input each target query entity into a pre-trained entity weighting model so as to perform weight estimation on the target query entity through the pre-trained entity weighting model and obtain a second query weight corresponding to the target query entity. Furthermore, the server may fuse the first query weight and the second query weight to obtain a target query weight corresponding to the target query entity. It can be appreciated that the target query weight fully considers the weight values of the target query entities obtained by different weight determination modes, and has higher accuracy than the first query weight and the second query weight.
In one embodiment, the server may multiply the first query weight and the second query weight to obtain a target query weight corresponding to the target query entity.
In one embodiment, the server may average the first query weight and the second query weight to obtain a target query weight corresponding to the target query entity.
In the above embodiment, the weight of the target query entity can be determined from multiple dimensions together by fusing the first query weight obtained based on the entity type and the second query weight obtained based on the entity weighting model, so that the singleness of determining the dimensions by the entity weight is avoided, and the accuracy of the target query weight of the target query entity is further improved.
In one embodiment, recall, from documents stored in an index repository, a target document satisfying a query expression based on an index entity in a target field in a pre-built index repository, comprising: determining index entities in the target fields corresponding to the documents stored in the index library respectively; and selecting a document of which the corresponding index entity accords with the query expression from the documents stored in the index library, obtaining a target document and recalling the target document.
Specifically, the index library stores a plurality of documents, each document corresponds to an index entity, and the index entity can be used as index information for querying the documents. It can be understood that the server can determine the index entity in the target field corresponding to each document stored in the index library, and select the document corresponding to the index entity meeting the query expression from the documents stored in the index library, so as to obtain the target document and recall the target document.
In one embodiment, the server may select a document corresponding to the index entity conforming to the query expression from the documents stored in the index repository, and take the document conforming to the query expression directly as the target document and recall.
In one embodiment, the server may select a document corresponding to the index entity conforming to the query expression from among the documents stored in the index repository, and randomly select a portion of the documents from among the documents conforming to the query expression as target documents and recall.
In the above embodiment, since the index entity has the relation of association storage with the corresponding document, the index entity can quickly select the document of which the corresponding index entity accords with the query expression from the documents stored in the index library, so as to obtain the target document, and the efficiency of document query is improved.
In one embodiment, selecting a document of which the corresponding index entity accords with the query expression from the documents stored in the index library, obtaining a target document and recalling the target document, and the method comprises the following steps: selecting a document of which the corresponding index entity accords with the query expression from the documents stored in the index library to obtain candidate documents; determining semantic relativity between each candidate document and the query text; and determining target documents which are strongly related to the query text semantics from the candidate documents according to the semantic relatedness and recalling.
The semantic relatedness is the relatedness between the semantics of the candidate document and the semantics of the query text. It can be appreciated that the higher the semantic relatedness is, the stronger the relatedness between the semantics of the candidate document and the semantics of the query text, i.e. the higher the semantic similarity of the semantics of the candidate document and the query text is.
Specifically, the server may select, from the documents stored in the index repository, the documents corresponding to the index entity conforming to the query expression, obtain candidate documents, and determine semantic relatedness between each candidate document and the query text. It can be understood that the semantic relativity between each candidate document and the query text is different, so that the recalled document better accords with the intention of the user query, the server can determine the target document which is strongly related to the query text semantic from each candidate document according to the semantic relativity and recall the target document, and the accuracy of document recall can be improved.
In one embodiment, the computer device may calculate the semantic relevance between each candidate document and the query text based on the weights corresponding to the index entities of the candidate documents and the target query weights corresponding to the target query entities of the query text.
In the above embodiment, the semantic relevance between each candidate document and the query text is determined. The semantic relativity between the candidate documents and the query text can represent the degree that the candidate documents meet the query intention of the user, and the higher the semantic relativity is, the more the candidate documents meet the query intention of the user, so that the target documents which are strongly related to the query text semantics are determined from the candidate documents according to the semantic relativity, and the accuracy of the document query can be further improved.
In one embodiment, each target query entity corresponds to a target query weight; each index entity corresponds to a target index weight; determining semantic relatedness between each candidate document and the query text, comprising: for each candidate document, determining index entities which are the same as target query entities in the query text from index entities corresponding to the candidate documents to obtain common entities; determining semantic relativity between candidate documents and query texts according to target index weights and target query weights corresponding to the common entities; the semantic relativity is related to the target index weight of the common entity and the size of the target query weight.
The target index weight is a weight corresponding to the index entity, and it can be understood that the target index weight can represent the importance degree of the index entity on the related document. For further understanding, it is illustrated that if a certain index entity appears in a document more frequently, it is illustrated that the index entity has higher importance to the document, and may be given higher weight. The common entity is an entity which exists in the index entity and the target query entity at the same time, and it can be understood that the common entity is both the index entity and the target query entity, and it can be further understood that if the entity contents of the index entity and the target query entity are the same, the index entity and the target query entity are the common entity.
Specifically, for each candidate document, the server may determine, from index entities corresponding to the candidate documents, index entities that are the same as target query entities in the query text, obtain a common entity, and determine a semantic relevance between the candidate document and the query text according to a target index weight and a target query weight corresponding to the common entity. It can be understood that the target index weight corresponding to the common entity is the target index weight corresponding to the index entity determined to be the common entity. The target query weight corresponding to the common entity is the target query weight corresponding to the target query entity determined to be the common entity. The magnitude of the semantic relatedness is positively related to the size of the target index weight and the target query weight of the common entity. It can be appreciated that the greater the target index weight and target query weight of a common entity, the higher the semantic relevance between the corresponding candidate document and the query text, and the smaller the target index weight and target query weight of the common entity, the lower the semantic relevance between the corresponding candidate document and the query text.
For example, the index entities corresponding to candidate document 1 include A, B, C and D, and the target index weights corresponding to A, B, C and D are 0.2, 0.4, 0.3, and 0.1, respectively. The index entities corresponding to candidate document 2 include A, B, C and E, and the target index weights corresponding to A, B, C and E are 0.1, and 0.7, respectively. The target query entities in the query text include A, B and C, and the target query weights corresponding to A, B, C and F are 0.4, 0.3, 0.2 and 0.1 respectively, and A, B, C is the common entity. Since the target index weight of A, B, C corresponding to candidate document 1 is greater than the target query weights of A, B and C in the query text, the semantic relevance between candidate document 1 and the query text is higher than the semantic relevance between candidate document 2 and the query text, and candidate document 1 is recalled preferentially.
In one embodiment, the target query weights corresponding to the target query entities and the target index weights corresponding to the index entities may be determined by the weight calculation logic shown in fig. 5. Specifically, the target query entity and the index entity are taken as entities, and the target query weight of the target query entity and the target index weight of the index entity are taken as target weights. For each entity, the server may determine a first weight corresponding to the entity type to which the entity belongs. The server may input each entity to a pre-trained entity weighting model, so as to perform weight estimation on the entity through the pre-trained entity weighting model, and obtain a second query weight corresponding to the entity. Furthermore, the server may fuse the first weight and the second weight to obtain a target weight corresponding to the entity. It can be appreciated that the target weight sufficiently considers the weight values of the entities obtained by different weight determining modes, and has higher accuracy than the first weight and the second weight. It can be understood that, in the case that the entity is the target query entity, the target weight is the target query weight corresponding to the target query entity. And under the condition that the entity is an index entity, the target weight is the target index weight corresponding to the index entity.
In the above embodiment, the size of the target index weight and the target query weight of the common entity may affect the semantic relevance between the candidate document and the query text. It will be appreciated that the higher the target query weight for a common entity, the more desirable the user will be to obtain documents that are strongly associated with that common entity. And the higher the target index weight of the common entity is, the stronger the relevance between the document corresponding to the common entity and the common entity is. Therefore, the semantic relativity between the candidate documents and the query text is determined through the target index weight and the target query weight corresponding to the common entity, so that the accuracy of the semantic relativity can be improved, and the accuracy of the information query can be further improved.
In one embodiment, recall, from documents stored in an index repository, a target document satisfying a query expression based on an index entity in a target field in a pre-built index repository, comprising: recall, from documents stored in an index library, a target document satisfying a query expression based on an index entity and associated index information in a target field in the index library constructed in advance; the associated index information is information which has an associated relation with the index entity in the target knowledge graph.
The information related to the index entity may specifically be information obtained based on an entity having an edge connected to the index entity in the target knowledge graph, and it may be understood that the related index information is information obtained based on an expansion of the index entity in the target knowledge graph, and documents related to the index entity may be recalled through the expanded related index information, so as to improve recall rate of the documents. The associated index information may be the entity itself connected to the index entity by the edge, or may be a part of text content corresponding to the entity. For example, if the index entity is a and the entity connected to a in the target knowledge graph is BCD, the associated index information may be BCD itself or a part of BCD, for example, the associated index information may be B.
Specifically, for each document in the index library constructed in advance, the document corresponds to the index entity and the associated index information in the target field, it can be understood that the index entity and the associated index information can be used as index information for querying the document, and the document can be rapidly queried through the index entity and the associated index information. It can be understood that the server can quickly recall the target document meeting the query expression from the documents stored in the index library based on the index entity and the associated index information in the target field in the index library, so that the document recall accuracy is further improved.
For example, if the target index entity corresponding to the query text input by the user is B, the index entity corresponding to the document is a, and the associated index information is B, the document may be recalled to the user through the associated index information corresponding to the document is B. It can be understood that, although the index entity corresponding to the document is a, the target index entity B corresponding to the query text is not directly contacted, by expanding the entity B having the association relationship with the entity a in the target knowledge graph into the index information of the document, recall of the document can be realized by matching the target index entity B with the association index information B, and the recall rate of the document is improved.
In the above embodiment, on the basis that the index entity in the target field exists in the index library, the index field is extended in the index library, that is, the associated index information is added, so that the index information of the document is richer, and therefore, the target document meeting the query expression is recalled from the document stored in the index library based on the index entity in the target field and the associated index information in the index library, and the query accuracy of the document can be further improved.
In one embodiment, the associated index information includes at least one of heterogeneous entities, tag entities, or entity description information; the heterogeneous entity is an entity which has an association relation with the index entity in the target knowledge graph and does not belong to the same entity type with the index entity; the label entity is an entity for representing personalized characteristics of the index entity in the target knowledge graph; the entity description information is determined based on the description text for describing the index entity in the target knowledge graph.
For further understanding of the heterogeneous entities, it is illustrated that if the index entity is "company a", the entity type is "company", the entities having an association relationship with the index entity "company a" in the target knowledge graph include "company B" and "product C", the entity type of "company B" is also "company", and the entity type of product C "is" product ", the heterogeneous entity of" company a "is" product C "which does not belong to the same entity type (i.e., the entity type does not belong to" company ").
For further understanding of the tag entity, it is illustrated that if the index entity is "fund a", the entity "gold cow fund a" in the target knowledge graph is an entity characterizing the personalized features of the index entity "fund a", which indicates that the fund a has awarded gold cow prize, and the tag entity of "fund a" is "gold cow fund a".
For further understanding of entity description information, it is illustrated that if the index entity is "company a", the description text in the target knowledge graph for describing the index entity "company a" is a brief introduction about the company, for example, the description text of "company a" is that "company a stands for XXX, the creator creates a person to be a third, and the business of the company mainly includes business B and business C", the entity description information of "company a" may be the description text itself, or may be a keyword extracted from the description text, for example, the entity description information of "company a" may be a third.
In one embodiment, as shown in fig. 6, the server may perform entity recognition on the input query text according to the target knowledge graph to obtain a target query entity (i.e., query text understanding) in the target domain, and generate a query expression based on the target query entity. Further, the server may recall, from documents stored in the index repository, target documents that satisfy the query expression based on index entities in the target domain in the index repository that were previously constructed. Specifically, the server may recall, from documents stored in the index repository, a target document that satisfies the query expression based on the index entity and associated index information in the target domain in the index repository that was previously constructed. Wherein the associated index information includes at least one of heterogeneous entities, tag entities, or entity description information. It can be understood that, in the case that the index entity corresponding to the document exists in the index library, the index field is extended in the index library, that is, the associated index information is added, so that the index information of the document is richer, and the recall rate of the document is improved. Furthermore, the server can input each recalled target document into the document ordering model so as to order each target document through the document ordering model, so that a document ordering result for displaying is obtained, the displaying effect of the document can be improved, and the user can efficiently obtain the target document which accords with the query intention.
In the above embodiment, the associated index information includes at least one of heterogeneous entities, tag entities or entity description information, which improves the richness of the associated index information. Therefore, based on the index entity and the associated index information in the target field in the index library, the target document meeting the query expression is recalled from the documents stored in the index library, and the query accuracy of the documents can be further improved.
In one embodiment, the method further comprises: sequencing all recalled target documents through a document sequencing model to obtain a document sequencing result for display; the document ordering result is used for representing semantic relativity between each target document and the query text; the document ordering model is obtained by training a plurality of groups of sample data in advance; each set of sample data includes sample query text and sample target documents; the sample query text and the sample target document comprise entities under the target knowledge graph; the entities included in the sample query text have an association with the entities included in the sample target document.
Wherein the document ranking model is a statistical model or a deep neural network model for ranking documents. The sample data is data for training a document ranking model. The sample query text is query text for training a document ranking model. It will be appreciated that the sample query text used to train the document ranking model may be the same as or different from the sample query text used to train the entity recognition model described above. The sample target document is a document used to train a document ranking model.
Specifically, the server may obtain multiple sets of sample data including sample query text and sample target documents, and train a document ranking model from the sample data, resulting in a trained document ranking model. Furthermore, the server can input the query text input by the user and the recalled target documents into the trained document ordering model, so that the recalled target documents are ordered through the document ordering model, and a document ordering result for display is obtained. The document ordering result can represent the semantic relativity between each target document and the query text, and it can be understood that the document ordering result comprises a document ordering list, and the higher the semantic relativity between the document ordering result and the query text is, the higher the ordering of the target document in the document ordering list is, so that the display effect of the document can be improved, and the user can efficiently acquire the target document conforming to the query intention.
In the above embodiment, the training process of the document ordering model refers to the rich knowledge under the target knowledge graph, so that the document ordering model can accurately identify the entity under the target field. And furthermore, the recalled target documents are sequenced through the document sequencing model, so that a document sequencing result for displaying is obtained, and the information display effect can be improved.
As shown in fig. 7, in one embodiment, an information query method is provided, and this embodiment is described by taking the application of the method to the server 104 in fig. 1 as an example, where the method specifically includes the following steps:
step 702, segmenting an input query text according to a plurality of text feature dimensions to obtain sub-texts corresponding to the text feature dimensions respectively;
step 704, matching the sub-texts corresponding to the text feature dimensions with the entities in the target knowledge graph to obtain candidate query entities; the target knowledge graph is constructed in advance based on data in the target field;
step 706, performing deduplication on each candidate query entity to obtain at least one deduplicated query entity;
step 708, obtaining at least one reference sub-text obtained by carrying out boundary segmentation on the query text, and taking the de-duplicated query entity matched with the reference sub-text as a target query entity in the target field.
In one embodiment, the server may further input the input query text to an entity recognition model, so as to perform entity recognition on the query text through the entity recognition model, to obtain at least one target query entity in the target domain; the entity recognition model is obtained by training a sample query text in advance; the sample query text comprises entities under the target knowledge graph. It can be understood that the target query entity can be obtained by performing the multi-dimensional segmentation on the query text, can be obtained by identifying an entity identification model, and can be obtained by combining the two modes. It can be understood that if the target query entity is obtained by combining the two modes, a specific implementation manner may be to perform deduplication on the query entity obtained by multi-dimensional segmentation and the query entity obtained by the identification of the entity identification model, so as to obtain the final target query entity.
Step 710, determining, for each target query entity, a first query weight corresponding to the entity type to which the target query entity belongs;
step 712, performing weight estimation on the target query entity through a pre-trained entity weighting model to obtain a second query weight corresponding to the target query entity;
step 714, fusing the first query weight and the second query weight to obtain a target query weight corresponding to the target query entity, and combining the target query entities according to the target query weights to generate a query expression.
Step 716, determining index entities and associated index information in the target fields corresponding to the documents stored in the index library.
The indexing entity comprises an entity identified from documents stored in an index library based on a target knowledge graph; the associated index information is information which has an associated relation with the index entity in the target knowledge graph.
In one embodiment, the associated index information includes at least one of heterogeneous entities, tag entities, or entity description information; the heterogeneous entity is an entity which has an association relation with the index entity in the target knowledge graph and does not belong to the same entity type with the index entity; the label entity is an entity for representing personalized characteristics of the index entity in the target knowledge graph; the entity description information is determined based on the description text for describing the index entity in the target knowledge graph.
Step 718, selecting the corresponding index entity and the document with the associated index information conforming to the query expression from the documents stored in the index library to obtain candidate documents;
step 720, determining semantic relativity between each candidate document and the query text, determining target documents which are strongly related to the query text semantics from each candidate document according to the semantic relativity, and recalling.
Step 722, sorting all recalled target documents through a document sorting model to obtain a document sorting result for display; and the document ordering result is used for representing the semantic relativity between each target document and the query text.
The document ordering model is obtained by training a plurality of groups of sample data in advance; each set of sample data includes sample query text and sample target documents; the sample query text and the sample target document comprise entities under the target knowledge graph; the entities included in the sample query text have an association with the entities included in the sample target document.
The application also provides an application scene, which applies the information query method. Specifically, the information query method can be applied to the scene of information query in the financial field. It is understood that the target domain includes a financial domain, the target knowledge-graph includes a financial knowledge-graph, and the document includes a financial document. Specifically, the server may segment the input query text according to a plurality of text feature dimensions, to obtain sub-texts corresponding to the text feature dimensions respectively; matching the sub-texts corresponding to the text feature dimensions with the entities in the financial knowledge graph to obtain candidate query entities; the financial knowledge graph is constructed in advance based on data in the financial field; performing deduplication on each candidate query entity to obtain at least one deduplicated query entity; obtaining at least one reference sub-text obtained by carrying out boundary segmentation on a query text; and using the de-duplicated query entity matched with the reference sub-text as a target query entity in the financial field.
It can be understood that the server may further input the input query text to an entity recognition model, so as to perform entity recognition on the query text through the entity recognition model, thereby obtaining at least one target query entity in the financial field; the entity recognition model is obtained by training a sample query text in advance; the sample query text includes entities under the financial knowledge graph. It can be understood that the target query entity can be obtained by performing the multi-dimensional segmentation on the query text, can be obtained by identifying an entity identification model, and can be obtained by combining the two modes. It can be understood that if the target query entity is obtained by combining the two modes, a specific implementation manner may be to perform deduplication on the query entity obtained by multi-dimensional segmentation and the query entity obtained by the identification of the entity identification model, so as to obtain the final target query entity.
For each target query entity, the server can determine a first query weight corresponding to the entity type to which the target query entity belongs; carrying out weight pre-estimation on the target query entity through a pre-trained entity weighting model to obtain a second query weight corresponding to the target query entity; and fusing the first query weight and the second query weight to obtain the target query weight corresponding to the target query entity. And combining the target query entities according to the target query weights to generate a query expression.
The server may determine index entities and associated index information in the financial domain to which each document stored in the index repository corresponds, respectively. Selecting corresponding index entities and documents with associated index information conforming to the query expression from the documents stored in the index library to obtain candidate financial documents; determining semantic relativity between each candidate financial document and the query text; and determining target financial documents which are strongly related to the query text semantics from the candidate financial documents according to the semantic relatedness and recalling. The indexing entity comprises an entity identified from documents stored in an index base based on a financial knowledge graph; the association index information is information having an association relation with an index entity in the financial knowledge graph. The associated index information comprises at least one of heterogeneous entities, tag entities or entity description information; the heterogeneous entity is an entity which has an association relation with the index entity in the financial knowledge graph and does not belong to the same entity type with the index entity; the label entity is an entity used for representing personalized characteristics of the index entity in the financial knowledge graph; the entity description information is determined based on the description text for describing the index entity in the financial knowledge graph.
The server can sort all recalled target financial documents through a document sorting model to obtain a document sorting result for display; and the document ordering result is used for representing the semantic relativity between each target financial document and the query text. The document ordering model is obtained by training a plurality of groups of sample data in advance; each set of sample data includes sample query text and sample target financial documents; the sample query text and the sample target financial document comprise entities under a financial knowledge graph; the entities included in the sample query text have an association with the entities included in the sample target financial document. The information query method can improve the query accuracy of the financial document in the financial field.
The application further provides an application scene, and particularly the information query method can be applied to the scene of information query in the medical field. It is understood that the target field includes the medical field, the target knowledge-graph includes the medical knowledge-graph, and the document includes the medical document. Specifically, the server can perform entity recognition on the input query text according to the medical knowledge graph to obtain a target query entity in the medical field, and generate a query expression based on the target query entity. Further, the server may recall, from documents stored in the index repository, the target medical document satisfying the query expression based on the index entity under the medical field in the index repository constructed in advance. The information query method can improve the query accuracy of the medical document in the medical field.
The application further provides an application scene, and particularly the information query method can be applied to the scene of information query in the communication field. It is understood that the target domain includes the communication domain, the target knowledge-graph includes the communication knowledge-graph, and the document includes the communication document. Specifically, the server can perform entity recognition on the input query text according to the communication knowledge graph to obtain a target query entity in the communication field, and generate a query expression based on the target query entity. Further, the server may recall, from among the documents stored in the index repository, the target communication document satisfying the query expression based on the index entity in the communication field in the index repository constructed in advance. The information query method can improve the query accuracy of the communication document in the communication field.
It should be understood that, although the steps in the flowcharts of the above embodiments are sequentially shown in order, these steps are not necessarily sequentially performed in order. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the embodiments described above may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, and the order of execution of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with at least a portion of other steps or sub-steps of other steps.
In one embodiment, as shown in fig. 8, an information query apparatus 800 is provided, which may employ software modules or hardware modules, or a combination of both, as part of a computer device, and specifically includes:
the recognition module 802 is configured to perform entity recognition on the input query text according to the target knowledge graph, so as to obtain a target query entity in the target field; the target knowledge graph is constructed in advance based on data in the target field;
a generating module 804, configured to generate a query expression based on the target query entity;
a recall module 806, configured to recall, from documents stored in the index library, a target document that satisfies the query expression based on an index entity in a target field in the index library that is constructed in advance;
the indexing entity comprises an entity identified from documents stored in an index base based on a target knowledge graph.
In one embodiment, the recognition module 802 is further configured to segment the query text according to a plurality of text feature dimensions, to obtain sub-texts corresponding to the text feature dimensions respectively; matching the sub-texts corresponding to the text feature dimensions with the entities in the target knowledge graph to obtain candidate query entities; and de-duplicating each candidate query entity to obtain at least one target query entity in the target field.
In one embodiment, the identifying module 802 is further configured to deduplicate each candidate query entity to obtain at least one deduplicated query entity; obtaining at least one reference sub-text obtained by carrying out boundary segmentation on a query text; and taking the query entity matched with the reference sub-text after the duplication removal as a target query entity in the target field.
In one embodiment, the recognition module 802 is further configured to input the query text into an entity recognition model, so as to perform entity recognition on the query text through the entity recognition model, thereby obtaining at least one target query entity in the target domain; the entity recognition model is obtained by training a sample query text in advance; the sample query text comprises entities under the target knowledge graph.
In one embodiment, the generating module 804 is further configured to determine, according to the entity types corresponding to the target query entities, target query weights corresponding to the target query entities respectively; and combining the target query entities according to the target query weights to generate a query expression.
In one embodiment, the generating module 804 is further configured to determine, for each target query entity, a first query weight corresponding to an entity type to which the target query entity belongs; carrying out weight pre-estimation on the target query entity through a pre-trained entity weighting model to obtain a second query weight corresponding to the target query entity; and fusing the first query weight and the second query weight to obtain the target query weight corresponding to the target query entity.
In one embodiment, recall module 806 is further configured to determine an index entity under the target domain to which each document stored in the index repository corresponds; and selecting a document of which the corresponding index entity accords with the query expression from the documents stored in the index library, obtaining a target document and recalling the target document.
In one embodiment, the recall module 806 is further configured to select, from the documents stored in the index repository, the documents corresponding to the index entity that conform to the query expression, and obtain candidate documents; determining semantic relativity between each candidate document and the query text; and determining target documents which are strongly related to the query text semantics from the candidate documents according to the semantic relatedness and recalling.
In one embodiment, each target query entity corresponds to a target query weight; each index entity corresponds to a target index weight; recall module 806 is further configured to determine, for each candidate document, an index entity that is the same as the target query entity in the query text from index entities corresponding to the candidate documents, to obtain a common entity; determining semantic relativity between candidate documents and query texts according to target index weights and target query weights corresponding to the common entities; the semantic relativity is related to the target index weight of the common entity and the size of the target query weight.
In one embodiment, recall module 806 is further configured to recall, from documents stored in the index repository, a target document that satisfies the query expression based on the index entity and associated index information in the pre-constructed index repository under the target domain; the associated index information is information which has an associated relation with the index entity in the target knowledge graph.
In one embodiment, the associated index information includes at least one of heterogeneous entities, tag entities, or entity description information; the heterogeneous entity is an entity which has an association relation with the index entity in the target knowledge graph and does not belong to the same entity type with the index entity; the label entity is an entity for representing personalized characteristics of the index entity in the target knowledge graph; the entity description information is determined based on the description text for describing the index entity in the target knowledge graph.
In one embodiment, referring to fig. 9, the information query apparatus 800 may further include:
the sorting module 808 is configured to sort the recalled target documents through the document sorting model, so as to obtain a document sorting result for display; the document ordering result is used for representing semantic relativity between each target document and the query text; the document ordering model is obtained by training a plurality of groups of sample data in advance; each set of sample data includes sample query text and sample target documents; the sample query text and the sample target document comprise entities under the target knowledge graph; the entities included in the sample query text have an association with the entities included in the sample target document.
According to the information query device, the target query entity in the target field is obtained by carrying out entity identification on the input query text according to the target knowledge graph, and the query expression is generated based on the target query entity. The target knowledge graph is constructed in advance based on data in the target field and contains rich general knowledge in the target field, so that a query expression generated based on a target query entity can accurately represent the query intention of a user. And recalling the target document meeting the query expression from the documents stored in the index library based on the index entity in the target field in the index library constructed in advance. Because the index entity is an entity identified from the documents stored in the index library based on the target knowledge graph, the semantics of the documents in the target field can be accurately represented, so that the target documents conforming to the real query intention of the user can be recalled based on the index entity in the target field.
The modules in the information query apparatus may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 10. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of querying information.
It will be appreciated by those skilled in the art that the structure shown in FIG. 10 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
In one embodiment, a computer-readable storage medium is provided, storing a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
In one embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims (16)

1. An information query method, the method comprising:
performing entity recognition on the input query text according to the target knowledge graph to obtain a target query entity in the target field; the target knowledge graph is constructed in advance based on data in the target field;
generating a query expression based on the target query entity;
Recall, from documents stored in an index library, a target document satisfying the query expression based on an index entity in the target field in a pre-constructed index library;
the indexing entity comprises an entity identified from documents stored in the index library based on the target knowledge graph.
2. The method of claim 1, wherein the entity recognition is performed on the input query text according to the target knowledge graph to obtain the target query entity in the target domain, and the method comprises:
segmenting the query text according to a plurality of text feature dimensions to obtain sub-texts corresponding to the text feature dimensions respectively;
matching the sub-texts corresponding to the text feature dimensions with the entities in the target knowledge graph to obtain candidate query entities;
and de-duplicating each candidate query entity to obtain at least one target query entity in the target field.
3. The method of claim 2, wherein the deduplicating each candidate query entity to obtain at least one target query entity in the target domain comprises:
performing deduplication on each candidate query entity to obtain at least one deduplicated query entity;
Obtaining at least one reference sub-text obtained by carrying out boundary segmentation on the query text;
and taking the query entity which is matched with the reference sub-text and subjected to the de-duplication as a target query entity in the target field.
4. The method of claim 1, wherein the entity recognition is performed on the input query text according to the target knowledge graph to obtain the target query entity in the target domain, and the method comprises:
inputting the query text into an entity recognition model to perform entity recognition on the query text through the entity recognition model to obtain at least one target query entity in the target field;
the entity recognition model is obtained by training a sample query text in advance; and the sample query text comprises the entity under the target knowledge graph.
5. The method of claim 1, wherein the generating a query expression based on the target query entity comprises:
determining target query weights respectively corresponding to the target query entities according to entity types respectively corresponding to the target query entities;
and combining the target query entities according to the target query weights to generate a query expression.
6. The method of claim 5, wherein determining the target query weights respectively corresponding to the target query entities according to the entity types respectively corresponding to the target query entities, comprises:
determining a first query weight corresponding to an entity type to which each target query entity belongs according to each target query entity;
performing weight estimation on the target query entity through a pre-trained entity weighting model to obtain a second query weight corresponding to the target query entity;
and fusing the first query weight and the second query weight to obtain the target query weight corresponding to the target query entity.
7. The method of claim 1, wherein recalling target documents that satisfy the query expression from documents stored in the index repository based on index entities in the target domain in a pre-built index repository, comprises:
determining index entities under the target fields corresponding to the documents stored in the index library respectively;
and selecting a document of which the corresponding index entity accords with the query expression from the documents stored in the index library, obtaining a target document and recalling the target document.
8. The method of claim 7, wherein selecting a document with a corresponding index entity conforming to the query expression from the documents stored in the index repository, obtaining a target document, and recalling the target document, comprises:
selecting a document of which the corresponding index entity accords with the query expression from the documents stored in the index library to obtain candidate documents;
determining semantic relatedness between each candidate document and the query text;
and determining target documents which are strongly related to the query text semantics from the candidate documents according to the semantic relativity, and recalling.
9. The method of claim 8, wherein each of the target query entities corresponds to a target query weight; each index entity corresponds to a target index weight;
the determining the semantic relatedness between each candidate document and the query text comprises the following steps:
for each candidate document, determining an index entity which is the same as a target query entity in the query text from index entities corresponding to the candidate documents to obtain a common entity;
determining semantic relativity between the candidate documents and the query text according to the target index weight and the target query weight corresponding to the common entity;
The semantic relativity is related to the target index weight of the common entity and the target query weight.
10. The method of claim 1, wherein recalling target documents that satisfy the query expression from documents stored in the index repository based on index entities in the target domain in a pre-built index repository, comprises:
the target document meeting the query expression is recalled from the documents stored in the index library based on the index entity and the associated index information in the target field in the index library constructed in advance;
the associated index information is information which has an associated relation with the index entity in the target knowledge graph.
11. The method of claim 10, wherein the associated index information comprises at least one of heterogeneous entities, tagged entities, or entity description information;
the heterogeneous entity is an entity which has an association relation with the index entity in the target knowledge graph and does not belong to the same entity type with the index entity;
the tag entity is an entity used for representing the personalized characteristics of the index entity in the target knowledge graph;
The entity description information is determined based on the description text for describing the index entity in the target knowledge graph.
12. The method according to any one of claims 1 to 11, further comprising:
sequencing the recalled target documents through a document sequencing model to obtain a document sequencing result for display;
the document ordering result is used for representing semantic relativity between each target document and the query text;
the document ordering model is obtained by training a plurality of groups of sample data in advance; each set of the sample data comprises sample query text and sample target documents; the sample query text and the sample target document comprise entities under the target knowledge graph; and the entity included in the sample query text has an association relationship with the entity included in the sample target document.
13. An information query apparatus, the apparatus comprising:
the identification module is used for carrying out entity identification on the input query text according to the target knowledge graph to obtain a target query entity in the target field; the target knowledge graph is constructed in advance based on data in the target field;
The generation module is used for generating a query expression based on the target query entity;
the recall module is used for recalling target documents meeting the query expression from documents stored in the index library based on index entities in the target field in the index library constructed in advance;
the indexing entity comprises an entity identified from documents stored in the index library based on the target knowledge graph.
14. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 12 when the computer program is executed.
15. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method of any one of claims 1 to 12.
16. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 12.
CN202310268968.1A 2023-03-14 2023-03-14 Information query method, device, equipment and medium Pending CN116975198A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310268968.1A CN116975198A (en) 2023-03-14 2023-03-14 Information query method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310268968.1A CN116975198A (en) 2023-03-14 2023-03-14 Information query method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN116975198A true CN116975198A (en) 2023-10-31

Family

ID=88482066

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310268968.1A Pending CN116975198A (en) 2023-03-14 2023-03-14 Information query method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN116975198A (en)

Similar Documents

Publication Publication Date Title
US20220261427A1 (en) Methods and system for semantic search in large databases
WO2020057022A1 (en) Associative recommendation method and apparatus, computer device, and storage medium
US9104979B2 (en) Entity recognition using probabilities for out-of-collection data
US8577882B2 (en) Method and system for searching multilingual documents
CN108304444B (en) Information query method and device
US20120117051A1 (en) Multi-modal approach to search query input
WO2021120627A1 (en) Data search matching method and apparatus, computer device, and storage medium
CN111680173A (en) CMR model for uniformly retrieving cross-media information
CN110637316B (en) System and method for prospective object identification
CN110929125B (en) Search recall method, device, equipment and storage medium thereof
WO2020114100A1 (en) Information processing method and apparatus, and computer storage medium
JP7451747B2 (en) Methods, devices, equipment and computer readable storage media for searching content
CA3138556A1 (en) Apparatuses, storage medium and method of querying data based on vertical search
WO2023108980A1 (en) Information push method and device based on text adversarial sample
CN112883030A (en) Data collection method and device, computer equipment and storage medium
CN115795030A (en) Text classification method and device, computer equipment and storage medium
CN112989010A (en) Data query method, data query device and electronic equipment
CN114330335A (en) Keyword extraction method, device, equipment and storage medium
Toba et al. Enhanced unsupervised person name disambiguation to support alumni tracer study
CN116822491A (en) Log analysis method and device, equipment and storage medium
CN116975198A (en) Information query method, device, equipment and medium
Ma et al. API prober–a tool for analyzing web API features and clustering web APIs
CN113779248A (en) Data classification model training method, data processing method and storage medium
JP2011159100A (en) Successive similar document retrieval apparatus, successive similar document retrieval method and program
CN110930189A (en) Personalized marketing method based on user behaviors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40099401

Country of ref document: HK