CN117093601A - Recall method, device, equipment and medium for structured data - Google Patents

Recall method, device, equipment and medium for structured data Download PDF

Info

Publication number
CN117093601A
CN117093601A CN202311117801.1A CN202311117801A CN117093601A CN 117093601 A CN117093601 A CN 117093601A CN 202311117801 A CN202311117801 A CN 202311117801A CN 117093601 A CN117093601 A CN 117093601A
Authority
CN
China
Prior art keywords
structured data
recall
data set
query text
keywords
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311117801.1A
Other languages
Chinese (zh)
Inventor
甘露
张新运
张建兵
陈亮辉
孙珂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202311117801.1A priority Critical patent/CN117093601A/en
Publication of CN117093601A publication Critical patent/CN117093601A/en
Pending legal-status Critical Current

Links

Abstract

The disclosure provides a recall method, device, equipment and medium for structured data, relates to the field of artificial intelligence, and particularly relates to the field of data processing. Determining a structured data table corresponding to the query text in the structured database, wherein the structured data table comprises a plurality of pieces of structured data; acquiring a plurality of types of keywords corresponding to the query text, and carrying out multi-way recall on the structured data in the structured data table based on a multi-way recall strategy corresponding to each type of keywords to acquire a first structured data set; carrying out semantic recall on the query text and the structured data in the structured data table, carrying out semantic filtration on the structured data obtained by the semantic recall, and obtaining a second structured data set generated after filtration; and acquiring a target structured data set according to the first structured data set and the second structured data set. The application combines the keyword recall and the semantic recall, and can improve the efficiency and accuracy of structured data retrieval.

Description

Recall method, device, equipment and medium for structured data
Technical Field
The disclosure relates to the field of artificial intelligence and data processing, in particular to the field of artificial intelligence, and specifically relates to a recall method, device, equipment and medium of structured data.
Background
Structured data refers to data stored in a well-defined and standardized format, suitable for use in a variety of different scenarios, typically in tabular form, and having predefined fields and data types. For example, in public security scenarios, the structured data includes data of people, trains, planes, hotels, cases, police, etc., which are stored in a database in the form of a relational table. In the related art, a keyword search technology is mostly adopted to recall related structured data from a database, but because matching information may be scattered in different field names and field values, the scheme has lower accuracy.
Disclosure of Invention
The present disclosure provides a recall method, apparatus, device and medium for structured data.
According to one aspect of the disclosure, a recall method of structured data is provided, by determining a structured data table corresponding to a query text in a structured database, wherein the structured data table contains a plurality of pieces of structured data; acquiring a plurality of types of keywords corresponding to the query text, and carrying out multi-way recall on the structured data in the structured data table based on a multi-way recall strategy corresponding to each type of keywords so as to acquire a first structured data set; carrying out semantic recall on the query text and the structured data in the structured data table, and carrying out semantic filtration on the structured data obtained by the semantic recall to obtain a second structured data set generated after filtration; and acquiring a target structured data set according to the first structured data set and the second structured data set.
In the recall method of the structured data, provided by the application, the keyword recall uses a multi-way recall strategy, so that the structured data related to the query can be more accurately found, the accuracy and the relativity of the query result can be improved, the semantic recall can consider wider context and semantic relativity, the relativity of the search result can be improved, the semantic filtering is combined, the recalled structured data can be further screened and filtered, the result is ensured to be more in line with the user expectation, and the application combines the keyword recall with the semantic recall, so that the efficiency and the accuracy of the structured data retrieval can be improved.
According to another aspect of the present disclosure, there is provided a recall device for structured data, including a determining module configured to determine a structured data table corresponding to a query text in a structured database, where the structured data table includes a plurality of structured data; the keyword recall module is used for acquiring various types of keywords corresponding to the query text, and carrying out multipath recall on the structured data in the structured data table based on a multipath recall strategy corresponding to each type of keywords so as to acquire a first structured data set; the semantic recall module is used for carrying out semantic recall on the query text and the structured data in the structured data table, and carrying out semantic filtration on the structured data obtained by the semantic recall to obtain a second structured data set generated after filtration; the data acquisition module is used for acquiring the target structured data set according to the first structured data set and the second structured data set.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the recall method of structured data described above.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the recall method of structured data described above.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the recall method of structured data described above.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic diagram illustrating an exemplary embodiment of a method of recall of structured data, in accordance with the present application.
FIG. 2 is a schematic diagram of an exemplary embodiment of another structured data recall method shown in the present application.
FIG. 3 is a schematic diagram of an exemplary embodiment of another structured data recall method shown in the present application.
FIG. 4 is a schematic diagram of the architecture of a structured data recall method according to the present application.
FIG. 5 is a schematic diagram of a structured data recall device according to the present application.
Fig. 6 is a schematic diagram of an electronic device according to an exemplary embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Artificial intelligence (Artificial Intelligence, AI for short) is a discipline of researching and enabling a computer to simulate certain thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning and the like) of a person, and has a technology at a hardware level and a technology at a software level. Artificial intelligence hardware technologies generally include computer vision technologies, speech recognition technologies, natural language processing technologies, and learning/deep learning, big data processing technologies, knowledge graph technologies, and the like.
Data processing, data (Data) is a representation of facts, concepts or instructions that may be processed by manual or automated means. After the data is interpreted and given a certain meaning, the data becomes information. Data processing (data processing) is the collection, storage, retrieval, processing, transformation, and transmission of data. The basic purpose of data processing is to extract and derive data that is valuable and meaningful to some particular person from a large, possibly unorganized, unintelligible, data.
FIG. 1 is a schematic diagram of an exemplary embodiment of a structured data recall method according to the present application, as shown in FIG. 1, comprising the steps of:
s101, determining a structured data table corresponding to the query text in the structured database, wherein the structured data table comprises a plurality of pieces of structured data.
Text entered by the user is received as query text, and the association between the query text and the structured database table is matched using related algorithms or rules based on the content and semantics of the query text.
For example, the intent of the query text may be understood by comparing keywords in the query text to field names, table descriptions, etc. of the database tables, or using natural language processing techniques, such as entity recognition, relational extraction, etc. to determine the structured data tables to which the query text corresponds in the structured database.
S102, acquiring a plurality of types of keywords corresponding to the query text, and carrying out multi-way recall on the structured data in the structured data table based on a multi-way recall strategy corresponding to each type of keywords so as to acquire a first structured data set.
Various types of keywords are extracted from the query text, and common keyword extraction methods comprise word segmentation, part-of-speech tagging, named entity recognition and the like. For each keyword type, a different recall policy may be formulated to obtain the relevant structured data. And according to recall strategies of different keyword types, executing the strategies in turn, acquiring related structured data, taking the acquired structured data as first structured data, and forming a first structured data set by all the first structured data together.
S103, carrying out semantic recall on the query text and the structured data in the structured data table, and carrying out semantic filtering on the structured data obtained by the semantic recall to obtain a second structured data set generated after filtering.
Inputting the query text and the pieces of structured data contained in the structured data table into a trained semantic recall model to obtain the pieces of structured data output by the semantic recall model, inputting the pieces of structured data output by the semantic recall model and the query text into a trained semantic filter model together to obtain the pieces of structured data output by the semantic filter model, taking the pieces of structured data output by the semantic filter model as second structured data, and forming a second structured data set by all the second structured data together.
Optionally, the semantic recall model uses a large language pre-training model as a base and combines a double-tower deep neural network (Deep Neural Networks, DNN), and the semantic recall model can utilize the semantic representation capability of the pre-training model and evaluate the similarity between the query text and the plurality of pieces of structured data through the double-tower DNN so as to realize text recall. Wherein the large language pre-training model can be a pre-training language model (Bidirectional Encoder Representations from Transformers, BERT); wherein the dual-tower DNN employs a Pairwise (pair) training approach, wherein one tower processes the query text and the other tower processes the pieces of structured data contained in the structured data table, each tower comprising a multi-layer deep neural network for extracting a characteristic representation of the input text, combining the outputs of the two towers through a connection layer, and calculating a similarity score between them.
Optionally, the semantic filtering model adopts a large language pre-training model as a base and combines a single-tower deep neural network (Deep Neural Networks, DNN), the semantic filtering model can utilize the semantic representation capability of the pre-training model, and the matching score is carried out on the query text and a plurality of pieces of structured data output by the semantic recall model through the single-tower DNN so as to realize filtering. Wherein the large language pre-training model can be a pre-training language model (Bidirectional Encoder Representations from Transformers, BERT); the single-tower DNN adopts a point-by-point (Pointwise) training mode, namely, each query text and the structural data output by the corresponding semantic recall model are treated as independent training samples. For each training sample, the model is classified or regressed by a deep neural network to output a predicted matching score. Wherein, when training the semantic filtering model, a plurality of counterexamples of structured data recalled by the semantic recall model can be constructed as partial negative samples.
S104, acquiring a target structured data set according to the first structured data set and the second structured data set.
Merging the first structured data set and the second structured data set to obtain a merged structured data set generated after merging; sequencing all the structured data contained in the merged structured data set to obtain the structured data of the P items arranged in front after sequencing as target structured data; a target structured data set is obtained that consists of all target structured data.
When ordering all structured data contained in the synthetic structured data set, a heuristic strategy can be used, or a ctr estimation model based on click logs can be adopted.
The embodiment of the application provides a recall method of structured data, which comprises the steps of determining a structured data table corresponding to a query text in a structured database, wherein the structured data table comprises a plurality of pieces of structured data; acquiring a plurality of types of keywords corresponding to the query text, and carrying out multi-way recall on the structured data in the structured data table based on a multi-way recall strategy corresponding to each type of keywords so as to acquire a first structured data set; carrying out semantic recall on the query text and the structured data in the structured data table, and carrying out semantic filtration on the structured data obtained by the semantic recall to obtain a second structured data set generated after filtration; and acquiring a target structured data set according to the first structured data set and the second structured data set. In the application, the keyword recall uses a multi-way recall strategy, so that the structured data related to the query can be more accurately found, the accuracy and the relevance of the query result can be improved, the semantic recall can consider wider context and semantic relevance, the relevance of the retrieval result can be improved, the recalled structured data can be further screened and filtered by combining semantic filtering, and the result can be ensured to be more in line with the user expectation.
FIG. 2 is a schematic diagram of an exemplary embodiment of another structured data recall method of the present application, as shown in FIG. 2, comprising the steps of:
s201, determining a structured data table corresponding to the query text in the structured database, wherein the structured data table comprises a plurality of pieces of structured data.
And acquiring query text input by a user. Alternatively, the query text is obtained in a request through a user interface, an application programming interface (Application Programming Interface, API), or any other suitable manner.
Intent recognition is performed on the query text. Where the intent recognition is aimed at determining the primary intent or purpose of the user query, a trained classification model may be used in intent recognition of the query text that receives the query text as input and outputs one or more intent categories. And determining a structured data table corresponding to the query text in the structured database according to the intention recognition result, and laying a foundation for subsequent structured data recall.
S202, extracting attributes of the query text, and obtaining extracted keywords as attribute keywords.
Optionally, when extracting the attribute of the query text, the query text may be preprocessed, including removing stop words, punctuation marks and other irrelevant characters, and performing word segmentation and other operations.
Attributes are typically specific information related to the query text, such as location, time, people, products, etc. In the application, a pre-trained model or rule can be used for entity identification and relation extraction, or a sequence labeling method can be used for attribute extraction, so that extracted keywords are obtained as attribute keywords.
S203, obtaining the rest inquiry texts except the attribute keywords in the inquiry texts, and carrying out importance analysis on the rest inquiry texts to determine important keywords and non-important keywords.
Further, the attribute keywords are removed from the query text using text processing techniques (e.g., regular expressions or natural language processing libraries), leaving the remainder as the remainder of the query text.
The remaining query text is parsed for importance to determine which keywords are important, relevant to the query intent, and which keywords are less important. Keywords in the query text are divided into important and non-important keywords based on the results of the importance parsing.
And S204, taking the attribute keywords, the important keywords and the non-important keywords as various types of keywords.
S205, carrying out multi-way recall on the structured data in the structured data table based on the multi-way recall strategy corresponding to each type of key words so as to acquire a first structured data set.
And determining a multi-way recall strategy corresponding to each type of keyword, wherein the multi-way recall strategy comprises at least two matching strategies of accurate matching, fuzzy matching, regular matching and numerical comparison.
And carrying out multi-way recall on the structured data in the structured data table according to a multi-way recall strategy aiming at each type of key word, and obtaining first recalled structured data of each path corresponding to the type of key word. And generating a first structured data set according to all the first structured data corresponding to all the types of keywords. In the application, based on a multi-way recall strategy, each recall strategy analyzes and recalls data from different angles. This can increase the coverage of the recall and increase the recall rate, and more potential matches can be captured, providing a more comprehensive result.
S206, carrying out semantic recall on the query text and the structured data in the structured data table, and carrying out semantic filtering on the structured data obtained by the semantic recall to obtain a second structured data set generated after filtering.
Inputting the query text and the pieces of structured data contained in the structured data table into a trained semantic recall model to obtain the pieces of structured data output by the semantic recall model, inputting the pieces of structured data output by the semantic recall model and the query text into a trained semantic filter model together to obtain the pieces of structured data output by the semantic filter model, taking the pieces of structured data output by the semantic filter model as second structured data, and forming a second structured data set by all the second structured data together.
Optionally, the semantic recall model employs a large language pre-training model base+parilwise double tower deep neural network (Deep Neural Networks, DNN).
Alternatively, the semantic filtering model employs a large language pre-training model base+pointwise single tower deep neural network (Deep Neural Networks, DNN). When training the semantic filter model, counterexamples of pieces of structured data recalled by some semantic recall models can be constructed as partial negative examples.
S207, acquiring a target structured data set according to the first structured data set and the second structured data set.
As one achievable way, the structured data existing in the first structured data set and the second structured data set simultaneously is obtained as target structured data; a target structured data set is obtained that consists of all target structured data. Since these data appear in both data sets, the two data sets are compared and the co-existing data is extracted as the target structured data, this way can ensure that the extracted data has higher accuracy and reliability, and thus can increase the integrity of the data, avoiding losing important information that may be present in one of the data sets.
As another implementation manner, the first structured data set and the second structured data set are combined to obtain a combined structured data set generated after combination; sequencing all the structured data contained in the merged structured data set to obtain the structured data of the P items arranged in front after sequencing as target structured data; a target structured data set is obtained that consists of all target structured data. When ordering all structured data contained in the synthetic structured data set, a heuristic strategy can be used, or a ctr estimation model based on click logs can be adopted. The method combines the first structured data set and the second structured data set to obtain a larger data set, so that the data volume and coverage area can be increased, more comprehensive information is provided, all structured data in the combined structured data set are ordered, and the most relevant or valuable data can be arranged in front according to a specific ordering rule or algorithm to improve the quality and relevance of target structured data.
In the embodiment of the application, the keyword recall and the semantic recall are combined, the keyword recall uses a multi-way recall strategy, the structured data related to the query can be more accurately found, the accuracy and the relevance of the query result can be improved, the semantic recall can consider wider context and semantic relevance, the relevance of the retrieval result can be improved, and the semantic filtering is combined, so that the recalled structured data can be further screened and filtered, the result is ensured to be more in line with the user expectation, and the efficiency and the accuracy of the structured data retrieval can be improved.
FIG. 3 is a schematic diagram of an exemplary embodiment of another structured data recall method of the present application, as shown in FIG. 3, comprising the steps of:
s301, determining a structured data table corresponding to the query text in the structured database, wherein the structured data table comprises a plurality of pieces of structured data.
S302, acquiring a plurality of types of keywords corresponding to the query text, and carrying out multi-way recall on the structured data in the structured data table based on a multi-way recall strategy corresponding to each type of keywords so as to acquire a first structured data set.
For the specific implementation of steps S301 to S302, reference may be made to the specific description of the relevant parts in the above embodiments, and the detailed description is omitted here.
S303, acquiring a first query text vector corresponding to the query text, and acquiring a candidate text vector corresponding to each piece of structured data in the structured data table.
And acquiring a text vector corresponding to the query text as a first query text vector.
And obtaining a text vector corresponding to each piece of structured data in the structured data table as a candidate text vector. Wherein each piece of structured data corresponds to a candidate text vector.
S304, obtaining the first query text vector and the vector similarity corresponding to each candidate text vector.
S305, sorting the vector similarity according to the order from big to small, and determining the vector similarity of the N items arranged in front as the target similarity according to the sorting result.
S306, taking the structured data corresponding to the target similarity as initial second structured data.
S307, obtaining a text vector to be filtered corresponding to each piece of initial second structured data, and obtaining a second query text vector corresponding to the query text.
And obtaining a text vector corresponding to each piece of initial second structured data as a text vector to be filtered.
And acquiring a text vector corresponding to the query text as a second query text vector.
The first query text vector and the second query text vector are vectors corresponding to the query text, but can be obtained through different natural language processing technologies and models, so that the first query text vector and the second query text vector are not identical.
S308, obtaining a matching score corresponding to the second query text vector and each text vector to be filtered.
And matching the second query text vector with each text vector to be filtered, and obtaining the matching score corresponding to each of the second query text vector and each text vector to be filtered.
S309, sorting the matching scores from large to small, and determining the matching score of the M items arranged in front as a target matching score according to the sorting result.
It will be appreciated that the value of M is less than the value of N.
And S310, taking the initial second structured data corresponding to the target matching score as second structured data obtained after filtering.
S311, acquiring a target structured data set according to the first structured data set and the second structured data set.
For the specific implementation of step S311, reference may be made to the specific description of the relevant parts in the above embodiment, and the detailed description is omitted here.
In the embodiment of the application, the keyword recall and the semantic recall are combined, the keyword recall uses a multi-way recall strategy, the structured data related to the query can be more accurately found, the accuracy and the relevance of the query result can be improved, the semantic recall can consider wider context and semantic relevance, the relevance of the retrieval result can be improved, and the semantic filtering is combined, so that the recalled structured data can be further screened and filtered, the result is ensured to be more in line with the user expectation, and the efficiency and the accuracy of the structured data retrieval can be improved.
FIG. 4 is a schematic diagram of the architecture of a structured data recall method of the present application, as shown in FIG. 4, for intent recognition of query text. And determining a structured data table corresponding to the query text in the structured database according to the intention recognition result, and laying a foundation for subsequent structured data recall.
In Term recall, attribute extraction is performed on the query text, and extracted keywords are obtained as attribute keywords. And acquiring the rest query texts except the attribute keywords in the query text, analyzing the importance of the rest query texts, and determining important keywords and non-important keywords. Attribute keywords, important keywords, and non-important keywords are used as various types of keywords. And carrying out multi-way recall on the structured data in the structured data table based on the multi-way recall strategy corresponding to each type of key words so as to obtain a first structured data set.
In the semantic recall, inputting the query text and the pieces of structured data contained in the structured data table into a trained semantic recall model to obtain the pieces of structured data output by the semantic recall model, inputting the pieces of structured data output by the semantic recall model and the query text into a trained semantic filter model together to obtain the pieces of structured data output by the semantic filter model, taking the pieces of structured data output by the semantic filter model as second structured data, and forming a second structured data set by all the second structured data together.
Merging the first structured data set and the second structured data set to obtain a merged structured data set generated after merging; sequencing all the structured data contained in the merged structured data set to obtain the structured data of the P items arranged in front after sequencing as target structured data; a target structured data set is obtained that consists of all target structured data.
FIG. 5 is a schematic diagram of a structured data recall device according to the present application, as shown in FIG. 5, the structured data recall device 500, comprising a determining module 501, a keyword recall module 502, a semantic recall module 503, and a data acquisition module 504, wherein:
The determining module 501 is configured to determine a structured data table corresponding to the query text in the structured database, where the structured data table includes a plurality of pieces of structured data.
The keyword recall module 502 is configured to obtain multiple types of keywords corresponding to the query text, and perform multiple recall on the structured data in the structured data table based on multiple recall policies corresponding to each type of keywords, so as to obtain a first structured data set.
The semantic recall module 503 is configured to perform semantic recall on the query text and the structured data in the structured data table, and perform semantic filtering on the structured data obtained by the semantic recall, so as to obtain a second structured data set generated after filtering.
The data acquisition module 504 is configured to acquire a target structured data set according to the first structured data set and the second structured data set.
In the device, the keyword recall uses a multi-way recall strategy, so that the structured data related to the query can be more accurately found, the accuracy and the relevance of the query result can be improved, the semantic recall can consider wider context and semantic relevance, the relevance of the retrieval result is improved, the semantic filtering is combined, the recalled structured data can be further screened and filtered, the result is ensured to be more in line with the user expectation, and the application combines the keyword recall and the semantic recall, so that the efficiency and the accuracy of the structured data retrieval can be improved.
Further, the keyword recall module 502 is further configured to: extracting attributes of the query text, and obtaining extracted keywords as attribute keywords; acquiring the rest inquiry texts except the attribute keywords in the inquiry text, carrying out importance analysis on the rest inquiry texts, and determining important keywords and non-important keywords; attribute keywords, important keywords, and non-important keywords are used as various types of keywords.
Further, the keyword recall module 502 is further configured to: determining a multi-way recall strategy corresponding to each type of keyword, wherein the multi-way recall strategy comprises at least two matching strategies of accurate matching, fuzzy matching, regular matching and numerical comparison; for each type of keyword, carrying out multi-way recall on the structured data in the structured data table according to a multi-way recall strategy, and obtaining first recalled structured data of each path corresponding to the type of keyword; and generating a first structured data set according to all the first structured data corresponding to all the types of keywords.
Further, the semantic recall module 503 is further configured to: acquiring a first query text vector corresponding to the query text, and acquiring a candidate text vector corresponding to each piece of structured data in the structured data table; obtaining a first query text vector and vector similarity corresponding to each candidate text vector; sorting the vector similarity according to the sequence from big to small, and determining the vector similarity of the first N items as the target similarity according to the sorting result; and taking the structured data corresponding to the target similarity as initial second structured data, and carrying out semantic filtering on the initial second structured data to obtain a second structured data set generated after filtering.
Further, the semantic recall module 503 is further configured to: acquiring a text vector to be filtered corresponding to each piece of initial second structured data, and acquiring a second query text vector corresponding to a query text; obtaining a matching score corresponding to the second query text vector and each text vector to be filtered; sorting the matching scores according to the size from large to small, and determining the matching score of the M items arranged in front as a target matching score according to the sorting result; and taking the initial second structured data corresponding to the target matching score as second structured data obtained after filtering.
Further, the data acquisition module 504 is further configured to: acquiring structured data which simultaneously exist in a first structured data set and a second structured data set as target structured data; a target structured data set is obtained that consists of all target structured data.
Further, the data acquisition module 504 is further configured to: merging the first structured data set and the second structured data set to obtain a merged structured data set generated after merging; sequencing all the structured data contained in the merged structured data set to obtain the structured data of the P items arranged in front after sequencing as target structured data; a target structured data set is obtained that consists of all target structured data.
Further, the determining module 501 is further configured to: acquiring a query text input by a user; and carrying out intention recognition on the query text, and determining a structured data table corresponding to the query text in the structured database according to the result of the intention recognition.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 6 illustrates a schematic block diagram of an example electronic device 600 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the apparatus 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 may also be stored. The computing unit 601, ROM 602, and RAM 603 are connected to each other by a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Various components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, mouse, etc.; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 601 performs the various methods and processes described above, such as the recall method of structured data. For example, in some embodiments, the recall method of structured data may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into RAM 603 and executed by computing unit 601, one or more steps of the structured data recall method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the recall method of structured data in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (19)

1. A recall method of structured data, comprising:
determining a structured data table corresponding to the query text in a structured database, wherein the structured data table comprises a plurality of pieces of structured data;
acquiring a plurality of types of keywords corresponding to the query text, and carrying out multi-way recall on the structured data in the structured data table based on a multi-way recall strategy corresponding to each type of keywords so as to acquire a first structured data set;
Carrying out semantic recall on the query text and the structured data in the structured data table, and carrying out semantic filtering on the structured data obtained by the semantic recall to obtain a second structured data set generated after filtering;
and acquiring a target structured data set according to the first structured data set and the second structured data set.
2. The method of claim 1, wherein the obtaining the plurality of types of keywords corresponding to the query text comprises:
extracting attributes of the query text, and obtaining extracted keywords as attribute keywords;
acquiring the rest inquiry texts except the attribute keywords in the inquiry texts, carrying out importance analysis on the rest inquiry texts, and determining important keywords and non-important keywords;
and taking the attribute keywords, the important keywords and the non-important keywords as the keywords of the multiple types.
3. The method of claim 2, wherein the multiplexing recall of structured data in the structured data table based on the multiplexing recall policy corresponding to each type of key to obtain the first structured data set comprises:
Determining a multi-path recall strategy corresponding to each type of key word, wherein the multi-path recall strategy comprises at least two matching strategies of accurate matching, fuzzy matching, regular matching and numerical comparison;
for each type of keyword, carrying out multi-way recall on the structured data in the structured data table according to the multi-way recall strategy, and obtaining first recalled structured data of each path corresponding to the type of keyword;
and generating the first structured data set according to all the first structured data corresponding to all the types of keywords.
4. A method according to claim 1 or 3, wherein said semantically recalling the query text and the structured data in the structured data table and semantically filtering the semantically recalled structured data to obtain a second structured data set generated after filtering, comprises:
acquiring a first query text vector corresponding to the query text, and acquiring a candidate text vector corresponding to each piece of structured data in the structured data table;
obtaining the first query text vector and the vector similarity corresponding to each candidate text vector;
Sorting the vector similarity according to the order from big to small, and determining the vector similarity of the N items arranged in front as the target similarity according to the sorting result;
and taking the structured data corresponding to the target similarity as initial second structured data, and carrying out semantic filtering on the initial second structured data to obtain a second structured data set generated after filtering.
5. The method of claim 4, wherein the semantically filtering the initial second structured data to obtain a second set of structured data generated after filtering comprises:
obtaining a text vector to be filtered corresponding to each piece of initial second structured data, and obtaining a second query text vector corresponding to the query text;
obtaining a matching score corresponding to the second query text vector and each text vector to be filtered;
sorting the matching scores from large to small, and determining the matching score of the M items arranged in front as a target matching score according to the sorting result;
and taking the initial second structured data corresponding to the target matching score as the second structured data obtained after filtering.
6. The method of claim 1 or 5, wherein the obtaining a target structured data set from the first structured data set and the second structured data set comprises:
acquiring structured data which simultaneously exist in the first structured data set and the second structured data set as target structured data;
and acquiring the target structured data set consisting of all the target structured data.
7. The method of claim 1 or 5, wherein the obtaining a target structured data set from the first structured data set and the second structured data set comprises:
merging the first structured data set and the second structured data set to obtain a merged structured data set generated after merging;
sequencing all the structured data contained in the merged structured data set to obtain the structured data of the P items arranged in front after sequencing as target structured data;
and acquiring the target structured data set consisting of all the target structured data.
8. The method of claim 1, wherein the determining a structured data table for which the query text corresponds in the structured database comprises:
Acquiring a query text input by a user;
and carrying out intention recognition on the query text, and determining a structured data table corresponding to the query text in a structured database according to an intention recognition result.
9. A recall device for structured data, comprising:
the determining module is used for determining a structured data table corresponding to the query text in the structured database, wherein the structured data table comprises a plurality of pieces of structured data;
the keyword recall module is used for acquiring multiple types of keywords corresponding to the query text, and carrying out multipath recall on the structured data in the structured data table based on a multipath recall strategy corresponding to each type of keywords so as to acquire a first structured data set;
the semantic recall module is used for carrying out semantic recall on the query text and the structured data in the structured data table, and carrying out semantic filtration on the structured data obtained by the semantic recall so as to obtain a second structured data set generated after filtration;
the data acquisition module is used for acquiring a target structured data set according to the first structured data set and the second structured data set.
10. The apparatus of claim 9, wherein the keyword recall module is further to:
extracting attributes of the query text, and obtaining extracted keywords as attribute keywords;
acquiring the rest inquiry texts except the attribute keywords in the inquiry texts, carrying out importance analysis on the rest inquiry texts, and determining important keywords and non-important keywords;
and taking the attribute keywords, the important keywords and the non-important keywords as the keywords of the multiple types.
11. The apparatus of claim 10, wherein the keyword recall module is further to:
determining a multi-path recall strategy corresponding to each type of key word, wherein the multi-path recall strategy comprises at least two matching strategies of accurate matching, fuzzy matching, regular matching and numerical comparison;
for each type of keyword, carrying out multi-way recall on the structured data in the structured data table according to the multi-way recall strategy, and obtaining first recalled structured data of each path corresponding to the type of keyword;
and generating the first structured data set according to all the first structured data corresponding to all the types of keywords.
12. The apparatus of claim 9 or 11, wherein the semantic recall module is further to:
acquiring a first query text vector corresponding to the query text, and acquiring a candidate text vector corresponding to each piece of structured data in the structured data table;
obtaining the first query text vector and the vector similarity corresponding to each candidate text vector;
sorting the vector similarity according to the order from big to small, and determining the vector similarity of the N items arranged in front as the target similarity according to the sorting result;
and taking the structured data corresponding to the target similarity as initial second structured data, and carrying out semantic filtering on the initial second structured data to obtain a second structured data set generated after filtering.
13. The apparatus of claim 12, wherein the semantic recall module is further to:
obtaining a text vector to be filtered corresponding to each piece of initial second structured data, and obtaining a second query text vector corresponding to the query text;
obtaining a matching score corresponding to the second query text vector and each text vector to be filtered;
sorting the matching scores from large to small, and determining the matching score of the M items arranged in front as a target matching score according to the sorting result;
And taking the initial second structured data corresponding to the target matching score as the second structured data obtained after filtering.
14. The apparatus of claim 9 or 13, wherein the data acquisition module is further configured to:
acquiring structured data which simultaneously exist in the first structured data set and the second structured data set as target structured data;
and acquiring the target structured data set consisting of all the target structured data.
15. The apparatus of claim 9 or 13, wherein the data acquisition module is further configured to:
merging the first structured data set and the second structured data set to obtain a merged structured data set generated after merging;
sequencing all the structured data contained in the merged structured data set to obtain the structured data of the P items arranged in front after sequencing as target structured data;
and acquiring the target structured data set consisting of all the target structured data.
16. The apparatus of claim 9, wherein the means for determining is further configured to:
acquiring a query text input by a user;
And carrying out intention recognition on the query text, and determining a structured data table corresponding to the query text in a structured database according to an intention recognition result.
17. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.
18. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-8.
19. A computer program product comprising a computer program which, when executed by a processor, implements the steps of the method according to any of claims 1-8.
CN202311117801.1A 2023-08-31 2023-08-31 Recall method, device, equipment and medium for structured data Pending CN117093601A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311117801.1A CN117093601A (en) 2023-08-31 2023-08-31 Recall method, device, equipment and medium for structured data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311117801.1A CN117093601A (en) 2023-08-31 2023-08-31 Recall method, device, equipment and medium for structured data

Publications (1)

Publication Number Publication Date
CN117093601A true CN117093601A (en) 2023-11-21

Family

ID=88780209

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311117801.1A Pending CN117093601A (en) 2023-08-31 2023-08-31 Recall method, device, equipment and medium for structured data

Country Status (1)

Country Link
CN (1) CN117093601A (en)

Similar Documents

Publication Publication Date Title
WO2017198031A1 (en) Semantic parsing method and apparatus
CN112925883B (en) Search request processing method and device, electronic equipment and readable storage medium
CN113988157A (en) Semantic retrieval network training method and device, electronic equipment and storage medium
CN112560461A (en) News clue generation method and device, electronic equipment and storage medium
CN113553412A (en) Question and answer processing method and device, electronic equipment and storage medium
US11755677B2 (en) Data mining method, data mining apparatus, electronic device and storage medium
CN115470313A (en) Information retrieval and model training method, device, equipment and storage medium
CN112948573B (en) Text label extraction method, device, equipment and computer storage medium
CN113609847A (en) Information extraction method and device, electronic equipment and storage medium
CN112989235A (en) Knowledge base-based internal link construction method, device, equipment and storage medium
CN112506864A (en) File retrieval method and device, electronic equipment and readable storage medium
CN113792230B (en) Service linking method, device, electronic equipment and storage medium
CN116467461A (en) Data processing method, device, equipment and medium applied to power distribution network
CN116090450A (en) Text processing method and computing device
CN115292506A (en) Knowledge graph ontology construction method and device applied to office field
CN112925912A (en) Text processing method, and synonymous text recall method and device
CN117093601A (en) Recall method, device, equipment and medium for structured data
CN113806483A (en) Data processing method and device, electronic equipment and computer program product
CN112784600A (en) Information sorting method and device, electronic equipment and storage medium
CN112560425A (en) Template generation method and device, electronic equipment and storage medium
CN112559713A (en) Text relevance judgment method and device, model, electronic equipment and readable medium
CN114201607B (en) Information processing method and device
CN115618968B (en) New idea discovery method and device, electronic device and storage medium
CN113971216B (en) Data processing method and device, electronic equipment and memory
CN114647739B (en) Entity chain finger method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination