CN110866042A - Intelligent table query method and device and computer readable storage medium - Google Patents

Intelligent table query method and device and computer readable storage medium Download PDF

Info

Publication number
CN110866042A
CN110866042A CN201910975458.1A CN201910975458A CN110866042A CN 110866042 A CN110866042 A CN 110866042A CN 201910975458 A CN201910975458 A CN 201910975458A CN 110866042 A CN110866042 A CN 110866042A
Authority
CN
China
Prior art keywords
query
intelligent
training
word vector
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910975458.1A
Other languages
Chinese (zh)
Other versions
CN110866042B (en
Inventor
王建华
马琳
张晓东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910975458.1A priority Critical patent/CN110866042B/en
Publication of CN110866042A publication Critical patent/CN110866042A/en
Priority to PCT/CN2020/098951 priority patent/WO2021068565A1/en
Application granted granted Critical
Publication of CN110866042B publication Critical patent/CN110866042B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to an artificial intelligence technology, and discloses an intelligent table query method, which comprises the following steps: receiving an original form set and a label set, splitting the original form set to obtain a standard form set, performing part-of-speech coding on the label set to obtain a word vector set, inputting the standard form set and the word vector set into an intelligent query model to train to obtain a training value, completing the training of the intelligent query model until the training value is smaller than a preset threshold value, receiving query content of a user, extracting the query content based on a keyword extraction technology to obtain a keyword set, performing part-of-speech coding on the keyword set to obtain a keyword vector set, inputting the keyword vector set into the intelligent query model completing the training, obtaining the form set required by the query content, and outputting the form set. The invention also provides an intelligent table query device and a computer readable storage medium. The invention can realize the accurate and efficient intelligent table query function.

Description

Intelligent table query method and device and computer readable storage medium
Technical Field
The present invention relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for intelligently querying a table, and a computer-readable storage medium.
Background
With the rapid development of the internet, the data scale is rapidly enlarged, and therefore, the rapid query speed of the data is higher and higher. Most data are stored in a table form at present, such as income data of the day of company business, house property registration information of house property companies and the like, and at present, data-based query is mainly based on a table traversal method or a user keyword search method, so that although query requirements can be met to a certain extent, when the table capacity is large, the table traversal method and the user keyword search method are slow in search speed and consume a large amount of calculation memory.
Disclosure of Invention
The invention provides a method and a device for intelligently querying a table and a computer readable storage medium, and mainly aims to carry out intelligent query on the table according to query requirements of a user.
In order to achieve the above object, the present invention provides an intelligent table query method, which comprises:
receiving an original table set and a label set, and splitting the original table set to obtain a standard table set;
performing part-of-speech coding on the tag set to obtain a word vector set;
inputting the standard table set and the word vector set into an intelligent query model for training to obtain a training value, judging the size relationship between the training value and a preset threshold value, if the training value is larger than the preset threshold value, continuing training of the intelligent query model, and if the training value is smaller than the preset threshold value, finishing training of the intelligent query model;
receiving query content of a user, extracting the query content based on a keyword extraction technology to obtain a keyword set, performing part-of-speech coding on the keyword set to obtain a keyword vector set, inputting the keyword vector set to the intelligent query model which completes training, and obtaining and outputting a form set required by the query content.
Optionally, the splitting process is to split the original table set into a user layer, a computation layer and a data layer, and combine the user layer, the computation layer and the data layer into the standard table set;
wherein:
the user layer consists of a table header and a table header of each table in the original table set;
the data layer is composed of the body of each table in the original table set;
the computing layer provides a mutual query function of the user layer and the data layer.
Optionally, the part-of-speech encoding the tag set to obtain a word vector set includes:
carrying out unique hot coding on the label set to obtain a primary word vector set;
and carrying out dimension reduction on the primary word vector set to obtain the word vector set.
Optionally, the dimension reduction comprises:
establishing a forward probability model and a backward probability model;
and optimizing the forward probability model and the backward probability model to obtain an optimal solution, wherein the optimal solution is the word vector set.
Optionally, the standard table set and the word vector set are input into an intelligent query model to be trained to obtain training values, including:
performing the part-of-speech coding on the user layer information in the standard table set to obtain a user layer information vector set;
inputting the user layer information vector set into the intelligent query model, and sequentially performing convolution operation, pooling operation and activation operation on the intelligent query model to obtain a prediction value set;
and performing loss calculation on the prediction value set and the word vector set to obtain the training value.
In addition, in order to achieve the above object, the present invention further provides a table intelligent query apparatus, which includes a memory and a processor, wherein the memory stores a table intelligent query program operable on the processor, and the table intelligent query program, when executed by the processor, implements the following steps:
receiving an original table set and a label set, and splitting the original table set to obtain a standard table set;
performing part-of-speech coding on the tag set to obtain a word vector set;
inputting the standard table set and the word vector set into an intelligent query model for training to obtain a training value, judging the size relationship between the training value and a preset threshold value, if the training value is larger than the preset threshold value, continuing training of the intelligent query model, and if the training value is smaller than the preset threshold value, finishing training of the intelligent query model;
receiving query content of a user, extracting the query content based on a keyword extraction technology to obtain a keyword set, performing part-of-speech coding on the keyword set to obtain a keyword vector set, inputting the keyword vector set to the intelligent query model which completes training, and obtaining and outputting a form set required by the query content.
Optionally, the splitting process is to split the original table set into a user layer, a computation layer and a data layer, and combine the user layer, the computation layer and the data layer into the standard table set;
wherein:
the user layer consists of a table header and a table header of each table in the original table set;
the data layer is composed of the body of each table in the original table set;
the computing layer provides a mutual query function of the user layer and the data layer.
Optionally, the part-of-speech encoding the tag set to obtain a word vector set includes:
carrying out unique hot coding on the label set to obtain a primary word vector set;
and carrying out dimension reduction on the primary word vector set to obtain the word vector set.
Optionally, the dimension reduction comprises:
establishing a forward probability model and a backward probability model;
and optimizing the forward probability model and the backward probability model to obtain an optimal solution, wherein the optimal solution is the word vector set.
In addition, to achieve the above object, the present invention also provides a computer readable storage medium, which stores thereon a table intelligent query program, which is executable by one or more processors to implement the steps of the table intelligent query method as described above.
According to the method, the standard form set is obtained by splitting the original form set, so that the splitting process can split a large and integrated original form into a small-sized form, fast query is facilitated, the intelligent query model is trained through the standard form set, the intelligent query model has an excellent and efficient query function, meanwhile, keyword extraction is carried out on query contents of a user, the query contents desired by the user are accurately known, and fast query is carried out according to the intelligent query model. Therefore, the intelligent table query method, the intelligent table query device and the computer readable storage medium can realize accurate and efficient table query functions.
Drawings
Fig. 1 is a schematic flow chart of a table intelligent query method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of an internal structure of the intelligent table query device according to an embodiment of the present invention;
fig. 3 is a schematic block diagram of a table intelligent query program in the table intelligent query device according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides an intelligent table query method. Fig. 1 is a schematic flow chart of a table intelligent query method according to an embodiment of the present invention. The method may be performed by an apparatus, which may be implemented by software and/or hardware.
In this embodiment, the intelligent table query method includes:
and S1, receiving the original form set and the label set, and splitting the original form set to obtain a standard form set.
The original form set in the preferred embodiment of the present invention is a form automatically generated on the premise of different services, such as a real estate agency collecting a plurality of real estate sources and a real estate information form set obtained by the collection based on EXCEL software; the communication service company generates a user consumption list table set and the like according to the use conditions of calls, flow and the like of different users.
The set of tags is a description of each table in the original set of tables. Preferably, the description mode is a form of a keyword combination, such as the aforementioned property information table set, the tag set is a form of 4 types of keyword combinations of a cell name and a unit number + a property area + a market price + a property pattern, and if the tag set records that a set of rooms in the property information table set is: a cell 3A unit +89 square meters +120 ten thousand + three rooms, one hall, two toilets.
Preferably, the splitting process is to split the original table set into a user layer, a computation layer and a data layer to obtain the standard table set. Because the table comprises a table question, a table head and a table body, wherein the table question and the table head are in a form based on character expression, the table question of the product sales statistical table is a product sales statistical table, and the table head comprises a product number, a product name, specifications, a packaging mode and the like, the table question and the table head of the original table set are extracted to form a user layer; the form body is data for storing the whole table, so the form body of the original form set is extracted to form a data layer, when the whole table is split to form the user layer and the data layer, a query relationship needs to be established between the user layer and the data layer, that is, the function of the computation layer, the computation layer can adopt a linear index query form of multiple elements once, and the linear query is:
y=a1x1+a2x2+…+anxn
where y is the data in the data layer and x1、x2、xnIs the text information of the user layer, a1、a2、anThe index coefficient is 0 or 1. If the house information with the house price of 130 ten thousand needs to be inquired, so that y is 130, all the user layer character information meeting the house price of 130 ten thousand is reversely solved, namely, the table head and the table question information in the original table set are shown, if the table questions of cell A, cell B and cell E have house sources meeting the house price of 130 ten thousand, and the table head is shown as which house prices are 130 ten thousand under the table question of cell A.
And S2, performing part-of-speech coding on the tag set to obtain a word vector set.
Since the intelligent query model cannot effectively identify the text information, the effective identification is to perform differentiation judgment after extracting the characteristics of the text information, and therefore, preferably, the part-of-speech encoding is to represent all keywords of each label in the label set by an N-dimensional matrix vector, that is, to convert the text information into digital information for subsequent identification and training of the model, where N is the number of keywords in the label set.
Further, the part-of-speech encoding first performs one-hot encoding (one-hot) operation on the tag set to obtain a primary word vector set, and then performs dimension reduction on the primary word vector set to obtain the word vector set.
The one-hot operation is as follows:
Figure BDA0002230002130000051
wherein i represents a keyword number, viN-dimensional matrix vector representing keyword i, all viForming the primary word vector set, assuming a total of s keywords, vjIs the jth element of the N-dimensional matrix vector.
Further, the dimension reduction is to shorten the generated N-dimensional matrix vector into data with smaller dimensions and easier calculation for subsequent model training, i.e. to finally convert the primary word vector set into the word vector set.
Preferably, the dimensionality reduction first establishes a forward probability model and a backward probability model, and then optimizes the forward probability model and the backward probability model to obtain an optimal solution, which is the word vector set.
Further, the forward probability model and the backward probability model are respectively:
Figure BDA0002230002130000061
Figure BDA0002230002130000062
optimizing the forward probability model and the backward probability model:
Figure BDA0002230002130000063
where max represents the optimization, where,
Figure BDA0002230002130000065
indicating the derivation, viAnd an N-dimensional matrix vector representing a keyword i, wherein the label set has s keywords, and further, after the forward probability model and the backward probability model are optimized, the dimension of the N-dimensional matrix vector is reduced to be smaller, and the dimension reduction is completed to obtain the word vector set.
S3, inputting the standard table set and the word vector set into an intelligent query model for training to obtain a training value, judging the size relation between the training value and a preset threshold value, if the training value is larger than the preset threshold value, continuing training of the intelligent query model, and if the training value is smaller than the preset threshold value, finishing training of the intelligent query model.
Preferably, the training to obtain the training value comprises: and performing the part-of-speech coding on the user layer information in the standard table set to obtain a user layer information vector set, inputting the user layer information vector set to the intelligent query model, sequentially performing convolution operation, pooling operation and activation operation on the intelligent query model to obtain a prediction value set, and performing loss calculation on the prediction value set and the word vector set to obtain the training value.
And further, the convolution operation and the pooling operation comprise the steps of constructing a convolution template in advance, determining convolution step length, calculating the convolution template and a user layer information vector set according to the convolution step length to obtain a convolution matrix set after the convolution operation, and finishing the convolution operation. And selecting the maximum value or the average value of the matrix in the convolution matrix set to replace the convolution matrix set, and finishing the pooling operation.
Further, the pre-constructed convolution template may be a standard 3 x 3 matrix, such as
Figure BDA0002230002130000064
The calculation mode of calculating the matrix after the convolution operation is a mode that the convolution amplitude is 1 from left to right, and if the characteristic candidate area matrix with 9 × 9 characteristics in the characteristic candidate area set is:
Figure BDA0002230002130000071
said pre-constructed convolution template
Figure BDA0002230002130000072
First and
Figure BDA00022300021300000715
and calculating in the following way: multiplying corresponding dimensions of 1 × 0, 0 × 3, 1 × 1 and the like, and finally obtaining the result:
Figure BDA0002230002130000074
and so on, the pre-constructed convolution template
Figure BDA0002230002130000075
Continuing to traverse right one step with a convolution magnitude of 1 and the matrix as: the pre-constructed convolution template
Figure BDA0002230002130000076
Performing the above operation to obtain the pre-constructed convolution template
Figure BDA0002230002130000077
It follows that a large number of small-dimensional matrices can be generated when the convolution operation is completed, as described above
Figure BDA0002230002130000078
And
Figure BDA0002230002130000079
etc. therefore, the pooling operation is to make the dimensionality of the large number of small-dimensional matrices generated by the convolution operation smaller, preferably using the maximization principle, as described above
Figure BDA00022300021300000710
And
Figure BDA00022300021300000711
the maximum values 3 and 7 are substituted to complete the pooling operation.
Preferably, the convolution and pooling operations are repeated, and the final feature matrix set can be obtained after 16 times of convolution and pooling operations.
Preferably, the activation operation is to perform probability estimation on the feature matrix set through a softmax function, and select a prediction result with the highest probability as a final prediction and output the final prediction. The softmax function is:
Figure BDA00022300021300000712
wherein p (matrix) represents the output probability of the feature matrix set matrix, k represents the data volume of the feature matrix set, and e is infiniteThe cyclic fraction, j, represents the range of the set of prediction values that can be selected. Such as when
Figure BDA00022300021300000713
Time, calculate
Figure BDA00022300021300000714
Is 0.21, when
Figure BDA0002230002130000081
Time, calculate
Figure BDA0002230002130000082
Is 0.64, so the feature matrix represents
Figure BDA0002230002130000083
The loss calculation includes:
Figure BDA0002230002130000084
where t is the number of the word vector set, yiFor the set of word vectors, yi' is the set of prediction values.
S4, receiving query content of a user, extracting the query content based on a keyword extraction algorithm to obtain a keyword set, performing part-of-speech conversion on the keyword set to obtain a keyword vector set, inputting the keyword vector set to the intelligent query model which completes training to obtain a table set required by the query content, and outputting the table set.
Because the query content input by the user is often not in the form of the specified keyword combination and tastes linguistically much, such as "i want to search for the telephone bill in September", the keyword extraction needs to be performed on the query content input by the user, and the keywords "September" and "telephone bill" in the "telephone bill in September that i want to search for" are extracted.
Preferably, the keyword extraction may adopt a traversal method, for example, all keywords in the tag set are split and deduplicated to construct a keyword vocabulary, and the query content is sequentially compared with the keyword vocabulary to complete the keyword extraction.
Further, the intelligent query model obtains a table set required by the query content, wherein the query content is information of the user layer, so that the function provided by the computation layer for querying the data layer according to the user layer is called to obtain the data layer, and the data layer and the user layer are recombined to obtain the table set.
The invention also provides an intelligent table query device. Fig. 2 is a schematic diagram illustrating an internal structure of the intelligent table query device according to an embodiment of the present invention.
In this embodiment, the form intelligent query apparatus 1 may be a PC (Personal Computer), a terminal device such as a smart phone, a tablet Computer, or a mobile Computer, or may be a server. The intelligent query device 1 for tables at least comprises a memory 11, a processor 12, a communication bus 13, and a network interface 14.
The memory 11 includes at least one type of readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 11 may be an internal storage unit of the table intelligent query apparatus 1 in some embodiments, for example, a hard disk of the table intelligent query apparatus 1. The memory 11 may also be an external storage device of the table lookup apparatus 1 in other embodiments, such as a plug-in hard disk provided on the table lookup apparatus 1, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 11 may also include both an internal storage unit and an external storage device of the table intelligent query apparatus 1. The memory 11 may be used not only to store application software installed in the table intelligent query apparatus 1 and various types of data, such as codes of the table intelligent query program 01, but also to temporarily store data that has been output or is to be output.
Processor 12, which in some embodiments may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data Processing chip, is configured to execute program code stored in memory 11 or process data, such as executing table smart query 01.
The communication bus 13 is used to realize connection communication between these components.
The network interface 14 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), typically used to establish a communication link between the apparatus 1 and other electronic devices.
Optionally, the apparatus 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is used to display information processed in the form intelligent query apparatus 1 and to display a visual user interface.
While FIG. 2 shows only the form intelligent query device 1 having the components 11-14 and the form intelligent query program 01, those skilled in the art will appreciate that the configuration shown in FIG. 1 does not constitute a limitation of the form intelligent query device 1, and may include fewer or more components than shown, or some components in combination, or a different arrangement of components.
In the embodiment of the apparatus 1 shown in fig. 2, the memory 11 stores a table smart query program 01; when processor 12 executes table smart query program 01 stored in memory 11, the following steps are implemented:
the method comprises the steps of firstly, receiving an original form set and a label set, and splitting the original form set to obtain a standard form set.
The original form set in the preferred embodiment of the present invention is a form automatically generated on the premise of different services, such as a real estate agency collecting a plurality of real estate sources and a real estate information form set obtained by the collection based on EXCEL software; the communication service company generates a user consumption list table set and the like according to the use conditions of calls, flow and the like of different users.
The set of tags is a description of each table in the original set of tables. Preferably, the description mode is a form of a keyword combination, such as the aforementioned property information table set, the tag set is a form of 4 types of keyword combinations of a cell name and a unit number + a property area + a market price + a property pattern, and if the tag set records that a set of rooms in the property information table set is: a cell 3A unit +89 square meters +120 ten thousand + three rooms, one hall, two toilets.
Preferably, the splitting process is to split the original table set into a user layer, a computation layer and a data layer to obtain the standard table set. Because the table comprises a table question, a table head and a table body, wherein the table question and the table head are in a form based on character expression, the table question of the product sales statistical table is a product sales statistical table, and the table head comprises a product number, a product name, specifications, a packaging mode and the like, the table question and the table head of the original table set are extracted to form a user layer; the form body is data for storing the whole table, so the form body of the original form set is extracted to form a data layer, when the whole table is split to form the user layer and the data layer, a query relationship needs to be established between the user layer and the data layer, that is, the function of the computation layer, the computation layer can adopt a linear index query form of multiple elements once, and the linear query is:
y=a1x1+a2x2+…+anxn
where y is the data in the data layer and x1、x2、xnIs the text information of the user layer, a1、a2、anThe index coefficient is 0 or 1. If the house property information with the house property price of 130 ten thousand needs to be inquired, the y is 130, and all the user layer text information meeting the house property price of 130 ten thousand, namely the post, is reversely solvedThe original table is concentrated with table header and table header information, for example, the table header is that under the table header of the table header, which houses are at a price of 130 ten thousand, the table header has house sources meeting the price of 130 ten thousand of houses.
And step two, performing part-of-speech coding on the tag set to obtain a word vector set.
Since the intelligent query model cannot effectively identify the text information, the effective identification is to perform differentiation judgment after extracting the characteristics of the text information, and therefore, preferably, the part-of-speech encoding is to represent all keywords of each label in the label set by an N-dimensional matrix vector, that is, to convert the text information into digital information for subsequent identification and training of the model, where N is the number of keywords in the label set.
Further, the part-of-speech encoding first performs one-hot encoding (one-hot) operation on the tag set to obtain a primary word vector set, and then performs dimension reduction on the primary word vector set to obtain the word vector set.
The one-hot operation is as follows:
Figure BDA0002230002130000111
wherein i represents a keyword number, viN-dimensional matrix vector representing keyword i, all viForming the primary word vector set, assuming a total of s keywords, vjIs the jth element of the N-dimensional matrix vector.
Further, the dimension reduction is to shorten the generated N-dimensional matrix vector into data with smaller dimensions and easier calculation for subsequent model training, i.e. to finally convert the primary word vector set into the word vector set.
Preferably, the dimensionality reduction first establishes a forward probability model and a backward probability model, and then optimizes the forward probability model and the backward probability model to obtain an optimal solution, which is the word vector set.
Further, the forward probability model and the backward probability model are respectively:
Figure BDA0002230002130000112
Figure BDA0002230002130000113
optimizing the forward probability model and the backward probability model:
Figure BDA0002230002130000114
where max represents the optimization, where,
Figure BDA0002230002130000115
indicating the derivation, viAnd an N-dimensional matrix vector representing a keyword i, wherein the label set has s keywords, and further, after the forward probability model and the backward probability model are optimized, the dimension of the N-dimensional matrix vector is reduced to be smaller, and the dimension reduction is completed to obtain the word vector set.
Inputting the standard table set and the word vector set into an intelligent query model for training to obtain a training value, judging the size relationship between the training value and a preset threshold value, if the training value is larger than the preset threshold value, continuing training of the intelligent query model, and if the training value is smaller than the preset threshold value, finishing training of the intelligent query model.
Preferably, the training to obtain the training value comprises: and performing the part-of-speech coding on the user layer information in the standard table set to obtain a user layer information vector set, inputting the user layer information vector set to the intelligent query model, sequentially performing convolution operation, pooling operation and activation operation on the intelligent query model to obtain a prediction value set, and performing loss calculation on the prediction value set and the word vector set to obtain the training value.
And further, the convolution operation and the pooling operation comprise the steps of constructing a convolution template in advance, determining convolution step length, calculating the convolution template and a user layer information vector set according to the convolution step length to obtain a convolution matrix set after the convolution operation, and finishing the convolution operation. And selecting the maximum value or the average value of the matrix in the convolution matrix set to replace the convolution matrix set, and finishing the pooling operation.
Further, the pre-constructed convolution template may be a standard 3 x 3 matrix, such as
Figure BDA0002230002130000121
The calculation mode of calculating the matrix after the convolution operation is a mode that the convolution amplitude is 1 from left to right, and if the characteristic candidate area matrix with 9 × 9 characteristics in the characteristic candidate area set is:
Figure BDA0002230002130000122
said pre-constructed convolution template
Figure BDA0002230002130000123
First and
Figure BDA0002230002130000124
and calculating in the following way: multiplying corresponding dimensions of 1 × 0, 0 × 3, 1 × 1 and the like, and finally obtaining the result:
Figure BDA0002230002130000125
and so on, the pre-constructed convolution template
Figure BDA0002230002130000126
Continuing to traverse right one step with a convolution magnitude of 1 and the matrix as: the pre-constructed convolution template
Figure BDA0002230002130000127
Performing the above operation to obtain the pre-constructed convolution template
Figure BDA0002230002130000128
It follows that a large number of small-dimensional matrices can be generated when the convolution operation is completed, as described above
Figure BDA0002230002130000131
And
Figure BDA0002230002130000132
etc. therefore, the pooling operation is to make the dimensionality of the large number of small-dimensional matrices generated by the convolution operation smaller, preferably using the maximization principle, as described above
Figure BDA0002230002130000133
And
Figure BDA0002230002130000134
the maximum values 3 and 7 are substituted to complete the pooling operation.
Preferably, the convolution and pooling operations are repeated, and the final feature matrix set can be obtained after 16 times of convolution and pooling operations.
Preferably, the activation operation is to perform probability estimation on the feature matrix set through a softmax function, and select a prediction result with the highest probability as a final prediction and output the final prediction. The softmax function is:
Figure BDA0002230002130000135
wherein p (matrix) represents the output probability of the feature matrix set matrix, k represents the data amount of the feature matrix set, e is an infinite acyclic decimal, and j represents the selectable range of the prediction value set. Such as when
Figure BDA0002230002130000136
Time, calculate
Figure BDA0002230002130000137
Is 0.21, when
Figure BDA0002230002130000138
Time, calculate
Figure BDA0002230002130000139
Is 0.64, so the feature matrix represents
Figure BDA00022300021300001310
The loss calculation includes:
Figure BDA00022300021300001311
where t is the number of the word vector set, yiFor the set of word vectors, yi' is the set of prediction values.
Step four, receiving the query content of the user, extracting the query content based on a keyword extraction algorithm to obtain a keyword set, performing part-of-speech conversion on the keyword set to obtain a keyword vector set, inputting the keyword vector set to the intelligent query model which completes training to obtain a table set required by the query content, and outputting the table set.
Because the query content input by the user is often not in the form of the specified keyword combination and tastes linguistically much, such as "i want to search for the telephone bill in September", the keyword extraction needs to be performed on the query content input by the user, and the keywords "September" and "telephone bill" in the "telephone bill in September that i want to search for" are extracted.
Preferably, the keyword extraction may adopt a traversal method, for example, all keywords in the tag set are split and deduplicated to construct a keyword vocabulary, and the query content is sequentially compared with the keyword vocabulary to complete the keyword extraction.
Further, the intelligent query model obtains a table set required by the query content, wherein the query content is information of the user layer, so that the function provided by the computation layer for querying the data layer according to the user layer is called to obtain the data layer, and the data layer and the user layer are recombined to obtain the table set.
Alternatively, in other embodiments, the form intelligent query program may be further divided into one or more modules, and the one or more modules are stored in the memory 11 and executed by one or more processors (in this embodiment, the processor 12) to implement the present invention.
For example, referring to fig. 3, a schematic diagram of program modules of a table intelligent query program in an embodiment of the table intelligent query apparatus of the present invention is shown, in this embodiment, the table intelligent query program may be divided into a data receiving and processing module 10, a part-of-speech encoding module 20, an intelligent query model training module 30, and a table query and output module 40, which exemplarily:
the data receiving and processing module 10 is configured to: and receiving an original table set and a label set, and splitting the original table set to obtain a standard table set.
The part-of-speech encoding module 20 is configured to: and performing part-of-speech coding on the tag set to obtain a word vector set.
The smart query model training 30 is used to: inputting the standard table set and the word vector set into an intelligent query model for training to obtain a training value, judging the size relationship between the training value and a preset threshold value, if the training value is larger than the preset threshold value, continuing training of the intelligent query model, and if the training value is smaller than the preset threshold value, finishing training of the intelligent query model.
The table query and output 40 is used to: receiving query content of a user, extracting the query content based on a keyword extraction technology to obtain a keyword set, performing part-of-speech coding on the keyword set to obtain a keyword vector set, inputting the keyword vector set to the intelligent query model which completes training, and obtaining and outputting a form set required by the query content.
The functions or operation steps implemented by the data receiving and processing module 10, the part-of-speech encoding module 20, the intelligent query model training module 30, the table query and output module 40 and other program modules when executed are substantially the same as those of the above embodiments, and are not described herein again.
Furthermore, an embodiment of the present invention provides a computer-readable storage medium, where the computer-readable storage medium has a table smart query program stored thereon, and the table smart query program is executable by one or more processors to implement the following operations:
and receiving an original table set and a label set, and splitting the original table set to obtain a standard table set.
And performing part-of-speech coding on the tag set to obtain a word vector set.
Inputting the standard table set and the word vector set into an intelligent query model for training to obtain a training value, judging the size relationship between the training value and a preset threshold value, if the training value is larger than the preset threshold value, continuing training of the intelligent query model, and if the training value is smaller than the preset threshold value, finishing training of the intelligent query model.
Receiving query content of a user, extracting the query content based on a keyword extraction technology to obtain a keyword set, performing part-of-speech coding on the keyword set to obtain a keyword vector set, inputting the keyword vector set to the intelligent query model which completes training, and obtaining and outputting a form set required by the query content.
It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A method for intelligent query of a form, the method comprising:
receiving an original table set and a label set, and splitting the original table set to obtain a standard table set;
performing part-of-speech coding on the tag set to obtain a word vector set;
inputting the standard table set and the word vector set into an intelligent query model for training to obtain a training value, judging the size relationship between the training value and a preset threshold value, if the training value is larger than the preset threshold value, continuing training of the intelligent query model, and if the training value is smaller than the preset threshold value, finishing training of the intelligent query model;
receiving query content of a user, extracting the query content based on a keyword extraction technology to obtain a keyword set, performing part-of-speech coding on the keyword set to obtain a keyword vector set, inputting the keyword vector set to the intelligent query model which completes training, and obtaining and outputting a form set required by the query content.
2. The form intelligent query method of claim 1, wherein the splitting process is splitting the original form set into a user layer, a computation layer and a data layer, and composing the user layer, the computation layer and the data layer into the standard form set;
wherein:
the user layer consists of a table header and a table header of each table in the original table set;
the data layer is composed of the body of each table in the original table set;
the computing layer provides a mutual query function of the user layer and the data layer.
3. The intelligent table query method of claim 1 or 2, wherein said part-of-speech encoding said tag set to obtain a word vector set, comprises:
carrying out unique hot coding on the label set to obtain a primary word vector set;
and carrying out dimension reduction on the primary word vector set to obtain the word vector set.
4. The form intelligent query method of claim 3, wherein the dimension reduction comprises:
establishing a forward probability model and a backward probability model;
and optimizing the forward probability model and the backward probability model to obtain an optimal solution, wherein the optimal solution is the word vector set.
5. The table intelligent query method of claim 2, wherein inputting the standard table set and the word vector set into an intelligent query model to train to obtain training values comprises:
performing the part-of-speech coding on the user layer information in the standard table set to obtain a user layer information vector set;
inputting the user layer information vector set into the intelligent query model, and sequentially performing convolution operation, pooling operation and activation operation on the intelligent query model to obtain a prediction value set;
and performing loss calculation on the prediction value set and the word vector set to obtain the training value.
6. An apparatus for intelligent table query, the apparatus comprising a memory and a processor, the memory having stored thereon an intelligent table query program operable on the processor, the intelligent table query program when executed by the processor implementing the steps of:
receiving an original table set and a label set, and splitting the original table set to obtain a standard table set;
performing part-of-speech coding on the tag set to obtain a word vector set;
inputting the standard table set and the word vector set into an intelligent query model for training to obtain a training value, judging the size relationship between the training value and a preset threshold value, if the training value is larger than the preset threshold value, continuing training of the intelligent query model, and if the training value is smaller than the preset threshold value, finishing training of the intelligent query model;
receiving query content of a user, extracting the query content based on a keyword extraction technology to obtain a keyword set, performing part-of-speech coding on the keyword set to obtain a keyword vector set, inputting the keyword vector set to the intelligent query model which completes training, and obtaining and outputting a form set required by the query content.
7. The apparatus for intelligent lookup of tables as claimed in claim 6 wherein said splitting is a splitting of said original set of tables into a user layer, a computation layer and a data layer, and said user layer, said computation layer and said data layer are combined into said standard set of tables;
wherein:
the user layer consists of a table header and a table header of each table in the original table set;
the data layer is composed of the body of each table in the original table set;
the computing layer provides a mutual query function of the user layer and the data layer.
8. The apparatus for intelligent table query as claimed in claim 6 or 7, wherein said part-of-speech encoding said tag set to obtain a word vector set, comprises:
carrying out unique hot coding on the label set to obtain a primary word vector set;
and carrying out dimension reduction on the primary word vector set to obtain the word vector set.
9. The apparatus for intelligent lookup of tables as claimed in claim 8 wherein said dimension reduction comprises:
establishing a forward probability model and a backward probability model;
and optimizing the forward probability model and the backward probability model to obtain an optimal solution, wherein the optimal solution is the word vector set.
10. A computer-readable storage medium having stored thereon a form intelligent query program, the form intelligent query program being executable by one or more processors to implement the steps of the form intelligent query method of any one of claims 1 to 5.
CN201910975458.1A 2019-10-11 2019-10-11 Intelligent query method and device for table and computer readable storage medium Active CN110866042B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910975458.1A CN110866042B (en) 2019-10-11 2019-10-11 Intelligent query method and device for table and computer readable storage medium
PCT/CN2020/098951 WO2021068565A1 (en) 2019-10-11 2020-06-29 Table intelligent query method and apparatus, electronic device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910975458.1A CN110866042B (en) 2019-10-11 2019-10-11 Intelligent query method and device for table and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110866042A true CN110866042A (en) 2020-03-06
CN110866042B CN110866042B (en) 2023-05-12

Family

ID=69652834

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910975458.1A Active CN110866042B (en) 2019-10-11 2019-10-11 Intelligent query method and device for table and computer readable storage medium

Country Status (2)

Country Link
CN (1) CN110866042B (en)
WO (1) WO2021068565A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112597171A (en) * 2020-12-31 2021-04-02 平安银行股份有限公司 Table relation visualization method and device, electronic equipment and storage medium
WO2021068565A1 (en) * 2019-10-11 2021-04-15 平安科技(深圳)有限公司 Table intelligent query method and apparatus, electronic device and computer readable storage medium
CN113111864A (en) * 2021-05-13 2021-07-13 上海巽联信息科技有限公司 Intelligent table extraction algorithm based on multiple modes
CN116049354A (en) * 2023-01-28 2023-05-02 北京原子回声智能科技有限公司 Multi-table retrieval method and device based on natural language

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110282895A1 (en) * 2010-05-14 2011-11-17 Oracle International Corporation System and method for logical people groups
CN106250381A (en) * 2015-06-04 2016-12-21 微软技术许可有限责任公司 The row sequence optimized for input/output in list data
CN106874411A (en) * 2017-01-22 2017-06-20 网易(杭州)网络有限公司 The searching method and search platform of a kind of form
JP2017224240A (en) * 2016-06-17 2017-12-21 富士通株式会社 Table data search apparatus, table data search method, and table data search program
CN110222160A (en) * 2019-05-06 2019-09-10 平安科技(深圳)有限公司 Intelligent semantic document recommendation method, device and computer readable storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101615193A (en) * 2009-07-07 2009-12-30 北京大学 A kind of based on the integrated inquiry system of encyclopaedia data extract
US20140025626A1 (en) * 2012-04-19 2014-01-23 Avalon Consulting, LLC Method of using search engine facet indexes to enable search-enhanced business intelligence analysis
US10311374B2 (en) * 2015-09-11 2019-06-04 Adobe Inc. Categorization of forms to aid in form search
CN110866042B (en) * 2019-10-11 2023-05-12 平安科技(深圳)有限公司 Intelligent query method and device for table and computer readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110282895A1 (en) * 2010-05-14 2011-11-17 Oracle International Corporation System and method for logical people groups
CN106250381A (en) * 2015-06-04 2016-12-21 微软技术许可有限责任公司 The row sequence optimized for input/output in list data
JP2017224240A (en) * 2016-06-17 2017-12-21 富士通株式会社 Table data search apparatus, table data search method, and table data search program
CN106874411A (en) * 2017-01-22 2017-06-20 网易(杭州)网络有限公司 The searching method and search platform of a kind of form
CN110222160A (en) * 2019-05-06 2019-09-10 平安科技(深圳)有限公司 Intelligent semantic document recommendation method, device and computer readable storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021068565A1 (en) * 2019-10-11 2021-04-15 平安科技(深圳)有限公司 Table intelligent query method and apparatus, electronic device and computer readable storage medium
CN112597171A (en) * 2020-12-31 2021-04-02 平安银行股份有限公司 Table relation visualization method and device, electronic equipment and storage medium
CN113111864A (en) * 2021-05-13 2021-07-13 上海巽联信息科技有限公司 Intelligent table extraction algorithm based on multiple modes
CN116049354A (en) * 2023-01-28 2023-05-02 北京原子回声智能科技有限公司 Multi-table retrieval method and device based on natural language

Also Published As

Publication number Publication date
WO2021068565A1 (en) 2021-04-15
CN110866042B (en) 2023-05-12

Similar Documents

Publication Publication Date Title
CN110866042B (en) Intelligent query method and device for table and computer readable storage medium
CN110427480B (en) Intelligent personalized text recommendation method and device and computer readable storage medium
CN102053992A (en) Clustering method and system
CN112380870A (en) User intention analysis method and device, electronic equipment and computer storage medium
CN111475617A (en) Event body extraction method and device and storage medium
US11599727B2 (en) Intelligent text cleaning method and apparatus, and computer-readable storage medium
CN111767375A (en) Semantic recall method and device, computer equipment and storage medium
CN110795548A (en) Intelligent question answering method, device and computer readable storage medium
CN110852785B (en) User grading method, device and computer readable storage medium
WO2020248366A1 (en) Text intention intelligent classification method and device, and computer-readable storage medium
CN113627797A (en) Image generation method and device for employee enrollment, computer equipment and storage medium
CN115062134A (en) Knowledge question-answering model training and knowledge question-answering method, device and computer equipment
CN112231452A (en) Question-answering method, device, equipment and storage medium based on natural language processing
CN113656690B (en) Product recommendation method and device, electronic equipment and readable storage medium
CN112598039B (en) Method for obtaining positive samples in NLP (non-linear liquid) classification field and related equipment
CN113505273B (en) Data sorting method, device, equipment and medium based on repeated data screening
CN110765765A (en) Contract key clause extraction method and device based on artificial intelligence and storage medium
CN114358023A (en) Intelligent question-answer recall method and device, computer equipment and storage medium
CN116186295B (en) Attention-based knowledge graph link prediction method, attention-based knowledge graph link prediction device, attention-based knowledge graph link prediction equipment and attention-based knowledge graph link prediction medium
CN113360654A (en) Text classification method and device, electronic equipment and readable storage medium
CN112287140A (en) Image retrieval method and system based on big data
CN108876422A (en) For the method, apparatus of information popularization, electronic equipment and computer-readable medium
CN115525739A (en) Supply chain financial intelligent duplicate checking method, device, equipment and medium
CN114637831A (en) Data query method based on semantic analysis and related equipment thereof
CN114996566A (en) Intelligent recommendation system and method for industrial internet platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant