CN110866042A

CN110866042A - Intelligent table query method and device and computer readable storage medium

Info

Publication number: CN110866042A
Application number: CN201910975458.1A
Authority: CN
Inventors: 王建华; 马琳; 张晓东
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-10-11
Filing date: 2019-10-11
Publication date: 2020-03-06
Anticipated expiration: 2039-10-11
Also published as: WO2021068565A1; CN110866042B

Abstract

The invention relates to an artificial intelligence technology, and discloses an intelligent table query method, which comprises the following steps: receiving an original form set and a label set, splitting the original form set to obtain a standard form set, performing part-of-speech coding on the label set to obtain a word vector set, inputting the standard form set and the word vector set into an intelligent query model to train to obtain a training value, completing the training of the intelligent query model until the training value is smaller than a preset threshold value, receiving query content of a user, extracting the query content based on a keyword extraction technology to obtain a keyword set, performing part-of-speech coding on the keyword set to obtain a keyword vector set, inputting the keyword vector set into the intelligent query model completing the training, obtaining the form set required by the query content, and outputting the form set. The invention also provides an intelligent table query device and a computer readable storage medium. The invention can realize the accurate and efficient intelligent table query function.

Description

Intelligent table query method and device and computer readable storage medium

Technical Field

The present invention relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for intelligently querying a table, and a computer-readable storage medium.

Background

With the rapid development of the internet, the data scale is rapidly enlarged, and therefore, the rapid query speed of the data is higher and higher. Most data are stored in a table form at present, such as income data of the day of company business, house property registration information of house property companies and the like, and at present, data-based query is mainly based on a table traversal method or a user keyword search method, so that although query requirements can be met to a certain extent, when the table capacity is large, the table traversal method and the user keyword search method are slow in search speed and consume a large amount of calculation memory.

Disclosure of Invention

The invention provides a method and a device for intelligently querying a table and a computer readable storage medium, and mainly aims to carry out intelligent query on the table according to query requirements of a user.

In order to achieve the above object, the present invention provides an intelligent table query method, which comprises:

receiving an original table set and a label set, and splitting the original table set to obtain a standard table set;

performing part-of-speech coding on the tag set to obtain a word vector set;

inputting the standard table set and the word vector set into an intelligent query model for training to obtain a training value, judging the size relationship between the training value and a preset threshold value, if the training value is larger than the preset threshold value, continuing training of the intelligent query model, and if the training value is smaller than the preset threshold value, finishing training of the intelligent query model;

receiving query content of a user, extracting the query content based on a keyword extraction technology to obtain a keyword set, performing part-of-speech coding on the keyword set to obtain a keyword vector set, inputting the keyword vector set to the intelligent query model which completes training, and obtaining and outputting a form set required by the query content.

Optionally, the splitting process is to split the original table set into a user layer, a computation layer and a data layer, and combine the user layer, the computation layer and the data layer into the standard table set;

wherein:

the user layer consists of a table header and a table header of each table in the original table set;

the data layer is composed of the body of each table in the original table set;

the computing layer provides a mutual query function of the user layer and the data layer.

Optionally, the part-of-speech encoding the tag set to obtain a word vector set includes:

carrying out unique hot coding on the label set to obtain a primary word vector set;

and carrying out dimension reduction on the primary word vector set to obtain the word vector set.

Optionally, the dimension reduction comprises:

establishing a forward probability model and a backward probability model;

and optimizing the forward probability model and the backward probability model to obtain an optimal solution, wherein the optimal solution is the word vector set.

Optionally, the standard table set and the word vector set are input into an intelligent query model to be trained to obtain training values, including:

performing the part-of-speech coding on the user layer information in the standard table set to obtain a user layer information vector set;

inputting the user layer information vector set into the intelligent query model, and sequentially performing convolution operation, pooling operation and activation operation on the intelligent query model to obtain a prediction value set;

and performing loss calculation on the prediction value set and the word vector set to obtain the training value.

In addition, in order to achieve the above object, the present invention further provides a table intelligent query apparatus, which includes a memory and a processor, wherein the memory stores a table intelligent query program operable on the processor, and the table intelligent query program, when executed by the processor, implements the following steps:

performing part-of-speech coding on the tag set to obtain a word vector set;

wherein:

the data layer is composed of the body of each table in the original table set;

Optionally, the dimension reduction comprises:

establishing a forward probability model and a backward probability model;

In addition, to achieve the above object, the present invention also provides a computer readable storage medium, which stores thereon a table intelligent query program, which is executable by one or more processors to implement the steps of the table intelligent query method as described above.

According to the method, the standard form set is obtained by splitting the original form set, so that the splitting process can split a large and integrated original form into a small-sized form, fast query is facilitated, the intelligent query model is trained through the standard form set, the intelligent query model has an excellent and efficient query function, meanwhile, keyword extraction is carried out on query contents of a user, the query contents desired by the user are accurately known, and fast query is carried out according to the intelligent query model. Therefore, the intelligent table query method, the intelligent table query device and the computer readable storage medium can realize accurate and efficient table query functions.

Drawings

Fig. 1 is a schematic flow chart of a table intelligent query method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of an internal structure of the intelligent table query device according to an embodiment of the present invention;

fig. 3 is a schematic block diagram of a table intelligent query program in the table intelligent query device according to an embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention provides an intelligent table query method. Fig. 1 is a schematic flow chart of a table intelligent query method according to an embodiment of the present invention. The method may be performed by an apparatus, which may be implemented by software and/or hardware.

In this embodiment, the intelligent table query method includes:

and S1, receiving the original form set and the label set, and splitting the original form set to obtain a standard form set.

The original form set in the preferred embodiment of the present invention is a form automatically generated on the premise of different services, such as a real estate agency collecting a plurality of real estate sources and a real estate information form set obtained by the collection based on EXCEL software; the communication service company generates a user consumption list table set and the like according to the use conditions of calls, flow and the like of different users.

The set of tags is a description of each table in the original set of tables. Preferably, the description mode is a form of a keyword combination, such as the aforementioned property information table set, the tag set is a form of 4 types of keyword combinations of a cell name and a unit number + a property area + a market price + a property pattern, and if the tag set records that a set of rooms in the property information table set is: a cell 3A unit +89 square meters +120 ten thousand + three rooms, one hall, two toilets.

Preferably, the splitting process is to split the original table set into a user layer, a computation layer and a data layer to obtain the standard table set. Because the table comprises a table question, a table head and a table body, wherein the table question and the table head are in a form based on character expression, the table question of the product sales statistical table is a product sales statistical table, and the table head comprises a product number, a product name, specifications, a packaging mode and the like, the table question and the table head of the original table set are extracted to form a user layer; the form body is data for storing the whole table, so the form body of the original form set is extracted to form a data layer, when the whole table is split to form the user layer and the data layer, a query relationship needs to be established between the user layer and the data layer, that is, the function of the computation layer, the computation layer can adopt a linear index query form of multiple elements once, and the linear query is:

y＝a₁x₁+a₂x₂+…+a_nx_n

where y is the data in the data layer and x₁、x₂、x_nIs the text information of the user layer, a₁、a₂、a_nThe index coefficient is 0 or 1. If the house information with the house price of 130 ten thousand needs to be inquired, so that y is 130, all the user layer character information meeting the house price of 130 ten thousand is reversely solved, namely, the table head and the table question information in the original table set are shown, if the table questions of cell A, cell B and cell E have house sources meeting the house price of 130 ten thousand, and the table head is shown as which house prices are 130 ten thousand under the table question of cell A.

And S2, performing part-of-speech coding on the tag set to obtain a word vector set.

Since the intelligent query model cannot effectively identify the text information, the effective identification is to perform differentiation judgment after extracting the characteristics of the text information, and therefore, preferably, the part-of-speech encoding is to represent all keywords of each label in the label set by an N-dimensional matrix vector, that is, to convert the text information into digital information for subsequent identification and training of the model, where N is the number of keywords in the label set.

Further, the part-of-speech encoding first performs one-hot encoding (one-hot) operation on the tag set to obtain a primary word vector set, and then performs dimension reduction on the primary word vector set to obtain the word vector set.

The one-hot operation is as follows:

wherein i represents a keyword number, vⁱN-dimensional matrix vector representing keyword i, all vⁱForming the primary word vector set, assuming a total of s keywords, v_jIs the jth element of the N-dimensional matrix vector.

Further, the dimension reduction is to shorten the generated N-dimensional matrix vector into data with smaller dimensions and easier calculation for subsequent model training, i.e. to finally convert the primary word vector set into the word vector set.

Preferably, the dimensionality reduction first establishes a forward probability model and a backward probability model, and then optimizes the forward probability model and the backward probability model to obtain an optimal solution, which is the word vector set.

Further, the forward probability model and the backward probability model are respectively:

optimizing the forward probability model and the backward probability model:

where max represents the optimization, where,

indicating the derivation, vⁱAnd an N-dimensional matrix vector representing a keyword i, wherein the label set has s keywords, and further, after the forward probability model and the backward probability model are optimized, the dimension of the N-dimensional matrix vector is reduced to be smaller, and the dimension reduction is completed to obtain the word vector set.

S3, inputting the standard table set and the word vector set into an intelligent query model for training to obtain a training value, judging the size relation between the training value and a preset threshold value, if the training value is larger than the preset threshold value, continuing training of the intelligent query model, and if the training value is smaller than the preset threshold value, finishing training of the intelligent query model.

Preferably, the training to obtain the training value comprises: and performing the part-of-speech coding on the user layer information in the standard table set to obtain a user layer information vector set, inputting the user layer information vector set to the intelligent query model, sequentially performing convolution operation, pooling operation and activation operation on the intelligent query model to obtain a prediction value set, and performing loss calculation on the prediction value set and the word vector set to obtain the training value.

And further, the convolution operation and the pooling operation comprise the steps of constructing a convolution template in advance, determining convolution step length, calculating the convolution template and a user layer information vector set according to the convolution step length to obtain a convolution matrix set after the convolution operation, and finishing the convolution operation. And selecting the maximum value or the average value of the matrix in the convolution matrix set to replace the convolution matrix set, and finishing the pooling operation.

Further, the pre-constructed convolution template may be a standard 3 x 3 matrix, such as

The calculation mode of calculating the matrix after the convolution operation is a mode that the convolution amplitude is 1 from left to right, and if the characteristic candidate area matrix with 9 × 9 characteristics in the characteristic candidate area set is:

said pre-constructed convolution template

First and

and calculating in the following way: multiplying corresponding dimensions of 1 × 0, 0 × 3, 1 × 1 and the like, and finally obtaining the result:

and so on, the pre-constructed convolution template

Continuing to traverse right one step with a convolution magnitude of 1 and the matrix as: the pre-constructed convolution template

Performing the above operation to obtain the pre-constructed convolution template

It follows that a large number of small-dimensional matrices can be generated when the convolution operation is completed, as described above

And

etc. therefore, the pooling operation is to make the dimensionality of the large number of small-dimensional matrices generated by the convolution operation smaller, preferably using the maximization principle, as described above

And

the maximum values 3 and 7 are substituted to complete the pooling operation.

Preferably, the convolution and pooling operations are repeated, and the final feature matrix set can be obtained after 16 times of convolution and pooling operations.

Preferably, the activation operation is to perform probability estimation on the feature matrix set through a softmax function, and select a prediction result with the highest probability as a final prediction and output the final prediction. The softmax function is:

wherein p (matrix) represents the output probability of the feature matrix set matrix, k represents the data volume of the feature matrix set, and e is infiniteThe cyclic fraction, j, represents the range of the set of prediction values that can be selected. Such as when

Time, calculate

Is 0.21, when

Time, calculate

Is 0.64, so the feature matrix represents

The loss calculation includes:

where t is the number of the word vector set, y_iFor the set of word vectors, y_i' is the set of prediction values.

S4, receiving query content of a user, extracting the query content based on a keyword extraction algorithm to obtain a keyword set, performing part-of-speech conversion on the keyword set to obtain a keyword vector set, inputting the keyword vector set to the intelligent query model which completes training to obtain a table set required by the query content, and outputting the table set.

Because the query content input by the user is often not in the form of the specified keyword combination and tastes linguistically much, such as "i want to search for the telephone bill in September", the keyword extraction needs to be performed on the query content input by the user, and the keywords "September" and "telephone bill" in the "telephone bill in September that i want to search for" are extracted.

Preferably, the keyword extraction may adopt a traversal method, for example, all keywords in the tag set are split and deduplicated to construct a keyword vocabulary, and the query content is sequentially compared with the keyword vocabulary to complete the keyword extraction.

Further, the intelligent query model obtains a table set required by the query content, wherein the query content is information of the user layer, so that the function provided by the computation layer for querying the data layer according to the user layer is called to obtain the data layer, and the data layer and the user layer are recombined to obtain the table set.

The invention also provides an intelligent table query device. Fig. 2 is a schematic diagram illustrating an internal structure of the intelligent table query device according to an embodiment of the present invention.

In this embodiment, the form intelligent query apparatus 1 may be a PC (Personal Computer), a terminal device such as a smart phone, a tablet Computer, or a mobile Computer, or may be a server. The intelligent query device 1 for tables at least comprises a memory 11, a processor 12, a communication bus 13, and a network interface 14.

The memory 11 includes at least one type of readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 11 may be an internal storage unit of the table intelligent query apparatus 1 in some embodiments, for example, a hard disk of the table intelligent query apparatus 1. The memory 11 may also be an external storage device of the table lookup apparatus 1 in other embodiments, such as a plug-in hard disk provided on the table lookup apparatus 1, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 11 may also include both an internal storage unit and an external storage device of the table intelligent query apparatus 1. The memory 11 may be used not only to store application software installed in the table intelligent query apparatus 1 and various types of data, such as codes of the table intelligent query program 01, but also to temporarily store data that has been output or is to be output.

Processor 12, which in some embodiments may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data Processing chip, is configured to execute program code stored in memory 11 or process data, such as executing table smart query 01.

The communication bus 13 is used to realize connection communication between these components.

The network interface 14 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), typically used to establish a communication link between the apparatus 1 and other electronic devices.

Optionally, the apparatus 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is used to display information processed in the form intelligent query apparatus 1 and to display a visual user interface.

While FIG. 2 shows only the form intelligent query device 1 having the components 11-14 and the form intelligent query program 01, those skilled in the art will appreciate that the configuration shown in FIG. 1 does not constitute a limitation of the form intelligent query device 1, and may include fewer or more components than shown, or some components in combination, or a different arrangement of components.

In the embodiment of the apparatus 1 shown in fig. 2, the memory 11 stores a table smart query program 01; when processor 12 executes table smart query program 01 stored in memory 11, the following steps are implemented:

the method comprises the steps of firstly, receiving an original form set and a label set, and splitting the original form set to obtain a standard form set.

y＝a₁x₁+a₂x₂+…+a_nx_n

where y is the data in the data layer and x₁、x₂、x_nIs the text information of the user layer, a₁、a₂、a_nThe index coefficient is 0 or 1. If the house property information with the house property price of 130 ten thousand needs to be inquired, the y is 130, and all the user layer text information meeting the house property price of 130 ten thousand, namely the post, is reversely solvedThe original table is concentrated with table header and table header information, for example, the table header is that under the table header of the table header, which houses are at a price of 130 ten thousand, the table header has house sources meeting the price of 130 ten thousand of houses.

And step two, performing part-of-speech coding on the tag set to obtain a word vector set.

The one-hot operation is as follows:

optimizing the forward probability model and the backward probability model:

where max represents the optimization, where,

Inputting the standard table set and the word vector set into an intelligent query model for training to obtain a training value, judging the size relationship between the training value and a preset threshold value, if the training value is larger than the preset threshold value, continuing training of the intelligent query model, and if the training value is smaller than the preset threshold value, finishing training of the intelligent query model.

said pre-constructed convolution template

First and

and so on, the pre-constructed convolution template

And

And

the maximum values 3 and 7 are substituted to complete the pooling operation.

wherein p (matrix) represents the output probability of the feature matrix set matrix, k represents the data amount of the feature matrix set, e is an infinite acyclic decimal, and j represents the selectable range of the prediction value set. Such as when

Time, calculate

Is 0.21, when

Time, calculate

Is 0.64, so the feature matrix represents

The loss calculation includes:

Step four, receiving the query content of the user, extracting the query content based on a keyword extraction algorithm to obtain a keyword set, performing part-of-speech conversion on the keyword set to obtain a keyword vector set, inputting the keyword vector set to the intelligent query model which completes training to obtain a table set required by the query content, and outputting the table set.

Alternatively, in other embodiments, the form intelligent query program may be further divided into one or more modules, and the one or more modules are stored in the memory 11 and executed by one or more processors (in this embodiment, the processor 12) to implement the present invention.

For example, referring to fig. 3, a schematic diagram of program modules of a table intelligent query program in an embodiment of the table intelligent query apparatus of the present invention is shown, in this embodiment, the table intelligent query program may be divided into a data receiving and processing module 10, a part-of-speech encoding module 20, an intelligent query model training module 30, and a table query and output module 40, which exemplarily:

the data receiving and processing module 10 is configured to: and receiving an original table set and a label set, and splitting the original table set to obtain a standard table set.

The part-of-speech encoding module 20 is configured to: and performing part-of-speech coding on the tag set to obtain a word vector set.

The smart query model training 30 is used to: inputting the standard table set and the word vector set into an intelligent query model for training to obtain a training value, judging the size relationship between the training value and a preset threshold value, if the training value is larger than the preset threshold value, continuing training of the intelligent query model, and if the training value is smaller than the preset threshold value, finishing training of the intelligent query model.

The table query and output 40 is used to: receiving query content of a user, extracting the query content based on a keyword extraction technology to obtain a keyword set, performing part-of-speech coding on the keyword set to obtain a keyword vector set, inputting the keyword vector set to the intelligent query model which completes training, and obtaining and outputting a form set required by the query content.

The functions or operation steps implemented by the data receiving and processing module 10, the part-of-speech encoding module 20, the intelligent query model training module 30, the table query and output module 40 and other program modules when executed are substantially the same as those of the above embodiments, and are not described herein again.

Furthermore, an embodiment of the present invention provides a computer-readable storage medium, where the computer-readable storage medium has a table smart query program stored thereon, and the table smart query program is executable by one or more processors to implement the following operations:

and receiving an original table set and a label set, and splitting the original table set to obtain a standard table set.

And performing part-of-speech coding on the tag set to obtain a word vector set.

It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A method for intelligent query of a form, the method comprising:

performing part-of-speech coding on the tag set to obtain a word vector set;

2. The form intelligent query method of claim 1, wherein the splitting process is splitting the original form set into a user layer, a computation layer and a data layer, and composing the user layer, the computation layer and the data layer into the standard form set;

wherein:

the data layer is composed of the body of each table in the original table set;

3. The intelligent table query method of claim 1 or 2, wherein said part-of-speech encoding said tag set to obtain a word vector set, comprises:

4. The form intelligent query method of claim 3, wherein the dimension reduction comprises:

establishing a forward probability model and a backward probability model;

5. The table intelligent query method of claim 2, wherein inputting the standard table set and the word vector set into an intelligent query model to train to obtain training values comprises:

6. An apparatus for intelligent table query, the apparatus comprising a memory and a processor, the memory having stored thereon an intelligent table query program operable on the processor, the intelligent table query program when executed by the processor implementing the steps of:

performing part-of-speech coding on the tag set to obtain a word vector set;

7. The apparatus for intelligent lookup of tables as claimed in claim 6 wherein said splitting is a splitting of said original set of tables into a user layer, a computation layer and a data layer, and said user layer, said computation layer and said data layer are combined into said standard set of tables;

wherein:

the data layer is composed of the body of each table in the original table set;

8. The apparatus for intelligent table query as claimed in claim 6 or 7, wherein said part-of-speech encoding said tag set to obtain a word vector set, comprises:

9. The apparatus for intelligent lookup of tables as claimed in claim 8 wherein said dimension reduction comprises:

establishing a forward probability model and a backward probability model;

10. A computer-readable storage medium having stored thereon a form intelligent query program, the form intelligent query program being executable by one or more processors to implement the steps of the form intelligent query method of any one of claims 1 to 5.